Reproduce our backtest — tradingarbor

Why this page exists

Most trading-SaaS backtests are black boxes. Sharpe ratios get cited; methodology stays hidden; numbers can't be independently verified. We do the opposite.

The 5-year backtest behind the Performance Expectations card is a walk-forward run against pinned Alpaca bars with deterministic seeding. Every commit produces byte-identical output for the same input. You can clone the repo, run one command, and verify the published numbers — no API keys, no calls home, no proprietary data feed.

If a competitor won't show their methodology this clearly, that's a signal worth thinking about.

The exact commands

All three are run from the repo root with Python 3.12+ and the dev dependencies installed. No broker keys required — the backtest reads pinned bars, not live data.

1Clone + install

# Clone the repo (private at launch — request access via founders@tradingarbor.com)
git clone https://github.com/pataskad/automated-trading-assistant
cd automated-trading-assistant

# Install dev dependencies (includes pandas, numpy, scipy)
pip install -r requirements.txt

2Pin the bars

The committed snapshot at backtest/data/pinned_bars/ is what every shipped number was computed against. Set the env var to point at it.

# Use the committed snapshot (deterministic; matches published numbers)
export PINNED_BARS_PATH=$(pwd)/backtest/data/pinned_bars/snapshot.json
export PYTHONHASHSEED=0    # Belt-and-suspenders for sort-order determinism

3Run the walk-forward

# Per-year breakdown: returns, Sharpe, drawdowns, trade counts
python -m backtest.per_year_return_report_v2 --years 2021,2022,2023,2024,2025

# Verification battery: ablations + cross-checks before any param change ships
python -m backtest.verify_swing_growth_stacked

# Compare to live signal log (catches drift between backtest + production)
python -m scripts.parity_check

Output is reproducible to the byte. If you re-run the same command on the same commit, the results match exactly. That's the whole point.

What you should see

The shipped numbers from the committed hardened run (2026-06 re-validation). Regenerate with python -m backtest.bake_honest_performance against the pinned 10-year bars and you should match these to the rounding shown.

Metric · deployed engine, $25K, minimum (1%) risk	Backtest (hardened)	Live-expected
Avg / year (compounded, injection-adjusted)	+12%	+6%
Avg / year (no injection)	+15%	—
Worst year of 10	−14%	−21%
Max drawdown (injected mean)	~28%	worse
Sharpe (yearly mean/σ)	0.87	0.61

Numbers reflect the committed dashboard_data/honest_performance.json, which the dashboard's Performance card reads directly. "Hardened" means: compounded dollars (never summed percentages), every stop fill repriced through overnight gaps, and a random 2%-per-trade total loss injected to proxy blow-ups the still-listed universe can't show. An earlier methodology summed per-trade percentages and reported five-year averages near +182%/yr with a positive worst year; the 2026-06 re-validation measured that accounting as the artifact it was and retired it. Full trail: backtest/REPLAY_RESEARCH.md.

Honest caveats

Backtests over-promise. We compress the numbers before surfacing live expectations:

Average return × 0.5 — slippage, fills, theta decay vs simulation
Worst-year drawdown × 1.5 — live tail risk has been worse than backtest in every honest study we've read
Sharpe × 0.7 — friction compounds; live variance is higher than simulated

So with the hardened backtest at +12%/yr (injection-adjusted, Sharpe 0.87, worst year −14%), our planning number for live trading is +6%/yr at Sharpe ~0.61 with a worst year near −21%. That's the number we plan around — not a headline. These figures describe the engine at its deployed minimum-risk setting; higher-risk configurations exist in the research but do not surface here until they clear live fill-parity and paper validation gates.

Other things this backtest doesn't account for that live trading will: regimes outside 2016–2026 (no dot-com or 2008-style secular bear exists in the data); broker outages; your specific account's fee structure; tax drag on short-term capital gains; the psychological cost of seeing a −20%+ drawdown on a real account. Trade paper for at least 2-3 weeks before going live; the engines run identically against paper and live brokers, so you're stress-testing your own tolerance, not just the algorithm.

The deep technical doc

For someone doing real due diligence on the methodology — investors, prospects with quant backgrounds, auditors — the full breakdown is in our documentation.

It covers:

Why pinned bars matter (and the non-determinism bug we hit + fixed before shipping)
The 4-test verification battery every parameter change goes through
Per-year regime breakdown (2022 bear, 2024 mega-cap concentration, etc.)
The theta-honest projection math + why we use it
The nightly parity check that compares live signals to backtest math

Read the full methodology See pricing