๐Ÿšง BETA Pre-launch product. We are not a registered investment adviser. Trading involves substantial risk of loss. Legal notice โ†’
SHOW YOUR WORK ยท DETERMINISTIC ยท 5-YEAR WALK-FORWARD

Reproduce our backtest

Three commands. Same input. Same output, byte-for-byte. We don't ask you to trust the Sharpe number โ€” we hand you the recipe.

Why this page exists

Most trading-SaaS backtests are black boxes. Sharpe ratios get cited; methodology stays hidden; numbers can't be independently verified. We do the opposite.

The 5-year backtest behind the Performance Expectations card is a walk-forward run against pinned Alpaca bars with deterministic seeding. Every commit produces byte-identical output for the same input. You can clone the repo, run one command, and verify the published numbers โ€” no API keys, no calls home, no proprietary data feed.

If a competitor won't show their methodology this clearly, that's a signal worth thinking about.

The exact commands

All three are run from the repo root with Python 3.12+ and the dev dependencies installed. No broker keys required โ€” the backtest reads pinned bars, not live data.

1Clone + install

# Clone the repo (private at launch โ€” request access via founders@tradingarbor.com)
git clone https://github.com/pataskad/automated-trading-assistant
cd automated-trading-assistant

# Install dev dependencies (includes pandas, numpy, scipy)
pip install -r requirements.txt

2Pin the bars

The committed snapshot at backtest/data/pinned_bars/ is what every shipped number was computed against. Set the env var to point at it.

# Use the committed snapshot (deterministic; matches published numbers)
export PINNED_BARS_PATH=$(pwd)/backtest/data/pinned_bars/snapshot.json
export PYTHONHASHSEED=0    # Belt-and-suspenders for sort-order determinism

3Run the walk-forward

# Per-year breakdown: returns, Sharpe, drawdowns, trade counts
python -m backtest.per_year_return_report_v2 --years 2021,2022,2023,2024,2025

# Verification battery: ablations + cross-checks before any param change ships
python -m backtest.verify_swing_growth_stacked

# Compare to live signal log (catches drift between backtest + production)
python -m scripts.parity_check

Output is reproducible to the byte. If you re-run the same command on the same commit, the results match exactly. That's the whole point.

What you should see

The shipped numbers from the committed hardened run (2026-06 re-validation). Regenerate with python -m backtest.bake_honest_performance against the pinned 10-year bars and you should match these to the rounding shown.

Metric ยท deployed engine, $25K, minimum (1%) risk Backtest (hardened) Live-expected
Avg / year (compounded, injection-adjusted) +12%+6%
Avg / year (no injection) +15%โ€”
Worst year of 10 โˆ’14%โˆ’21%
Max drawdown (injected mean) ~28%worse
Sharpe (yearly mean/ฯƒ) 0.870.61

Numbers reflect the committed dashboard_data/honest_performance.json, which the dashboard's Performance card reads directly. "Hardened" means: compounded dollars (never summed percentages), every stop fill repriced through overnight gaps, and a random 2%-per-trade total loss injected to proxy blow-ups the still-listed universe can't show. An earlier methodology summed per-trade percentages and reported five-year averages near +182%/yr with a positive worst year; the 2026-06 re-validation measured that accounting as the artifact it was and retired it. Full trail: backtest/REPLAY_RESEARCH.md.

Honest caveats

Backtests over-promise. We compress the numbers before surfacing live expectations:
  • Average return ร— 0.5 โ€” slippage, fills, theta decay vs simulation
  • Worst-year drawdown ร— 1.5 โ€” live tail risk has been worse than backtest in every honest study we've read
  • Sharpe ร— 0.7 โ€” friction compounds; live variance is higher than simulated

So with the hardened backtest at +12%/yr (injection-adjusted, Sharpe 0.87, worst year โˆ’14%), our planning number for live trading is +6%/yr at Sharpe ~0.61 with a worst year near โˆ’21%. That's the number we plan around โ€” not a headline. These figures describe the engine at its deployed minimum-risk setting; higher-risk configurations exist in the research but do not surface here until they clear live fill-parity and paper validation gates.

Other things this backtest doesn't account for that live trading will: regimes outside 2016โ€“2026 (no dot-com or 2008-style secular bear exists in the data); broker outages; your specific account's fee structure; tax drag on short-term capital gains; the psychological cost of seeing a โˆ’20%+ drawdown on a real account. Trade paper for at least 2-3 weeks before going live; the engines run identically against paper and live brokers, so you're stress-testing your own tolerance, not just the algorithm.

The deep technical doc

For someone doing real due diligence on the methodology โ€” investors, prospects with quant backgrounds, auditors โ€” the full breakdown is in our documentation.

It covers: