In April 2026, we killed a strategy that looked, on paper, like the best thing we had ever evolved.
Champion eb686789 — internally called Frankenstein — was a 5-minute BTC scalper. Backtested win rate: 82-86%. Backtested profit factor: 15-16. DNA: BBands(18,2.3) + RSI(12), timeframe 5m/15m, direction both, leverage 4x. In backtesting across multiple windows, it was consistently the top performer.
We deployed it live with $20 capital, compounding, targeting BTC/ETH/SOL pairs.
After three weeks, the live win rate was 49%.
The Math Behind That Drift
An 82% win rate means you expect roughly 4 losses out of every 5 trades. A 49% win rate means you are right about as often as a coin flip.
The PnL per trade was also different. Backtested average PnL per trade: roughly +$0.15 net. Live average: approximately +$0.04 per trade — about 23 times lower profitability per hour than the main executor running a portfolio of strategies.
The strategy was not losing money. It was making money so slowly that transaction fees and slippage were eating a disproportionate share of each trade.
Why Backtests Lie
Frankenstein was evolved on historical data. The genetic algorithm found the parameter combination that produced the best fitness score across that data window. The problem is that a 5-minute scalper captures microstructure patterns — order flow imbalances, spread behavior, the way liquidity pools around round price levels. These patterns are noisier and less persistent than the patterns a 4-hour or daily strategy uses.
When the genetic algorithm ran, it found a pattern that was real in the training data. But it was not persistent enough to survive into live trading. The pattern degraded within weeks of deployment.
This is the standard overfitting failure mode. What was surprising was how clean the decay looked — the live win rate did not gradually drift from 82% to 79% to 73%. It landed at 49% almost immediately and stayed there.
What the System Does Now to Catch This
We built three layers of drift detection after this incident.
Layer 1: Forward-only promotion. No strategy gets to trade real capital based on backtest numbers alone. Every champion must earn at least 15 arena trades with a forward win rate above 25%, positive PnL, and profit factor above 1.2. The paper arena runs every 2 minutes on real-time prices. Frankenstein's arena stats were the first early signal — its arena win rate was lower than backtested, though we did not catch it fast enough.
Layer 2: All-time max baseline. The system now tracks the highest ever forward win rate for each strategy across its lifetime. When current win rate falls more than a threshold below that baseline, the drift monitor flags it. This catches gradual decay that a rolling window misses.
Layer 3: Confidence drift model. An XGBoost model (48 features, retrained weekly) produces a confidence score for each signal. The model includes features for how recently a strategy was trained, how far its current win rate is from its peak, and whether the current regime matches the regime it was optimized for. If the model returns below 45% confidence, the signal does not execute.
None of these layers existed when Frankenstein was deployed. They exist now because of it.
The Actual Decision to Kill It
The formal kill decision was straightforward once we had the data. The system compared Frankenstein's PnL per hour to the main executor running a portfolio of strategies. The main executor was generating 23 times more PnL per hour with less capital concentration.
Running Frankenstein in parallel was not just low-return — it was consuming position slots and cooldown budget that could go to higher-quality signals. The cron job was commented out, its DNA was archived to the killed-strategies ledger with a weak_bull regime tag, and the capital was folded back into the main executor.
The archive matters. If a future market regime matches the microstructure environment Frankenstein was optimized for, it can be considered for resurrection. The system has a resurrection scanner that checks killed strategies every hour against current regime and their historical native-regime performance.
What This Means for VIP Signals
Every signal posted to the free and VIP Telegram channels comes from the live executor — strategies that passed the paper arena forward-validation gate and the XGBoost confidence gate. Backtest stats are never shown as the primary metric. When a champion's card shows win rate and profit factor, those numbers are from the forward paper arena, not the backtester.
The /how-it-works page covers the full evaluation pipeline if you want the technical detail.
Risk disclaimer: Past performance is not indicative of future results. Full disclaimer.