AI-Directed Options Trading
The system that taught itself to sell volatility.
An AI options trading system that started with no strategy preference. After processing thousands of signals across 26 market features, 27 regime classifications, and nightly self-learning cycles, it converged on credit spreads — not because I told it to, but because the data did.
Paper Trading
79.3% win rate · $2,880 PnL · 214 resolved trades
Backtested
96.1% win rate · 1,299 trades · 365-day lookback
Strategy Convergence
Debits disabled after analysis · confirmed across SPX + BTC
How three layers produce one decision
Three layers, one decision
Every trade passes through three independent layers. Each can refine, gate, or veto the signal from the layer above. The output isn't a simple buy/sell — it's a probability that's been stress-tested through backtested constants, regime awareness, and real-time context.
AutoResearcher
Backtested structural constants that define the system's baseline thresholds. These rarely change — they're the foundation.
Regime Playbook
Adjusts thresholds and sizing per market environment. What works in low-vol contango doesn't work in fear momentum.
LLM Override
Reviews the math model's output with context it can't see — news, macro events, recent trade outcomes. Can trim, boost, or pass through.
Each layer can veto the one above it. The LLM sees everything the math model sees, plus news, macro calendar, and recent trade outcomes. A 64.7% bullish probability can become 59% if FOMC is in two days.
26 features converge into one probability (13 core shown)
26 features, one probability
Every signal flows into the aggregator independently. No single feature dominates — the system weighs them in combination, recalibrated nightly by the self-learner.
VIX Level
Current volatility index — the market's fear gauge
IV Rank
Implied vol percentile over 52 weeks
IV-HV Spread
Implied vs realized vol gap — over/underpriced options
Realized Vol 5d
Short-term actual price movement
Realized Vol 20d
Medium-term actual price movement
Vol Regime
Cheap · Fair · Expensive classification
GEX
Gamma exposure — dealer hedging pressure
Term Structure
VIX contango vs backwardation
RSI
Relative strength momentum signal
MACD
Trend direction and strength
Put/Call Ratio
Market sentiment gauge
Macro Events
FOMC, CPI, NFP, PCE calendar
News Headlines
Real-time market news via IB + Brave Search
How credit spreads won
The system didn't start with a preference. It found one.
Early on, the system tried everything — iron condors, debit spreads, bull puts, bear calls. The nightly learner reviewed every outcome and recalibrated. Over hundreds of trades, the data was clear: credit spreads dominated. Debit strategies had negative expected value and were manually disabled in code after analysis confirmed what the learner was already showing — selling premium beats buying it.
SPX Options (Paper)
81.9%
Bull put spread win rate (104W-23L)
SPX alone: $2,915 profit — carries the entire system
BTC / Deribit (Testnet)
91.8% vs 40.2%
Credit vs debit win rate
Independent validation on uncorrelated asset
Two uncorrelated markets. Same conclusion: credit spreads dominate.
10 markets, 10 playbooks
What works depends on where you are
The same strategy that prints money in low-vol contango will destroy you in fear momentum. The regime playbook classifies the current market into one of 10 environments and adjusts strategy, sizing, and thresholds accordingly.
When the system chooses not to trade
Knowing when to sit out
On high-impact event days, the system blocks iron condors on SPX — the strategy most exposed to sudden moves. Directional spreads stay open because they benefit from the volatility.
Events that trigger the gate
Knowing when NOT to trade is as important as knowing when to trade. The macro gate has been live since March 2026, blocking only the highest-risk strategy during the highest-risk moments.
AI reviewing AI
The math sees numbers. The LLM sees context.
Five times a day, the LLM reviews every signal, every headline, and every recent outcome — then decides if the math model's probability needs adjustment.
Trim probability
Reduce confidence on macro risk
Boost probability
Increase when signals align strongly
Pass through
Math model's call stands
Flag for skip
Override to no-trade on red flags
The LLM doesn't make the trade decision. It reviews the math model's decision with context the math can't see. It's a judgment layer, not a replacement.
A system that reviews its own homework
Every night at 10 PM, the system grades itself
The 10 PM cron reviews every trade from the day, recalibrates thresholds, and adjusts feature weights. After 20+ trades, it starts recommending optimizer threshold changes. The system gets sharper over time — not because I tune it, but because it tunes itself.
Initial values
Learned values
The scorecard
What the numbers look like
Win rate by entry window
Compact scorecard
| Paper trades (resolved) | 214 (169W-44L) |
| Paper win rate | 79.3% |
| Backtest win rate | 96.1% (1,299 trades) |
| Paper → Backtest gap | 79% vs 96% (execution reality) |
| Best window | 2:30 PM (90%) / 10:05 AM (86%) |
| Best strategy | Bull Put Spreads (81.9%) |
| SPX carries the system | $2,915 of $2,880 total PnL |
| Risk of ruin ($5K) | 0.11% |
| Median max drawdown | $963 |
By ticker
| Ticker | Win Rate | Avg PnL | Share |
|---|---|---|---|
| SPX | 71.4% | $46.27/trade | 101% |
| QQQ | 80.0% | $1.87/trade | 3% |
| IWM | 84.1% | ~$0/trade | 0% |
| SPY | 83.6% | -$1.95/trade | — |
What happens when a model meets the market
The reality gap: 96% → 79%
The backtester says 96.1% win rate. Paper trading says 79.3%. That 17-point gap is the most honest number on this page — it's what execution reality does to theoretical models. Slippage, intraday stops, timing, and regime shifts all eat into backtest performance.
1,248W-51L · No intraday stops · No execution friction
169W-44L · Position monitor active · Real market hours
The gap IS the insight. Any backtest that reports 96%+ win rates on credit spreads is structurally correct — 2% OTM placement makes that nearly guaranteed. The real question is how much of that survives execution. In our case: ~79%. That's still profitable, but it's a different conversation than "98% win rate."
Macro impact
The event day tax
FOMC, CPI, PPI, PCE, NFP days are where the system bleeds. The macro gate now blocks iron condors on these days, but the data shows why it matters.
Normal days
+$3,812
High-event days
-$933
Without event days, paper PnL would be $3,812 instead of $2,880. The macro gate is the single biggest structural improvement since launch.
What the model actually pays attention to
SHAP feature importance
Not all signals are equal. SHAP analysis on 1,278 trades reveals what drives win/loss predictions. VIX dominates everything — the market's fear gauge is the single most predictive feature.
The top 3 features are all volatility-related. The system's edge is fundamentally a vol regime classifier — everything else is noise reduction.
What Went Wrong
The $5 lesson
The system looked invincible — until we changed one parameter. Narrowing the spread width from $10 to $5 turned a perfect record into a losing streak overnight.
$10 wide spreads
$5 wide spreads
Root cause: The backtester showed 99.1% win rate but modeled hold-to-expiry. Reality has intraday volatility. A stop loss at 50% of max loss was too tight for narrower spreads — the position monitor killed trades that would have expired profitable.
The system's 97% structural win rate comes from 2% OTM placement, not the direction model. The model adds ~2 percentage points of alpha. Understanding that distinction is the difference between confidence and hubris.
Honest constraints
- Paper trading — not live capital yet.
- Fee drag matters — only SPX is live-viable. QQQ/IWM fees exceed 100% of credit at $1-wide.
- Model contamination — underlying LLMs trained on post-period data.
- 97% is structural — 2% OTM placement, not model alpha.
What the system proves
- Credit > debit is structural, not accidental.
- Regime awareness adds measurable value.
- Time-of-day matters — 8:55 AM is death, 10:05 AM is gold.
- Self-learning works — thresholds drifted to better values on their own.
What's next
- Live trading transition with real capital.
- Position sizing optimization via Kelly criterion.
- Multi-timeframe regime detection.
- Per-regime logistic models instead of global weights.
Bottom line
The point is not the win rate. It's the convergence.
The system started with every strategy available. Through self-learning and regime awareness, it converged to selling premium via credit spreads. That same convergence happened independently on BTC/Deribit — two uncorrelated markets, same conclusion.
When two markets point the same direction through independent analysis, the signal isn't in any individual trade — it's in the pattern the system discovered on its own.
That's a more interesting use of AI in trading than "make the computer buy and sell faster."
Social block 1
"I built an AI options system with no strategy preference. After 1,500+ trades across backtest and paper, it converged on credit spreads. The same convergence happened independently on BTC/Deribit. Two uncorrelated markets. Same conclusion."
ankitchandola.com
Social block 2
"The most honest number on the page: 96% backtest win rate drops to 79% in paper trading. That 17-point gap is what execution reality does to theoretical models. Understanding the gap matters more than the win rate."
ankitchandola.com
Social block 3
"Event days cost us $933 while normal days made $3,812. The macro gate — blocking iron condors on FOMC/CPI days — is the single biggest structural improvement. Knowing when NOT to trade is the edge."
ankitchandola.com
Architecture, data sources, bugs, model hierarchy, and the full hardening path.
▸ Full technical details
System architecture
Model hierarchy
Opus
Principal
Conversation, QC, 10 PM learner, judgment calls
Sonnet
Workhorse
Intraday crons, LLM review, trade execution, Discord posts
Codex 5.3
Coder
Feature development only — never runs crons or analysis
Data sources
Yahoo Finance
SPX price, VIX, basic signals
ThetaData
Options EOD, strikes, expirations
Polygon.io
SPX options chain, economic calendar
IB Gateway
Live quotes, news, order execution
Brave Search
Real-time macro context, news
Internal
Paper trade log, calibration state
Daily cron schedule (Mon–Fri)
9:35 AM CT
Intraday scan #1
11:00 AM CT
Intraday scan #2
12:30 PM CT
Intraday scan #3
2:00 PM CT
Intraday scan #4
3:30 PM CT
Intraday scan #5
10:00 PM CT
Nightly learner
10 bugs found in the initial audit
From a double Codex audit + Opus deep QC session. All fixed before paper trading began.
| # | Bug | Cause | Fix |
|---|---|---|---|
| 1 | Iron butterfly unreachable | Precedence bug — checked >60 after >30 | Reordered strategy precedence |
| 2 | Yahoo Finance empty | No User-Agent header | Added Chrome UA header |
| 3 | Brave Search 422 | Missing Accept header + invalid freshness | Fixed request headers |
| 4 | Polygon API key missing | MASSIVE_API_KEY alias needed | Added env alias mapping |
| 5 | Self-learning mismatch | Thresholds 0.65/0.35 vs live 0.55/0.45 | Synced calibration values |
| 6 | Missing holiday | Juneteenth 2026 not in NYSE calendar | Added to static calendar |
| 7 | Preflight crash | Guards crash on missing files | Made fail-soft with defaults |
| 8 | Learning loop broken | Daily review not linked to optimizer | Connected pipeline end-to-end |
| 9 | LLM override unclamped | adjustedProbUp could exceed 1.0 | Added 0–1 clamping |
| 10 | Discord formatting | ERROR messages incomplete | Full error context in output |
Structural limitation: The underlying LLMs were trained on data that includes post-period market outcomes. Latent hindsight contamination remains a structural issue. This is a proof-of-concept for AI-directed options trading, not a validated alpha generator. The 97% structural win rate comes from OTM placement — the model adds marginal alpha on top. I state this because intellectual honesty matters more than impressive numbers.
Read deeper
Full technical details
Architecture, data sources, model hierarchy, bugs found, and daily cron schedule.
Related
Multi-Agent Intelligence
When 100 AI analysts debate a company's future — the Peloton experiment.
Explore more
More writing
Essays on AI, investing, and building systems that work.