AI-Directed Options Trading

The system that taught itself to sell volatility.

An AI options trading system that started with no strategy preference. After processing thousands of signals across 26 market features, 27 regime classifications, and nightly self-learning cycles, it converged on credit spreads — not because I told it to, but because the data did.

Paper Trading

0W-0L

79.3% win rate · $2,880 PnL · 214 resolved trades

Backtested

96.1% win rate · 1,299 trades · 365-day lookback

Strategy Convergence

Credit Spreads

Debits disabled after analysis · confirmed across SPX + BTC

⚠️ Paper trading results · Not live capital · Past performance ≠ future results

The Architecture

How three layers produce one decision

Three layers, one decision

Every trade passes through three independent layers. Each can refine, gate, or veto the signal from the layer above. The output isn't a simple buy/sell — it's a probability that's been stress-tested through backtested constants, regime awareness, and real-time context.

📊

Layer 1 365-day backtest

AutoResearcher

Backtested structural constants that define the system's baseline thresholds. These rarely change — they're the foundation.

Bullish: 0.64 Bearish: 0.50 Neutral: 0.42–0.58

🎯

Layer 2 10 regimes

Regime Playbook

Adjusts thresholds and sizing per market environment. What works in low-vol contango doesn't work in fear momentum.

Sizing: 0.3×–1.5× Threshold ±0.05 Strategy gating

🧠

Layer 3 Real-time

LLM Override

Reviews the math model's output with context it can't see — news, macro events, recent trade outcomes. Can trim, boost, or pass through.

64.7% → 59% Macro caution 30 min freshness

Each layer can veto the one above it. The LLM sees everything the math model sees, plus news, macro calendar, and recent trade outcomes. A 64.7% bullish probability can become 59% if FOMC is in two days.

The Signal Stack

26 features converge into one probability (13 core shown)

26 features, one probability

Every signal flows into the aggregator independently. No single feature dominates — the system weighs them in combination, recalibrated nightly by the self-learner.

Output

0.64

📈

VIX Level

Current volatility index — the market's fear gauge

📉

IV Rank

Implied vol percentile over 52 weeks

⚡

IV-HV Spread

Implied vs realized vol gap — over/underpriced options

📊

Realized Vol 5d

Short-term actual price movement

📊

Realized Vol 20d

Medium-term actual price movement

🌡️

Vol Regime

Cheap · Fair · Expensive classification

🧲

GEX

Gamma exposure — dealer hedging pressure

📐

Term Structure

VIX contango vs backwardation

💪

RSI

Relative strength momentum signal

📈

MACD

Trend direction and strength

⚖️

Put/Call Ratio

Market sentiment gauge

🗓️

Macro Events

FOMC, CPI, NFP, PCE calendar

📰

News Headlines

Real-time market news via IB + Brave Search

The Convergence

How credit spreads won

The system didn't start with a preference. It found one.

Early on, the system tried everything — iron condors, debit spreads, bull puts, bear calls. The nightly learner reviewed every outcome and recalibrated. Over hundreds of trades, the data was clear: credit spreads dominated. Debit strategies had negative expected value and were manually disabled in code after analysis confirmed what the learner was already showing — selling premium beats buying it.

Bull Put Spreads Dominant

$11.82/trade 81.9%

81.9%

Iron Condors High reward, volatile

$16.03/trade 75.6%

75.6%

Debit Spreads Disabled in code

Negative EV 40.2%

SPX Options (Paper)

81.9%

Bull put spread win rate (104W-23L)

SPX alone: $2,915 profit — carries the entire system

BTC / Deribit (Testnet)

91.8% vs 40.2%

Credit vs debit win rate

Independent validation on uncorrelated asset

Two uncorrelated markets. Same conclusion: credit spreads dominate.

The Regime Map

10 markets, 10 playbooks

What works depends on where you are

The same strategy that prints money in low-vol contango will destroy you in fear momentum. The regime playbook classifies the current market into one of 10 environments and adjusts strategy, sizing, and thresholds accordingly.

Full send Cautious Sit out / minimal

The News Gate

When the system chooses not to trade

Knowing when to sit out

On high-impact event days, the system blocks iron condors on SPX — the strategy most exposed to sudden moves. Directional spreads stay open because they benefit from the volatility.

Mon

✓

OPEN

Tue

✓

OPEN

Wed

🚫

FOMC IC blocked

Thu

✓

OPEN

Fri

✓

OPEN

Events that trigger the gate

FOMCCPINFPPCEGDP

Knowing when NOT to trade is as important as knowing when to trade. The macro gate has been live since March 2026, blocking only the highest-risk strategy during the highest-risk moments.

The LLM Layer

AI reviewing AI

The math sees numbers. The LLM sees context.

Five times a day, the LLM reviews every signal, every headline, and every recent outcome — then decides if the math model's probability needs adjustment.

// What the LLM sees at decision time

Math probability: 0.647 (bullish)

VIX: 28.3 | IV Rank: 67% | Vol regime: expensive

GEX: -5.51 | Term structure: backwardation

Macro: FOMC in 2 days

News: "SPX drops 1.5% on oil spike"

Recent: 12W-0L today | Regime: vix_high__backwardation

LLM adjustment: 0.647 → 0.590 (trimmed, macro caution)

📉

Trim probability

Reduce confidence on macro risk

📈

Boost probability

Increase when signals align strongly

➡️

Pass through

Math model's call stands

⛔

Flag for skip

Override to no-trade on red flags

The LLM doesn't make the trade decision. It reviews the math model's decision with context the math can't see. It's a judgment layer, not a replacement.

The Nightly Learner

A system that reviews its own homework

Every night at 10 PM, the system grades itself

The 10 PM cron reviews every trade from the day, recalibrates thresholds, and adjusts feature weights. After 20+ trades, it starts recommending optimizer threshold changes. The system gets sharper over time — not because I tune it, but because it tunes itself.

📊

Trade

✓

Resolve

🧠

Learn

⚙️

Adjust

0.55

threshold

Initial values

Bullish threshold 0.55

Bearish threshold 0.45

IC IV threshold 50

Max contracts 1

Learned values

Bullish threshold 0.64

Bearish threshold 0.37

IC IV threshold 37

Max contracts 2

Results

The scorecard

What the numbers look like

Win rate by entry window

8:55 AM Weakest — morning chop

68%

10:05 AM The sweet spot

85%

11:30 AM Midday

73%

1:00 PM Gamma acceleration

73%

2:30 PM Late day edge

90%

Compact scorecard

Paper trades (resolved)	214 (169W-44L)
Paper win rate	79.3%
Backtest win rate	96.1% (1,299 trades)
Paper → Backtest gap	79% vs 96% (execution reality)
Best window	2:30 PM (90%) / 10:05 AM (86%)
Best strategy	Bull Put Spreads (81.9%)
SPX carries the system	$2,915 of $2,880 total PnL
Risk of ruin ($5K)	0.11%
Median max drawdown	$963

By ticker

Ticker	Win Rate	Avg PnL	Share
SPX	71.4%	$46.27/trade	101%
QQQ	80.0%	$1.87/trade	3%
IWM	84.1%	~$0/trade	0%
SPY	83.6%	-$1.95/trade	—

Theory vs Reality

What happens when a model meets the market

The reality gap: 96% → 79%

The backtester says 96.1% win rate. Paper trading says 79.3%. That 17-point gap is the most honest number on this page — it's what execution reality does to theoretical models. Slippage, intraday stops, timing, and regime shifts all eat into backtest performance.

Backtest (365-day, hold-to-expiry) 96.1%

1,248W-51L · No intraday stops · No execution friction

Paper trading (live execution, real stops) 79.3%

169W-44L · Position monitor active · Real market hours

The gap IS the insight. Any backtest that reports 96%+ win rates on credit spreads is structurally correct — 2% OTM placement makes that nearly guaranteed. The real question is how much of that survives execution. In our case: ~79%. That's still profitable, but it's a different conversation than "98% win rate."

Macro impact

The event day tax

FOMC, CPI, PPI, PCE, NFP days are where the system bleeds. The macro gate now blocks iron condors on these days, but the data shows why it matters.

Normal days

+$3,812

131W-27L 82.9% WR

High-event days

-$933

38W-17L 69.1% WR

Without event days, paper PnL would be $3,812 instead of $2,880. The macro gate is the single biggest structural improvement since launch.

What the model actually pays attention to

SHAP feature importance

Not all signals are equal. SHAP analysis on 1,278 trades reveals what drives win/loss predictions. VIX dominates everything — the market's fear gauge is the single most predictive feature.

VIX 0.832

VIX3M Ratio (term structure) 0.644

IV-HV Spread 0.601

Direction Probability 0.373

Realized Vol 5d 0.366

The top 3 features are all volatility-related. The system's edge is fundamentally a vol regime classifier — everything else is noise reduction.

What Went Wrong

The $5 lesson

The system looked invincible — until we changed one parameter. Narrowing the spread width from $10 to $5 turned a perfect record into a losing streak overnight.

$10 wide spreads

Record22W-0L

PnL+$6,809

Stop loss$350 (50% of $700)

ResultNever triggered

$5 wide spreads

Record3W-5L

PnL-$704.50

Stop loss$150 (50% of $300)

ResultTriggered by normal vol

Root cause: The backtester showed 99.1% win rate but modeled hold-to-expiry. Reality has intraday volatility. A stop loss at 50% of max loss was too tight for narrower spreads — the position monitor killed trades that would have expired profitable.

The system's 97% structural win rate comes from 2% OTM placement, not the direction model. The model adds ~2 percentage points of alpha. Understanding that distinction is the difference between confidence and hubris.

Honest constraints

Paper trading — not live capital yet.
Fee drag matters — only SPX is live-viable. QQQ/IWM fees exceed 100% of credit at $1-wide.
Model contamination — underlying LLMs trained on post-period data.
97% is structural — 2% OTM placement, not model alpha.

What the system proves

Credit > debit is structural, not accidental.
Regime awareness adds measurable value.
Time-of-day matters — 8:55 AM is death, 10:05 AM is gold.
Self-learning works — thresholds drifted to better values on their own.

What's next

Live trading transition with real capital.
Position sizing optimization via Kelly criterion.
Multi-timeframe regime detection.
Per-regime logistic models instead of global weights.

Bottom line

The point is not the win rate. It's the convergence.

The system started with every strategy available. Through self-learning and regime awareness, it converged to selling premium via credit spreads. That same convergence happened independently on BTC/Deribit — two uncorrelated markets, same conclusion.

When two markets point the same direction through independent analysis, the signal isn't in any individual trade — it's in the pattern the system discovered on its own.

That's a more interesting use of AI in trading than "make the computer buy and sell faster."

Share-ready

Social block 1

"I built an AI options system with no strategy preference. After 1,500+ trades across backtest and paper, it converged on credit spreads. The same convergence happened independently on BTC/Deribit. Two uncorrelated markets. Same conclusion."

ankitchandola.com

Social block 2

"The most honest number on the page: 96% backtest win rate drops to 79% in paper trading. That 17-point gap is what execution reality does to theoretical models. Understanding the gap matters more than the win rate."

ankitchandola.com

Social block 3

"Event days cost us $933 while normal days made $3,812. The macro gate — blocking iron condors on FOMC/CPI days — is the single biggest structural improvement. Knowing when NOT to trade is the edge."

ankitchandola.com

Deep Dive

Architecture, data sources, bugs, model hierarchy, and the full hardening path.

▸ Full technical details

System architecture

// Signal flow

Yahoo Finance + ThetaData + Polygon → signal_aggregator.js

IB Gateway + Brave Search → macro_context.js + ib_news.js

All 13 signals → strategy_engine.js → direction + strategy selection

Strategy → llm_override.js → probability adjustment

Final decision → executor.js → IB Gateway (paper orders)

// Learning loop

10 PM → trade_resolver.js → self_learner.js → threshold updates

Model hierarchy

Opus

Principal

Conversation, QC, 10 PM learner, judgment calls

Sonnet

Workhorse

Intraday crons, LLM review, trade execution, Discord posts

Codex 5.3

Coder

Feature development only — never runs crons or analysis

Data sources

Yahoo Finance

SPX price, VIX, basic signals

ThetaData

Options EOD, strikes, expirations

Polygon.io

SPX options chain, economic calendar

IB Gateway

Live quotes, news, order execution

Brave Search

Real-time macro context, news

Internal

Paper trade log, calibration state

Daily cron schedule (Mon–Fri)

9:35 AM CT

Intraday scan #1

11:00 AM CT

Intraday scan #2

12:30 PM CT

Intraday scan #3

2:00 PM CT

Intraday scan #4

3:30 PM CT

Intraday scan #5

10:00 PM CT

Nightly learner

10 bugs found in the initial audit

From a double Codex audit + Opus deep QC session. All fixed before paper trading began.

#	Bug	Cause	Fix
1	Iron butterfly unreachable	Precedence bug — checked >60 after >30	Reordered strategy precedence
2	Yahoo Finance empty	No User-Agent header	Added Chrome UA header
3	Brave Search 422	Missing Accept header + invalid freshness	Fixed request headers
4	Polygon API key missing	MASSIVE_API_KEY alias needed	Added env alias mapping
5	Self-learning mismatch	Thresholds 0.65/0.35 vs live 0.55/0.45	Synced calibration values
6	Missing holiday	Juneteenth 2026 not in NYSE calendar	Added to static calendar
7	Preflight crash	Guards crash on missing files	Made fail-soft with defaults
8	Learning loop broken	Daily review not linked to optimizer	Connected pipeline end-to-end
9	LLM override unclamped	adjustedProbUp could exceed 1.0	Added 0–1 clamping
10	Discord formatting	ERROR messages incomplete	Full error context in output

Structural limitation: The underlying LLMs were trained on data that includes post-period market outcomes. Latent hindsight contamination remains a structural issue. This is a proof-of-concept for AI-directed options trading, not a validated alpha generator. The 97% structural win rate comes from OTM placement — the model adds marginal alpha on top. I state this because intellectual honesty matters more than impressive numbers.

Read deeper

Full technical details

Architecture, data sources, model hierarchy, bugs found, and daily cron schedule.

Multi-Agent Intelligence

When 100 AI analysts debate a company's future — the Peloton experiment.

Explore more

More writing

Essays on AI, investing, and building systems that work.