Multi-Agent Intelligence
A new way to analyze a business: let 100 agents debate until the signal appears.
Not a faster memo. A different analytical system — one where many perspectives start with data, argue across rounds, and either converge on fragility or preserve real disagreement.
The key layperson takeaway: when agents with opposing incentives — a dedicated subscriber and a short seller — still turn bearish after five rounds of debate, that convergence under adversarial pressure is itself the signal. It suggests the business narrative requires too many things to stay true at once.
The case study
I tested the system on Peloton as of mid-2021, with Lululemon as a control. Peloton later unraveled. Lululemon did not. The value of the experiment is that the panel treated them differently before that difference was fully obvious in the market.
Peloton
Avg confidence: 8.9 / 10 · 0 agents at max conviction
Lululemon control
Avg confidence: 8.12 / 10 · only 0 agents at 10/10
How structured adversarial debate produces signal
Structured social debate, not just a bigger panel
The core idea: spawn many agents, each starting with real data and a distinct incentive structure. Then force them to argue over multiple rounds. Agents must engage with counterarguments, update their confidence, and justify every shift. The output isn't a single opinion — it's a convergence map that reveals whether fragility is real or just noise.
Diverse starting positions
Each agent gets a distinct persona with built-in incentives — some want the bull case, others are structurally skeptical. No one starts neutral.
Shared evidence, rotated views
All agents draw from the same pre-cutoff data, but see partially rotated subsets. 60% core evidence stays constant; 40% rotates per agent and round.
Adversarial rounds
Over 5 rounds, agents see competing arguments and must update. They can't just restate — evolution rules force genuine engagement with counterpoints.
Convergence as signal
If agents with opposing incentives converge, that's meaningful. If they stay split despite pressure, that's meaningful too. Both outcomes are informative.
How one agent experiences the debate
Persona assigned
Incentive + role
Evidence pack
5–8 pre-cutoff sources
Independent thesis
Round 1 stance
Debate rounds 2–4
Counterarguments + updates
Final commitment
Stance + confidence
100 agents, 8 archetypes, each with distinct incentives and data focus
Emotionally invested in the product and community. Wants Peloton to succeed.
Data focus: Personal usage data, community engagement, product satisfaction, habit strength
Bought in during COVID lockdowns. Asking: was this a habit or a phase?
Data focus: Subscription renewal patterns, gym reopening data, usage frequency trends
Already drifting away. Looking for reasons to stay — or permission to leave.
Data focus: Cancellation rates, competitive alternatives, price sensitivity signals
Evaluating whether the premium price point justifies itself vs alternatives.
Data focus: Product comparisons, pricing data, consumer reviews, value perception
Sees Peloton as a competitor. Financially motivated to find weakness.
Data focus: Market share data, reopening trends, consumer behavior shifts post-lockdown
Needs a defensible rating. Career risk from being wrong in either direction.
Data focus: Financial statements, TAM models, comparable company analysis, guidance
Looking for the thesis that breaks. Financially motivated to find fragility.
Data focus: Cash flow analysis, insider selling, competitive dynamics, overvaluation signals
Needs a compelling narrative. Drawn to both hype and controversy equally.
Data focus: Product launches, executive statements, consumer sentiment, cultural trends
Panel composition
10×10 mapWhy the mix matters
The biggest cohorts were Pandemic Converts and Sell-Side Analysts. That was intentional: one group pressure-tests demand durability, the other pressure-tests the narrative. I used the same mix on both companies, so the difference in output comes from the business and evidence — not from changing the room.
What each agent received per turn
The verdict
Peloton converged. Lululemon stayed split.
The same system, same archetypes, same process. One turned into a wall of red. The other preserved genuine disagreement.
Confidence: 8.9/10 avg · 36 agents at maximum conviction
Confidence: 8.12/10 avg · only 2 agents at 10/10
Peloton final map
10×10Each cell = one agent. Hover or tap for stance.
Lululemon final map
10×10The control stays genuinely mixed.
Convergence flow
One converged. The other didn't.
Peloton converged twice. The control stayed split. That is the more faithful picture of what was actually run.
Round by round
How the debate unfolded
Click a round to see what happened at each stage.
Independent orientation
No social signal yet. Agents form views from evidence + persona alone. Already ~60% bearish on Peloton — the evidence pack does the work before any debate begins.
~60–65% bearish
~50/50 split
Bearish % at round 1
Structural health comparison
Where the two businesses differed
Not whether one ticker went up or down — whether the business was fragile across multiple load-bearing dimensions.
Why this matters
A new paradigm for business analysis
Traditional diligence: a small team, finite calls, one memo under time pressure. The output is useful but narrow — a few people, one narrative, one recommendation.
This tests a different model. Instead of one analyst, create a structured room of 100 synthetic stakeholders with different incentives and let them argue. The output isn't a memo — it's a convergence signal, a fragility map, and a record of where disagreement held or broke.
That is the key layperson takeaway: when agents with opposing incentives — a dedicated subscriber and a short seller — still turn bearish after five rounds of debate, that convergence under adversarial pressure is itself the signal. It suggests the business narrative requires too many things to stay true at once.
Systems like this don't replace investors. They change the shape of good judgment. The job becomes directing attention, validating reality, and knowing which arguments matter — while the system does the parallel stress-testing underneath.
Old model vs. new model
| Traditional | Synthetic panel | |
|---|---|---|
| Perspectives | 2–5 analysts | 100 agents, 8 archetypes |
| Process | Sequential drafts | 5 adversarial rounds |
| Output | Narrative | Convergence signal + fragility map |
| Time | Days–weeks | Hours |
What happened in reality
Peloton's narrative broke. Demand softened, margins compressed, credibility eroded — the stock fell ~95% from peak. Lululemon kept compounding through the same environment.
Why Peloton converged
- Demand durability: pandemic pull-forward, not a durable mass-market habit.
- Unit economics: premium hardware + weakening cohorts is a bad combination.
- Management credibility: narrative required trust; trust weakened fast.
Why the control matters
- Brand power: pricing power and community stayed real.
- International runway: durable growth vectors beyond the pandemic.
- No collapse: the system stayed split — it's not a doom machine.
Scope & limitations
- This is not proof that AI cleanly predicted Peloton.
- Models contain post-2021 world knowledge at the weight level — contamination is structural.
- A proof-of-concept for fragility mapping, not validated forecasting.
Reproducibility
Peloton did it twice
Not proof of calibration. But harder to dismiss as one lucky prompt artifact.
Compact scorecard
| Metric | Peloton | Lululemon |
|---|---|---|
| Bearish (R5) | 91 | 47 |
| Bullish (R5) | 2 | 46 |
| Avg confidence | 8.9 / 10 | 8.12 / 10 |
| Max conviction | 36 agents | 2 agents |
| Consensus | Strong convergence | Split |
Bottom line
The point is not prediction. It is attention direction.
A structured adversarial panel can push attention toward the right failure modes earlier than a static memo often does. For Peloton, the simulation repeatedly surfaced a business whose story required too many things to stay true at once. For Lululemon, it surfaced real debate without the same multi-vector collapse.
That is why I think convergence under adversarial pressure matters. When agents with opposing incentives still end up in the same place after repeated debate, the disagreement itself has been stress-tested away. What remains is often the load-bearing weakness in the business.
That is a much more interesting use of AI in diligence than "write me a faster memo."
Social block 1
"What happens when 100 AI analysts with competing incentives debate a company's future? For Peloton, they converged on fragility. For the control, they stayed genuinely split. The signal was in the difference."
ankitchandola.com
Social block 2
"I didn't build this to predict stock prices. I built it to find where a business narrative requires too many things to stay true at once. That's a different — and more useful — question."
ankitchandola.com
Social block 3
"91 out of 100 synthetic analysts turned bearish on Peloton. The same system stayed split on Lululemon. Not prediction. Fragility mapping. The most useful AI in analysis isn't the one that writes a faster memo — it's the one that tells you which questions your team isn't asking."
ankitchandola.com
Social block 4
"The future of business analysis: not one analyst writing a memo, but 100 perspectives arguing across 5 rounds until fragility either surfaces or doesn't. The convergence is the signal."
ankitchandola.com
Methodology, model choices, hardening path, and full limitations.
▸ Full methodology and limitations
Why Peloton, and why this way
I chose Peloton because the eventual breakdown was not a simple fraud story — it was a multi-stakeholder collapse across demand durability, unit economics, narrative fragility, operating execution, and management credibility. That makes it a better probe than a cleaner accounting problem.
I paired it with Lululemon because I needed a control in adjacent consumer/wellness territory that did not rely on the same hardware-plus-subscription narrative. If both had collapsed into the same profile, the experiment would have been useless.
| Time gate | All sources published on or before 2021-06-30 |
| Control | Lululemon through identical framework |
| Framing | Fragility mapping, not prediction |
| Disclosure | Model weights contain post-cutoff knowledge |
| Runs | 2× Peloton scale + 1× Lululemon control |
How the experiment evolved
Pilots (v1.0)
Small live runs to validate architecture: coherent outputs, citation behavior, actual view evolution.
Hardening (v1.1)
Citation hygiene, deterministic source rotation, prior-state compression, evolution rules, sharper archetype prompts.
Control validation
Clean Lululemon control was the key checkpoint — if it had collapsed too, the framework would have been a doom machine.
Scale runs
Peloton twice at 100 agents, Lululemon once. Repeat confirmed core signal lived in the evidence pack.
Hardening changelog
| Area | v1.0 | v1.1 fix |
|---|---|---|
| Citations | Formatting artifacts triggered false flags | Strict source-ID normalization |
| Source assignment | Random allocation | Deterministic rotation (60/40) |
| Memory | Agents faked consistency | Richer prior-state compression |
| Reasoning | Restatement instead of updating | Stronger evolution rules |
| Personas | Consumer types sounded similar | Sharper archetype prompts |
Models and runtime
Model
gemini-3-flash-preview
Temperature
0.4
Agents
100
Rounds
5
Cutoff
2021-06-30
Social signal
Round 4 only
Peloton runs
2
Lulu runs
1
Key lessons
The control matters as much as the flagship
Peloton alone = dramatic story. Peloton + a control that stayed split = methodology worth taking seriously.
Process discipline beats agent count
Citation hygiene, deterministic rotation, and prior-state compression mattered more than swarm size.
Best used for attention direction
The win: "the system kept surfacing the same load-bearing fragilities that later mattered."
Structural limitation: Even with a clean prompt-level cutoff, the underlying models were trained after 2021. Latent hindsight contamination remains a structural issue. This is a strong proof-of-concept for fragility mapping, not a fully validated forecasting system. I state this not as a disclaimer, but because getting this right matters for the methodology to be taken seriously.