Multi-Agent Intelligence

A new way to analyze a business: let 100 agents debate until the signal appears.

Not a faster memo. A different analytical system — one where many perspectives start with data, argue across rounds, and either converge on fragility or preserve real disagreement.

The key layperson takeaway: when agents with opposing incentives — a dedicated subscriber and a short seller — still turn bearish after five rounds of debate, that convergence under adversarial pressure is itself the signal. It suggests the business narrative requires too many things to stay true at once.

The case study

I tested the system on Peloton as of mid-2021, with Lululemon as a control. Peloton later unraveled. Lululemon did not. The value of the experiment is that the panel treated them differently before that difference was fully obvious in the market.

Peloton

0 % bearish at round 5

Avg confidence: 8.9 / 10 · 0 agents at max conviction

Lululemon control

0 / 0 bearish / bullish split

Avg confidence: 8.12 / 10 · only 0 agents at 10/10

⚠️ Data cutoff: June 30, 2021 · Models carry post-2021 weight knowledge — I treat this as fragility mapping, not clean prediction
The Method

How structured adversarial debate produces signal

Structured social debate, not just a bigger panel

The core idea: spawn many agents, each starting with real data and a distinct incentive structure. Then force them to argue over multiple rounds. Agents must engage with counterarguments, update their confidence, and justify every shift. The output isn't a single opinion — it's a convergence map that reveals whether fragility is real or just noise.

🧬

Diverse starting positions

Each agent gets a distinct persona with built-in incentives — some want the bull case, others are structurally skeptical. No one starts neutral.

📊

Shared evidence, rotated views

All agents draw from the same pre-cutoff data, but see partially rotated subsets. 60% core evidence stays constant; 40% rotates per agent and round.

⚔️

Adversarial rounds

Over 5 rounds, agents see competing arguments and must update. They can't just restate — evolution rules force genuine engagement with counterpoints.

🔍

Convergence as signal

If agents with opposing incentives converge, that's meaningful. If they stay split despite pressure, that's meaningful too. Both outcomes are informative.

How one agent experiences the debate

1

Persona assigned

Incentive + role

2

Evidence pack

5–8 pre-cutoff sources

3

Independent thesis

Round 1 stance

4

Debate rounds 2–4

Counterarguments + updates

5

Final commitment

Stance + confidence

The Panel

100 agents, 8 archetypes, each with distinct incentives and data focus

Dedicated Subscriber 16 agents

Emotionally invested in the product and community. Wants Peloton to succeed.

Data focus: Personal usage data, community engagement, product satisfaction, habit strength

Pandemic Convert 20 agents

Bought in during COVID lockdowns. Asking: was this a habit or a phase?

Data focus: Subscription renewal patterns, gym reopening data, usage frequency trends

Lapsed / At-Risk User 8 agents

Already drifting away. Looking for reasons to stay — or permission to leave.

Data focus: Cancellation rates, competitive alternatives, price sensitivity signals

Prospective Buyer 8 agents

Evaluating whether the premium price point justifies itself vs alternatives.

Data focus: Product comparisons, pricing data, consumer reviews, value perception

Gym Operator / Competitor 8 agents

Sees Peloton as a competitor. Financially motivated to find weakness.

Data focus: Market share data, reopening trends, consumer behavior shifts post-lockdown

Sell-Side Analyst 20 agents

Needs a defensible rating. Career risk from being wrong in either direction.

Data focus: Financial statements, TAM models, comparable company analysis, guidance

Short Seller / Skeptic 10 agents

Looking for the thesis that breaks. Financially motivated to find fragility.

Data focus: Cash flow analysis, insider selling, competitive dynamics, overvaluation signals

Consumer-Tech Journalist 10 agents

Needs a compelling narrative. Drawn to both hype and controversy equally.

Data focus: Product launches, executive statements, consumer sentiment, cultural trends

Panel composition

10×10 map
A1 A2 A3 A4 A5 A6 A7 A8

Why the mix matters

The biggest cohorts were Pandemic Converts and Sell-Side Analysts. That was intentional: one group pressure-tests demand durability, the other pressure-tests the narrative. I used the same mix on both companies, so the difference in output comes from the business and evidence — not from changing the room.

What each agent received per turn

Persona prompt: Stable role + incentive structure that persists across all rounds
Evidence subset: 5–8 pre-cutoff sources, partially rotated (60% core / 40% rotating)
Prior-state memory: Structured compression of previous stance, confidence, evidence used, and open questions
Panel summary: Anonymized group-level summary — injected in round 4 only
Results

The verdict

Peloton converged. Lululemon stayed split.

The same system, same archetypes, same process. One turned into a wall of red. The other preserved genuine disagreement.

Peloton — round 5 stance distribution 91 bearish · 2 bullish · 3 conflicted · 4 neutral
91%

Confidence: 8.9/10 avg · 36 agents at maximum conviction

Lululemon — round 5 stance distribution 47 bearish · 46 bullish · 5 conflicted · 2 neutral
47%
46%

Confidence: 8.12/10 avg · only 2 agents at 10/10

Peloton final map

10×10

Each cell = one agent. Hover or tap for stance.

Lululemon final map

10×10

The control stays genuinely mixed.

Bearish Bullish Conflicted Neutral

Convergence flow

One converged. The other didn't.

Peloton converged twice. The control stayed split. That is the more faithful picture of what was actually run.

0% 25% 50% 75% 100% R1R2R3R4R5 62 73 82 88 91 34 71 84 87 91 Peloton run 1 Peloton run 2 (R4–R5 replication) Control

Round by round

How the debate unfolded

Click a round to see what happened at each stage.

Independent orientation

No social signal yet. Agents form views from evidence + persona alone. Already ~60% bearish on Peloton — the evidence pack does the work before any debate begins.

Peloton

~60–65% bearish

Lululemon

~50/50 split

Bearish % at round 1

Peloton62%
Control bearish51%
Control bullish45%

Structural health comparison

Where the two businesses differed

Not whether one ticker went up or down — whether the business was fragile across multiple load-bearing dimensions.

Demand durability
Is the core demand pull-forward or structural?
Peloton
Critical
Lululemon
Non-issue
Unit economics
Hardware + subscription margin trajectory
Peloton
Critical
Lululemon
Non-issue
Narrative fragility
How many things must stay true for the story to hold?
Peloton
Critical
Lululemon
Non-issue
Competitive moat
Defensibility against well-funded alternatives
Peloton
Severe
Lululemon
Minor
Operational execution
Supply chain, logistics, cost structure discipline
Peloton
Critical
Lululemon
Minor
Customer sentiment
NPS trajectory and churn signals
Peloton
Moderate-Severe
Lululemon
Non-issue
Management credibility
Trust in leadership to navigate transition
Peloton
Critical
Lululemon
Non-issue

Why this matters

A new paradigm for business analysis

Traditional diligence: a small team, finite calls, one memo under time pressure. The output is useful but narrow — a few people, one narrative, one recommendation.

This tests a different model. Instead of one analyst, create a structured room of 100 synthetic stakeholders with different incentives and let them argue. The output isn't a memo — it's a convergence signal, a fragility map, and a record of where disagreement held or broke.

That is the key layperson takeaway: when agents with opposing incentives — a dedicated subscriber and a short seller — still turn bearish after five rounds of debate, that convergence under adversarial pressure is itself the signal. It suggests the business narrative requires too many things to stay true at once.

Systems like this don't replace investors. They change the shape of good judgment. The job becomes directing attention, validating reality, and knowing which arguments matter — while the system does the parallel stress-testing underneath.

Old model vs. new model

Traditional Synthetic panel
Perspectives 2–5 analysts 100 agents, 8 archetypes
Process Sequential drafts 5 adversarial rounds
Output Narrative Convergence signal + fragility map
Time Days–weeks Hours

What happened in reality

Peloton's narrative broke. Demand softened, margins compressed, credibility eroded — the stock fell ~95% from peak. Lululemon kept compounding through the same environment.

Why Peloton converged

  • Demand durability: pandemic pull-forward, not a durable mass-market habit.
  • Unit economics: premium hardware + weakening cohorts is a bad combination.
  • Management credibility: narrative required trust; trust weakened fast.

Why the control matters

  • Brand power: pricing power and community stayed real.
  • International runway: durable growth vectors beyond the pandemic.
  • No collapse: the system stayed split — it's not a doom machine.

Scope & limitations

  • This is not proof that AI cleanly predicted Peloton.
  • Models contain post-2021 world knowledge at the weight level — contamination is structural.
  • A proof-of-concept for fragility mapping, not validated forecasting.

Reproducibility

Peloton did it twice

> RUN 2 / same setup / tightened round-4 framing
> Round 4: 84 / 100 bearish
> Round 5: 91 / 100 bearish
> The evidence pack, not just phrasing, drove convergence

Not proof of calibration. But harder to dismiss as one lucky prompt artifact.

Compact scorecard

MetricPelotonLululemon
Bearish (R5) 91 47
Bullish (R5) 2 46
Avg confidence 8.9 / 10 8.12 / 10
Max conviction 36 agents 2 agents
Consensus Strong convergence Split

Bottom line

The point is not prediction. It is attention direction.

A structured adversarial panel can push attention toward the right failure modes earlier than a static memo often does. For Peloton, the simulation repeatedly surfaced a business whose story required too many things to stay true at once. For Lululemon, it surfaced real debate without the same multi-vector collapse.

That is why I think convergence under adversarial pressure matters. When agents with opposing incentives still end up in the same place after repeated debate, the disagreement itself has been stress-tested away. What remains is often the load-bearing weakness in the business.

That is a much more interesting use of AI in diligence than "write me a faster memo."

Share-ready

Social block 1

"What happens when 100 AI analysts with competing incentives debate a company's future? For Peloton, they converged on fragility. For the control, they stayed genuinely split. The signal was in the difference."

ankitchandola.com

Social block 2

"I didn't build this to predict stock prices. I built it to find where a business narrative requires too many things to stay true at once. That's a different — and more useful — question."

ankitchandola.com

Social block 3

"91 out of 100 synthetic analysts turned bearish on Peloton. The same system stayed split on Lululemon. Not prediction. Fragility mapping. The most useful AI in analysis isn't the one that writes a faster memo — it's the one that tells you which questions your team isn't asking."

ankitchandola.com

Social block 4

"The future of business analysis: not one analyst writing a memo, but 100 perspectives arguing across 5 rounds until fragility either surfaces or doesn't. The convergence is the signal."

ankitchandola.com

Deep Dive

Methodology, model choices, hardening path, and full limitations.

▸ Full methodology and limitations

Why Peloton, and why this way

I chose Peloton because the eventual breakdown was not a simple fraud story — it was a multi-stakeholder collapse across demand durability, unit economics, narrative fragility, operating execution, and management credibility. That makes it a better probe than a cleaner accounting problem.

I paired it with Lululemon because I needed a control in adjacent consumer/wellness territory that did not rely on the same hardware-plus-subscription narrative. If both had collapsed into the same profile, the experiment would have been useless.

Time gate All sources published on or before 2021-06-30
Control Lululemon through identical framework
Framing Fragility mapping, not prediction
Disclosure Model weights contain post-cutoff knowledge
Runs 2× Peloton scale + 1× Lululemon control

How the experiment evolved

1

Pilots (v1.0)

Small live runs to validate architecture: coherent outputs, citation behavior, actual view evolution.

2

Hardening (v1.1)

Citation hygiene, deterministic source rotation, prior-state compression, evolution rules, sharper archetype prompts.

3

Control validation

Clean Lululemon control was the key checkpoint — if it had collapsed too, the framework would have been a doom machine.

4

Scale runs

Peloton twice at 100 agents, Lululemon once. Repeat confirmed core signal lived in the evidence pack.

Hardening changelog

Areav1.0v1.1 fix
Citations Formatting artifacts triggered false flags Strict source-ID normalization
Source assignment Random allocation Deterministic rotation (60/40)
Memory Agents faked consistency Richer prior-state compression
Reasoning Restatement instead of updating Stronger evolution rules
Personas Consumer types sounded similar Sharper archetype prompts

Models and runtime

Model

gemini-3-flash-preview

Temperature

0.4

Agents

100

Rounds

5

Cutoff

2021-06-30

Social signal

Round 4 only

Peloton runs

2

Lulu runs

1

Key lessons

The control matters as much as the flagship

Peloton alone = dramatic story. Peloton + a control that stayed split = methodology worth taking seriously.

Process discipline beats agent count

Citation hygiene, deterministic rotation, and prior-state compression mattered more than swarm size.

Best used for attention direction

The win: "the system kept surfacing the same load-bearing fragilities that later mattered."

Structural limitation: Even with a clean prompt-level cutoff, the underlying models were trained after 2021. Latent hindsight contamination remains a structural issue. This is a strong proof-of-concept for fragility mapping, not a fully validated forecasting system. I state this not as a disclaimer, but because getting this right matters for the methodology to be taken seriously.