Multi-Agent Intelligence

A new way to analyze a business: let 100 agents debate until the signal appears.

Not a faster memo. A different analytical system — one where many perspectives start with data, argue across rounds, and either converge on fragility or preserve real disagreement.

The key layperson takeaway: when agents with opposing incentives — a dedicated subscriber and a short seller — still turn bearish after five rounds of debate, that convergence under adversarial pressure is itself the signal. It suggests the business narrative requires too many things to stay true at once.

The case study

I tested the system on Peloton as of mid-2021, with Lululemon as a control. Peloton later unraveled. Lululemon did not. The value of the experiment is that the panel treated them differently before that difference was fully obvious in the market.

Peloton

0 % bearish at round 5

Avg confidence: 8.9 / 10 · 0 agents at max conviction

Lululemon control

0 / 0 bearish / bullish split

Avg confidence: 8.12 / 10 · only 0 agents at 10/10

⚠️ Data cutoff: June 30, 2021 · Models carry post-2021 weight knowledge — I treat this as fragility mapping, not clean prediction

The Method

How structured adversarial debate produces signal

Structured social debate, not just a bigger panel

The core idea: spawn many agents, each starting with real data and a distinct incentive structure. Then force them to argue over multiple rounds. Agents must engage with counterarguments, update their confidence, and justify every shift. The output isn't a single opinion — it's a convergence map that reveals whether fragility is real or just noise.

🧬

Diverse starting positions

Each agent gets a distinct persona with built-in incentives — some want the bull case, others are structurally skeptical. No one starts neutral.

📊

Shared evidence, rotated views

All agents draw from the same pre-cutoff data, but see partially rotated subsets. 60% core evidence stays constant; 40% rotates per agent and round.

⚔️

Adversarial rounds

Over 5 rounds, agents see competing arguments and must update. They can't just restate — evolution rules force genuine engagement with counterpoints.

🔍

Convergence as signal

If agents with opposing incentives converge, that's meaningful. If they stay split despite pressure, that's meaningful too. Both outcomes are informative.

How one agent experiences the debate

Persona assigned

Incentive + role

→

Evidence pack

5–8 pre-cutoff sources

→

Independent thesis

Round 1 stance

→

Debate rounds 2–4

Counterarguments + updates

→

Final commitment

Stance + confidence

The Panel

100 agents, 8 archetypes, each with distinct incentives and data focus

Dedicated Subscriber 16 agents

Emotionally invested in the product and community. Wants Peloton to succeed.

Data focus: Personal usage data, community engagement, product satisfaction, habit strength

Pandemic Convert 20 agents

Bought in during COVID lockdowns. Asking: was this a habit or a phase?

Data focus: Subscription renewal patterns, gym reopening data, usage frequency trends

Lapsed / At-Risk User 8 agents

Already drifting away. Looking for reasons to stay — or permission to leave.

Data focus: Cancellation rates, competitive alternatives, price sensitivity signals

Prospective Buyer 8 agents

Evaluating whether the premium price point justifies itself vs alternatives.

Data focus: Product comparisons, pricing data, consumer reviews, value perception

Gym Operator / Competitor 8 agents

Sees Peloton as a competitor. Financially motivated to find weakness.

Data focus: Market share data, reopening trends, consumer behavior shifts post-lockdown

Sell-Side Analyst 20 agents

Needs a defensible rating. Career risk from being wrong in either direction.

Data focus: Financial statements, TAM models, comparable company analysis, guidance

Short Seller / Skeptic 10 agents

Looking for the thesis that breaks. Financially motivated to find fragility.

Data focus: Cash flow analysis, insider selling, competitive dynamics, overvaluation signals

Consumer-Tech Journalist 10 agents

Needs a compelling narrative. Drawn to both hype and controversy equally.

Data focus: Product launches, executive statements, consumer sentiment, cultural trends

Panel composition

10×10 map

A1 A2 A3 A4 A5 A6 A7 A8

Why the mix matters

The biggest cohorts were Pandemic Converts and Sell-Side Analysts. That was intentional: one group pressure-tests demand durability, the other pressure-tests the narrative. I used the same mix on both companies, so the difference in output comes from the business and evidence — not from changing the room.

What each agent received per turn

→

Persona prompt: Stable role + incentive structure that persists across all rounds

→

Evidence subset: 5–8 pre-cutoff sources, partially rotated (60% core / 40% rotating)

→

Prior-state memory: Structured compression of previous stance, confidence, evidence used, and open questions

→

Panel summary: Anonymized group-level summary — injected in round 4 only

Results

The verdict

Peloton converged. Lululemon stayed split.

The same system, same archetypes, same process. One turned into a wall of red. The other preserved genuine disagreement.

Peloton — round 5 stance distribution 91 bearish · 2 bullish · 3 conflicted · 4 neutral

91%

Confidence: 8.9/10 avg · 36 agents at maximum conviction

Lululemon — round 5 stance distribution 47 bearish · 46 bullish · 5 conflicted · 2 neutral

47%

46%

Confidence: 8.12/10 avg · only 2 agents at 10/10

Peloton final map

10×10

Each cell = one agent. Hover or tap for stance.

Lululemon final map

10×10

The control stays genuinely mixed.

Bearish Bullish Conflicted Neutral

Convergence flow

One converged. The other didn't.

Peloton converged twice. The control stayed split. That is the more faithful picture of what was actually run.

Round by round

How the debate unfolded

Click a round to see what happened at each stage.

Independent orientation

No social signal yet. Agents form views from evidence + persona alone. Already ~60% bearish on Peloton — the evidence pack does the work before any debate begins.

Peloton

~60–65% bearish

Lululemon

~50/50 split

Bearish % at round 1

Peloton62%

Control bearish51%

Control bullish45%

Structural health comparison

Where the two businesses differed

Not whether one ticker went up or down — whether the business was fragile across multiple load-bearing dimensions.

Demand durability

Is the core demand pull-forward or structural?

Peloton

Critical

Lululemon

Non-issue

Unit economics

Hardware + subscription margin trajectory

Peloton

Critical

Lululemon

Non-issue

Narrative fragility

How many things must stay true for the story to hold?

Peloton

Critical

Lululemon

Non-issue

Competitive moat

Defensibility against well-funded alternatives

Peloton

Severe

Lululemon

Minor

Operational execution

Supply chain, logistics, cost structure discipline

Peloton

Critical

Lululemon

Minor

Customer sentiment

NPS trajectory and churn signals

Peloton

Moderate-Severe

Lululemon

Non-issue

Management credibility

Trust in leadership to navigate transition

Peloton

Critical

Lululemon

Non-issue

Vector	Peloton	Lululemon
Demand durability Is the core demand pull-forward or structural?	Critical	Non-issue
Unit economics Hardware + subscription margin trajectory	Critical	Non-issue
Narrative fragility How many things must stay true for the story to hold?	Critical	Non-issue
Competitive moat Defensibility against well-funded alternatives	Severe	Minor
Operational execution Supply chain, logistics, cost structure discipline	Critical	Minor
Customer sentiment NPS trajectory and churn signals	Moderate-Severe	Non-issue
Management credibility Trust in leadership to navigate transition	Critical	Non-issue

Why this matters

A new paradigm for business analysis

Traditional diligence: a small team, finite calls, one memo under time pressure. The output is useful but narrow — a few people, one narrative, one recommendation.

This tests a different model. Instead of one analyst, create a structured room of 100 synthetic stakeholders with different incentives and let them argue. The output isn't a memo — it's a convergence signal, a fragility map, and a record of where disagreement held or broke.

That is the key layperson takeaway: when agents with opposing incentives — a dedicated subscriber and a short seller — still turn bearish after five rounds of debate, that convergence under adversarial pressure is itself the signal. It suggests the business narrative requires too many things to stay true at once.

Systems like this don't replace investors. They change the shape of good judgment. The job becomes directing attention, validating reality, and knowing which arguments matter — while the system does the parallel stress-testing underneath.

Old model vs. new model

	Traditional	Synthetic panel
Perspectives	2–5 analysts	100 agents, 8 archetypes
Process	Sequential drafts	5 adversarial rounds
Output	Narrative	Convergence signal + fragility map
Time	Days–weeks	Hours

What happened in reality

Peloton's narrative broke. Demand softened, margins compressed, credibility eroded — the stock fell ~95% from peak. Lululemon kept compounding through the same environment.

Why Peloton converged

Demand durability: pandemic pull-forward, not a durable mass-market habit.
Unit economics: premium hardware + weakening cohorts is a bad combination.
Management credibility: narrative required trust; trust weakened fast.

Why the control matters

Brand power: pricing power and community stayed real.
International runway: durable growth vectors beyond the pandemic.
No collapse: the system stayed split — it's not a doom machine.

Scope & limitations

This is not proof that AI cleanly predicted Peloton.
Models contain post-2021 world knowledge at the weight level — contamination is structural.
A proof-of-concept for fragility mapping, not validated forecasting.

Reproducibility

Peloton did it twice

> RUN 2 / same setup / tightened round-4 framing

> Round 4: 84 / 100 bearish

> Round 5: 91 / 100 bearish

> The evidence pack, not just phrasing, drove convergence

Not proof of calibration. But harder to dismiss as one lucky prompt artifact.

Compact scorecard

Metric	Peloton	Lululemon
Bearish (R5)	91	47
Bullish (R5)	2	46
Avg confidence	8.9 / 10	8.12 / 10
Max conviction	36 agents	2 agents
Consensus	Strong convergence	Split

Bottom line

The point is not prediction. It is attention direction.

A structured adversarial panel can push attention toward the right failure modes earlier than a static memo often does. For Peloton, the simulation repeatedly surfaced a business whose story required too many things to stay true at once. For Lululemon, it surfaced real debate without the same multi-vector collapse.

That is why I think convergence under adversarial pressure matters. When agents with opposing incentives still end up in the same place after repeated debate, the disagreement itself has been stress-tested away. What remains is often the load-bearing weakness in the business.

That is a much more interesting use of AI in diligence than "write me a faster memo."

Share-ready

Social block 1

"What happens when 100 AI analysts with competing incentives debate a company's future? For Peloton, they converged on fragility. For the control, they stayed genuinely split. The signal was in the difference."

ankitchandola.com

Social block 2

"I didn't build this to predict stock prices. I built it to find where a business narrative requires too many things to stay true at once. That's a different — and more useful — question."

ankitchandola.com

Social block 3

"91 out of 100 synthetic analysts turned bearish on Peloton. The same system stayed split on Lululemon. Not prediction. Fragility mapping. The most useful AI in analysis isn't the one that writes a faster memo — it's the one that tells you which questions your team isn't asking."

ankitchandola.com

Social block 4

"The future of business analysis: not one analyst writing a memo, but 100 perspectives arguing across 5 rounds until fragility either surfaces or doesn't. The convergence is the signal."

ankitchandola.com

Deep Dive

Methodology, model choices, hardening path, and full limitations.

▸ Full methodology and limitations

Why Peloton, and why this way

I chose Peloton because the eventual breakdown was not a simple fraud story — it was a multi-stakeholder collapse across demand durability, unit economics, narrative fragility, operating execution, and management credibility. That makes it a better probe than a cleaner accounting problem.

I paired it with Lululemon because I needed a control in adjacent consumer/wellness territory that did not rely on the same hardware-plus-subscription narrative. If both had collapsed into the same profile, the experiment would have been useless.

Time gate	All sources published on or before 2021-06-30
Control	Lululemon through identical framework
Framing	Fragility mapping, not prediction
Disclosure	Model weights contain post-cutoff knowledge
Runs	2× Peloton scale + 1× Lululemon control

How the experiment evolved

Pilots (v1.0)

Small live runs to validate architecture: coherent outputs, citation behavior, actual view evolution.

Hardening (v1.1)

Citation hygiene, deterministic source rotation, prior-state compression, evolution rules, sharper archetype prompts.

Control validation

Clean Lululemon control was the key checkpoint — if it had collapsed too, the framework would have been a doom machine.

Scale runs

Peloton twice at 100 agents, Lululemon once. Repeat confirmed core signal lived in the evidence pack.

Hardening changelog

Area	v1.0	v1.1 fix
Citations	Formatting artifacts triggered false flags	Strict source-ID normalization
Source assignment	Random allocation	Deterministic rotation (60/40)
Memory	Agents faked consistency	Richer prior-state compression
Reasoning	Restatement instead of updating	Stronger evolution rules
Personas	Consumer types sounded similar	Sharper archetype prompts

Models and runtime

Model

gemini-3-flash-preview

Temperature

0.4

Agents

100

Rounds

Cutoff

2021-06-30

Social signal

Round 4 only

Peloton runs

Lulu runs

Key lessons

The control matters as much as the flagship

Peloton alone = dramatic story. Peloton + a control that stayed split = methodology worth taking seriously.

Process discipline beats agent count

Citation hygiene, deterministic rotation, and prior-state compression mattered more than swarm size.

Best used for attention direction

The win: "the system kept surfacing the same load-bearing fragilities that later mattered."

Structural limitation: Even with a clean prompt-level cutoff, the underlying models were trained after 2021. Latent hindsight contamination remains a structural issue. This is a strong proof-of-concept for fragility mapping, not a fully validated forecasting system. I state this not as a disclaimer, but because getting this right matters for the methodology to be taken seriously.

Read deeper

Full methodology

Agent setup, rounds, hardening, model choices, and limitations.

Explore more

More writing

Essays on AI, investing, and building systems that work.

See future experiments

Building toward systems that make institutional-grade stress testing accessible.

A new way to analyze a business: let 100 agents debate until the signal appears.

Structured social debate, not just a bigger panel

Panel composition

Peloton converged. Lululemon stayed split.

Peloton final map

Lululemon final map

One converged. The other didn't.

How the debate unfolded

Independent orientation

Demand durability dominates

Stress-testing fails to rescue the bull case

Social signal injection

Final commitment

Where the two businesses differed

A new paradigm for business analysis

Peloton did it twice

The point is not prediction. It is attention direction.

Why Peloton, and why this way

How the experiment evolved

Hardening changelog

Models and runtime

Key lessons