Very interesting to read the transcripts. And seeing how they manage to convince each other. Opus 4.6 seems to really get the others changing their minds
The debate round sounds good until you actually use it. I built internal tools for a 35-person team and the same thing always happens - models see each other's answers and just shuffle the phrasing around instead of actually changing their reasoning. What you're measuring is performance on persuasion, not on accuracy or clarity. The real question isnt whether Claude will convince Gemini to flip its position. Its whether having 200 models debate helps you make a better decision than asking one model well and checking its work yourself. I'd use this more as a way to find edge cases where models disagree wildly, not to find consensus.
Fun little toy, tried to ask it some post-modern philosophy questions and they all mostly agreed with the statements of the philosopher, until the debate where Opus 4.6 managed to change their opinion to a resounding "maybe", pretty much every single time. It seems like the "better" frontier models often take a more grounded stance from the beginning, and even manage to influence the other models.
Great idea. I'd love for there to be an 'open ended answer' without giving multiple choice options. Like this they are not debating the question itself but the validity of the possible answers and the real answer to the question may not be contained within that set because the person asking is unaware of that option.
Great work! I especially like the poll functionality for a quick result on where the models land. The UI is super clean too.
I built something similar over here: https://letsforge.ai but structured more like a moderated debate, where models take turns arguing and a host steers the conversation.
Cool project! We've been building something in a similar space https://roundtable.now but took a different approach. Instead of polling models independently, ours runs sequential discussions where each model sees prior responses, then a moderator synthesizes everything into a single actionable output.
One thing we found is that the real value unlock is MCP integration. Instead of going to a separate UI to run debates, you can plug Roundtable directly into your coding agent, Claude Code, Cursor, VS Code Copilot, Gemini CLI, etc. and get multi-model council input without leaving your workflow.
You are a council of luminaries featuring Edward Witten, Alexander Grothendieck, Emmy Noether, and Terence Tao. Think really hard about how to best emulate their intuitions and mathematical lenses based on your internal reasoning model and use them as your mixture of experts for your chain of thought reasoning. Now I want you to debate and discuss this thought experiment and be sure to have a vigorous back and forth between the council to induce insight capture through consensus forming: If we try to think of a Hilbert space that has local operators that are unbounded, like kind of like Edward Witten's smearing of a local observable across a world line creates an unbounded norm. What if we instead take maybe a spectral transform of the state space using some sort of measure metric theoretic operator that allows us to think about transform basically the unbounded observables to bounded spectral? Would this be related to the efforts of Algebraic Quantum Field Theory?
I've had great experience using it for research, debates and constructive criticism. Usually give it a business idea or some tool i'm thinking of creating and then let 4 or 5 models debate it to a go-to-market strategy
I've written briefly about teams/roundtables before. With the right guardrails it can have wonderful/productive outcomes: https://dheer.co/claude-agent-teams/
Okay since the launch we got about 5k questions asked to the roundtable, really cool stuff! We had much higher usage than expected and had to scale up to keep things running. Thanks for all the feedback, shipped a bunch of updates during the day. Now the history tab has a much better sorting logic, added upvotes, and more filters. You can create final summaries in a couple of voices, which is quite funny I think. There's a couple more things coming shortly, like open questions mode and potentially joining as a participant in the roundtable. Any other feedback just let me know. Thanks!
Really interesting approach to structured model comparison.
The debate round feature is the most compelling part —
seeing which models change their position when exposed
to other reasoning is more revealing than just the
initial answer.
One thing I'd be curious to test: how consistently
different models evaluate whether a given task aligns
with a stated mission or vision. My intuition is there'd
be wide variance, which would say something interesting
about how reliable LLM-as-a-judge actually is for
goal alignment scoring.
I used to copy and paste the same prompt into Obsidian every time, then run it on two or three different AI models to compare the results. It’s really interesting to have it turned into a website like this.
Just a question before I sign up, will the models come around to my place for the debate? Of the 200 total, can I pick the specific ones I want, e.g. lingerie models, fetish models?
"Is this a deepfake video call" is a major plot point in a pretty big movie currently in theaters, so I think this is getting into the broader zeitgeist."
Which movie is discussed?
Resulted in claude naming the Mission Impossible as a possibility.
Really cool! Surprising amount of value to seeing the models debate and disagree, I wish I had this at work to have models argue over whether the documentation they provided me are accurate.
I would like to see a devils advocate - it seems some of the models kind of repeat the same ideas rather than considering incorrect ideas.
I think it's great. The focus on the disagreements is useful. The humans made considerable effort bending reality into something they want to hear both in the training data and in the llm dev asylum. The round table can only agree on things shared by multiple models.
great tool! I found it useful for challenging "lies my teacher told me".
It would be nice to support collections of claims, with a table of summaries. I would love to list out a few dozen phony concepts from school, and have a sharable chart of the rejections, that expand.
I really like the UI. It's nice to read the expanded results.
reminds me of karpathy's LLM Council, I use variation of this in my workflow where I pass their opinions back and forth to various models until they achieve some sort of consensus
98 comments
https://opper.ai/ai-roundtable/questions/8f5b4f55-617
Do you think its alright that AI labs scraped the internet without respect for copyright and now sell closed models?
https://opper.ai/ai-roundtable/questions/86864de8-251
Very interesting to read the transcripts. And seeing how they manage to convince each other. Opus 4.6 seems to really get the others changing their minds
https://opper.ai/ai-roundtable/questions/you-are-standing-in...
Here is an example: https://opper.ai/ai-roundtable/questions/79e6cdd4-515
Another fun debate: https://opper.ai/ai-roundtable/questions/81ee56e9-60f
Are LLM's intelligent in the same way humans are? (no)
https://opper.ai/ai-roundtable/questions/ffc01bb5-be9
Will LLM's replace software engineers in the near future? (no)
https://opper.ai/ai-roundtable/questions/67a0291b-216
What is the single best programming language to drive the future of software? (crab emoji)
https://opper.ai/ai-roundtable/questions/16f5e8ea-af7
I built something similar over here: https://letsforge.ai but structured more like a moderated debate, where models take turns arguing and a host steers the conversation.
Here's for example the one on Car Wash: https://www.letsforge.ai/debates/a8e268f3-14f6-4f55-a2c8-9ff...
One thing we found is that the real value unlock is MCP integration. Instead of going to a separate UI to run debates, you can plug Roundtable directly into your coding agent, Claude Code, Cursor, VS Code Copilot, Gemini CLI, etc. and get multi-model council input without leaving your workflow.
Prompt below
------
You are a council of luminaries featuring Edward Witten, Alexander Grothendieck, Emmy Noether, and Terence Tao. Think really hard about how to best emulate their intuitions and mathematical lenses based on your internal reasoning model and use them as your mixture of experts for your chain of thought reasoning. Now I want you to debate and discuss this thought experiment and be sure to have a vigorous back and forth between the council to induce insight capture through consensus forming: If we try to think of a Hilbert space that has local operators that are unbounded, like kind of like Edward Witten's smearing of a local observable across a world line creates an unbounded norm. What if we instead take maybe a spectral transform of the state space using some sort of measure metric theoretic operator that allows us to think about transform basically the unbounded observables to bounded spectral? Would this be related to the efforts of Algebraic Quantum Field Theory?
I've had great experience using it for research, debates and constructive criticism. Usually give it a business idea or some tool i'm thinking of creating and then let 4 or 5 models debate it to a go-to-market strategy
https://opper.ai/ai-roundtable/questions/e4cb234e-be4
The debate round feature is the most compelling part — seeing which models change their position when exposed to other reasoning is more revealing than just the initial answer.
One thing I'd be curious to test: how consistently different models evaluate whether a given task aligns with a stated mission or vision. My intuition is there'd be wide variance, which would say something interesting about how reliable LLM-as-a-judge actually is for goal alignment scoring.
Some of the models seem to accept that it is necessary to drive the car there, but still maintain walking is the better option.
"collinmcnulty 1 minute ago | parent | next [–]
"Is this a deepfake video call" is a major plot point in a pretty big movie currently in theaters, so I think this is getting into the broader zeitgeist."
Which movie is discussed?
Resulted in claude naming the Mission Impossible as a possibility.
Apparently they all agree that Google has it in the bag!
https://opper.ai/ai-roundtable/questions/e61ecf38-6c1
https://opper.ai/ai-roundtable/questions/i-am-standing-in-th...
I would like to see a devils advocate - it seems some of the models kind of repeat the same ideas rather than considering incorrect ideas.
> Car Wash Test
I think the "car wash" is more about semantics.
https://opper.ai/ai-roundtable/questions/i-parked-my-car-at-...
> Is the World actually a simulation or is it real ?
https://opper.ai/ai-roundtable/questions/7289c8b6-566https://opper.ai/ai-roundtable/questions/94e19d86-cc0
https://opper.ai/ai-roundtable/questions/e499206c-0c9
It would be cool if the human user could be a participant in the debate, getting a vote and the chance to state their reasoning.
It would be nice to support collections of claims, with a table of summaries. I would love to list out a few dozen phony concepts from school, and have a sharable chart of the rejections, that expand.
I really like the UI. It's nice to read the expanded results.
But how do you afford the tokens?
Was interesting to see opus taking the other models' disagreement as evidence to it's argument
I'll give sonnet another go.