Further human + AI + proof assistant work on Knuth's "Claude Cycles" problem

[−] vatsachak 48d ago

I've always said this but AI will win a fields medal before being able to manage a McDonald's.

Math seems difficult to us because it's like using a hammer (the brain) to twist in a screw (math).

LLMs are discovering a lot of new math because they are great at low depth high breadth situations.

I predict that in the future people will ditch LLMs in favor of AlphaGo style RL done on Lean syntax trees. These should be able to think on much larger timescales.

Any professional mathematician will tell you that their arsenal is ~ 10 tricks. If we can codify those tricks as latent vectors it's GG

[−] vatsachak 48d ago

Tricks are nothing but patterns in the logical formulae we reduce.

Ergo these are latent vectors in our brain. We use analogies like geometry in order to use Algebraic Geometry to solve problems in Number Theory.

An AI trained on Lean Syntax trees might develop it's own weird versions of intuition that might actually properly contain ours.

If this sounds far fetched, look at Chess. I wonder if anyone has dug into StockFish using mechanistic interpretability

[−] myffical 48d ago

Some DeepMind researchers used mechanistic interpretability techniques to find concepts in AlphaZero and teach them to human chess Grandmasters: https://www.pnas.org/doi/10.1073/pnas.2406675122

[−] hodgehog11 48d ago

This argument, that LLMs can develop new crazy strategies using RLVR on math problems (like what happened with Chess), turns out to be false without a serious paradigm shift. Essentially, the search space is far too large, and the model will need help to explore better, probably with human feedback.

https://arxiv.org/abs/2504.13837

[−] narrator 48d ago

The search space for the game of Go was also thought to be too large for computers to manage.

[−] thesz 48d ago

It still is [1].

[1] https://www.vice.com/en/article/a-human-amateur-beat-a-top-g...

[−] stalfie 48d ago

The blind spot exploiting strategy you link to was found by an adverserial ML model...

[−] sealeck 48d ago

Yes and making a horse drawn cart drive itself was thought to be impossible so why don't we have faster than light travel yet...

[−] Finbel 48d ago

Yes but "the search space is too large" is something that has been said about innumerable AI-problems that were then solved. So it's not unreasonable that one doubts the merit of the statement when it's said for the umpteenth time.

[−] hodgehog11 48d ago

I should have been more specific then. The problem isn't that the search space is too large to explore. The problem is that the search space is so large that the training procedure actively prefers to restrict the search space to maximise short term rewards, regardless of hyperparameter selection. There is a tradeoff here that could be ignored in the case of chess, but not for general math problems.

This is far from unsolvable. It just means that the "apply RL like AlphaGo" attitude is laughably naive. We need at least one more trick.

[−] vatsachak 47d ago

The other trick could be bootstrapping through mathlib.

As you said brute forcing the search space as the starting procedure would take way too long for the AI to build intuition.

But if we could give it a million or so lemmas of human math, that would be a great starting point.

[−] throwaway27448 48d ago

I agree that LLMs are a bad fit for mathematical reasoning, but it's very hard for me to buy that humans are a better fit than a computational approach. Search will always beat our intuition.

[−] hodgehog11 48d ago

Yes and no. I think we have vastly underestimated the extent of the search space for math problems. I also think we underestimate the degree to which our worldview influences the directions with which we attempt proofs. Problems are derived from constructions that we can relate to, often physically. Consequently, the technique in the solution often involves a construction that is similarly physical in its form. I think measure theory is a prime example of this, and it effectively unlocked solutions to a lot of long-standing statistical problems.

[−] ineedasername 47d ago

That linked article says its about RLVR but then goes on to conflate other RL with it, and doesn't address much in the way of the core thinking that was in the paper they were partially responding to that had been published a month earlier[0] which laid out findings and theory reasonably well, including work that runs counter to the main criticism in the article you cited, ie, performance at or above base models only being observed with low K examples.

That said, reachability and novel strategies are somewhat overlapping areas of consideration, and I don't see many ways in which RL in general, as mainly practiced, improves upon models' reachability. And even when it isn't clipping weights it's just too much of a black box approach.

But none of this takes away from the question of raw model capability on novel strategies, only such with respect to RL.

[0] https://arxiv.org/pdf/2506.14245

[−] slopinthebag 48d ago

Stockfish's power comes from mostly search, and the ML techniques it uses are mainly about better search, i.e. pruning branches more efficiently.

[−] vatsachak 48d ago

The weights must still have some understanding of the chess board. Though there is always the chance that it makes no sense to us

[−] emp17344 48d ago

Why must it involve understanding? I feel like you’re operating under the assumption that functionalism is the “correct” philosophical framework without considering alternative views.

[−] slopinthebag 48d ago

Even that is probably too much. It has no understanding of what "chess" is, or what a chess board is, or even what a game is. And yet it crushes every human with ease. It's pretty nuts haha.

[−] Sopel 48d ago

The ML techniques it uses are only about evaluation, but you were close

[−] hodgehog11 48d ago

As a professional mathematician, I would say that a good proof requires a very good representation of the problem, and then pulling out the tricks. The latter part is easy to get operating using LLMs, they can do it already. It's the former part that still needs humans, and I'm perfectly fine with that.

[−] madrox 48d ago

> I've always said this but AI will win a fields medal before being able to manage a McDonald's.

I love this and have a corollary saying: the last job to be automated will be QA.

This wave of technology has triggered more discussion about the types of knowledge work that exist than any other, and I think we will be sharper for it.

[−] pfdietz 48d ago

> Any professional mathematician will tell you that their arsenal is ~ 10 tricks. If we can codify those tricks as latent vectors it's GG

And if we can train the systems to discover new tricks, whoa Nelly.

[−] ryanar 48d ago

Are they actually producing new math? In the most recent ACM issue there was an article about testing AI against a math bench that was privately built by mathematicians, and what they found is that even though AI can solve some problems, it never truly has come up with something novel and new in mathematics, it is just good at drawing connections between existing research and putting a spin on it.

[−] Yoric 48d ago

> I predict that in the future people will ditch LLMs in favor of AlphaGo style RL done on Lean syntax trees. These should be able to think on much larger timescales.

This is certainly my hope.

In my spare time, I'm slowly, very slowly, inching towards a prototype of something that could work like that.

[−] slopinthebag 48d ago

> AI will win a fields medal before being able to manage a McDonald's

Of course, because it takes multi-modal intelligence to manage a McDonalds. I.e. it requires human intelligence.

> I predict that in the future people will ditch LLMs in favor of AlphaGo style RL

Same for coding as well. LLM's might be the interface we use with other forms of AI though.

[−] kelseyfrog 48d ago

As of now, no models have solved a Millennium Prize Problem[1].

1. https://mppbench.com/

[−] 3abiton 48d ago

It will be heavily still reliant onexpert human input and interactions. Knuth is an expert, and know how to guide.

[−] smokel 48d ago

I think this is mostly about existing legislature, not about technology.

In any other context than when your paycheck depends on it, you would probably not be following orders from a random manager. If your paycheck depended on following the instructions of an AI robot, the world might start to look pretty scary real soon.

[−] NamlchakKhandro 48d ago

I've never seen you say that

[−] breatheoften 48d ago

Like so many things -- the evolution of AI math will I think follow trajectories hinted at in the 90s by the all time great sci-fi author Greg Egan. The nature of math won't change -- but the why of it definitely will. Egan imagined a future ai civilization in Diaspora where "math discovery" -- by nature in the future perhaps accurately described as "mechanistic math discovery" is modeled by society as a kind of salt mine environment in which you can dig for arbitrarily long amounts of time and find new nuggets. The nuggets themselves have a kind of "pure value" as mathematical objects even if they might not have any knowable value outside the mines. Some personalities were interested in and valued the nuggets for their own sake while others didn't but recognized that there were occasionally nuggets found in the mind that had broader appeal.

Research institutes like those founded by Terence Tao in our current present feel like they will align to this future almost perfectly on a long enough timeline -- tho I think on a shorter timeline this area of research is almost certain to provide a ton of useful ways to advance our current ai systems as our current systems are still in a state where literally anything that can generate new information that is "accurate" in some way -- like our current theorem prover engines are enormously valuable parts of our still manually curated training loops.

[−] pks016 48d ago

Interesting but not surprising to me. Once a field expert guides the models, they most likely will reach a solution. The models are good at lazy work for experts. For hard or complicated questions, many a time the models have blind spots.

[−] EternalFury 48d ago

There are people who think knowledge discovery is just a matter of parroting past behavior and trying things at random until something sticks. I don’t.

[−] qnleigh 48d ago

In the paper, they give part of their system prompt:

> * After EVERY exploreXX.py run, IMMEDIATELY update this file [plan.md] before doing anything else. * No exceptions. Do not start the next exploration until the previous one is documented here.

Is this known to improve performance for advanced problem solving? If so, why this specific prompt?

[−] ftchd 48d ago

https://xcancel.com/BoWang87/status/2037648937453232504

[−] manapause 48d ago

If you give 100 monkeys 100 guns and room full of building materials, how long will it take before they build a house?

How long will it take before they rob a bank?

If they do either of those things will the results have been intentional from the simian’s POV?

[−] not_that_d 48d ago

Seems like we are ready heading to what the OpenAI CEO wanted "intelligence just available thru a subscription"

[−] bharxhav 48d ago

Ramanujan is a good analogy for this situation. Theories could be right/wrong, until there's a proof. Same with anything AI produces. There's always a "told you so" baked in with it's response.

[−] adrithmetiqa 48d ago

Super interesting but what does this mean for us mere mortals?

[−] pugchat 48d ago

[dead]

[−] johnwhitman 48d ago

[flagged]

[−] gnarlouse 48d ago

out of curiosity, i wonder if people are taking stabs at p!=np

[−] smithcoin 48d ago

When I was younger I remember a point of demarcation for me was learning the 4chan adage “trolls trolling trolls”, and approaching all internet interactions with skepticism. While I have been sure that Reddit for a while has succumbed to being “dead internet”. This thread is another moment for me- I can no longer recognize who is a bot, and who has honest intentions.

[−] NathanielLucas 48d ago

[flagged]

[−] techpulse_x 48d ago

[dead]

[−] riteshyadav02 48d ago

[dead]

[−] linncharm 48d ago

[dead]

Further human + AI + proof assistant work on Knuth's "Claude Cycles" problem (twitter.com)

184 comments