In math, rigor is vital, but are digitized proofs taking it too far?

[−] WhitneyLand 46d ago

Great quote from Hilbert, I think it’s also a useful thought for software development.

“The edifice of science is not raised like a dwelling, in which the foundations are first firmly laid and only then one proceeds to construct and to enlarge the rooms,” the great mathematician David Hilbert wrote in 1905 (opens a new tab). Rather, scientists should first find “comfortable spaces to wander around and only subsequently, when signs appear here and there that the loose foundations are not able to sustain the expansion of the rooms, [should they] support and fortify them.”

[−] nicoburns 46d ago

Yeah, I see a lot of people (especially on HN) bemoaning any science that isn't a controlled double blind experiment with a large sample size. But exploratory science is just as important as the science that proves things. Otherwise we wouldn't know which hypotheses are useful/interesting to test.

[−] harshreality 46d ago

Are they bemoaning that science is being done, or are they bemoaning that the experimental results have not yet reached high enough confidence to justify the conclusions being suggested?

[−] JumpCrisscross 46d ago

>

Are they bemoaning that science is being done

The reflexive "in mice" comments seem to be bemoaning how science is done.

[−] cwillu 46d ago

As someone who has made several comments consisting entirely of “…in mice.”, let me assure you that the reflex only kicks in after reading the paper and noticing that the experimental subjects were exclusively mice.

The problem is not mice experiments on arxiv, the problem is posting those papers for broader dissemenation to the public, with titles suggesting to the public that cancer has been cured, without prominently pointing out that the experiments were not about cancer in humans.

[−] JumpCrisscross 46d ago

>

problem is posting those papers for broader dissemenation to the public, with titles suggesting to the public that cancer has been cured

Fair enough. I'm thinking of cases where a good study that isn't turned into PR slop is dismissed because it was done in mice. Which is fine for most people. But not great if we're treating real science that way.

[−] cwillu 46d ago

Dismissing good science is entirely the correct decision when the good science isn't ready for broad dissemination to the audience which it is being presented to.

[−] dyauspitr 46d ago

I disagree. I think people understand studies have to begin in mice. It’s what the GP said. You can’t release those studies because there’s not a high enough confidence rate in what most people are interested in ie how it effect humans.

[−] JumpCrisscross 46d ago

>

You can’t release those studies because there’s not a high enough confidence rate in what most people are interested in ie how it effect humans

This is science by ignoramus. It isn't how science works, at least not when it works at its best. Someone advocating for censoring science because it might be misread is not on the side of science.

[−] dyauspitr 45d ago

I’m not advocating for censoring them. I’m advocating for less hype in science media reporting around mice studies because let’s be frank. The vast majority of the population are ignoramuses that cannot make the distinctions themselves, and that has real political consequences through lack of trust in scientific organizations.

[−] cap11235 46d ago

More Doctors Smoke Camels!™

[−] dyauspitr 46d ago

It depends, especially coming from fields like psychology. You can prove anything with a small enough group. A lot of those just end up adding a lot of noise and reduce the reliability of the entire field in general. It just ends up with people getting conflicting information every other week and then they just tune out.

[−] potsandpans 46d ago

Like anything else, it's easier to complain about the legitimacy of something and nitpick it to death than it is to do the actual thing.

Most people on HN aren't scientists, even if they fancy themselves as such.

[−] GuB-42 46d ago

The problem is more about how it is reported to the public. Science is ugly, but when a discovery is announced to the public, a high level of confidence is expected, and journalists certainly act like there is. Kind of like you are not supposed to ship untested development versions of software to customers.

But sometimes, some of the ugly science gets out of the lab a bit too soon, and it usually doesn't end well. Usually people get their hopes up, and when it doesn't live up to the hype, people get confused.

It really stood out during the covid pandemic. We didn't have time to wait for the long trials we normally expect, waiting could mean thousands of deaths, and we had to make do with uncertainty. That's how we got all sorts of conflicting information and policies that changed all the time. The virus spread by contact, no, it is airborne, masks, no masks, hydroxycholoroquine, no, that's bullshit, etc... that sort of thing. That's the kind of thing that usually don't get publicized outside of scientific papers, but the circumstances made it so that everyone got to see that, including science deniers unfortunately.

Edit: Still, I really enjoyed the LK99 saga (the supposed room temperature superconductor). It was overhyped, and it it came to its expected conclusion (it isn't), however, it sparked widespread interest in semiconductors and plenty of replication attempts.

[−] godelski 46d ago

  > The problem is more about how it is reported to the public.

Yes and no.

From scientific communicators there's a lot of slop and it's getting worse. Even places like Nature and Scientific American are making unacceptable mistakes (a famous one being the quantum machine learning black hole BS that Quanta published)

But I frequently see those HN comments on ArXiv links. That's not a science communication issue. Those are papers. That's researcher to researcher communication. It's open, but not written for the public. People will argue it should be, but then where does researcher to researcher communication happen? You really want that behind closed doors?

There is a certain arrogance that plays a role. Small sample size? There's a good chance it's a paper arguing for the community to study at a larger scale. You're not going to start out by recruiting a million people to figure out if an effect might even exist. Yet I see those papers routinely scoffed at. They're scientifically sound but laughing at them is as big of an error as treating them like absolute truth, just erring in the opposite direction.

People really do not understand how science works and they get extremely upset if you suggest otherwise. As if not understanding something that they haven't spent decades studying implies they're dumb. Scientists don't expect non scientists to understand how science works. There's a reason you're only a junior scientist after getting an entire PhD. You can be smart and not understand tons of stuff. I got a PhD and I'll happily say I'll look like a bumbling idiots even outside my niche, in my own domain! I think we're just got to stop trying to prove how smart we are before we're all dumb as shit. We're just kinda not dumb at some things, and that's perfectly okay. Learning is the interesting part. And it's extra ironic the Less Wrong crowd doesn't take those words to heart because that's what it's all about. We're all wrong. It's not about being right, it's about being less wrong

[−] ratmice 46d ago

My only complaint with the article is that it doesn't seem to mention that digitized proofs can contain gaps but that those gaps must be explicit like in lean the sorry function, or axioms.

[−] raincom 45d ago

That’s similar to Neurath’s boat:” We are like sailors who on the open sea must reconstruct their ship but are never able to start afresh from the bottom. Where a beam is taken away a new one must at once be put there, and for this the rest of the ship is used as support. In this way, by using the old beams and driftwood the ship can be shaped entirely anew, but only by gradual reconstruction.”

[−] DoctorOetker 44d ago

Hilbert's quote is entirely out of context:

1) while many formalists in his day were stress-testing definitions for unexpected gotcha's; some vocal minority were doing formalization as an eccentric art form.

2) commoditized computers running verification software was not available in his day and age

As long as the weakest link was reliance on human brains faithfully attempting to maintain consistency anyway, then it was more productive and fruitful for the economy to focus on translating observations into the language of mathematics.

Once commoditized hardware and minimalistic verification software becomes available, it makes sense to step back and start a machine readable formalization program to translate or verify the main body of mathematics.

Quoting mathematicians of the caliber like Hilbert in 2026 doesn't mean its great guidance in the face of questions Hilbert was never confronted with: with cheap affordable compute, and an enormously expanded number of mathematicians, perhaps its time to formalize the bulk of mathematics.

And it could happen quickly.

A government can mandate that a certain fraction of student scores is assessed on their formalization tasks. Basically turn the job of formalizing mathematics into homework exercises for students. There are students at all levels, undergraduate, graduate, ... If a result isn't proven yet, turn into a temporary axiom, which goes to the collective TODO list.

In a few years all of mathematics that is regularly touched on in academia could be formalized.

Nation states that enforce this will have a large number of mathematicians capable of formalizing systems into machine readable form, and will benefit tremendously compared to nation states that don't (even if the resulting formalizations were public domain: having a sword available is not the same as having workers experienced in smithing such a sword).

[−] jl6 46d ago

Imagine a future where proofs are discovered autonomously and proved rigorously by machines, and the work of the human mathematician becomes to articulate the most compelling motivations, the clearest explanations, and the most useful maps between intuitions, theorems, and applications. Mathematicians as illuminators and bards of their craft.

[−] layer8 46d ago

The question is whether the capabilities that would let AI take over the discovery part wouldn’t also let them take over the other parts.

[−] thaumasiotes 46d ago

> Imagine a future where proofs are discovered autonomously and proved rigorously by machines, and the work of the human mathematician becomes to articulate the most compelling motivations

You've got the wrong idea of what mathematicians do now. There's not a proof shortage! We've had autonomously discovered proofs since at least Automated Mathematician, and we can have more whenever we want them - a basic result in logic is that you can enumerate valid proofs mechanically.

But we don't want them, because most proofs have no value. The work of a mathematician today is to determine what proofs would be interesting to have ("compelling motivations"), and try to prove them.

[−] tines 46d ago

But in this future, why will “the most compelling motivations, the clearest explanations, and the most useful maps between intuitions, theorems, and applications” be necessary? Catering to hobbyists?

[−] fasterik 46d ago

Most mathematicians don't understand the fields outside of their specialization (at a research level). Your assumption that intuition and applications are limited to hobbyists ignores the possibility of enabling mathematicians to work and collaborate more effectively at the cutting edge of multiple fields.

[−] ndriscoll 46d ago

Very far in the future when AI runs everything, of course math will be a hobby (and it will be great! As a professional programmer I'm happy that I now have a research-level tutor/mentor for my math/physics hobby). In the nearer term, it seems apparent to me that people with stronger mental models of the world are able (without even trying!) to formulate better prompts and get better output from models. i.e. as long as people are asking the questions, they'll do better to have some idea of the nuance within the problem/solution spaces. Math can provide vocabulary to express such nuance.

[−] layer8 46d ago

Mapping theorems to applications is certainly necessary for mathematics to be useful.

[−] tines 46d ago

Sure, applications are necessary, but why will humans do that?

[−] layer8 46d ago

I agree (https://news.ycombinator.com/item?id=47575890), but the parent assumes that AI will lack the ability.

[−] rtpg 46d ago

Proofs of what?

Proofs tend to get generated upstream of people trying to investigate something concrete about our models.

A computer might be able to autonomously prove that some function might have some property, and this prove is entirely useless when nobody cares about that function!

Imagine if you had an autonomous SaaS generator. You end up with “flipping these pixels from red to blue as a servis” , “adding 14 to numbers as a service”, “writing the word ‘dog’ into a database as a service”.

That is what autonomous proof discovery might end up being. A bunch of things that might be true but not many people around to care.

I do think there’s a loooot of value in the more restricted “testing the truthfulness of an idea with automation as a step 1”, and this is something that is happening a lot already by my understanding.

[−] umutisik 46d ago

With sufficient automation, there shouldn't really be a trade-off between rigor and anything else. The goal should be to automate as much as possible so that whatever well-defined useful thing can come out theory can come out faster and more easily. Formal proofs make sense as part of this goal.

[−] _alternator_ 46d ago

Let’s not forget that mathematics is a social construct as much as (and perhaps more than) a true science. It’s about techniques, stories, relationships between ideas, and ultimately, it’s a social endeavor that involves curiosity satisfaction for (somewhat pedantic) people. If we automate ‘all’ of mathematics, then we’ve removed the people from it.

There are things that need to be done by humans to make it meaningful and worthwhile. I’m not saying that automation won’t make us more able to satisfy our intellectual curiosity, but we can’t offload everything and have something of value that we could rightly call ‘mathematics’.

[−] justonceokay 46d ago

> mathematics is a social construct

If you believe Wittgenstein then all of math is more and more complicated stories amounting to 1=1. Like a ribbon that we figure out how to tie in ever more beautiful knots. These stories are extremely valuable and useful, because we find equivalents of these knots in nature—but boiled down that is what we do when we do math

[−] ianhorn 46d ago

I like the Kronecker quote, "Natural numbers were created by god, everything else is the work of men" (translated). I figure that (like programming) it turns out that putting our problems and solutions into precise reusable generalizable language helps us use and reuse them better, and that (like programming language evolution) we're always finding new ways to express problems precisely. Reusability of ideas and solutions is great, but sometimes the "language" gets in the way, whether that's a programming language or a particular shape of the formal expression of something.

[−] anthk 46d ago

More like 1 = 0 + 1.

Read about Lisp, the Computational Beauty of Nature, 64k Lisp from https://t3x.org and how all numbers can be composed of counting nested lists all down.

List of a single item:

     (cons '1 nil)

Nil it's an empty atom, thus, this reads as:

[ 1 | nil ]

List of three items:

    (cons '1 (cons 2 (cons 3 nil)))

Which is the same as

    (list '1 '2 '3)

Internally, it's composed as is, imagine these are domino pieces chained. The right part of the first one points to the second one and so on.

[ 1 | --> [ 2 | -> [ 3 | nil ]

A function is a list, it applies the operation over the rest of the items:

     (plus '1 '2 3')

Returns '6

Which is like saying:

  (eval '(+ '1 '2 '3))

'(+ '1 '2 '3) it's just a list, not a function, with 4 items.

Eval will just apply the '+' operation to the rest of the list, recursively.

Whis is the the default for every list written in parentheses without the leading ' .

    (+ 1 (+ 2 3))

Will evaluate to 6, while

    (+ '1 '(+ '2 '3))

will give you an error as you are adding a number and a list and they are distinct items themselves.

How arithmetic is made from 'nothing':

https://t3x.org/lisp64k/numbers.html

Table of contents:

https://t3x.org/lisp64k/toc.html

Logic, too:

https://t3x.org/lisp64k/logic.html

[−] _alternator_ 46d ago

You don’t really have to believe Wittgenstein; any logician will tell you that if your proof is not logically equivalent to 1=1 then it’s not a proof.

[−] justonceokay 46d ago

Sure, I just personally like his distinction between a “true” statement like “I am typing right now” and a “tautological” statement like “3+5=8”.

In other words, declarative statements relate to objects in the world, but mathematical statements categorize possible declarative statements and do not relate directly to the world.

[−] IsTom 46d ago

If you look from far enough, it becomes "Current world ⊨ I am typing right now" which becomes tautological again.

[−] sesm 46d ago

In my view mathematics builds tools that help solve problems in science.

[−] _alternator_ 46d ago

This is known as “applied mathematics”.

[−] nathan_compton 46d ago

Sounds lame and boring to me.

[−] adrianN 46d ago

There is a bit about this in Greg Egan‘s Disspora, where a parallel is drawn between maths and art. It is not difficult to automate art in the sense that you can enumerate all possible pictures, but it takes sentient input to find the beautiful areas in the problem space.

[−] SabrinaJewson 46d ago

I do not think this parallel works, because I think you would struggle to find a discipline for which this is not the case. It is trivial to enumerate all the possible scientific or historical hypothesis, or all the possible building blueprints, or all the possible programs, or all the possible recipes, or legal arguments…

The fact that the domain of study is countable and computable is obvious because humans can’t really study uncountable or uncomputable things. The process of doing anything at all can always be thought of as narrowing down a large space, but this doesn’t provide more insight than the view that it’s building things up.

[−] seanmcdirmid 46d ago

Automating proofs is like automating calculations: neither is what math is, they are just things in the way that need to be done in the process of doing math.

Mathematicians will just adopt the tools and use them to get even more math done.

[−] quietbritishjim 46d ago

I don't think that's true. Often, to come up with a proof of a particular theorem of interest, it's necessary to invent a whole new branch of mathematics that is interesting in its own right e.g. Galois theory for finding roots of polynomials. If the proof is automated then it might not be decomposed in a way that makes some new theory apparent. That's not true of a simple calculation.

[−] seanmcdirmid 46d ago

> I don't think that's true. Often, to come up with a proof of a particular theorem of interest, it's necessary to invent a whole new branch of mathematics that is interesting in its own right e.g. Galois theory for finding roots of polynomials. If the proof is automated then it might not be decomposed in a way that makes some new theory apparent. That's not true of a simple calculation.

Ya, so? Even if automation is only going to work well on the well understood stuff, mathematicians can still work on mysteries, they will simply have more time and resources to do so.

[−] ndriscoll 46d ago

This is literally the same thing as having the model write well factored, readable code. You can tell it to do things like avoid mixing abstraction levels within a function/proof, create interfaces (definitions/axioms) for useful ideas, etc. You can also work with it interactively (this is how I work with programming), so you can ask it to factor things in the way you prefer on the fly.

[−] integralid 46d ago

>This is literally the same thing as

No.

>You can

Not right now, right? I don't think current AI automated proofs are smart enough to introduce nontrivial abstractions.

Anyway I think you're missing the point of parent's posts. Math is not proofs. Back then some time ago four color theorem "proof" was very controversial, because it was a computer assisted exhaustive check of every possibility, impossible to verify by a human. It didn't bring any insight.

In general, on some level, proofs like not that important for mathematicians. I mean, for example, Riemann hypothesis or P?=NP proofs would be groundbreaking not because anyone has doubts that P=NP, but because we expect the proofs will be enlightening and will use some novel technique

[−] ndriscoll 45d ago

Right, in the same way that programs are not opcodes. They're written to be read and understood by people. Language models can deal with this.

I'm not sure what your threshold for "trivial" is (e.g. would inventing groups from nothing be trivial? Would figuring out what various definitions in condensed mathematics "must be" to establish a correspondence with existing theory be trivial?), but I see LLMs come up with their own reasonable abstractions/interfaces just fine.

[−] jhanschoo 46d ago

There are areas of mathematics where the standard proofs are very interesting and require insight, often new statements and definitions and theorems for their sake, but the theorems and definitions are banal. For an extreme example, consider Fermat's Last Theorem.

Note on the other hand that proving standard properties of many computer programs are frequently just tedious and should be automated.

[−] seanmcdirmid 46d ago

Yes, but > 90% of the proof work to be done is not that interesting insightful stuff. It is rather pattern matching from existing proofs to find what works for the proof you are currently working on.

If you've ever worked on a proof for formal verification, then its...work...and the nature of the proof probably (most probably) is not going to be something new and interesting for other people to read about, it is just work that you have to do.

[−] jhanschoo 45d ago

You're right, I misread your comment. Apologies.

[−] 3yr-i-frew-up 46d ago

[dead]

[−] anthk 46d ago

[flagged]

[−] integralid 46d ago

First of all, I think your comment is against HN guidelines.

And I expect GP has actually a lot of experience in mathematics - there are exactly right and this is how professional mathematicians see math (at least most of them, including ones I interact with).

[−] anthk 46d ago

Engineers, maybe. Not the case with Mathematicians.

[−] storus 46d ago

There are still many major oversimplifications in the core of math, making it weirdly corresponding with the real world. For example, if you want to model human reasoning you need to step away from binary logic that uses "weird" material implication that is a neat shortcut for math to allow its formalization but doesn't map well to reasoning. Then you might find out that e.g. medicine uses counterfactuals instead of material implication. Logics that tried to make implication more "reasonable" like relevance logic are too weak to allow formalization of math. So you either decide to treat material implication as correct (getting incompleteness theorem in the end), making you sound autistic among other humans, or you can't really do rigorous math.

[−] jojomodding 46d ago

People keep getting hung up on material implication but it can not understand why. It's more than an encoding hack--falsity (i.e. the atomic logical statement equivalent to 0=1) indicates that a particular case is unreachable and falsity elimination (aka "from falsity follows everything") expresses that you have reached such a case as part of the case distinctions happening in every proof.

Or more poetically, "if my grandmother had wheels she would have been a bike[1]" is a folk wisdom precisely because it makes so much sense.

1: https://www.youtube.com/watch?v=A-RfHC91Ewc

[−] YetAnotherNick 46d ago

The thing is if something is proved by checking million different cases automatically, it makes it hard to factor in learning for other proofs.

[−] pfdietz 46d ago

A few comments:

(1) Math journals are being flooded with AI slop papers loaded with errors. I can see a time when they will require papers to be accompanied by formal proofs of the results. This will enable much of the slop to be filtered out.

(2) Formalization enables AI to do extensive search while staying grounded.

(3) Formalization of the historical math literature (about 3.5M papers) will allow all those results to become available for training and mining, to a greater extent that if they're just given as plain text input to LLMs.

[−] casey2 46d ago

In the long run creating a certificate that guarantees a certain probability of correctness will take much less energy. Right now we can run miller-rabin and show with 1-(1/10^100) certainty that the number is/isn't prime. Similar for hash collisions, after a certain point these can't happen in reality. If Anthropic can get their uptime from 1 9 to 9 9s (software isn't the bottleneck for 9 9s) then we don't need formally checked proofs.

[−] johnbender 46d ago

I’m confused by the calculus example and I’m hoping someone here can clarify why one can’t state the needed assumptions for roughed out theory that still need to be proven? That is, I’m curious if the critical concern the article is highlighting the requirement to “prove all assumptions before use” or instead the idea that sometimes we can’t even define the blind spots as assumptions in a theory before we use it?

[−] zitterbewegung 46d ago

I think the future of having lean as a tool is mathematicians using this or similar software and have it create a corresponding lean code. [1] This is an LLM that outputs Lean code given a mathematical paper. It can also reason within lean projects and enhance or fix lean code.

[1] https://aristotle.harmonic.fun

[−] ux266478 46d ago

Rigor was never vital to mathematics. ZFC was explicitly pushed as the foundation for mathematics because Type Theory was too rigorous and demanding. I think that mathematicians are coming around to TT is a bit of funny irony lost on many. Now we just need to restore Logicism...

[−] anthk 46d ago

LLM's are not reproducible. Common Lisp, Coq and the like for sure are.

[−] dbvn 46d ago

There's no such thing as being too rigorous when you're talking about proofs in math. It either proves it or it doesn't. You get as rigorous as you need to

[−] j45 46d ago

Is digitized proofs another way of saying the equivalent of a calculator, when a calculator was new?

In math, rigor is vital, but are digitized proofs taking it too far? (quantamagazine.org)

107 comments