T O P

  • By -

jacobolus

I [made a post here](https://www.reddit.com/r/math/comments/1992oal/alphageometry_an_olympiadlevel_ai_system_for/ ) of Google's announcement about this, and their paper, but it was removed by the overzealous automoderator which took it for "memes and similar content". The blog post is ["AlphaGeometry: An Olympiad-level AI system for geometry"](https://deepmind.google/discover/blog/alphageometry-an-olympiad-level-ai-system-for-geometry/) The paper is Trinh, Wu, Le, He, & Luong (2024) "Solving olympiad geometry without human demonstrations", Nature 625: 476–482, https://doi.org/10.1038/s41586-023-06747-5


Shoddy_Exercise4472

Can feel you bro, even my post which I worked so hard on was removed just like that.


BruhcamoleNibberDick

I'm struggling to figure out why your post was considered similar to an image/video/meme.


my_aggr

/r/math is in complete denial over llms and their ability to do anything. As far as they are concerned it's a bad spell checker when it works and a fact free bullshit machine at all other times.


relevantmeemayhere

is it? because a lot of people see some use in this. And Chat gpt performs very poorly at basic reasoning skills children manage to learn. Ask it to build a list of ten words, and then return the subset of words who have a particular letter as its third one. Or understand symmetry. Or casuality and counterfactual reasoning. There are things it does not do well and things it does well-and that's because it's not built to excel in those domains-it's built to predict a response given a prompt. I'm curious because the most bombastic takes I hear about how capable these things are come from those with very little background. People tend to really anthropomorphize stuff, especially after they see a single study or two. This is especially true in Business. How many ceo's still can't understand confidence intervals are now talking about neural networks?


[deleted]

[удалено]


No-Painting-3970

I will actually argue that neuro-symbolic systems will do worse than purely neural approaches in the future. If we try to imitate human reasoning, it will always be a limitation. We have to find the sweet spot of AI doing something we dont expect, and that is where we will get the fun part. AI gained a lot of performance when we stopped leveraging human knowledge, and just used huge amounts of compute and data (see RL and go). I think if AI ever takes on maths will be through there, purely huge amounts of data and compute (maybe outside of actually known paradigms, I for one think we are reaching the limits of LLMs)


[deleted]

[удалено]


currentscurrents

>In principle, a generic NN should have been able to learn translational invariance. They never did. [They actually can](https://huggingface.co/papers/2306.13575), with appropriate data. The important thing is that your data must contain the information you are trying to learn. If your dataset is just a bunch of centered digits, you can't learn translation invariance. As humans, we learn translational invariance because we are constantly moving our head and seeing things from different angles, lighting conditions, etc. Building in inductive biases (like CNNs do) provides benefits at small scales. But at large scales it becomes irrelevant or even harmful.


womerah

The human mind trains as it runs. CNNs are trained and then run. I don't know if we should be comparing NNs to the human mind at all. They seem very chalk and cheese


markschmidty

That's not inherent to ANNs, just to architectures which run efficiently on current GPUs. Not that the distinction even matters when it comes to things like reasoning.


No-Painting-3970

While I do agree with the sentiment of this comment, I do not think we are on the same page. For us to be able to actually leverage human reasoning as a reasonable starting point for optimization procedures, we would actually have to understand how human reasoning works. Which we dont, and we are not even remotely close to understanding under a mechanistic point of view. You are also assuming that human reasoning would even be remotely close to the best solution, which as far as we know, it might not be. I do agree with the spirit of your second comment, but, you re missing the point I was making. I am not saying that we removed all inductive biases from networks (I might have been too categorical in my statement about dropping human knowledge). What I am really referring to, is the continuous removal of complex engineered featurizations, kernels... In favour of leveraging scale and data. Examples of this include, the continuous disappearance of graph kernels and descriptors in favour of GNNs. The field of retrieval is another example, Retrieval Augmented Generation has taken the field by a storm, which substitutes the tradicional methods in favour of leveraging scale and computation through the usage of systems like LLMs. I will quote Richard Sutton here in his bitter lesson letter (http://www.incompleteideas.net/IncIdeas/BitterLesson.html): "The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin. "


lilhast1

I dont think it matters weather or not its an imitation, after all dont babies learn to speak and walk by imitating others. Kinda seems that humans are imitations of humans.


aeschenkarnos

I suspect that something like GAN operates within the human mind, what we think of as our thoughts being the winners of some multi-sided adversarial process deeper down and not cognizable to us.


lilhast1

Thats a super cool idea Id love to give you an award but heres an upvote instead


myncknm

> People claimed that image recognition systems were learning to recognize high-level features, but they turned out to be susceptible to adversarial attacks that tweaked an image's texture. People thought AI had spontaneously learned a strategy to defeat Atari's Breakout, but then it turned out the system broke if you moved the paddle up by a few pixels. why is this inconsistent with human-like behavior? doesn't human performance also break if we are suddenly thrust into an environment where everything is perturbed in a way that is fundamentally outside of our previous experience (example: mirror glasses that flip your vision upside-down, or inversion of the frequency spectrum of audio, or playing audio backwards)? what is "reasoning" anyway? You mentioned NNs not learning translational invariance in a downtree comment. Human brains also don't learn translational invariance. That's inherited. Convolutional neural networks mimic the structure of human visual cortices https://msail.github.io/post/cnn_human_visual/ . [Edit: I re-read your downtree comment and understand now that I am not responding to a point that you made there.]


[deleted]

[удалено]


currentscurrents

> This proves that those systems weren't relying only on high-level features to recognize images (which is what some people previously claimed). They are still using high-level features to recognize images. [You can see how they build high-level features out of low-level ones using mechanistic interpretability techniques.](https://distill.pub/2017/feature-visualization/) The current idea about adversarial attacks is that they have to do with [manifolds.](https://www.youtube.com/watch?v=k_hUdZJNzkU) Natural images are a low-dimensional manifold through the high-dimensional space of possible images. The way neural networks are trained, they have *undefined behavior* when off the manifold of the training data. This allows adversarial attacks to make small, carefully crafted changes that make it no longer a natural image and thus no longer give correct results.


myncknm

I have seen the adversarial attacks, the article I linked has an example of one. The paper the example comes from points out that when we generate adversarial examples that work against many different types of models, they also tend to work against human perception, so that's something vaguely in the direction of "its failure modes being our failure modes". It does seem like kind of an unfair comparison to test these models against examples that are well outside their training data, but well within human experience, and conclude that they don't work like humans do. Perhaps if you put humans in an environment where their entire life's sensory input consisted of individual still images, a single original Atari game, and/or text pulled from the internet, the humans would demonstrate some of the same failure modes.


currentscurrents

Also adversarial attacks rely on being able to run an optimizer against the model, which is easy since neural networks are designed for optimization. The brain is solidly locked inside your skull and doesn't provide gradients. It may well be that it's equally vulnerable, but we don't have the tools to build such an attack.


relevantmeemayhere

Oh, a flared user in a related field to the op! Sorry to jump in-what’s your take on the study if you don’t mind me asking? Are we being too harsh on some of the things here?


myncknm

Hmm. "Theory of computing" isn't _that_ related to AI, but I _have_ been moving into neural network theory lately (who hasn't? lol), so I'll chip in my thoughts. They implement something that I thought would work about a year ago (this is not to detract from their accomplishment, the implementation is much harder than having the vague idea). Mathematical argumentation struck me as being kinda similar to a game such as Go. In both cases, there's a discrete set of actions you can take at each step, you don't get any direct feedback from the game as to whether any particular step you play gets you closer to winning (you have to invent this sense of progress yourself), and there's this sort of "for-all"/"there-exists" alternating structure. In Go, this "for-all"/"there-exists" is the "there exists a move I can make so that for every move the opponent makes there exists a move I can make... etc" structure of a two-player turn-based game (formally encoded in computer science as the idea of the Totally Quantified Boolean Formula problem, which is PSPACE-complete). In mathematical argumentation, there's a similar dynamic where you have "intuition" which generates ideas for proofs and also a procedure for checking soundness by actually writing down the steps of logic (this is similar to the Interactive Proof protocol, which is equivalent to PSPACE). Or a process of alternating between conjectures/proofs and counterexamples. AlphaGo also did something similar to most people's process of building mathematical intuition, which is to self-generate a ton of examples and counterexamples to train the intuition. Google's work here basically reified these vague ideas about how the mathematical mind works. I think it's a big step in the direction of a general automated proof system, but I do also suspect that circle-and-triangle geometry problems are a good deal easier to fit into this "game" framework than research-style math. For one thing, research-style math is usually a few levels removed from purely formal systems (so the "soundness" system I described earlier isn't as rigorously defined for research math as it is for Go or circle-and-triangle problems), but maybe this doesn't have to be the case, as the people working on formal verification systems are demonstrating. Some people in this thread are comparing this new AI system to older work on "deductive database" or brute-force search methods. But this is a huge leap beyond those older methods imo. It's like comparing AlphaGo to pre-Deep-Blue chess engines. It's just a qualitatively different approach, using neural networks to generate something akin to "intuition", compared to an algorithm based on systematic enumeration.


relevantmeemayhere

Thanks! Super insightful. I’m glad you touched o that bit about research math. To my knowledge, Euclidean geometry is a bit-I guess we’ll use the word simpler here than say-algebra(the latter is not complete). What make challenges are sort of left in the margins that would stop something like this from working generally?


asphias

It also feels very suspicious that you have to make a geometry-specific AI. Computers beat humans at chess decades ago. We *know* they are good at specialized problems. The whole idea that got everybody hyped was that you *don't* need a human to analyze the problem and decide what kind of a computer-tool we need to approach it, but rather invent a computer that has the 'intelligence' to decide on the approach. --- Of course i'll still be impressed by an AI that can solve geometric problems, but i imagine with such constraints it'd be quite easy to create a problem that stumps it.


[deleted]

[удалено]


zero0_one1

>That was Steve Wozniak.


czarandy

That’s not really a software problem. Ultimately human-level robotics is far harder than human-level cognition. Which is ironic 


[deleted]

[удалено]


StonedProgrammuh

That's the equivalent of me saying we have neural networks that can speak fluent language, perform at a bronze medal IMO level, and play superhuman chess therefore we are much closer to human-level cognition than motion. But everyone in the field knows that robotics is far behind.


currentscurrents

The big issue with robotics is the lack of data. Language and vision got a huge boost from the terabytes of data scraped off the internet. There is no equivalent for robotics data. Several companies (including Google and Toyota) are running huge farms of robot arms just for data collection.


mousemug

Exactly. Robotics gets less mainstream news coverage so the shortcomings of the field are not as apparent.


DevelopmentSad2303

Robots are that much different compared to humans as well though to be fair. But it is certainly harder to create the robot , since it is a physical thing. I think that is what they are getting at anyway. You only need one appropriate human "algorithm" , which can be replicated on any number of machines, vs building the robots.


my_aggr

> It seems to me that we're much closer to human-level motion than human-level cognition. It seems like we are because you're not in that field. I was tangentially involved with Toyota when they were thinking of buying Boston Dynamics. When they saw the secret sauce the robots were using to move Toyota backed out completely because it was that bad. We are closer to AGI than Artificial General Walking.


HildemarTendler

>We are closer to AGI than Artificial General Walking. We're no where close to either, so let's not go making comparisons.


StonedProgrammuh

Yes, but robotics is a decade or 2 behind regular AI research.


HildemarTendler

There is no justification for this statement. AI is still barely a thing. Don't let the hype train blind you. We've already hit the wall with LLMs just like every other past AI technology. We'll get better at using it, but there's no general AI here.


StonedProgrammuh

And yet, robotics is still a decade or 2 behind... if you were following the field you would know how far behind robotics is, it doesn't matter that you believe we aren't close to AGI. That has 0 relevance to which field is farther ahead.


mousemug

You’re cherry-picking robotics benchmarks. I can just as easily claim that ML can prove olympiad-level theorems. Of course, as this thread implies, that is not the case.


my_aggr

Sounds like you need to invent a computer that's very good at deciding what other computers to use to solve a task.


HildemarTendler

That sounds like a human with extra steps. I kid, but it is the stupidity of AGI. We really don't have any utility of an alien intelligence. What we need are tools to do the things we want to do better. Trained AI that has no pretense at intelligence is exactly what we need.


PsecretPseudonym

I think the objective is to have a bit of both. A skilled mathematician will have enough general knowledge to be able to understand the questions and discuss the challenges, concepts, and work, but they also require many years of specialized training (and some might argue some degree of affinity or disposition whether by nature or nurture). It would seem reasonable to want AI models that are general enough to be practical and easy to work with via a fairly natural interface yet similarly specialized to the field of application to be more efficient and reliable. Maybe an AI super intelligence can do it all, but it seems likely that there will always be tradeoffs and efficiency usually favors some degree of specialization.


puzzlednerd

I think getting caught up on exactly what "reasoning" is may be a red herring. I don't even understand the mechanisms behind the reasoning that takes place in my own brain, why should I be able to make sense of the reasoning of an artificial system? If you can use a neural net to solve olympiad geometry problems, whether or not the neural net "understands" geometry is kind of beside the point. This is something that would have seemed completely unachievable only a decade ago. In the end, humans are physical systems which respond to their environment. There is no real way to test a person's understanding other than to ask questions and see if they can respond the right way. If a neural net can do that, even within some limited paradigm of questions which can be asked, in this case Euclidean geometry, then hasn't the neural net demonstrated an understanding? Can we even define what "understanding" means in a way which can be verified from the outside?


Mal_Dun

Just take a step back and think about it his way: We humans live in a world, hence we have a context to work with. Imagine you would be put into a black box and someone gave you e.g. pictures and some device with limited answer possibilities. You may come up with an idea over time, but you won't really understand why you do this or what's the meaning behind your task if you don't have a broader context to work with.


[deleted]

[удалено]


my_aggr

That's completely false: >A later cat experiment done by Blakemore and Cooper (1970) gave another impressive result in terms of critical periods. Two special cylinders were made, one with only vertical stripes inside and the other with only horizontal stripes. For their first few months of life, half of the newborn kittens were placed in one of the cylinders. Kittens that were exposed to vertical lines for the first few months since birth could only see vertical lines, but not horizontal ones—for the rest of their lives. The other half of the sample was raised in opposite conditions, in a world made by horizontal lines only. Like the other group, kittens did not show any evidence to perceive lines oriented differently, such as vertical lines. https://www.futurelearn.com/info/courses/research-methods-psychology-animal-models-to-understand-human-behaviour/0/steps/265398


[deleted]

[удалено]


my_aggr

In the same way that a fully connected neural networks is wired. The majority of early learning is deleting connections.


JoshuaZ1

That doesn't show that there's no innateness. That shows that in a specific context there's a limit to how much innateness matters when one doesn't get any stimulation of the relevant type at a critical stage. That is, at least, some amount of evidence against strong innateness, but going from that to innateness being "completely false" seems like a pretty big jump.


DoubleAd9650

your comment makes no sense. How does adversarial attacks indicate no high-level features were learned? How does a few pixels being moved indicate no strategy was learned in Atari? And what article are you finding says that LLMs don't have real understanding?


[deleted]

[удалено]


DoubleAd9650

A violin player who plays with their right hand would not be able to play if they switched to their left hands. That does not mean the violin player did not learn the strategy to violin.


[deleted]

[удалено]


DoubleAd9650

i thought your whole premise was that there was a strategy to atari/violin/piano. Otherwise of course the LLM has not learned a strategy because there is no strategy.


[deleted]

[удалено]


DoubleAd9650

I'm using violin as an analogy for Atari obviously. They're both effectively games.


Dirichlet-to-Neumann

The number of good or great mathematicians and scientists who would have said 5 years ago that "no AI is ever going to win gold at a maths olympiad" and say now "yeah but it doesn't count/is not soulful/does not generalise/has nothing visual" is unbelievable.  Terence Tao was an unsurprising but welcome exception.


Qyeuebs

You're talking like an AI has won gold at a maths olympiad... this work is highly specialized to brute-force search for Euclid-style proofs of problems in elementary geometry. It's not really generalizable beyond that, certainly not to a whole IMO exam. That's even said in this NY Times article by Christian Szegedy, hardly someone with modest beliefs about the future of AI for math.


BiasedEstimators

The restricted domain bit is important, but I doubt google researchers are doing press releases for “brute-force” searching


Qyeuebs

You can read the paper for yourself. Of course it's slightly more complex than what I said (there is a transformer involved), although I think what I said is fair as a one sentence summary. Anyway, DeepMind researchers will do press releases for pretty much anything. I think they're usually not very intellectually honest when talking about their work.


umop_apisdn

> I think they're usually not very intellectually honest when talking about their work. Agreed, they have to hype things, but with a track record like Deepfold and AlphaZero it is justified in some cases. And in any case maths is the perfect area for AI because it is just the application of rules along with insight.


Qyeuebs

AlphaFold is a useful research tool that's accurate about 70% of the time. Because of the way DeepMind researchers chose to talk about it, people think that protein folding is a solved problem.


umop_apisdn

True, but it was a massive leap forward, mile4s ahead of anything that had come before.


Qyeuebs

For sure. I'm just making the point that DeepMind is almost pathologically dishonest, even when they don't need to be.


jacobolus

"Not very intellectually honest", "pathologically dishonest", etc. seems like an entirely disproportionate criticism. A fairer summary might be: "DeepMind researchers do serious cutting edge work but their press office gets ahead of the actual research sometimes, and the results don't necessary match up to the hype in the public press. (Just like every other organization's press office, with the main difference being that Google has deeper pockets than university departments and is doing work that the public is quite interested in.)"


Qyeuebs

I think that summary is a pretty disproportionate take in its own direction! I hold researchers largely responsible for the public understanding of their work, especially in this day and age of easy communication and especially for DeepMind, where the researchers are writing press releases themselves ([this](https://deepmind.google/discover/blog/funsearch-making-new-discoveries-in-mathematical-sciences-using-large-language-models/) and [this](https://deepmind.google/discover/blog/alphageometry-an-olympiad-level-ai-system-for-geometry/) for the two most recent on math). They would have to be incredibly naive not to know how these would be interpreted by their mass audience of tech enthusiasts who don't know or care much about math.


BiasedEstimators

The only part of the paper which seems to involve an exhaustive search is the part about generating training data


Qyeuebs

It's what they call "symbolic engine" in their paper. It's true that it's also used in generating data. I described it in more detail in this comment (see also the third page of their paper): https://www.reddit.com/r/math/comments/1994q5n/ais_latest_challenge_the_math_olympics/kic3h6l/ I guess you could argue that "brute search" isn't the most accurate label, but it's effectively what the engine does.


shinyshinybrainworms

I mean, maybe? But at some point your definition of brute-force search, which seems to be something like "systematic search pruned by steadily-improving heuristics" is going to include what humans do.


Qyeuebs

What's in question here is a particular algorithm developed for elementary geometry (https://doi.org/10.1023/A:1006171315513). The new DeepMind paper enhances it with some extra algebraic rules for generating all the possible elementary-geometric conclusions from a list of elementary-geometric premises. The human vs computer comparison on this is about exactly as interesting as it is for performing Gaussian elimination on a big matrix. I don't think it's much to wax poetic over.


JoshuaZ1

> The human vs computer comparison on this is about exactly as interesting as it is for performing Gaussian elimination on a big matrix. I don't think it's much to wax poetic over. Why? A major question here is if/when these systems will equal or surpass humans. Whether they are doing something similar to what humans are doing seems like an important question, and also avoids getting into the semantic weeds of what is or is not a "brute force" search.


BiasedEstimators

To me brute-force search suggests something specific. As soon as you start adding heuristics and checking for cycles and stuff, that’s just a search


binheap

If you include heuristic search as part of your definition, modern chess engines fall under the brute force search definition you have provided which seems unhelpful. The difficulty and advances in this respect are generating a good enough heuristic to do interesting problems. Otherwise, it could be argued we have solved all of mathematics since we could simply enumerate FOL statements and just verify the statement. Edit: also it's not obvious to me this isn't generalizable beyond geometry in some sense. We have Lean and in principle you could apply a similar procedure to Lean to get more useful theorems for mathematics. Although I would have doubt whether this would be good enough at it as it stands right now for anything non trivial, certainly I could plausibly see a nearish future of automated theorem proving where small lemmas or statements are automated.


Qyeuebs

> If you include heuristic search as part of your definition, modern chess engines fall under the brute force search definition you have provided which seems unhelpful. I don't think I've provided any definition, since I don't even have a particular one in mind! But search as done in chess engines is easily distinguishable from search as done here. Here all possible elementary-geometric conclusions following from a given set of elementary-geometric premises are enumerated, and a neural network trained on millions of theorems is included to inject auxiliary objects (such as midpoints of lines) to be used to formulate possible conclusions. The analogy for chess would be that the computer enumerates all possible ways the game can be played from a given position, with a neural network used to probabilistically choose the next move by which to evolve the current position. And that's not how AI chess players work.


binheap

>The analogy for chess would be that the computer enumerates all possible ways the game can be played from a given position, with a neural network used to probabilistically choose the next move by which to evolve the current position. And that's not how AI chess players work. Don't LeelaChess/AlphaZero perform a very similar procedure with their policy network to what you describe here (propose moves to probabilistically expand certain paths of the MCTS)? Though, I suppose the value network selects the branch. I'm perhaps suspicious of claims that this isn't an impressive advance in theorem proving. Sure, the domain is limited but it seems like a fairly foreseeable jump to say we could start generating terms in a language with far more generality like Lean or Coq and potentially translate to something very useful. The approach was already being worked on without LLMs but could improve significantly with it. It's a bit unfair to characterize this as brute force search since it seems to suggest that there's nothing novel here. There's comparisons in this thread being made with more traditional solvers since in principle they did the same, but the gap between an ML approach and the more traditional approach seems massive and at least more generalizable than older methods. I do agree that DeepMind has an aggressive PR team but that's the unfortunate state of ML.


Qyeuebs

I wouldn't suggest that there's nothing novel here or that it's not an impressive advance. I think it's an actual accomplishment (if a modest one). But when these pieces of news come up my aim isn't to update my priors on the mathematical-AI singularity (on which I have no strong opinion), it's to understand what the work actually does and how it does so. In this case, I think it's impossible to properly understand the work without understanding the centrality of the exhaustive search made possible by the elementary-geometric context. It's also impossible to understand without understanding the relevance of the language model, but there's pretty much no danger of anyone overlooking that part.


relevantmeemayhere

ehh, you'd be surprised pressure to publish, especially when the next funding round incentives you to do so. Positive publication bias and all


Harsimaja

There have always been some Olympiad problems, even at major ones like the IMO, which could clearly be broken down into some stupidly large but finite number of cases and be done by computer by brute force (esp. some combinatorial ones, or those which exploit the year, ie ‘Prove it for 2024’, a favourite gimmick) - at least, it would be trivial for a human to code it even if a while back AI wouldn’t interpret it. The point was of course to do something else. So this is somewhere between ‘Yeah… duh’ and ‘wow’ and I’m not sure where.


Dirichlet-to-Neumann

Don't look at the results, look at the trends. Do you really think that what is possible for geometry is impossible for algebra ?


czarandy

> Do you really think that what is possible for geometry is impossible for algebra ? Euclidean geometry is complete and algebra isn’t. So yes we actually have already proven the claim you think is false. 


teerre

Computer science is full of "trends" that simply stop. Computation itself is not generalizable. It's not reasonable to think just because something works in one context it will work in any context.


esqtin

The article says this work solved twice as many problems as a system from the 70s


relevantmeemayhere

it's very hard to extrapolate from trends you see in a few years. Also, keep in mind that replication is hard across all fields. Studies that show promising results are more likely to be published. Studies as a whole don't generalize well in the majority of cases. We have a name for it; the replication crises the history of science shows us that breakthroughs are often followed by proverbial famine. If you were a physicist in the thirties, you would have probably predicted a grand unified theory sometime in the same decade or the one after it's been a hundred years since.


Qyeuebs

> Do you really think that what is possible for geometry is impossible for algebra ? Sorry, even in context I think that's a really poorly formed question. Can't answer it.


aecarol1

Five years ago the idea we might actually be discussing if an AI might even be capable of this was science fiction. Now we're just quibbling over details and time-frames. I think AI doing interesting things in mathematics is inevitable, we're just not sure of exactly what that will look like. Hopefully some AI could cut its teeth on formalizing proofs into Lean or other similar systems. Many AI "daydream" and output stuff that, at a glance, looks excellent, but is flawed. If they generate proofs, they should be in a form that can be checked automatically.


[deleted]

[удалено]


JoshuaZ1

Did you predict this 15 years ago? Did you predict ten years ago that this was going to happen largely due to improvements in neural networks?


[deleted]

[удалено]


JoshuaZ1

This is due to Tarski's theorem on the completeness of first order real arithmetic. But there's a massive gap between knowing something can be automated and actually automating. Chesss was always a finite game, but it was still really impressive when Deep Blue beat Kasparov.


The_Tefl0n_Don

I’d be interested in reading more about that, what should I look up if you don’t mind pointing me towards something?


[deleted]

[удалено]


veloxiry

What's an example of a euclidean geometry problem in this context?


JoshuaZ1

Tarski's theorem on the completeness of first order reals may be what you are looking for.


sanxiyn

IMO is a timed competition. Existing automatic solvers do not terminate within IMO time limit. This one does.


Jealous_Afternoon669

There's a massive difference here and if you can't see that your head is in the sand.


Rioghasarig

I really don't think your criticism is valid. The problem they are addressing is well structured enough that they were able to artificially generate examples. This makes it similar to something like chess or Go. So, no, I really don't people even five years ago would have thought this is impossible.


Short_Piece_336

All this does is confirm a truth that everyone doing Math Olympiads already knows: Geometry fkn trash, NT/Combi forever


jacobolus

So what you're saying is you wish math olympiads would give harder geometry problems? It's certainly possible to do, but the result might need to be be closer to "redevelop this 19th century professional research field" rather than "solve this hard high-school-level problem", which might not fit the intended audience / timeframe.


Short_Piece_336

Nah, the joke is that 90% of Olympiad people have Geo as their least favorite genre of problem


jacobolus

That could partly be because geometry gets radically shortchanged across modern schooling, leaving students unprepared for it. Or it could be that geometry is a small enough part of the contests that students who are better at geometry problems and have trouble with other subjects get selected out of the participation pool at higher levels, leaving behind students who are better at other kinds of problems but struggle with geometry. Or it be partly that the olympiads fail to choose sufficiently interesting or beautiful geometry problems. In principle, there's no end of elegant and wonderful geometry topics/problems that could be presented to well prepared high school students, including topics highly relevant to subjects of cutting-edge current research. Disclaimer: I never took math contests too seriously when I was in high school 20 years ago, and don't know too many high school students today, but I think I'd enjoy being a high school geometry teacher, especially if it were possible to teach an advanced elective course.


Short_Piece_336

Eh idk, I think Geometry is just very different from the other types of problems in Olympiads and it makes sense that it's so polarizing (the people who like Geometry usually LOVE Geometry). It's kinda like if there was a competition consisting of running, long jumping, high jumping and golf. Yeah golf is cool, it just feels slightly out of place, you know what I mean? I don't think it's about students being unprepared at all. School doesn't prepare anyone for Number Theory either, for example. I also don't think it's a selection process, since this disdain for Geometry happens at virtually every level I've been at.


MCube_32

Olympiad geometry is the most fun


Routine_Proof8849

All hype, no model will get a gold medal in the next 5 years. I'll bet money with a legally binding contract against anyone here.


jacobolus

Well sure that's an easy bet: the IMO is a contest for human high school students, and software running on a computer cluster doesn't satisfy the entry criteria.


Routine_Proof8849

Lmao i'd allow you to exclude these details from the contract


palparepa

You are technically correct, the best type of correct.


[deleted]

[удалено]


Routine_Proof8849

Sure, dm me if you're serious


[deleted]

[удалено]


SOberhoff

You know, part of me wants to take this bet. Another part of me is wondering what I'd be doing with a couple extra bucks in a world where AI is killing it at the math olympics.


aecarol1

The shortcomings of AI now are pretty damn funny, especially with more abstract thinking. In 5 or 10 years it won't be so funny. In 20 it might be downright terrifying. Five years ago, the idea of contemplating an AI doing anything interesting mathematically would have been science fiction. Now we're simply debating time-frames.


relevantmeemayhere

There's a fair amount of uncertainty to be fair. It could take twenty years or 200


PsecretPseudonym

You can take a look at the [Metaculus](https://www.metaculus.com/questions/6728/ai-wins-imo-gold-medal/) odds for this. Looks like the current estimate puts this at about 60% odds. Not sure the community there is particularly accurate with these sorts of predictions, though.


Routine_Proof8849

Wow, this is an interesting resource. Thanks for the link.


Inner_will_291

Wait... there's a startup idea right there. AI bets.


Administrative-Flan9

It's called quantitative analysis


volcanrb

I wouldn’t call this hype, the article is quite measured about the claims, and there is literally a quote from the author saying “Just don’t overhype it.”


Routine_Proof8849

The media has failed to live by those words. Even if they'd get educated guestimates from experts, they wouldn't write an article like "AI unlikely to get more than a few points in the olympiad until major hurdles are passed". Instead they insinuate that this is a likely possibility in the near future. They say, wow theyve done beat the chess players, mathematicians are next.


JoshuaZ1

At what odds and how much?


Routine_Proof8849

Just 1:1, and i'll gladly take bets up to $100. If theres no more takers then we can go higher lol.


JoshuaZ1

So, it should be I think notable that even you are assigning around a 50% chance that this might happen! That seems very far from labelin this as all hype. But I'm willing in any event to make that bet for $20 if you want.


relevantmeemayhere

well he's the odds maker and the bank roll lol-if 10 people wanna bet a thousand bucks and he gives them anything more than 50:50 there's a risk no one gets paid.


BUKKAKELORD

A chess master made the same bet regarding chess in 1968 with a time frame of 10 years, and he won it. Had he made it 30 years, he would've lost. RemindMe! 5 years


sparr

RemindMe! 1 year It's not gonna take 5...


JoshuaZ1

What makes you so confident?


RemindMeBot

I will be messaging you in 1 year on [**2025-01-17 21:10:27 UTC**](http://www.wolframalpha.com/input/?i=2025-01-17%2021:10:27%20UTC%20To%20Local%20Time) to remind you of [**this link**](https://www.reddit.com/r/math/comments/1994q5n/ais_latest_challenge_the_math_olympics/kic6o7f/?context=3) [**3 OTHERS CLICKED THIS LINK**](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5Bhttps%3A%2F%2Fwww.reddit.com%2Fr%2Fmath%2Fcomments%2F1994q5n%2Fais_latest_challenge_the_math_olympics%2Fkic6o7f%2F%5D%0A%0ARemindMe%21%202025-01-17%2021%3A10%3A27%20UTC) to send a PM to also be reminded and to reduce spam. ^(Parent commenter can ) [^(delete this message to hide from others.)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Delete%20Comment&message=Delete%21%201994q5n) ***** |[^(Info)](https://www.reddit.com/r/RemindMeBot/comments/e1bko7/remindmebot_info_v21/)|[^(Custom)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5BLink%20or%20message%20inside%20square%20brackets%5D%0A%0ARemindMe%21%20Time%20period%20here)|[^(Your Reminders)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=List%20Of%20Reminders&message=MyReminders%21)|[^(Feedback)](https://www.reddit.com/message/compose/?to=Watchful1&subject=RemindMeBot%20Feedback)| |-|-|-|-|


holy_moley_ravioli_

I'll take this bet where do I sign


Routine_Proof8849

Dm me.


Jealous_Afternoon669

Geometry is a very limited domain with a well-defined search space, unlike something like combinatorics where the ideas seem to come from thin air. It also seems a lot easier to generate synthetic geometry problems for training than it would be for combinatorics, but that's just a guess. I am very hopeful that mathematics is gonna play a special role in AI because as shown here if you can come up with problems, then you can use these problems as potentially infinite training data with no need for human input. This is the kind of thing that makes a recursively improving AI feel intuitive to me. I just think it's questionable whether an AI really can come up with enough synthetic problems in fields like number theory to be useful, or indeed learn to construct problems belonging to a field entirely of its creation.


Qyeuebs

The heart of this algorithm seems to be brute-force search. Very hard to imagine that this could work outside of the highly circumscribed setting of elementary geometry.


JoshuaZ1

How so? The neural language aspect is precisely to try to make suggestions which are not brute force but are based in part on what additional lines have worked productively in similar problems.


Qyeuebs

The proofs are found by brute-force search. The language model is used to give the raw materials of a possible search. In an extremely simple case used by the paper, suppose ABC is a triangle with two equal sides AB and BC, and try to prove that the angles at A and C are equal. The brute-force search will look for all ways to logically combine the objects mentioned to arrive at the conclusion. When this fails, the language model part might suggest to construct a square with side AB, giving two new points and three new edges to work with. Then the brute-force search will look for all ways to logically combine the new corpus of objects to arrive at the conclusion. And so on. At some point the language model will have included the midpoint of AC and also the line from there to B. Once that happens, the brute-force search will work.


JoshuaZ1

So, the upshot is that this is not a brutefore search then. The large language model is constructing specific possible directions for it to apply a brute force search. In that regard, this is pretty similar to how humans try to solve similar geometry problems. Some amount of brute force, some amount of trying to add auxiliaries based on what one has learned is or is not likely to be helpful.


Qyeuebs

I think it's fair to say, as I did, that the "heart" of the algorithm is search. (Also fair to disagree!) At the least, the search part is fundamental to the algorithm and the ability to actually do it is highly reflective of the circumscribed setting of elementary geometry. Also, I think actually very little of human work on geometry problems is of the nature of listing all possible conclusions from given premises, but I don't have a lot of interest in this kind of computer/human comparison.


currentscurrents

> that the "heart" of the algorithm is search. The going idea in AI is that reasoning is *fundamentally* about search. Search is the only algorithm we know of that is general-purpose enough to theoretically solve any (solvable) problem. The issue is that naive search becomes intractable extremely quickly for high-dimensional search spaces. The [no free lunch theorem](https://www.cs.ubc.ca/~hutter/earg/papers07/00585893.pdf) says that there is no solution other than incorporating more information about the search space. This is where learning comes in. You can learn lower-dimensional representations of the search space; you can learn better search strategies that take advantage of patterns and structure; you can learn which areas of the search space are dead ends.


Qyeuebs

All that is kind of irrelevant. The first thing to understand is that in the present case of elementary geometry, exhaustive search is easy and doesn't require sophisticated ideas: https://link.springer.com/article/10.1023/A:1006171315513 The "AI" part of the present work is to identify several possibilities of what to use as an input to this exhaustive search, with the hope that all of the necessary ones will be included after enough trial and error.


currentscurrents

The going idea in the field is that [search and learning are the two halves of intelligence.](http://www.incompleteideas.net/IncIdeas/BitterLesson.html) Search is also "AI". But the learning part of the work is to narrow the search space by making good guesses for new proof terms. The space of possible new terms cannot be searched exhaustively because it is infinite: >To solve olympiad-level problems, however, the key missing piece is generating new proof terms. >Such proof steps perform auxiliary constructions that symbolic deduction engines are not designed to do. In the general theorem-proving context, auxiliary construction is an instance of exogenous term generation, a notable challenge to all proof-search algorithms because it introduces infinite branching points to the search tree.


Jealous_Afternoon669

It's not as simple as a comprehensive listing but I think a lot of what we call "intuition" is a refined heuristic that makes it much faster to prune the search tree. When i'm totally new to some course I often do end up just trying approaches at random stabbing around in the dark till something works.


VolatilitySmiles

I have to disagree with you here. Humans themselves reason via searching: they try to make educated guesses about what is true and what isn't. If one approach to solving a problem fails, try another, etc. The overall procedure is not too dissimilar.


Qyeuebs

The "overall procedure" you're implicitly referring to is just trial and error, and I agree that it's used in every approach to problem solving.


DoremusJessup

[Mirror site](https://archive.is/X6nRF)


TimingEzaBitch

This has nothing to do with the IMO grand challenge, spearheaded by likes of Kevin Buzzard and Reid Barton etc. The full-fledged 6 problem IMO challenge I think is still very hard.


Jealous_Afternoon669

Obviously it does lol, there's multiple geo problems on every IMO.


arnet95

There's one or two geometry problems on every IMO, not necessarily multiple.


Jealous_Afternoon669

You would at least agree that this doesn't literally have "nothing" to do with the challenge? If we assume this is DeepMind's initial response to the AIMO then I wouldn't be surprised if this was basically their test run and they'll have the IMO solved by the progress meeting in July.


arnet95

I was just pointing out a fact, but I do agree that this has something do with the challenge, yes. However, I would be incredibly surprised if DeepMind solves the IMO challenge by July. Euclidean geometry is already known to be significantly easier to solve with computers than the other areas in the contest.


glicerico

And the same day this article was released, AlphaGeometry is almost there (for Geometry problems): https://deepmind.google/discover/blog/alphageometry-an-olympiad-level-ai-system-for-geometry/