Tag: DeepMind
DeepMind’s New AI Found A Strange New Way To Think
The Signal
DeepMind’s new system reports solving nine previously unsolved Erdős problems from a curated subset of 350. The presenter argues this success marks a shift from relying on raw model intelligence to building tighter, iterative harnesses, though the result remains contested due to potential selection bias and the system's reliance on large base models.
The Case
- AlphaProof Nexus, the reported system, uses Lean—a formal proof language—and an iterative loop where an AI agent attempts proofs, a critique model refines them, and a judge model selects winners.
- The system achieved a 95.7% failure rate across the 350-problem subset, solving nine items at an estimated cost of a couple hundred dollars per successfully verified proof.
- The subset tested was explicitly chosen for ease of formalization, leaving it unsettled whether this approach can scale to the full set of approximately 1,200 Erdős problems.
- Smaller models performed entirely unsuccessfully in this configuration, indicating that a complex harness does not yet act as a complete substitute for high-level base model capability.
- The claim that "everyone is doing" this formalization loop today is an unsupported marketing-style assertion; similarly, the presenter's framing that the judge "cannot lie" is rhetorical rather than a proven reliability metric.
The 1 Minute Signal Take
This result is a legitimate, documented milestone in proof-search automation, but the presenter’s framing of a "clear" progression remains speculative. The video is worth a watch to see the exact tournament loop architecture, but skip it if you are looking for an unbiased analysis of the method's generalizability beyond the cherry-picked benchmark.
Time saved:
Tags
Tag: DeepMind
