Story
MerLean: An Agentic Framework for Autoformalization in Quantum Computation
Key takeaway
MerLean is a new tool that can automatically convert mathematical concepts described in research papers into verified computer code, making it easier for scientists to build on each other's work in quantum computing.
Quick Explainer
MerLean is an innovative framework that automates the process of translating mathematical statements from research papers into verified formal code. The core idea is to extract the key definitions, theorems, and lemmas from the paper's LaTeX source, iteratively formalize them into Lean 4 code, and then verify that the formal representation accurately captures the original intent. This end-to-end automation, spanning extraction, formalization, and faithfulness checking, is the key innovation that enables MerLean to scale the rigorous validation of rapidly growing theoretical quantum computing research.
Deep Dive
Technical Deep Dive: MerLean: An Agentic Framework for Autoformalization in Quantum Computation
Overview
MerLean is a fully automated framework for translating mathematical statements from research papers into verified Lean 4 code. It operates in two complementary pipelines:
- Autoformalization: MerLean extracts mathematical statements from LaTeX source files, formalizes them into Lean 4 code, and verifies the formalization is faithful to the original intent.
- Autoinformalization: MerLean translates the verified Lean code back into human-readable LaTeX for expert review, highlighting any unverified assumptions.
The key innovation is MerLean's ability to perform this end-to-end formalization process fully autonomously, without human-in-the-loop intervention during the core formalization work.
Problem & Context
The volume of theoretical quantum computing research has been growing rapidly, with over 11,000 arXiv submissions in 2025. This creates a verification bottleneck that threatens to outpace the peer-review system. Automating the formalization of these mathematical results is critical to scaling rigorous validation.
While prior work has demonstrated autoformalization capabilities on isolated statements, MerLean is designed to formalize entire research papers, handling the complex dependencies and logical gaps across interconnected definitions, lemmas, and theorems.
Methodology
MerLean's autoformalization pipeline consists of three main stages:
- Statement Extraction: MerLean first extracts all mathematical statements (definitions, theorems, lemmas, etc.) from the LaTeX source into a structured JSON representation.
- Iterative Formalization: For each statement, MerLean generates Lean 4 code, compiles it, and iteratively fixes errors based on compiler feedback and tool-assisted exploration of the Mathlib library.
- Faithfulness Checking: After successfully formalizing a statement, MerLean reflects on whether the result matches the original meaning, rejecting misrepresentations.
The autoinformalization pipeline then reverses this process, translating the verified Lean code back into human-readable LaTeX.
Data & Experimental Setup
MerLean was evaluated on three papers in theoretical quantum computing:
- Balanced Product Codes (published)
- Fault-Tolerant Quantum Computation (published)
- Quantum Topology (unpublished)
These papers cover a range of mathematical concepts, including homological algebra, stabilizer codes, fault-tolerant protocols, and algebraic topology.
Results
Across the three papers, MerLean formalized a total of 114 statements into 2,050 Lean declarations in under 42 hours. Key results include:
- MerLean achieved end-to-end formalization on all three papers, including an unpublished manuscript to ensure zero data contamination.
- The formalization required explicit axiom declarations for 9.1% of statements, corresponding to mathematical machinery not yet available in Mathlib.
- Theorems were the most challenging, requiring 39 minutes on average and 22 compile attempts, while remarks were the easiest at 11 minutes and 7 attempts.
- MerLean frequently introduced auxiliary lemmas not stated in the original papers, bridging logical gaps in the formalization.
Interpretation
MerLean demonstrates that fully automated formalization of frontier research papers is feasible, even for highly technical domains like quantum computation. The bidirectional pipeline, coupling powerful LLM reasoning with interactive theorem proving tools, enables both machine-verified correctness and human-verifiable semantic alignment.
This work opens up several exciting applications:
- Research Assistant: The autoinformalization blueprints can help researchers improve the rigor of their own definitions and proofs.
- Formalized Peer Review: Agentic autoformalization could transform peer review, providing machine-verified guarantees to focus reviewers on scientific significance.
- Synthetic Data Flywheel: High-quality (natural language, formal code) pairs mined from research papers can train the next generation of reasoning models.
- Contributing to Physics Libraries: MerLean can accelerate the formalization of physics concepts in dedicated Lean libraries.
Limitations & Uncertainties
- Mathlib Gaps: The evaluation highlighted gaps in Mathlib's coverage of specialized physics concepts, requiring explicit axiom declarations. Expanding Mathlib's domain-specific support is an important direction.
- Faithfulness Checking: Ensuring the formalized code faithfully represents the original intent remains a critical challenge. MerLean's multi-stage approach, including autoinformalization, aims to minimize hallucinations, but further research is needed.
What Comes Next
Going forward, the authors plan to:
- Evaluate MerLean on other scientific domains beyond quantum computation, such as algebraic geometry and number theory.
- Create a dedicated benchmark comprising diverse research papers across multiple mathematical fields.
- Explore applications in research assistance, formalized peer review, synthetic data generation, and contributing to domain-specific Lean libraries.
