Story

Can LLM generate interesting mathematical research problems?

Artificial IntelligenceMath & Economics

Key takeaway

Large language models have shown they can generate novel mathematical research problems, suggesting they may assist human mathematicians in discovering new frontiers.

Read the paper

Quick Explainer

The researchers developed an AI agent called DeepMath-generate to explore whether large language models can autonomously generate novel and valuable mathematical research problems. The agent consists of a problem generator and an evaluator that iterates to refine the problems. The key insight is that rather than just solving existing mathematical problems, the agent attempts to define new concepts, methods, and objects that could open up new areas of inquiry. While the generated problems show mathematical promise, the researchers note they are not yet at the groundbreaking level of the Poincaré Conjecture, and further work is needed to improve the agent's prompting and design to enhance its mathematical creativity.

Deep Dive

Technical Deep Dive: Can LLM Generate Interesting Mathematical Research Problems?

Problem & Context

This research investigates whether large language models (LLMs) can generate valuable and cutting-edge mathematical research problems. The authors note that most current research on applying LLMs to mathematics focuses on whether they can solve existing problems, rather than on their ability to generate new mathematical concepts, methods, or objects.

The authors define three key criteria for evaluating the mathematical creativity of LLMs:

Generation of New Concepts: Introducing unprecedented mathematical ideas that open up new fields of research, such as the Riemannian metric laying the foundation for modern differential geometry.
Invention of New Methods: Devising innovative techniques to solve previously intractable problems, like the Bochner technique connecting geometry and analysis.
Creation of New Mathematical Objects: Constructing specific mathematical objects, such as auxiliary functions or counterexamples, that enable proofs and a priori estimates.

Methodology

The authors develop an agent called "DeepMath-generate" to explore whether LLMs can produce valuable and unknown mathematical research problems. The agent consists of two components:

Generator: Generates mathematical problems based on instructions in a system prompt.
Evaluator: Assesses whether the generated problems meet the criteria for a "good" mathematical problem, as defined in the introduction.

The agent iterates between the generator and evaluator, using feedback to refine the generated problems.

Results

The authors applied DeepMath-generate to 200 research directions in differential geometry, generating 5 problems for each. Through human verification, they found that many of these problems were unknown to experts and possessed unique research value.

The paper presents two example problems generated by the agent:

Problem 1: Exploring the existence of nonnegatively curved metrics on vector bundles over exotic spheres, and whether the exotic smooth structure can be detected by the topology of the moduli space of such metrics.
Problem 2: Investigating whether the topology of the moduli space of nonnegatively curved metrics on an exotic sphere can differ from the moduli space on the standard sphere, potentially revealing geometric traces of the exotic smooth structure.

The authors note that while these problems are mathematically sound and target important open questions, the agent did not generate problems as groundbreaking as the Poincaré Conjecture. Further research is needed to improve the prompts and design of the agent to generate even more impactful mathematical questions.

Limitations & Uncertainties

The agent did not generate problems as novel and significant as the Poincaré Conjecture, suggesting room for improvement in the prompting and agent design.
The paper only explores problem generation in differential geometry, and it is unclear whether the agent would perform as well in other mathematical domains.
The evaluation of the generated problems was done through human verification, which may be subjective. More rigorous and systematic evaluation methods could be developed.

What Comes Next

The authors plan to incorporate reinforcement learning into the DeepMath-generate agent to further enhance its mathematical capabilities and creativity. They also suggest exploring problem generation in other areas of mathematics beyond differential geometry.

Source

Can LLM generate interesting mathematical research problems?
PreprintarXiv cs.AI3/20/2026