Story

APEX-Searcher: Augmenting LLMs' Search Capabilities through Agentic Planning and Execution

ComputingArtificial Intelligence

Key takeaway

Researchers developed APEX-Searcher, a system that improves how AI language models search for and use external knowledge to answer complex questions, which could make these models more useful for real-world tasks.

Read the paper

Quick Explainer

APEX-Searcher is a two-stage framework that enhances the search capabilities of large language models. First, an RL-trained Planning Agent decomposes complex queries into a logical sequence of sub-questions. Then, an SFT-trained Execution Agent systematically solves each sub-question through iterative retrieval and synthesis, building up an "accumulated knowledge base" to provide context. This decoupled approach addresses challenges with ambiguous execution paths and sparse rewards that plague end-to-end retrieval-augmented generation methods. The key innovation is the division of the retrieval process into specialized planning and execution stages, which enables more efficient and informed information gathering for complex, multi-hop queries.

Deep Dive

APEX-Searcher: Augmenting LLMs' Search Capabilities through Agentic Planning and Execution

Overview

APEX-Searcher is a novel framework that aims to enhance the search capabilities of large language models (LLMs) for complex, multi-hop information retrieval tasks. The key innovation is the decoupling of the retrieval process into two specialized stages:

Agentic Planning: An RL-trained Planning Agent decomposes the complex query into a logical sequence of sub-questions.
Iterative Sub-Task Execution: An SFT-trained Execution Agent systematically solves each sub-question through a multi-round retrieval and synthesis process.

This two-stage approach addresses challenges with ambiguous execution trajectories and sparse rewards in end-to-end retrieval-augmented generation (RAG) methods.

Problem & Context

Existing RAG systems struggle when faced with complex, multi-hop queries that require synthesis of information from multiple sources.
Iterative RAG and agentic RAG approaches have improved performance, but still face challenges:
- Ambiguous execution trajectories, lacking a global view to guide retrieval
- Over-reliance on end-to-end training leading to ill-defined optimization objectives and sparse rewards

Methodology

Agentic Planning

The Planning Agent uses RL with task decomposition-based rewards to learn an optimal policy for generating logical and efficient reasoning plans.
It decomposes the complex query into a sequence of sub-questions, which can have conditional dependencies.

Iterative Sub-Task Execution

The Execution Agent uses SFT on a curated dataset of high-quality multi-turn retrieval instructions.
It interacts with the knowledge base to retrieve and synthesize information for each sub-question.
The process maintains an "accumulated knowledge base" to provide context for subsequent sub-questions.

Inference Pipeline

The Planning Agent decomposes the complex query into a sub-question sequence.
The Execution Agent processes each sub-question iteratively, retrieving relevant information and building up the accumulated knowledge base.
The Execution Agent performs final answer synthesis using the complete accumulated knowledge.

Data & Experimental Setup

Evaluation benchmarks: 2WikiMultiHopQA, HotpotQA, MuSiQue, Bamboogle
Compared to various RAG baselines, including standard, iterative, and agentic approaches
Experiments conducted on Qwen-2.5-3B-Instruct and Qwen-2.5-7B-Instruct models

Results

APEX-Searcher outperforms the strongest baseline by 8.2% and 13.1% EM on the 3B and 7B models respectively.
Ablation studies confirm the importance of both the Planning and Execution stages:
- Planning + RL improves over Planning alone by 3.5 and 5.5 EM points on 3B and 7B.
- Execution SFT improves over Execution alone by 10.1 and 8.4 EM points on 3B and 7B.

Limitations & Uncertainties

While the proposed framework demonstrates significant performance gains, the authors note that further research is needed to:
- Explore the integration of RL for the Execution stage to better optimize the multi-round retrieval process.
- Expand the scope from local database retrieval to web search, broadening the application scenarios.

What Comes Next

The authors highlight two key areas for future work:

Investigating RL-based optimization of the multi-round retrieval process during the Execution stage.
Expanding the framework to leverage web search, further broadening the application scenarios.

Source

APEX-Searcher: Augmenting LLMs' Search Capabilities through Agentic Planning and Execution
PreprintarXiv cs.CL3/19/2026