Story
APEX-Searcher: Augmenting LLMs' Search Capabilities through Agentic Planning and Execution
Key takeaway
Researchers developed APEX-Searcher, a system that improves how AI language models search for and use external knowledge to answer complex questions, which could make these models more useful for real-world tasks.
Quick Explainer
APEX-Searcher is a two-stage framework that enhances the search capabilities of large language models. First, an RL-trained Planning Agent decomposes complex queries into a logical sequence of sub-questions. Then, an SFT-trained Execution Agent systematically solves each sub-question through iterative retrieval and synthesis, building up an "accumulated knowledge base" to provide context. This decoupled approach addresses challenges with ambiguous execution paths and sparse rewards that plague end-to-end retrieval-augmented generation methods. The key innovation is the division of the retrieval process into specialized planning and execution stages, which enables more efficient and informed information gathering for complex, multi-hop queries.
Deep Dive
APEX-Searcher: Augmenting LLMs' Search Capabilities through Agentic Planning and Execution
Overview
APEX-Searcher is a novel framework that aims to enhance the search capabilities of large language models (LLMs) for complex, multi-hop information retrieval tasks. The key innovation is the decoupling of the retrieval process into two specialized stages:
- Agentic Planning: An RL-trained Planning Agent decomposes the complex query into a logical sequence of sub-questions.
- Iterative Sub-Task Execution: An SFT-trained Execution Agent systematically solves each sub-question through a multi-round retrieval and synthesis process.
This two-stage approach addresses challenges with ambiguous execution trajectories and sparse rewards in end-to-end retrieval-augmented generation (RAG) methods.
Problem & Context
- Existing RAG systems struggle when faced with complex, multi-hop queries that require synthesis of information from multiple sources.
- Iterative RAG and agentic RAG approaches have improved performance, but still face challenges:
- Ambiguous execution trajectories, lacking a global view to guide retrieval
- Over-reliance on end-to-end training leading to ill-defined optimization objectives and sparse rewards
Methodology
Agentic Planning
- The Planning Agent uses RL with task decomposition-based rewards to learn an optimal policy for generating logical and efficient reasoning plans.
- It decomposes the complex query into a sequence of sub-questions, which can have conditional dependencies.
Iterative Sub-Task Execution
- The Execution Agent uses SFT on a curated dataset of high-quality multi-turn retrieval instructions.
- It interacts with the knowledge base to retrieve and synthesize information for each sub-question.
- The process maintains an "accumulated knowledge base" to provide context for subsequent sub-questions.
Inference Pipeline
- The Planning Agent decomposes the complex query into a sub-question sequence.
- The Execution Agent processes each sub-question iteratively, retrieving relevant information and building up the accumulated knowledge base.
- The Execution Agent performs final answer synthesis using the complete accumulated knowledge.
Data & Experimental Setup
- Evaluation benchmarks: 2WikiMultiHopQA, HotpotQA, MuSiQue, Bamboogle
- Compared to various RAG baselines, including standard, iterative, and agentic approaches
- Experiments conducted on Qwen-2.5-3B-Instruct and Qwen-2.5-7B-Instruct models
Results
- APEX-Searcher outperforms the strongest baseline by 8.2% and 13.1% EM on the 3B and 7B models respectively.
- Ablation studies confirm the importance of both the Planning and Execution stages:
- Planning + RL improves over Planning alone by 3.5 and 5.5 EM points on 3B and 7B.
- Execution SFT improves over Execution alone by 10.1 and 8.4 EM points on 3B and 7B.
Limitations & Uncertainties
- While the proposed framework demonstrates significant performance gains, the authors note that further research is needed to:
- Explore the integration of RL for the Execution stage to better optimize the multi-round retrieval process.
- Expand the scope from local database retrieval to web search, broadening the application scenarios.
What Comes Next
The authors highlight two key areas for future work:
- Investigating RL-based optimization of the multi-round retrieval process during the Execution stage.
- Expanding the framework to leverage web search, further broadening the application scenarios.
