Curious Now

Story

APEX-Searcher: Augmenting LLMs' Search Capabilities through Agentic Planning and Execution

ComputingArtificial Intelligence

Key takeaway

Researchers developed APEX-Searcher, a system that improves how AI language models search for and use external knowledge to answer complex questions, which could make these models more useful for real-world tasks.

Read the paper

Quick Explainer

APEX-Searcher is a two-stage framework that enhances the search capabilities of large language models. First, an RL-trained Planning Agent decomposes complex queries into a logical sequence of sub-questions. Then, an SFT-trained Execution Agent systematically solves each sub-question through iterative retrieval and synthesis, building up an "accumulated knowledge base" to provide context. This decoupled approach addresses challenges with ambiguous execution paths and sparse rewards that plague end-to-end retrieval-augmented generation methods. The key innovation is the division of the retrieval process into specialized planning and execution stages, which enables more efficient and informed information gathering for complex, multi-hop queries.

Deep Dive

APEX-Searcher: Augmenting LLMs' Search Capabilities through Agentic Planning and Execution

Overview

APEX-Searcher is a novel framework that aims to enhance the search capabilities of large language models (LLMs) for complex, multi-hop information retrieval tasks. The key innovation is the decoupling of the retrieval process into two specialized stages:

  1. Agentic Planning: An RL-trained Planning Agent decomposes the complex query into a logical sequence of sub-questions.
  2. Iterative Sub-Task Execution: An SFT-trained Execution Agent systematically solves each sub-question through a multi-round retrieval and synthesis process.

This two-stage approach addresses challenges with ambiguous execution trajectories and sparse rewards in end-to-end retrieval-augmented generation (RAG) methods.

Problem & Context

  • Existing RAG systems struggle when faced with complex, multi-hop queries that require synthesis of information from multiple sources.
  • Iterative RAG and agentic RAG approaches have improved performance, but still face challenges:
    • Ambiguous execution trajectories, lacking a global view to guide retrieval
    • Over-reliance on end-to-end training leading to ill-defined optimization objectives and sparse rewards

Methodology

Agentic Planning

  • The Planning Agent uses RL with task decomposition-based rewards to learn an optimal policy for generating logical and efficient reasoning plans.
  • It decomposes the complex query into a sequence of sub-questions, which can have conditional dependencies.

Iterative Sub-Task Execution

  • The Execution Agent uses SFT on a curated dataset of high-quality multi-turn retrieval instructions.
  • It interacts with the knowledge base to retrieve and synthesize information for each sub-question.
  • The process maintains an "accumulated knowledge base" to provide context for subsequent sub-questions.

Inference Pipeline

  1. The Planning Agent decomposes the complex query into a sub-question sequence.
  2. The Execution Agent processes each sub-question iteratively, retrieving relevant information and building up the accumulated knowledge base.
  3. The Execution Agent performs final answer synthesis using the complete accumulated knowledge.

Data & Experimental Setup

  • Evaluation benchmarks: 2WikiMultiHopQA, HotpotQA, MuSiQue, Bamboogle
  • Compared to various RAG baselines, including standard, iterative, and agentic approaches
  • Experiments conducted on Qwen-2.5-3B-Instruct and Qwen-2.5-7B-Instruct models

Results

  • APEX-Searcher outperforms the strongest baseline by 8.2% and 13.1% EM on the 3B and 7B models respectively.
  • Ablation studies confirm the importance of both the Planning and Execution stages:
    • Planning + RL improves over Planning alone by 3.5 and 5.5 EM points on 3B and 7B.
    • Execution SFT improves over Execution alone by 10.1 and 8.4 EM points on 3B and 7B.

Limitations & Uncertainties

  • While the proposed framework demonstrates significant performance gains, the authors note that further research is needed to:
    • Explore the integration of RL for the Execution stage to better optimize the multi-round retrieval process.
    • Expand the scope from local database retrieval to web search, broadening the application scenarios.

What Comes Next

The authors highlight two key areas for future work:

  1. Investigating RL-based optimization of the multi-round retrieval process during the Execution stage.
  2. Expanding the framework to leverage web search, further broadening the application scenarios.

Source