Curious Now

Story

Persona2Web: Benchmarking Personalized Web Agents for Contextual Reasoning with User History

ComputingArtificial Intelligence

Key takeaway

Researchers developed a new kind of web agent that can interpret ambiguous user queries by analyzing their browsing history, which could make online searches more personalized and contextual.

Read the paper

Quick Explainer

Persona2Web aims to benchmark the capability of personalized web agents to infer user context and preferences from their browsing histories, rather than relying on explicit instructions. It constructs realistic user profiles and ambiguous query sets that challenge agents to personalize their responses. The key components are a personalization module with planner, retriever, and generator sub-modules, and a reasoning-aware evaluation that distinguishes personalization failures from navigation failures. Persona2Web is the first benchmark of its kind, highlighting fundamental gaps in current personalization capabilities and providing a foundation for advancing personalized web agents that can effectively leverage user context.

Deep Dive

Persona2Web: Benchmarking Personalized Web Agents for Contextual Reasoning with User History

Introduction

  • Large language models have advanced web agents, yet current agents lack personalization capabilities
  • Users rarely specify every detail of their intent, so practical web agents must infer user preferences and contexts
  • Existing benchmarks fail to provide realistic user context or handle ambiguous queries that require personalization

Methodology

User History

  • Rigorously constructed user histories reveal preferences implicitly over long time spans, rather than providing them explicitly
  • Profiles include demographic information and domain-specific preferences across 21 web domains
  • Event seeds define recurring activity patterns based on user preferences and routines

Ambiguous Query Sets

  • Follow "clarify-to-personalize" principle by intentionally masking explicit details, requiring agents to infer context from user history
  • Query sets have 3 levels: Level 0 (clear), Level 1 (preference explicit, website masked), Level 2 (both masked)

Reasoning-aware Evaluation

  • Evaluates personalization, intent satisfaction, and success rate using structured rubrics
  • Distinguishes personalization failures from navigation failures by examining reasoning traces
  • Uses GPT-5-mini as an LLM judge to score agent trajectories

Agent Architecture

  • Augment generic web agent architectures (AgentOccam, Browser-Use) with a personalization module
  • Personalization module has 3 components: planner, retriever, generator

History Access Schemes

  • On-demand: Agent accesses user history dynamically during execution
  • Pre-execution: Agent retrieves all relevant histories before execution

Results

  • Without user history, all agents fail completely on ambiguous queries (0% success rate)
  • Even with user history, performance improves only marginally (13% success rate at best)
  • Proprietary models (o3, GPT-4.1) outperform open-source models (Qwen3-80B, Llama-3.3-70B)
  • Task completion alone cannot capture personalization capability - agents can succeed at personalization but fail at navigation, or vice versa

Error Analysis

Four key error types:

  1. Redundant History Access: Agent generates underspecified personalization queries, leading to unsuccessful retrieval and repeated attempts.
  2. Personalization Hallucination: Agent fabricates user information without accessing relevant histories.
  3. History Retrieval Failure: Agent fails to identify essential information in the user histories.
  4. History Utilization Failure: Agent fails to properly apply the retrieved user histories during execution.

Conclusion

  • Persona2Web is the first benchmark for evaluating personalized web agents on the real open web
  • Findings reveal fundamental gaps in current personalization capabilities, highlighting the need for methods that can effectively leverage user context
  • Provides a strong foundation for advancing personalization in web agents

Impact Statement

  • This work uses synthetically generated user data, raising privacy considerations for deployed personalized agents
  • Encourag

Source

You're offline. Saved stories may still be available.