Story
MedClarify: An information-seeking AI agent for medical diagnosis with case-specific follow-up questions
Key takeaway
Researchers developed an AI agent that can ask follow-up questions to diagnose medical conditions, which could improve the accuracy and efficiency of clinical decision-making.
Quick Explainer
MedClarify is an AI agent that uses language models to iteratively diagnose medical conditions. Unlike existing systems that make a single prediction, MedClarify generates a differential diagnosis with confidence scores, then selects the most informative follow-up question to refine its beliefs. As the patient responds, MedClarify updates its diagnostic probabilities using Bayesian reasoning. This interactive, information-seeking approach helps MedClarify overcome the limitations of static diagnosis when key clinical details are missing. The system's novel "diagnostic expected information gain" metric allows it to prioritize questions that are expected to maximally reduce uncertainty and converge on the correct condition.
Deep Dive
Technical Deep Dive: MedClarify — An Information-Seeking AI Agent for Medical Diagnosis
Overview
MedClarify is an AI agent that uses large language models (LLMs) to perform iterative, information-seeking medical diagnosis. Unlike existing LLM-based diagnostic systems that rely on a single-shot prediction, MedClarify proactively generates case-specific follow-up questions to reduce diagnostic uncertainty and identify the correct condition.
Key features of MedClarify:
- Generates a differential diagnosis (candidate conditions) and associated confidence scores
- Selects the most informative follow-up question to refine the differential diagnosis
- Updates the diagnostic beliefs using a Bayesian framework as new evidence is gathered
- Continues the interactive diagnostic process until sufficient confidence in a single diagnosis is reached
MedClarify aims to address a key limitation of existing medical LLMs — their inability to handle incomplete patient information, which is common in real-world clinical practice. The system demonstrates significant improvements in diagnostic accuracy, especially when key clinical details are missing from the initial case presentation.
Problem & Context
- In clinical practice, reaching an accurate diagnosis often requires an iterative process of history-taking and targeted questioning to resolve uncertainty
- However, existing medical LLMs are typically designed to operate in a single-shot mode, generating a diagnosis directly from the initial case description
- This static input-output paradigm performs poorly when patient information is incomplete or missing, a common scenario in real-world settings
- To address this limitation, MedClarify introduces an information-seeking approach that generates follow-up questions to actively gather the most relevant details and refine the differential diagnosis
Methodology
MedClarify's workflow consists of four key steps:
- Assessing candidate diagnoses: The system uses an LLM to generate a differential diagnosis — a set of potential conditions with associated confidence scores.
- Generating follow-up questions: MedClarify prompts the LLM to produce candidate questions that can either refute the top-ranked diagnosis or explore alternative conditions.
- Selecting the most informative question: MedClarify evaluates each candidate question using a novel "diagnostic expected information gain" (DEIG) metric. This information-theoretic score quantifies how much the question is expected to reduce diagnostic uncertainty.
- Updating diagnostic beliefs: After receiving the patient's response, MedClarify updates the diagnostic probabilities using Bayesian reasoning, integrating the new evidence with the previous belief state.
These steps are repeated iteratively until the system is sufficiently confident in a single diagnosis or reaches a maximum number of turns.
Data & Experimental Setup
MedClarify was evaluated on 469 patient cases across three medical datasets: NEJM Image Challenge, MediQ, and MedQA. These cases span eight medical specialties, including cardiology, pulmonology, neurology, and endocrinology.
To simulate incomplete information, the researchers applied "feature masking" — selectively hiding categories of clinical data (e.g., symptoms, lab results, imaging) from the initial case description. This allowed them to assess how MedClarify's performance compares to baseline LLMs under varying levels of missing information.
The evaluation framework used four agent components:
- Patient agent: Simulates the patient and provides responses to questions
- Doctor agent: Runs MedClarify to generate differential diagnoses and follow-up questions
- Update agent: Integrates new information into the patient case
- Evaluator agent: Assesses the correctness of the final diagnosis
Results
- Baseline LLMs struggle with incomplete information: When key clinical features were masked, the diagnostic accuracy of baseline LLMs decreased significantly, by up to 19.8 percentage points.
- MedClarify improves accuracy under incomplete information: Compared to the baseline, MedClarify achieved up to 27 percentage point improvements in top-1 diagnostic accuracy when clinical features were masked.
- MedClarify is effective across LLM backbones: The performance gains of MedClarify were robust across different LLM models, including GPT-5.1, Deepseek-R1-0528, and Llama-3.3-70B.
- MedClarify's confidence estimates are well-calibrated: Unlike the baseline, MedClarify produced confidence scores that were strongly aligned with actual diagnostic correctness.
Interpretation
- MedClarify's information-seeking approach, driven by its novel DEIG question selection strategy, enables it to effectively recover missing clinical details and converge on the correct diagnosis
- By incorporating medical knowledge through ICD code relationships, MedClarify can prioritize questions that rule out entire branches of related conditions, leading to more efficient diagnostic reasoning
- The Bayesian updating mechanism helps MedClarify maintain a coherent, evidence-backed belief state, unlike the baseline's tendency to anchor on salient but potentially misleading case details
Limitations & Uncertainties
- The evaluation relies on simulated patient interactions, which may not fully capture the ambiguity and communication patterns of real clinical encounters
- Performance may be affected by the evolution of LLM capabilities over time, and the system has not yet been validated in actual clinical workflows
- The current framework does not incorporate visual modalities like medical imaging, which are essential for many diagnostic decisions
What Comes Next
- Extend MedClarify to handle multimodal patient data, including medical images and biosensor readings
- Integrate MedClarify into real-world clinical decision support systems and evaluate its performance and usability in prospective studies
- Explore ways to make MedClarify's reasoning more transparent and interactive for clinicians, potentially leveraging recent advances in explainable AI