Story
ClinicalTrialsHub: Bridging Registries and Literature for Comprehensive Clinical Trial Access
Key takeaway
A new platform helps people find and understand clinical trials by combining trial registry data with relevant research articles. This makes it easier for people to access information about ongoing studies that could affect their health.
Quick Explainer
ClinicalTrialsHub is a unified search platform that combines clinical trial data from the ClinicalTrials.gov registry with automatically extracted information from biomedical research papers. It addresses the challenge of siloed data by translating user queries into structured searches across both sources, merging related entries, and employing language models to extract detailed trial data from unstructured paper text. This integrated approach aims to provide comprehensive access to clinical trial information for patients, clinicians, and researchers, overcoming the limitations of searching the registry and literature independently.
Deep Dive
Technical Deep Dive: ClinicalTrialsHub
Overview
ClinicalTrialsHub is a unified search platform that combines structured data from the ClinicalTrials.gov registry with automatically extracted information from PubMed research articles. It aims to improve access to comprehensive clinical trial data for patients, clinicians, researchers, and policymakers.
Problem & Context
- ClinicalTrials.gov (CTG) is the primary resource for clinical trial information, containing over 500,000 registered trials. However, many trials are not registered on CTG, especially those conducted outside the US. These unregistered trials are only available in PubMed, which contains over 35 million biomedical papers.
- The lack of integration between CTG and PubMed creates silos, making it difficult to search across both sources. PubMed articles are in free-text format, which is especially challenging for non-researchers to parse and filter.
Methodology
ClinicalTrialsHub uses the following key components:
Unified Search
- Translates natural language queries into structured database searches across both CTG and PubMed
- Applies BM25 relevance ranking and deduplication to merge related entries from both sources
Information Extraction
- Employs large language models (e.g. GPT-5.1, Gemini-3-Pro) to extract structured trial data from PubMed articles
- Extracts over 200 fields covering protocol details, results, and derived metadata
- Validates extracted data against CTG schema and controlled vocabularies
Grounded Q&A
- Provides an interactive chatbot interface that generates evidence-grounded answers to user questions about individual trials
- Selects the optimal language model (Gemini-3-Pro) based on evaluation of factual grounding and conciseness
Results
- ClinicalTrialsHub increases access to structured clinical trial data by 83.8% compared to CTG alone
- User study with medical professionals shows strong approval for search features, information extraction accuracy, and Q&A quality
- Quantitative evaluation demonstrates Gemini-3-Pro outperforms other frontier models on factual grounding and conciseness for clinical trial Q&A
Limitations & Uncertainties
- Current extraction evaluation focuses on protocol-level fields; extending to results sections requires additional benchmark development
- User study was limited to 7 participants; larger-scale evaluation is needed to fully validate the system's utility
- Performance of dense retrieval methods compared to BM25 ranking was not assessed
What Comes Next
- Expand extraction coverage to results-oriented fields with a validated benchmark
- Investigate dense retrieval techniques for unified search
- Conduct larger-scale user studies across diverse stakeholder groups
- Explore domain-adapted fine-tuning to further improve extraction accuracy
Sources:
