Story

Augmenting Rating-Scale Measures with Text-Derived Items Using the Information-Determined Scoring (IDS) Framework

Mind & BehaviorComputing

Key takeaway

Researchers created a new way to analyze text data from psychological assessments, allowing for more nuanced and personalized measurement of complex experiences.

Read the paper

Quick Explainer

The Information-Determined Scoring (IDS) framework augments traditional psychological assessments by deriving additional test items from free-text responses using language models. Instead of relying solely on rating-scale questions, IDS selects the most informative text-derived items to supplement the assessment, providing a more holistic and precise measure of the target trait. The key steps involve collating rating-scale and text data, generating candidate text-based items, evaluating their psychometric properties, and incorporating the most informative ones into the final augmented assessment. This novel approach shifts away from replicating expert judgments, instead optimizing for the information content that the text-derived items contribute to the overall measurement.

Deep Dive

Technical Deep Dive

Overview

This study introduced the Information-Determined Scoring (IDS) framework, a method for augmenting existing rating-scale psychological assessments with items derived from free-text responses. Using depression as a case study, the authors evaluated the performance of IDS-augmented tests against a baseline rating-scale-only test.

Problem & Context

Psychological assessments often rely on rating-scale items, which require respondents to condense complex experiences into predefined categories. However, rich, unstructured text data are frequently collected alongside these scales but are rarely utilized to measure the target trait.

Methodology

The IDS framework involves four key stages:

Collating inputs: Obtaining responses to a calibrated rating-scale test and qualitative text from the same individuals.
Generating candidate items: Applying multiple simple scoring prompts to the text to create a pool of LLM-derived candidate items.
Evaluating candidates: Co-calibrating each candidate item with the baseline rating-scale test and quantifying the psychometric information it provides.
Constructing augmented tests: Selecting the most informative LLM items to create final augmented tests.

The authors evaluated the framework using real-world data from high-school students in China (n=693) and a matched synthetic dataset (n=3,000).

Results

Both the "All Texts" and "Top 5 Texts" augmented tests showed significant advantages over the baseline test:

Increased measurement precision: Augmented tests yielded trait estimates with lower standard errors than the baseline test, especially in the early stages of the adaptive test.
Improved accuracy: In the synthetic dataset, augmented tests produced trait estimates closer to the true values compared to the baseline.
Enhanced convergent validity: The augmented tests showed stronger associations with an external measure of suicidality in the real-world data.

The information contributed by the LLM items was equivalent to adding up to 6.3 and 16.0 average rating-scale items for the real-world and synthetic data, respectively.

Interpretation

The IDS framework marks a conceptual shift from traditional automated text scoring approaches, which typically aim to replicate expert human judgments. Instead, IDS selects LLM items based on the psychometric information they provide, rather than their fidelity to a predefined rubric or training data.

By integrating unstructured text responses with established rating-scale tests, the IDS approach enables more holistic and precise psychological assessment, leveraging the rich information embedded in natural language.

Limitations & Uncertainties

The effectiveness of the IDS framework depends on the size and diversity of the candidate LLM item pool, which was limited in this study to a single model and a constrained set of scoring prompts.
The generalizability of the findings requires further examination, as the data focused on depression in Chinese high-school students.
Future research should explore applying the IDS framework to diverse psychological constructs and populations, as well as leveraging larger LLM item pools and more sophisticated scoring techniques.

What Comes Next

The authors propose several avenues for future research, including:

Expanding the candidate LLM item pool by applying a wider array of scoring prompts and leveraging multiple LLM models.
Exploring the generalizability of the IDS framework across different psychological constructs and populations.
Investigating the integration of IDS-augmented assessments with diverse sources of unstructured text data, such as therapy transcripts, reflective journals, and customer reviews.

By continuing to advance this novel approach to psychological measurement, the IDS framework holds promise for creating more holistic and precise assessments that can leverage the rich information embedded in natural language.

Source

Augmenting Rating-Scale Measures with Text-Derived Items Using the Information-Determined Scoring (IDS) Framework
PreprintarXiv cs.CL3/20/2026