Story
Beyond Polarity: Multi-Dimensional LLM Sentiment Signals for WTI Crude Oil Futures Return Prediction
Key takeaway
Researchers developed a new machine learning technique that uses multi-dimensional sentiment analysis on news articles to better predict fluctuations in oil futures prices, which could help investors and consumers plan for volatile energy markets.
Quick Explainer
This paper explores using multi-dimensional sentiment signals extracted from large language models (LLMs) to improve forecasts of weekly crude oil futures returns. The key insight is that conventional polarity-based sentiment measures may miss important information captured by other sentiment dimensions like uncertainty and forward-looking orientation. By combining sentiment features from models like GPT-4 and FinBERT, the study demonstrates that richer multi-dimensional representations of news sentiment can provide incremental predictive power beyond traditional approaches. The SHAP analysis shows that intensity and uncertainty-related features are among the most important predictors, indicating that the informational content of news sentiment extends beyond simple positive or negative classification.
Deep Dive
Technical Deep Dive: Beyond Polarity - Multi-Dimensional LLM Sentiment Signals for WTI Crude Oil Futures Return Prediction
Overview
This paper examines whether multi-dimensional sentiment signals extracted by large language models (LLMs) can improve the prediction of weekly returns for WTI crude oil futures. The key findings are:
- LLM-derived measures of uncertainty and forward-looking orientation exhibit statistically and economically significant predictive power beyond traditional polarity-based sentiment.
- The combination of GPT-4 and FinBERT sentiment features delivers the strongest overall predictive performance, suggesting that richer multi-dimensional representations contain incremental forecasting information.
- SHAP analysis shows that intensity and uncertainty-related features are among the most important predictors, indicating that the predictive content of news sentiment extends beyond simple positive/negative classification.
The results suggest that commodity investors and risk managers could benefit from supplementing traditional sentiment monitoring with LLM-based measures of uncertainty and intensity when forecasting crude oil futures returns.
Problem & Context
- Forecasting crude oil prices remains challenging due to the influence of expectation-driven shocks like geopolitical events and demand uncertainty, which are not fully captured by conventional polarity-based sentiment measures.
- Recent progress in large language models (LLMs) provides a new framework for extracting multi-dimensional sentiment attributes from text, including relevance, polarity, intensity, uncertainty, and forward-looking orientation.
- These dimensions may better align with the theoretical drivers of commodity pricing, particularly in markets where geopolitical risk and expectation formation play central roles.
Methodology
- Extracted energy-related news articles from the AlphaVantage News Sentiment API covering 2020-2025.
- Constructed five sentiment dimensions (relevance, polarity, intensity, uncertainty, forwardness) using GPT-4, Llama 3.2, and FinBERT models.
- Aggregated article-level scores to the weekly level using relevance-weighted means.
- Trained LightGBM classifiers to predict whether the weekly log return of WTI crude oil futures would be positive or negative.
- Evaluated model performance using AUROC, accuracy, and information coefficient (IC).
- Conducted SHAP analysis to interpret feature importance.
Data & Experimental Setup
- News article corpus: 29,153 articles, 93 articles/week on average.
- Weekly log returns for front-month WTI crude oil futures obtained from Yahoo Finance.
- Predicted whether the weekly log return in week t+1 would be positive or negative, using only news articles published no later than the end of week t.
- Compared six feature sets:
- AV Baseline: AlphaVantage sentiment scores only
- Tradition: AlphaVantage + GPT-4
- GPT: GPT-4 only
- Llama: Llama 3.2 only
- LLM: GPT-4 + Llama 3.2
- GPT + FinBERT
Results
Inter-Model Agreement
- Moderate positive correlations (47-68%) between polarity scores across models, indicating shared directional signals but also distinct components.
- GPT-4 consistently yields higher average scores across all five sentiment dimensions compared to Llama 3.2, suggesting greater sensitivity to sentiment cues.
Directional Prediction Performance
- All models outperformed the 0.5 random guessing benchmark, with the GPT + FinBERT combination achieving the highest AUROC of 0.652 and IC of 0.228.
- GPT-4 alone had the highest mean IC of 0.249, outperforming the LLM ensemble, indicating that the additional Llama 3.2 features may have introduced noise.
SHAP Feature Importance
- The two most important features were GPT-4 intensity and uncertainty, followed by GPT-4 polarity.
- Intensity and uncertainty-related features were among the top predictors, suggesting that dimensions beyond simple polarity contain valuable information for forecasting crude oil futures returns.
- GPT-4 uncertainty dispersion ranked second overall, indicating that cross-article sentiment variation is an important channel through which the models contribute to predictive performance.
Interpretation
- The results suggest that multi-dimensional sentiment extraction can improve the informational value of news-based signals in commodity return prediction.
- Incorporating LLM-based measures of uncertainty and sentiment intensity, in addition to polarity, provides incremental predictive power beyond conventional sentiment analysis.
- This finding aligns with the theoretical importance of expectation-driven risk factors in commodity pricing, where ambiguity and forward-looking sentiment may carry more predictive content than directional tone alone.
Limitations & Uncertainties
- The analysis is limited to a single commodity (WTI crude oil) and news source (AlphaVantage).
- The weekly forecasting horizon may not generalize to higher-frequency settings.
- The choice of GPT-4 and Llama 3.2 is based on a cost-performance tradeoff that merits further investigation as LLM models continue to improve.
- Future research could extend the framework to other energy commodities, incorporate additional data sources, and evaluate performance at shorter horizons.
What Comes Next
- Explore application of the multi-dimensional sentiment framework to other commodity markets beyond crude oil.
- Investigate the robustness of the findings to different news sources, data frequencies, and prediction horizons.
- Analyze how the relative importance of sentiment dimensions evolves during periods of heightened market volatility or geopolitical uncertainty.
- Examine the potential for multi-dimensional sentiment signals to enhance other financial applications, such as asset allocation, risk management, and macroeconomic forecasting.
