Story

Enhancing Multi-Label Emotion Analysis and Corresponding Intensities for Ethiopian Languages

Artificial IntelligenceMind & Behavior

Key takeaway

Researchers developed a way to better analyze multiple emotions and their intensities in Ethiopian language text, which can help understand social media and customer feedback in those languages.

Read the paper

Quick Explainer

The researchers enhanced the existing EthioEmo dataset by annotating the intensity of each labeled emotion, creating a richer resource for multi-label emotion analysis in Ethiopian languages. They then evaluated a diverse set of language models on this dataset, including general multilingual models as well as models tailored for African languages. The key approach was to leverage the additional emotion intensity information to improve the performance of the models on both emotion classification and intensity prediction tasks. The results highlight the importance of culturally and linguistically specific models for low-resource languages, demonstrating that the inclusion of the intensity feature can significantly boost the models' ability to capture the nuanced emotional expressions in these languages.

Deep Dive

Technical Deep Dive: Enhancing Multi-Label Emotion Analysis and Corresponding Intensities for Ethiopian Languages

Overview

This work extends the existing EthioEmo emotion dataset for Ethiopian languages by adding annotations for the intensity of each labeled emotion. The authors then benchmark state-of-the-art encoder-only Pretrained Language Models (PLMs) and Large Language Models (LLMs) on this enriched dataset, evaluating their performance on multi-label emotion classification and emotion intensity prediction tasks. Additionally, they explore the feasibility of cross-lingual transfer learning among the Ethiopian languages.

Methodology

The authors augmented the existing EthioEmo dataset by annotating the intensity of each labeled emotion on a scale of 0 (no intensity), 1 (low), 2 (medium), or 3 (high).
They employed a minimum of 3 annotators per instance, with Amharic (amh) having 5 annotators per instance.
The final intensity label was determined by majority vote, with a threshold on the average intensity score.
For evaluation, the authors selected a diverse set of models:
- General multilingual PLMs: LaBSE, RemBERT, mBERT, mDeBERTa, XLM-RoBERTa
- African-centric PLMs: AfriBERTa, AfroLM, AfroXLMR (61 and 76 languages), EthioLLM, AfroXLMR-Social
- Open-source LLMs: Gemma-3, LLaMa-3.1, LLaMa-3.3, DeepSeek-R1-70B
- Proprietary LLMs: GPT-4.1-mini, Gemini-2.5-flash

Results

Multi-Label Emotion Classification

The African-centric model AfroXLMR-Social achieved the best performance, with macro F1 scores of 70.66 for amh, 60.74 for orm, 54.75 for som, and 60.24 for tir.
Incorporating emotion intensity features into the classification task improved the performance of AfroXLMR-Social, with gains ranging from 1.62 to 11.47 points.
Open-source and proprietary LLMs performed worse than the encoder-only PLMs, with their performance highly dependent on model size.

Emotion Intensity Prediction

AfroXLMR-Social achieved the highest Pearson correlation between predicted and true intensity values, with scores of 53.82 for amh, 32.26 for orm, 38.44 for som, and 42.18 for tir.
LLMs struggled with emotion intensity prediction, often over-predicting intensity levels or failing to predict any intensity for non-English languages.

Cross-Lingual Emotion Classification

AfroXLMR-Social and other multilingual models that included the target languages in pretraining achieved the best cross-lingual performance.
Languages using the same script (Amharic and Tigrinya) benefited more from cross-lingual transfer than languages with different scripts (Oromo and Somali).

Limitations and Future Work

The authors acknowledge that annotating each instance with a minimum of 3 annotators (5 for Amharic) may not be sufficient, and more annotations could improve the quality of the dataset.
The majority vote approach for determining final intensity labels may not capture all annotator perspectives, and the authors plan to release the annotator-level data for further exploration.
The authors were limited in their evaluation of LLMs and plan to expand the model set in future work.

Conclusion

This work presents an enhanced version of the EthioEmo dataset with emotion intensity annotations, and benchmarks a diverse set of language models on multi-label emotion classification and intensity prediction tasks for Ethiopian languages. The results highlight the importance of culturally and linguistically tailored models, as well as the challenges in emotion understanding and intensity prediction for low-resource languages.

Sources:

Enhancing Multi-Label Emotion Analysis and Corresponding Intensities for Ethiopian Languages

Source

Enhancing Multi-Label Emotion Analysis and Corresponding Intensities for Ethiopian Languages
PreprintarXiv cs.CL3/20/2026