Emotion Recognition Capabilities of Large Language Models: A Comparative Analysis

News

02 August, 2019 OS DAY-2019. Cooperation among operating platform developers and the security of Russian software

10 April, 2019 Ivannikov Memorial Workshop has been supported by IEEE

14 March, 2019 The annual Ivannikov Memorial Workshop will take place on 13-14 September 2019

Emotion Recognition Capabilities of Large Language Models: A Comparative Analysis

Diatlinko E.S. (ISP RAS, Moscow, Russia; MSU, Moscow, Russia)
Pavlov M.D. (ISP RAS, Moscow, Russia)
Tigranyan S.T. (RAU, Yerevan, Armenia)
Avetisyan A.A. (ISP RAS, Moscow, Russia)

Abstract

Large language models (LLMs) are increasingly integrated into conversational systems, where understanding emotional cues is essential for maintaining coherent, engaging, and safe interactions. This study evaluates how effectively modern instruction-tuned large language models (LLMs) can recognize emotions from text only without task-specific fine-tuning. We benchmark multiple open-weight LLM families (<15B parameters) across four prompting strategies – Baseline, Context, Few-shot, and Context+Few-shot – on two English ERC benchmarks (IEMOCAP, MELD) and one Russian dataset (RESD). We find that the optimal prompting strategy is dataset-dependent: semantically redundant data such as IEMOCAP benefits most from few-shot demonstrations (best 73.3% weighted F1-score (WF1) with Context+Few-shot), whereas MELD gains primarily from incorporating dialogue history (best 60.3% WF1 with Context). Robustness experiments show that LLMs are largely insensitive to reordering few-shot examples, but performance degrades substantially when the label space is corrupted, indicating that coherent labels space matters more than order of examples or their ground truths. Cross-lingual evaluation reveals a notable drop on Russian RESD (best 45.8% WF1), highlighting a persistent gap between English and Russian affect understanding in current LLMs. Overall, non-finetuned LLMs serve as strong prompt-only baselines for ERC, yet remain clearly behind specialized supervised systems.

Keywords

large language models; emotion recognition; robustness; few-shot learning; emotion understanding.

Edition

Proceedings of the Institute for System Programming, vol. 38, issue 3, part 4, 2026, pp. 157-174

ISSN 2220-6426 (Online), ISSN 2079-8156 (Print).

DOI: 10.15514/ISPRAS-2026-38(3)-53

For citation

Diatlinko E.S., Pavlov M.D., Tigranyan S.T., Avetisyan A.A. Emotion Recognition Capabilities of Large Language Models: A Comparative Analysis. Proceedings of the Institute for System Programming, vol. 38, issue 3, part 4, 2026, pp. 157-174 DOI: 10.15514/ISPRAS-2026-38(3)-53.

Full text of the paper in pdf

Back to the contents of the volume

На нашем сайте мы используем cookie файлы, содержащие информацию о предыдущих посещениях веб-сайта. Данные обрабатываются для улучшения качества работы нашего веб-сайта. Если вы не хотите использовать cookie файлы, измените настройки браузера.

Понятно