结合上下文关联的图像情感分析

罗改芳; 张浩; 徐丹

doi:10.13700/j.bh.1001-5965.2023.0345

结合上下文关联的图像情感分析

doi: 10.13700/j.bh.1001-5965.2023.0345

罗改芳^{1, 2},
张浩¹,
徐丹^1, ,

1.
云南大学信息学院，昆明 650500
2.
山西农业大学软件学院，晋中 030801

基金项目:

国家自然科学基金(62162068,62061049)；云南省万人计划“云岭学者”专项(YNWR-YLXZ-2018-022)；云南省教育厅科学研究基金(2025J0008)

详细信息

通讯作者:
E-mail：danxu@ynu.edu.cn

中图分类号: TP391
计量
- 文章访问数: 409
- HTML全文浏览量: 75
- PDF下载量: 8
- 被引次数: 0
出版历程
- 收稿日期: 2023-06-12
- 录用日期: 2023-09-08
- 网络出版日期: 2023-11-07
- 整期出版日期: 2025-07-31

Image sentiment analysis by combining contextual correlation

LUO Gaifang^{1, 2},
ZHANG Hao¹,
XU Dan^{1
, ,}

1.
School of Information Science & Engineering，Yunnan University，Kunming 650500，China
2.
School of Software，Shanxi Agricultural University，Jinzhong 030801，China

Funds:

National Natural Science Foundation of China (62162068,62061049); Yunnan Province Ten Thousand Talents Program and Yunling Scholars Special Project (YNWR-YLXZ-2018-022); Yunnan Provincial Department of Education Scientific Research Fund (2025J0008)

More Information

Corresponding author: E-mail：danxu@ynu.edu.cn

摘要

摘要:
图像情感分析旨在分析和理解视觉内容所传达的情感，其挑战在于弥合潜在视觉特征与抽象情感间的情感鸿沟。现有的深度模型试图一次性通过直接在全局范围内学习有辨别力的高级情感表征来弥合鸿沟，但忽略了深度模型各层特征之间的层次关系，导致上下文特征间的关联缺失。为此，提出一种上下文层次交互网络（CHINet）来建立层次结构中的上下文信息和情感之间的相关性模型。该模型包含2个分支：自下而上的主分支直接在高级语义层次上学习全局情感表征，针对该分支的不同层次特征，通过构建浅层风格编码器和情感激活注意力机制来分别提取风格表示并定位潜在情感激活区域；所提取的特征被级联到金字塔结构作为自上而下分支，从而建模上下文层次相关性并为情感表示提供浅层视觉特征。通过全局和局部学习将低级风格属性和高级图像语义整合到一起。实验结果表明：所提模型在FI数据集上较同类方法（包括多层次特征融合方法和结合了局部情感区域的方法）提升了情感识别准确率。
- 图像情感分析 /
- 情感鸿沟 /
- 上下文关联 /
- 层次结构 /
- 情感分类
Abstract:
Image sentiment analysis aims to analyze the emotions conveyed by visual content. A key challenge in this field is to bridge the affective gap between latent visual features and abstract emotions. Existing deep learning models attempt to address this issue by directly learning discriminative high-level emotional representations globally at once but overlook the hierarchical relationship between features at each layer of the deep model, resulting in a lack of correlation between contextual features. Therefore, this paper proposed a context-hierarchical interaction network (CHINet) to model the correlation between contextual information and sentiment within the hierarchy. The model consists of two branches: a bottom-up branch, which first directly learns the global emotional representation at the high-level semantic level; then, for different feature level within the branch, it extracts the style representation and localizes potential emotion activation regions by shallow style encoder and emotion activation attention mechanism. The extracted features are then cascaded into a pyramid structure as top-down branches, modeling contextual hierarchical dependencies and providing shallow visual features for emotion representation. Finally, global and local learning integrate shallow image styles with high-level semantics. Experiments show that the proposed model improves emotion recognition accuracy on the FI dataset compared with related methods, including multi-level feature fusion methods and approaches incorporating local emotional regions.
- image sentiment analysis /
- affective gap /
- contextual correlation /
- hierarchy structure /
- emotion classification

HTML全文

图 1 来自FI数据集的情感图像示例

Figure 1. Affective image examples from FI dataset

下载: 全尺寸图片幻灯片

图 2 CHINet整体结构

Figure 2. Overall structure of CHINet

下载: 全尺寸图片幻灯片

图 3 浅层风格编码器

Figure 3. Shallow style encoder

下载: 全尺寸图片幻灯片

图 4 情感激活注意力机制

Figure 4. Emotion activation attention mechanism

下载: 全尺寸图片幻灯片

图 5 不同α对模型性能的影响

Figure 5. Effect of different α on model performance

下载: 全尺寸图片幻灯片

图 6 用于情感推理的类激活图可视化

Figure 6. Visualization of class activation map for emotion reasoning

下载: 全尺寸图片幻灯片

表 1 图像情感数据集详细信息

Table 1. Details of image emotion datasets

数据集	积极极性数量					消极极性数量					合计
数据集	愉悦	敬畏	满足	激动	小记	愤怒	恶心	恐惧	悲伤	小记	合计
Abstract^[9]	25	15	63	36	139	3	18	36	32	89	228
ArtPhoto^[9]	101	102	70	105	378	77	70	115	166	428	806
EmotionROI^[14]			330	330	660	330	330	330	330	1 320	1 980
FI^[1]	4 942	3 151	5 374	2 963	16 430	1 266	1 658	1 032	2 922	6 878	23 308
Twitter I^[20]					769					500	1 269

下载: 导出CSV

表 2 FI上不同层的CNN情感分类准确率

Table 2. Emotion classification accuracy of CNN models with different layers on FI

模型	层	分类准确率/%
模型	层	c=2	c=8
AlexNet^[16]	5 Conv + 2 fc	65.43	42.78
VGG-16^[36]	13 Conv + 2 fc	81.35	63.84
VGG-19^[36]	16 Conv + 2 fc	82.62	64.23
DenseNet-100 w/o fc^[37]	99 Conv	83.77	64.76
DenseNet-100^[37]	99 Conv + 2 fc	84.21	65.08
ResNet-18 w/o fc^[34]	17 Conv	80.16	60.76
ResNet-18^[34]	17 Conv + 2 fc	81.27	61.55
ResNet-50 w/o fc^[34]	49 Conv	83.52	62.81
ResNet-50^[34]	49 Conv + 2 fc	84.06	63.74
ResNet-152 w/o fc^[34]	151 Conv	84.73	66.27
ResNet-152^[34]	151 Conv + 2 fc	85.19	66.44

下载: 导出CSV

表 3 基于局部和全局学习的情感分类准确率

Table 3. Emotion classification accuracy based on local and global learning

模型	准确率/%
基线模型	85.19
CHINet w/o G	87.36
CHINet w/o L	88.41
CHINet G&L	89.74

下载: 导出CSV

表 4 FI数据集上二元情感极性分类准确率比较

Table 4. Classification accuracy comparison of binary emotion polarity on FI dataset

模型	网络	准确率/%
基准模型	AlexNet^[16]	60.54
	VGGNet^[36]	70.64
	ResNet^[34]	72.22
	Fine-tuned AlexNet^[6]	72.43
	Fine-tuned VGGNet^[36]	83.05
	Fine-tuned ResNet-50^[34]	85.19
对比方法	PCNN (VGGNet)^[20]	75.34
	DeepSentiBank^[21]	61.54
	AR^[29]	86.35
	VSF^[23]	88.11
	MSRCA^[30]	87.40
	MLR^[18]	87.87
本文模型	CHINet w/o G	87.36
	CHINet w/o L	88.41
	CHINet G&L	89.74

下载: 导出CSV

表 5 小规模数据集上的二元情感极性分类准确率比较

Table 5. Classification accuracy comparison of binary emotion polarity on small-scale datasets %

模型	网络	ArtPhoto^[9]	Abstract^[9]	Twitter I^[20]			EmotionROI^[14]
模型	网络	ArtPhoto^[9]	Abstract^[9]	Twitter I 5	Twitter I 4	Twitter I 3	EmotionROI^[14]
手工特征	PAEF^[13]	67.85	70.05	72.90	69.61	67.92	75.24
手工特征	SentiBank	67.74	64.95	71.32	68.28	66.63	66.18
深度学习	DeepSentiBank^[21]	68.73	71.19	76.35	70.15	71.25	70.11
	PCNN (VGGNet)^[20]	70.96	70.84	82.54	76.52	76.36	73.58
	Fine-tuned VGG^[36]	70.09	72.48	84.35	82.26	76.75	77.02
	AR^[29]	74.80	76.03	88.65	85.10	81.06	81.26
	R-CNNGSR^[31]	75.02	75.89				81.36
	VSF^[23]	81.62	81.82			83.10	83.17
	MLR^[18]	75.63	77.85	89.77	85.72	81.49	83.08
本文模型	CHINet w/o G	78.61	79.07	86.54	83.04	81.36	82.26
	CHINet w/o L	81.12	81.15	89.51	82.95	82.17	82.47
	CHINet G&L	83.27	82.33	90.52	86.43	84.37	83.53

下载: 导出CSV

表 6 FI数据集上多类别情感分类性能比较

Table 6. Comparison of multi-category emotion classification performance on FI dataset

模型	网络	准确率/%
基线模型	AlexNet^[16]	49.54
	VGGNet-19^[36]	61.74
	ResNet^[34]	53.08
	Inception^[38]	60.12
	Inception-Resnet^[39]	62.77
对比模型	DeepSentiBank^[21]	51.29
	PDANet^[40]	69.42
	MKN^[19]	63.92
	CycleEmotionGAN^[41]	66.79
	Deep metric learning^[42]	68.37
	MldrNet^[16]	67.24
	OSSCM^[22]	69.32
	WSCNet^[3]	70.07
	VSF^[23]	70.46
	MSRCA^[30]	69.05
	MLR^[18]	67.49
本文模型	CHINet w/o G	69.94
	CHINet w/o L	70.32
	CHINet G&L	71.56

下载: 导出CSV

参考文献(43)

[1]	YOU Q Z, LUO J B, JIN H L, et al. Building a large scale dataset for image emotion recognition: the fine print and the benchmark[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2016, 308-314.
[2]	XU L W, WANG Z T, WU B, et al. MDAN: multi-level dependent attention network for visual emotion analysis[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2022: 9469-9478.
[3]	SHE D Y, YANG J F, CHENG M M, et al. WSCNet: weakly supervised coupled networks for visual sentiment classification and detection[J]. IEEE Transactions on Multimedia, 2020, 22(5): 1358-1371. doi: 10.1109/TMM.2019.2939744
[4]	YANG H S, FAN Y Y, LV G Y, et al. Exploiting emotional concepts for image emotion recognition[J]. The Visual Computer, 2023, 39(5): 2177-2190. doi: 10.1007/s00371-022-02472-8
[5]	YANG J Y, LI J, WANG X M, et al. Stimuli-aware visual emotion analysis[J]. IEEE Transactions on Image Processing, 2021, 30: 7432-7445. doi: 10.1109/TIP.2021.3106813
[6]	LIANG Y, MAEDA K, OGAWA T, et al. Chain centre loss: a psychology inspired loss function for image sentiment analysis[J]. Neurocomputing, 2022, 495: 118-128. doi: 10.1016/j.neucom.2022.04.016
[7]	张浩, 李海鹏, 彭国琴, 等. 多层次特征融合表征的图像情感识别[J]. 计算机辅助设计与图形学学报, 2023, 35(10): 1566-1576. ZHANG H, LI H P, PENG G Q, et al. Image emotion recognition via fusion multi-level representations[J]. Journal of Computer-Aided Design & Computer Graphics, 2023, 35(10): 1566-1576(in Chinese).
[8]	ZHANG H M, XU M. Multiscale emotion representation learning for affective image recognition[J]. IEEE Transactions on Multimedia, 2022, 25: 2203-2212.
[9]	MACHAJDIK J, HANBURY A. Affective image classification using features inspired by psychology and art theory[C]//Proceedings of the 18th ACM International Conference on Multimedia. New York: ACM, 2010.
[10]	RAO T R, LI X X, XU M. Learning multi-level deep representations for image emotion classification[J]. Neural Processing Letters, 2020, 51(3): 2043-2061. doi: 10.1007/s11063-019-10033-9
[11]	詹明. 融合风格特征的抽象图像情感识别[D]. 吉林: 吉林大学, 2023. ZHAN M. Affective analysis of abstract images using style representation[D]. Jilin : Jilin University, 2023(in Chinese).
[12]	尹朝. 基于内容生成与特征提取的图像情感识别模型研究[J]. 系统仿真技术, 2023, 19(2): 141-147. doi: 10.3969/j.issn.1673-1964.2023.02.008 YIN C. Research on image emotion recognition model based on feature extraction and content generation[J]. System Simulation Technology, 2023, 19(2): 141-147(in Chinese). doi: 10.3969/j.issn.1673-1964.2023.02.008
[13]	ZHAO S C, YAO X X, YANG J F, et al. Affective image content analysis: two decades review and new perspectives[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(10): 6729-6751. doi: 10.1109/TPAMI.2021.3094362
[14]	PENG K C, CHEN T, SADOVNIK A, et al. A mixed bag of emotions: model, predict, and transfer emotion distributions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2015: 860-868.
[15]	CAMPOS V, SALVADOR A, GIRO-I-NIETO X, et al. Diving deep into sentiment: understanding fine-tuned CNNs for visual sentiment prediction[C]//Proceedings of the 1st International Workshop on Affect & Sentiment in Multimedia. New York: ACM, 2015.
[16]	KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90. doi: 10.1145/3065386
[17]	ZHU X G, LI L, ZHANG W G, et al. Dependency exploitation: a unified CNN-RNN approach for visual emotion recognition[C]//Proceedings of the 26th International Joint Conference on Artificial Intelligence. New York: ACM, 2017: 3595-3601.
[18]	ZHANG H, XU D, LUO G F, et al. Learning multi-level representations for affective image recognition[J]. Neural Computing and Applications, 2022, 34(16): 14107-14120. doi: 10.1007/s00521-022-07139-y
[19]	SHE D Y, SUN M, YANG J F. Learning discriminative sentiment representation from strongly- and weakly supervised CNNs[J]. ACM Transactions on Multimedia Computing, Communications, and Applications, 2019, 15(3s): 1-19.
[20]	YOU Q Z, LUO J B, JIN H L, et al. Robust image sentiment analysis using progressively trained and domain transferred deep networks[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2015, 29(1): 1-8.
[21]	CHEN T, BORTH D, DARRELL T, et al. DeepSentiBank: visual sentiment concept classification with deep convolutional neural networks[EB/OL]. (2014-10-30)[2023-06-01]. http://arxiv.org/abs/1410.8586.
[22]	ZHANG J, CHEN M, SUN H, et al. Object semantics sentiment correlation analysis enhanced image sentiment classification[J]. Knowledge-Based Systems, 2020, 191: 105245. doi: 10.1016/j.knosys.2019.105245
[23]	YAMAMOTO T, TAKEUCHI S, NAKAZAWA A. Image emotion recognition using visual and semantic features reflecting emotional and similar objects[J]. IEICE Transactions on Information and Systems, 2021, 104(10): 1691-1701.
[24]	YANG J Y, GAO X B, LI L D, et al. SOLVER: scene-object interrelated visual emotion reasoning network[J]. IEEE Transactions on Image Processing, 2021, 30: 8686-8701. doi: 10.1109/TIP.2021.3118983
[25]	DENG Z L, ZHU Q R, HE P, et al. A saliency detection and gram matrix transform-based convolutional neural network for image emotion classification[J]. Security and Communication Networks, 2021, 2021: 6854586.
[26]	彭国琴. 基于深度学习的图像情感语义分析关键问题研究[D]. 昆明: 云南大学, 2021. PENG G Q. Research on key issues of image emotional semantic analysis based on deep learning[D]. Kunming: Yunnan University, 2021(in Chinese).
[27]	PENG K C, SADOVNIK A, GALLAGHER A, et al. Where do emotions come from? predicting the emotion stimuli map[C]//Proceedings of the IEEE International Conference on Image Processing. Piscataway: IEEE Press, 2016: 614-618.
[28]	SUN M, YANG J F, WANG K, et al. Discovering affective regions in deep convolutional neural networks for visual sentiment prediction[C]//Proceedings of the IEEE International Conference on Multimedia and Expo. Piscataway: IEEE Press, 2016: 1-6.
[29]	YANG J F, SHE D Y, SUN M, et al. Visual sentiment prediction based on automatic discovery of affective regions[J]. IEEE Transactions on Multimedia, 2018, 20(9): 2513-2525. doi: 10.1109/TMM.2018.2803520
[30]	ZHANG J, LIU X Y, CHEN M, et al. Image sentiment classification via multi-level sentiment region correlation analysis[J]. Neurocomputing, 2022, 469: 221-233. doi: 10.1016/j.neucom.2021.10.062
[31]	XIONG H T, LIU Q, SONG S Y, et al. Region-based convolutional neural network using group sparse regularization for image sentiment classification[J]. EURASIP Journal on Image and Video Processing, 2019, 2019(1): 30. doi: 10.1186/s13640-019-0433-8
[32]	申朕, 崔超然, 董桂鑫, 等. 基于深度多任务学习的图像美感与情感联合预测研究[J]. 软件学报, 2023, 34(5): 2494-2506. SHEN Z, CUI C R, DONG G X, et al. Unified image aesthetic and emotional prediction based on deep multi-task learning[J]. Journal of Software, 2023, 34(5): 2494-2506(in Chinese).
[33]	ZHANG H, LUO G F, YUE Y Y, et al. Affective image recognition with multi-attribute knowledge in deep neural networks[J]. Multimedia Tools and Applications, 2024, 83(6): 18353-18379.
[34]	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Piscataway: IEEE Press, 2016: 770-778.
[35]	张浩, 徐丹. 基于深度学习的少数民族绘画情感分析方法[J]. 中国科学: 信息科学, 2019, 49(2): 204-215. doi: 10.1360/N112018-00249 ZHANG H, XU D. Ethnic painting analysis based on deep learning[J]. Scientia Sinica (Informationis), 2019, 49(2): 204-215 (in Chinese). doi: 10.1360/N112018-00249
[36]	SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. (2015-04-10)[2023-06-01]. http://arxiv.org/abs/1409.1556.
[37]	HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 2261-2269.
[38]	SZEGEDY C, VANHOUCKE V, IOFFE S, et al. Rethinking the inception architecture for computer vision[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 2818-2826.
[39]	SZEGEDY C, IOFFE S, VANHOUCKE V, et al. Inception-v4, inception-ResNet and the impact of residual connections on learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2017, 31(1): 1-7.
[40]	ZHAO S C, JIA Z Z, CHEN H, et al. PDANet: polarity-consistent deep attention network for fine-grained visual emotion regression[C]//Proceedings of the 27th ACM International Conference on Multimedia. New York: ACM, 2019.
[41]	SUN Z K, SARMA P, SETHARES W, et al. Learning relationships between text, audio, and video via deep canonical correlation for multimodal language analysis[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2020, 34(5): 8992-8999.
[42]	YAO X X, SHE D Y, ZHANG H W, et al. Adaptive deep metric learning for affective image retrieval and classification[J]. IEEE Transactions on Multimedia, 2020, 23: 1640-1653.
[43]	SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization[J]. International Journal of Computer Vision, 2020, 128(2): 336-359. doi: 10.1007/s11263-019-01228-7