留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

结合上下文关联的图像情感分析

罗改芳 张浩 徐丹

罗改芳,张浩,徐丹. 结合上下文关联的图像情感分析[J]. 北京麻豆精品秘 国产传媒学报,2025,51(7):2304-2313 doi: 10.13700/j.bh.1001-5965.2023.0345
引用本文: 罗改芳,张浩,徐丹. 结合上下文关联的图像情感分析[J]. 北京麻豆精品秘 国产传媒学报,2025,51(7):2304-2313 doi: 10.13700/j.bh.1001-5965.2023.0345
LUO G F,ZHANG H,XU D. Image sentiment analysis by combining contextual correlation[J]. Journal of Beijing University of Aeronautics and Astronautics,2025,51(7):2304-2313 (in Chinese) doi: 10.13700/j.bh.1001-5965.2023.0345
Citation: LUO G F,ZHANG H,XU D. Image sentiment analysis by combining contextual correlation[J]. Journal of Beijing University of Aeronautics and Astronautics,2025,51(7):2304-2313 (in Chinese) doi: 10.13700/j.bh.1001-5965.2023.0345

结合上下文关联的图像情感分析

doi: 10.13700/j.bh.1001-5965.2023.0345
基金项目: 

国家自然科学基金(62162068,62061049);云南省万人计划“云岭学者”专项(YNWR-YLXZ-2018-022);云南省教育厅科学研究基金(2025J0008)

详细信息
    通讯作者:

    E-mail:danxu@ynu.edu.cn

  • 中图分类号: TP391

Image sentiment analysis by combining contextual correlation

Funds: 

National Natural Science Foundation of China (62162068,62061049); Yunnan Province Ten Thousand Talents Program and Yunling Scholars Special Project (YNWR-YLXZ-2018-022); Yunnan Provincial Department of Education Scientific Research Fund (2025J0008)

More Information
  • 摘要:

    图像情感分析旨在分析和理解视觉内容所传达的情感,其挑战在于弥合潜在视觉特征与抽象情感间的情感鸿沟。现有的深度模型试图一次性通过直接在全局范围内学习有辨别力的高级情感表征来弥合鸿沟,但忽略了深度模型各层特征之间的层次关系,导致上下文特征间的关联缺失。为此,提出一种上下文层次交互网络(CHINet)来建立层次结构中的上下文信息和情感之间的相关性模型。该模型包含2个分支:自下而上的主分支直接在高级语义层次上学习全局情感表征,针对该分支的不同层次特征,通过构建浅层风格编码器和情感激活注意力机制来分别提取风格表示并定位潜在情感激活区域;所提取的特征被级联到金字塔结构作为自上而下分支,从而建模上下文层次相关性并为情感表示提供浅层视觉特征。通过全局和局部学习将低级风格属性和高级图像语义整合到一起。实验结果表明:所提模型在FI数据集上较同类方法(包括多层次特征融合方法和结合了局部情感区域的方法)提升了情感识别准确率。

     

  • 图 1  来自FI数据集的情感图像示例

    Figure 1.  Affective image examples from FI dataset

    图 2  CHINet整体结构

    Figure 2.  Overall structure of CHINet

    图 3  浅层风格编码器

    Figure 3.  Shallow style encoder

    图 4  情感激活注意力机制

    Figure 4.  Emotion activation attention mechanism

    图 5  不同α对模型性能的影响

    Figure 5.  Effect of different α on model performance

    图 6  用于情感推理的类激活图可视化

    Figure 6.  Visualization of class activation map for emotion reasoning

    表  1  图像情感数据集详细信息

    Table  1.   Details of image emotion datasets

    数据集 积极极性数量 消极极性数量 合计
    愉悦 敬畏 满足 激动 小记 愤怒 恶心 恐惧 悲伤 小记
    Abstract[9] 25 15 63 36 139 3 18 36 32 89 228
    ArtPhoto[9] 101 102 70 105 378 77 70 115 166 428 806
    EmotionROI[14] 330 330 660 330 330 330 330 1 320 1 980
    FI[1] 4 942 3 151 5 374 2 963 16 430 1 266 1 658 1 032 2 922 6 878 23 308
    Twitter I[20] 769 500 1 269
    下载: 导出CSV

    表  2  FI上不同层的CNN情感分类准确率

    Table  2.   Emotion classification accuracy of CNN models with different layers on FI

    模型 分类准确率/%
    c=2 c=8
    AlexNet[16] 5 Conv + 2 fc 65.43 42.78
    VGG-16[36] 13 Conv + 2 fc 81.35 63.84
    VGG-19[36] 16 Conv + 2 fc 82.62 64.23
    DenseNet-100 w/o fc[37] 99 Conv 83.77 64.76
    DenseNet-100[37] 99 Conv + 2 fc 84.21 65.08
    ResNet-18 w/o fc[34] 17 Conv 80.16 60.76
    ResNet-18[34] 17 Conv + 2 fc 81.27 61.55
    ResNet-50 w/o fc[34] 49 Conv 83.52 62.81
    ResNet-50[34] 49 Conv + 2 fc 84.06 63.74
    ResNet-152 w/o fc[34] 151 Conv 84.73 66.27
    ResNet-152[34] 151 Conv + 2 fc 85.19 66.44
    下载: 导出CSV

    表  3  基于局部和全局学习的情感分类准确率

    Table  3.   Emotion classification accuracy based on local and global learning

    模型 准确率/%
    基线模型 85.19
    CHINet w/o G 87.36
    CHINet w/o L 88.41
    CHINet G&L 89.74
    下载: 导出CSV

    表  4  FI数据集上二元情感极性分类准确率比较

    Table  4.   Classification accuracy comparison of binary emotion polarity on FI dataset

    模型 网络 准确率/%
    基准模型 AlexNet[16] 60.54
    VGGNet[36] 70.64
    ResNet[34] 72.22
    Fine-tuned AlexNet[6] 72.43
    Fine-tuned VGGNet[36] 83.05
    Fine-tuned ResNet-50[34] 85.19
    对比方法 PCNN (VGGNet)[20] 75.34
    DeepSentiBank[21] 61.54
    AR[29] 86.35
    VSF[23] 88.11
    MSRCA[30] 87.40
    MLR[18] 87.87
    本文模型 CHINet w/o G 87.36
    CHINet w/o L 88.41
    CHINet G&L 89.74
    下载: 导出CSV

    表  5  小规模数据集上的二元情感极性分类准确率比较

    Table  5.   Classification accuracy comparison of binary emotion polarity on small-scale datasets %

    模型 网络 ArtPhoto[9] Abstract[9] Twitter I[20] EmotionROI[14]
    Twitter I 5 Twitter I 4 Twitter I 3
    手工特征 PAEF[13] 67.85 70.05 72.90 69.61 67.92 75.24
    SentiBank 67.74 64.95 71.32 68.28 66.63 66.18
    深度学习 DeepSentiBank[21] 68.73 71.19 76.35 70.15 71.25 70.11
    PCNN (VGGNet)[20] 70.96 70.84 82.54 76.52 76.36 73.58
    Fine-tuned VGG[36] 70.09 72.48 84.35 82.26 76.75 77.02
    AR[29] 74.80 76.03 88.65 85.10 81.06 81.26
    R-CNNGSR[31] 75.02 75.89 81.36
    VSF[23] 81.62 81.82 83.10 83.17
    MLR[18] 75.63 77.85 89.77 85.72 81.49 83.08
    本文模型 CHINet w/o G 78.61 79.07 86.54 83.04 81.36 82.26
    CHINet w/o L 81.12 81.15 89.51 82.95 82.17 82.47
    CHINet G&L 83.27 82.33 90.52 86.43 84.37 83.53
    下载: 导出CSV

    表  6  FI数据集上多类别情感分类性能比较

    Table  6.   Comparison of multi-category emotion classification performance on FI dataset

    模型 网络 准确率/%
    基线模型 AlexNet[16] 49.54
    VGGNet-19[36] 61.74
    ResNet[34] 53.08
    Inception[38] 60.12
    Inception-Resnet[39] 62.77
    对比模型 DeepSentiBank[21] 51.29
    PDANet[40] 69.42
    MKN[19] 63.92
    CycleEmotionGAN[41] 66.79
    Deep metric learning[42] 68.37
    MldrNet[16] 67.24
    OSSCM[22] 69.32
    WSCNet[3] 70.07
    VSF[23] 70.46
    MSRCA[30] 69.05
    MLR[18] 67.49
    本文模型 CHINet w/o G 69.94
    CHINet w/o L 70.32
    CHINet G&L 71.56
    下载: 导出CSV
  • [1] YOU Q Z, LUO J B, JIN H L, et al. Building a large scale dataset for image emotion recognition: the fine print and the benchmark[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2016, 308-314.
    [2] XU L W, WANG Z T, WU B, et al. MDAN: multi-level dependent attention network for visual emotion analysis[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2022: 9469-9478.
    [3] SHE D Y, YANG J F, CHENG M M, et al. WSCNet: weakly supervised coupled networks for visual sentiment classification and detection[J]. IEEE Transactions on Multimedia, 2020, 22(5): 1358-1371. doi: 10.1109/TMM.2019.2939744
    [4] YANG H S, FAN Y Y, LV G Y, et al. Exploiting emotional concepts for image emotion recognition[J]. The Visual Computer, 2023, 39(5): 2177-2190. doi: 10.1007/s00371-022-02472-8
    [5] YANG J Y, LI J, WANG X M, et al. Stimuli-aware visual emotion analysis[J]. IEEE Transactions on Image Processing, 2021, 30: 7432-7445. doi: 10.1109/TIP.2021.3106813
    [6] LIANG Y, MAEDA K, OGAWA T, et al. Chain centre loss: a psychology inspired loss function for image sentiment analysis[J]. Neurocomputing, 2022, 495: 118-128. doi: 10.1016/j.neucom.2022.04.016
    [7] 张浩, 李海鹏, 彭国琴, 等. 多层次特征融合表征的图像情感识别[J]. 计算机辅助设计与图形学学报, 2023, 35(10): 1566-1576.

    ZHANG H, LI H P, PENG G Q, et al. Image emotion recognition via fusion multi-level representations[J]. Journal of Computer-Aided Design & Computer Graphics, 2023, 35(10): 1566-1576(in Chinese).
    [8] ZHANG H M, XU M. Multiscale emotion representation learning for affective image recognition[J]. IEEE Transactions on Multimedia, 2022, 25: 2203-2212.
    [9] MACHAJDIK J, HANBURY A. Affective image classification using features inspired by psychology and art theory[C]//Proceedings of the 18th ACM International Conference on Multimedia. New York: ACM, 2010.
    [10] RAO T R, LI X X, XU M. Learning multi-level deep representations for image emotion classification[J]. Neural Processing Letters, 2020, 51(3): 2043-2061. doi: 10.1007/s11063-019-10033-9
    [11] 詹明. 融合风格特征的抽象图像情感识别[D]. 吉林: 吉林大学, 2023.

    ZHAN M. Affective analysis of abstract images using style representation[D]. Jilin : Jilin University, 2023(in Chinese).
    [12] 尹朝. 基于内容生成与特征提取的图像情感识别模型研究[J]. 系统仿真技术, 2023, 19(2): 141-147. doi: 10.3969/j.issn.1673-1964.2023.02.008

    YIN C. Research on image emotion recognition model based on feature extraction and content generation[J]. System Simulation Technology, 2023, 19(2): 141-147(in Chinese). doi: 10.3969/j.issn.1673-1964.2023.02.008
    [13] ZHAO S C, YAO X X, YANG J F, et al. Affective image content analysis: two decades review and new perspectives[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(10): 6729-6751. doi: 10.1109/TPAMI.2021.3094362
    [14] PENG K C, CHEN T, SADOVNIK A, et al. A mixed bag of emotions: model, predict, and transfer emotion distributions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2015: 860-868.
    [15] CAMPOS V, SALVADOR A, GIRO-I-NIETO X, et al. Diving deep into sentiment: understanding fine-tuned CNNs for visual sentiment prediction[C]//Proceedings of the 1st International Workshop on Affect & Sentiment in Multimedia. New York: ACM, 2015.
    [16] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90. doi: 10.1145/3065386
    [17] ZHU X G, LI L, ZHANG W G, et al. Dependency exploitation: a unified CNN-RNN approach for visual emotion recognition[C]//Proceedings of the 26th International Joint Conference on Artificial Intelligence. New York: ACM, 2017: 3595-3601.
    [18] ZHANG H, XU D, LUO G F, et al. Learning multi-level representations for affective image recognition[J]. Neural Computing and Applications, 2022, 34(16): 14107-14120. doi: 10.1007/s00521-022-07139-y
    [19] SHE D Y, SUN M, YANG J F. Learning discriminative sentiment representation from strongly- and weakly supervised CNNs[J]. ACM Transactions on Multimedia Computing, Communications, and Applications, 2019, 15(3s): 1-19.
    [20] YOU Q Z, LUO J B, JIN H L, et al. Robust image sentiment analysis using progressively trained and domain transferred deep networks[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2015, 29(1): 1-8.
    [21] CHEN T, BORTH D, DARRELL T, et al. DeepSentiBank: visual sentiment concept classification with deep convolutional neural networks[EB/OL]. (2014-10-30)[2023-06-01]. http://arxiv.org/abs/1410.8586.
    [22] ZHANG J, CHEN M, SUN H, et al. Object semantics sentiment correlation analysis enhanced image sentiment classification[J]. Knowledge-Based Systems, 2020, 191: 105245. doi: 10.1016/j.knosys.2019.105245
    [23] YAMAMOTO T, TAKEUCHI S, NAKAZAWA A. Image emotion recognition using visual and semantic features reflecting emotional and similar objects[J]. IEICE Transactions on Information and Systems, 2021, 104(10): 1691-1701.
    [24] YANG J Y, GAO X B, LI L D, et al. SOLVER: scene-object interrelated visual emotion reasoning network[J]. IEEE Transactions on Image Processing, 2021, 30: 8686-8701. doi: 10.1109/TIP.2021.3118983
    [25] DENG Z L, ZHU Q R, HE P, et al. A saliency detection and gram matrix transform-based convolutional neural network for image emotion classification[J]. Security and Communication Networks, 2021, 2021: 6854586.
    [26] 彭国琴. 基于深度学习的图像情感语义分析关键问题研究[D]. 昆明: 云南大学, 2021.

    PENG G Q. Research on key issues of image emotional semantic analysis based on deep learning[D]. Kunming: Yunnan University, 2021(in Chinese).
    [27] PENG K C, SADOVNIK A, GALLAGHER A, et al. Where do emotions come from? predicting the emotion stimuli map[C]//Proceedings of the IEEE International Conference on Image Processing. Piscataway: IEEE Press, 2016: 614-618.
    [28] SUN M, YANG J F, WANG K, et al. Discovering affective regions in deep convolutional neural networks for visual sentiment prediction[C]//Proceedings of the IEEE International Conference on Multimedia and Expo. Piscataway: IEEE Press, 2016: 1-6.
    [29] YANG J F, SHE D Y, SUN M, et al. Visual sentiment prediction based on automatic discovery of affective regions[J]. IEEE Transactions on Multimedia, 2018, 20(9): 2513-2525. doi: 10.1109/TMM.2018.2803520
    [30] ZHANG J, LIU X Y, CHEN M, et al. Image sentiment classification via multi-level sentiment region correlation analysis[J]. Neurocomputing, 2022, 469: 221-233. doi: 10.1016/j.neucom.2021.10.062
    [31] XIONG H T, LIU Q, SONG S Y, et al. Region-based convolutional neural network using group sparse regularization for image sentiment classification[J]. EURASIP Journal on Image and Video Processing, 2019, 2019(1): 30. doi: 10.1186/s13640-019-0433-8
    [32] 申朕, 崔超然, 董桂鑫, 等. 基于深度多任务学习的图像美感与情感联合预测研究[J]. 软件学报, 2023, 34(5): 2494-2506.

    SHEN Z, CUI C R, DONG G X, et al. Unified image aesthetic and emotional prediction based on deep multi-task learning[J]. Journal of Software, 2023, 34(5): 2494-2506(in Chinese).
    [33] ZHANG H, LUO G F, YUE Y Y, et al. Affective image recognition with multi-attribute knowledge in deep neural networks[J]. Multimedia Tools and Applications, 2024, 83(6): 18353-18379.
    [34] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Piscataway: IEEE Press, 2016: 770-778.
    [35] 张浩, 徐丹. 基于深度学习的少数民族绘画情感分析方法[J]. 中国科学: 信息科学, 2019, 49(2): 204-215. doi: 10.1360/N112018-00249

    ZHANG H, XU D. Ethnic painting analysis based on deep learning[J]. Scientia Sinica (Informationis), 2019, 49(2): 204-215 (in Chinese). doi: 10.1360/N112018-00249
    [36] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. (2015-04-10)[2023-06-01]. http://arxiv.org/abs/1409.1556.
    [37] HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 2261-2269.
    [38] SZEGEDY C, VANHOUCKE V, IOFFE S, et al. Rethinking the inception architecture for computer vision[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 2818-2826.
    [39] SZEGEDY C, IOFFE S, VANHOUCKE V, et al. Inception-v4, inception-ResNet and the impact of residual connections on learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2017, 31(1): 1-7.
    [40] ZHAO S C, JIA Z Z, CHEN H, et al. PDANet: polarity-consistent deep attention network for fine-grained visual emotion regression[C]//Proceedings of the 27th ACM International Conference on Multimedia. New York: ACM, 2019.
    [41] SUN Z K, SARMA P, SETHARES W, et al. Learning relationships between text, audio, and video via deep canonical correlation for multimodal language analysis[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2020, 34(5): 8992-8999.
    [42] YAO X X, SHE D Y, ZHANG H W, et al. Adaptive deep metric learning for affective image retrieval and classification[J]. IEEE Transactions on Multimedia, 2020, 23: 1640-1653.
    [43] SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization[J]. International Journal of Computer Vision, 2020, 128(2): 336-359. doi: 10.1007/s11263-019-01228-7
  • 加载中
图(6) / 表(6)
计量
  • 文章访问数:  409
  • HTML全文浏览量:  75
  • PDF下载量:  8
  • 被引次数: 0
出版历程
  • 收稿日期:  2023-06-12
  • 录用日期:  2023-09-08
  • 网络出版日期:  2023-11-07
  • 整期出版日期:  2025-07-31

目录

    /

    返回文章
    返回
    常见问答