留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于联合交互注意力的图文情感分析方法

胡慧君 丁子毅 张耀峰 刘茂福

胡慧君,丁子毅,张耀峰,等. 基于联合交互注意力的图文情感分析方法[J]. 北京麻豆精品秘 国产传媒学报,2025,51(7):2262-2270 doi: 10.13700/j.bh.1001-5965.2023.0365
引用本文: 胡慧君,丁子毅,张耀峰,等. 基于联合交互注意力的图文情感分析方法[J]. 北京麻豆精品秘 国产传媒学报,2025,51(7):2262-2270 doi: 10.13700/j.bh.1001-5965.2023.0365
HU H J,DING Z Y,ZHANG Y F,et al. Analysis of image and text sentiment method based on joint and interactive attention[J]. Journal of Beijing University of Aeronautics and Astronautics,2025,51(7):2262-2270 (in Chinese) doi: 10.13700/j.bh.1001-5965.2023.0365
Citation: HU H J,DING Z Y,ZHANG Y F,et al. Analysis of image and text sentiment method based on joint and interactive attention[J]. Journal of Beijing University of Aeronautics and Astronautics,2025,51(7):2262-2270 (in Chinese) doi: 10.13700/j.bh.1001-5965.2023.0365

基于联合交互注意力的图文情感分析方法

doi: 10.13700/j.bh.1001-5965.2023.0365
基金项目: 

国家自然科学基金(62271359); 国家社科基金重点项目(23ATJ005);“十四五”湖北省优势特色学科(群)项目(2023D0302);湖北省教育厅科研重点项目(20192202)

详细信息
    通讯作者:

    E-mail:yfzhang@hbue.edu.cn

  • 中图分类号: TP391

Analysis of image and text sentiment method based on joint and interactive attention

Funds: 

National Natural Science Foundation of China (62271359); Key Projects of the National Social Science Foundation of China (23ATJ005); The “14th Five Year Plan” Hubei Province Advantage Characteristic Discipline (Group) Project (2023D0302); Key Scientific Research Project of Hubei Provincial Department of Education (20192202)

More Information
  • 摘要:

    社交媒体中的图文情感对于引导舆论走向具有重要意义,越来越受到自然语言处理(NLP)领域的广泛关注。当前,社交媒体图文情感分析的研究对象主要为单幅图像文本对,针对无时序性及多样性的图集文本对的研究相对较少,为有效挖掘图集中图像与文本之间情感一致性信息,提出基于联合交互注意力的图文情感分析(SA-JIA)方法。该方法使用RoBERTa和双向门控循环单元(Bi-GRU)来提取文本表达特征,使用ResNet50获取图像视觉特征,利用联合注意力来找到图文情感信息表达一致的显著区域,获得新的文本和图像视觉特征,采用交互注意力关注模态间的特征交互,并进行多模态特征融合,进而完成情感分类任务。在IsTS-CN数据集和CCIR20-YQ数据集上进行了实验验证,结果表明:所提方法能够提升社交媒体图文情感分析的性能。

     

  • 图 1  SA-JIA方法框架

    Figure 1.  SA-JIA method framework

    图 2  联合注意力

    Figure 2.  Joint attention

    图 3  交互注意力

    Figure 3.  Interactive attention

    图 4  正确预测样例

    Figure 4.  Correct prediction example

    图 5  错误预测样例

    Figure 5.  Incorrect prediction example

    表  1  数据集划分

    Table  1.   Dataset partitioning

    数据集 训练集/条 验证集/条 测试集/条
    IsTS-CN 8 277 1 035 1 035
    CCIR20-YQ 54 044 6 756 6 756
    下载: 导出CSV

    表  2  SA-JIA方法参数设置

    Table  2.   SA-JIA method parameters setting

    学习率 句子最大
    长度
    下降率 注意力
    机制头数
    Loss 隐藏层维度
    2×10−5 200 0.5 5 CrossEntropyLoss 200
    下载: 导出CSV

    表  3  IsTS-CN和CCIR20-YQ数据集上的图文情感分析对比实验结果

    Table  3.   Comparative experimental results of image and text sentiment analysis on IsTS-CN and CCIR20-YQ datasets

    模态 方法 P R F1
    IsTS-CN CCIR20-YQ IsTS-CN CCIR20-YQ IsTS-CN CCIR20-YQ
    文本 Att-Bi-LSTM[25] 0.574 0.691 0.601 0.583 0.566 0.607
    BERT[7] 0.617 0.601 0.608 0.634 0.612 0.617
    RoBERTa[9] 0.637 0.682 0.638 0.764 0.637 0.708
    图文 VistaNet[3] 0.634 0.627 0.626 0.619 0.628 0.623
    mBERT[26] 0.733 0.709 0.723 0.702 0.727 0.705
    MMBT[15] 0.711 0.732 0.747 0.670 0.722 0.691
    EF-CapTrBERT[16] 0.723 0.717 0.720 0.692 0.721 0.703
    TIBERT[27] 0.716 0.695 0.726 0.687 0.721 0.692
    SA-JIA 0.722 0.741 0.778 0.708 0.734 0.718
    下载: 导出CSV

    表  4  IsTS-CN和CCIR20-YQ数据集上的消融实验结果

    Table  4.   Ablation experimental results on IsTS-CN and CCIR20-YQ datasets

    方法PRF1
    IsTS-CNCCIR20-YQIsTS-CNCCIR20-YQIsTS-CNCCIR20-YQ
    JIA-JA0.6700.7380.7240.6710.6110.661
    JIA-IA0.6490.7180.6140.6110.6020.608
    SA-JIA0.7220.7410.7780.7080.7340.718
    下载: 导出CSV
  • [1] XIA E, YUE H, LIU H F. Tweet sentiment analysis of the 2020 U. S. presidential election[C]//Proceedings of the 30th Web Conference . New York: ACM, 2021.
    [2] BONHEME L, GRZES M. SESAM at SemEval-2020 task 8: investigating the relationship between image and text in sentiment analysis of memes[C]//Proceedings of the Fourteenth Workshop on Semantic Evaluation. Barcelona : International Committee for Computational Linguistics, 2020.
    [3] TRUONG Q T, LAUW H W. VistaNet: visual aspect attention network for multimodal sentiment analysis[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2019, 33(1): 305-312.
    [4] STONE P J. Thematic text analysis: new agendas for analyzing text content[M]. London: Routledge, 2020: 35-54.
    [5] WIEBE J M, BRUCE R F, O’HARA T P. Development and use of a gold-standard data set for subjectivity classifications[C]//Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 1999.
    [6] BASIRI M E, NEMATI S, ABDAR M, et al. ABCDM: an attention-based bidirectional CNN-RNN deep model for sentiment analysis[J]. Future Generation Computer Systems, 2021, 115: 279-294.
    [7] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: human Language Technologies. Stroudsburg: Association for Computational Linguistics, 2019: 4171-4186.
    [8] DAI J Q, YAN H, SUN T X, et al. Does syntax matter? A strong baseline for aspect-based sentiment analysis with RoBERTa[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: human Language Technologies. Stroudsburg: Association for Computational Linguistics, 2021: 1816-1829.
    [9] LIU Y H, OTT M, GOYAL N, et al. RoBERTa: a robustly optimized BERT pretraining approach[EB/OL]. (2019-07-26)[2023-05-20]. http://arxiv.org/abs/1907.11692v1.
    [10] WU C, XIONG Q Y, GAO M, et al. A relative position attention network for aspect-based sentiment analysis[J]. Knowledge and Information Systems, 2021, 63(2): 333-347. doi: 10.1007/s10115-020-01512-w
    [11] WU T, PENG J J, ZHANG W Q, et al. Video sentiment analysis with bimodal information-augmented multi-head attention[J]. Knowledge-Based Systems, 2022, 235: 107676. doi: 10.1016/j.knosys.2021.107676
    [12] HAN W, CHEN H, GELBUKH A, et al. Bi-bimodal modality fusion for correlation-controlled multimodal sentiment analysis[C]//Proceedings of the International Conference on Multimodal Interaction. New York: ACM, 2021.
    [13] 王靖豪, 刘箴, 刘婷婷, 等. 基于多层次特征融合注意力网络的多模态情感分析[J]. 中文信息学报, 2022, 36(10): 145-154.

    WANG J H, LIU Z, LIU T T, et al. Multimodal sentiment analysis based on multilevel feature fusion attention network[J]. Journal of Chinese Information Processing, 2022, 36(10): 145-154(in Chinese).
    [14] WEN H L, YOU S D, FU Y. Cross-modal context-gated convolution for multi-modal sentiment analysis[J]. Pattern Recognition Letters, 2021, 146: 252-259. doi: 10.1016/j.patrec.2021.03.025
    [15] KIELA D, BHOOSHAN S, FIROOZ H, et al. Supervised multimodal bitransformers for classifying images and text[EB/OL]. (2020-11-12)[2023-05-20]. http://arxiv.org/abs/1909.02950.
    [16] KHAN Z, FU Y. Exploiting BERT for multimodal target sentiment classification through input space translation[C]//Proceedings of the 29th ACM International Conference on Multimedia. New York: ACM, 2021.
    [17] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[EB/OL]. (2017-06-12)[2023-05-20]. http://arxiv.org/abs/1706.03762.
    [18] TSAI Y H, BAI S J, LIANG P P, et al. Multimodal transformer for unaligned multimodal language sequences[EB/OL]. (2019-06-01)[2023-05-20]. http://arxiv.org/abs/1906.00295.
    [19] CHO K, VAN MERRIENBOER B, GULCEHRE C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[EB/OL]. (2014-06-03)[2023-05-20]. http://arxiv.org/abs/1406.1078v3.
    [20] DENG J, DONG W, SOCHER R, et al. ImageNet: a large-scale hierarchical image database[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2009: 248-255.
    [21] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 770-778.
    [22] BENGIO Y, SCHWENK H, SENÉCAL J S, et al. Neural probabilistic language models[M]. Berlin: Springer, 2006: 137-186.
    [23] BA J L, KIROS J R, HINTON G E. Layer normalization[EB/OL]. (2016-07-21)[2023-05-22]. http://arxiv.org/abs/1607.06450v1.
    [24] 曹梦丽. 基于辅助信息抽取与融合的社交媒体图文情感分析方法研究[D]. 武汉: 武汉科技大学, 2022.

    CAO M L. Research on sentiment analysis method of social media graphics and text based on auxiliary information extraction and fusion[D]. Wuhan: Wuhan University of Science and Technology, 2022(in Chinese).
    [25] ZHOU P, SHI W, TIAN J, et al. Attention-based bidirectional long short-term memory networks for relation classification[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2016.
    [26] YU J F, JIANG J. Adapting BERT for target-oriented multimodal sentiment classification[C]//Proceedings of the 28h International Joint Conference on Artificial Intelligence. Vienna : IJCAI, 2019: 5408-5414.
    [27] YU B H, WEI J X, YU B, et al. Feature-guided multimodal sentiment analysis towards industry 4.0[J]. Computers and Electrical Engineering, 2022, 100: 107961. doi: 10.1016/j.compeleceng.2022.107961
  • 加载中
图(5) / 表(4)
计量
  • 文章访问数:  588
  • HTML全文浏览量:  104
  • PDF下载量:  19
  • 被引次数: 0
出版历程
  • 收稿日期:  2023-06-15
  • 录用日期:  2023-10-12
  • 网络出版日期:  2023-11-23
  • 整期出版日期:  2025-07-14

目录

    /

    返回文章
    返回
    常见问答