Image and Text Aspect-Level Sentiment Analysis Based on Attentional Mechanisms and Bimodal Fusion

Image and Text Aspect-Level Sentiment Analysis Based on Attentional Mechanisms and Bimodal Fusion

Jinsu Ma (School of Information Engineering, Henan University of Animal Husbandry and Economy, Zhengzhou, China) and Liwu Pan (School of Information Engineering, Henan University of Animal Husbandry and Economy, Zhengzhou, China)
Copyright: © 2025 |Pages: 23
DOI: 10.4018/IJDSST.370388
Article PDF Download
Open access articles are freely available for download

Abstract

To address the issues in existing image-text aspect-level sentiment analysis methods, such as insufficient feature extraction from a single modality, neglecting the association between text and target words, and inadequate interaction between modalities, an image-text aspect-level sentiment analysis method based on the attention mechanism and bimodal fusion is proposed. The model fully realizes the interaction between aspect and text, and between aspect words and images through the self-attention mechanism and graph convolutional network, and then realizes the deep interaction and fusion of inter-modal information through the bimodal fusion mechanism, in order to enhance the precision of sentiment classification. The experimental findings demonstrate that the suggested ITASA-AMB achieves ACC and F1 values of 87.6% and 80.1% on the Twitter-2015 dataset; and 82.3% and 77.2% on the Twitter-2017 dataset. All of them are significantly enhanced compared to several other advanced multimodal sentiment analysis methods, which can enhance the accuracy of sentiment classification.
Article Preview
Top

Introduction

In today’s era of information explosion, sentiment analysis (Sivakumar et al., 2022; Wu et al., 2024), as a technique that can automatically recognize and extract sentiment information in text or images, has received widespread attention and application. Traditional sentiment analysis typically concentrates on identifying general emotional trends, but in real-world applications, users’ emotional expressions often encompass various dimensions. Aspect-based sentiment analysis (Meng et al., 2023; Wang & Li, 2023) has thus emerged, which not only analyzes the overall sentiment but also refines the sentiment tendency to specific aspects, thus providing more precise and valuable insights. Aspect-level sentiment analysis refers to the identification and categorization of sentiment tendencies for specific aspects in texts, such as user comments or social media posts (Jiang et al., 2011).

With the proliferation of social media and multimedia content, it has become difficult to fully capture users’ emotional expressions with pure text analysis. By analyzing both images and text, sentiment analysis methods that combine these two modalities can offer a more precise and holistic understanding of a user’s emotional state from various perspectives. With the support of computer technology, combined image-text aspect-level sentiment analysis (ITASA) becomes possible. The extraction of visual features from images and semantic features from text, as well as the fusion of these two features, is made possible by the advancements in computer vision (Batch et al., 2023; Li et al., 2024; yadav & Raj, 2021) and natural language processing techniques (Shivahare et al., 2022). This enables a comprehensive analysis of emotions. Deep learning models, particularly convolutional neural networks (CNNs; Bhuvaneshwari et al., 2022; Joloudari et al., 2023) and recurrent neural networks (RNNs; Alroobaea, 2022; Topbaş et al., 2021), have demonstrated strong capabilities in image and text processing and become powerful tools for image and text sentiment analysis. Despite the significant advantages they bring to image and text processing, CNNs and RNNs have some drawbacks. CNNs have limited ability to capture long-distance-dependent and sequential information when processing text, although they can capture localized features, which may lead to poor results when processing complex textual sentiment. RNNs, especially traditional RNNs, suffer from gradient vanishing and gradient explosion problems due to the nature of their sequence processing, which impacts the effectiveness of training on long sequence data.

Complete Article List

Search this Journal:
Reset
Volume 17: 1 Issue (2025)
Volume 16: 1 Issue (2024)
Volume 15: 2 Issues (2023)
Volume 14: 4 Issues (2022): 1 Released, 3 Forthcoming
Volume 13: 4 Issues (2021)
Volume 12: 4 Issues (2020)
Volume 11: 4 Issues (2019)
Volume 10: 4 Issues (2018)
Volume 9: 4 Issues (2017)
Volume 8: 4 Issues (2016)
Volume 7: 4 Issues (2015)
Volume 6: 4 Issues (2014)
Volume 5: 4 Issues (2013)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing