Sarcasm Detection

Introduction

Sarcasm detection is one of the key subfields of the study of Sentiment Analysis because sarcastic texts can often be mistaken by machine learning models to have positive polarity, but maybe a negative opinion was conveyed.

This paper proposes a hybrid deep learning model that is Self-Attention-based Bidirectional Long Short-term Memory and Convolutional Neural Network (sAtt-BLSTM-CNN) for detecting sarcasm in social network contents automatically. Punctuation-, sentiment-, and semantic-based auxiliary features are merged into CNN alongside with the feature maps generated by sAtt-BLSTM-CNN. Code can be found in this GitHub Repo.

Three datasets are used to investigate the robustness of the proposed model: an imbalanced dataset from SemEval 2018 Task 3A (Van Hee, et al., 2018) containing 3,834 annotated tweets of which 1,923 tweets are not sarcastic and 1,911 tweets are sarcastic, an imbalanced dataset provided by (Riloff, et al., 2013) containing 877 annotated tweets of which 721 are not sarcastic and 156 are sarcastic, and a balanced dataset containing harvested real-time tweets obtained from Twitter API with 2,000 tweets annotated by the author of this paper.

Model Architecture

Results

The proposed model obtained a testing accuracy of 66%, 77%, and 90% when predicting the unseen test data on the SemEval, Riloff, and, Harvested datasets, respectively. The figures below also show the proposed model with three other word embedding method: Word2Vec, GloVe, and BERT.