Corpus Annotation and Analysis of Sarcasm in Twitter: #CatsMovie vs. #TheRiseOfSkywalker


Sentiment analysis is a natural language processing task that has received increased attention in the last decade due to the vast amount of opinionated data on social media platforms such as Twitter. Although the methodologies employed have grown in number and sophistication, analysing irony and sarcasm still poses a severe problem. From the linguistic perspective, sarcasm has been studied in discourse analysis from several perspectives, but little attention has been given to specific metrics that measure its relevance. In this paper we describe the creation of a manually-annotated dataset where detailed text markers are included. This dataset is a sample from a larger corpus of tweets (n= 76,764) on two highly controversial films: Cats and Star Wars: The Rise of Skywalker. We took two different samples for each film, one before and one after their release, to compare reception and presence of sarcasm. We then used a sentiment analysis tool to measure the impact of sarcasm in polarity detection and then manually classified the mechanisms of sarcasm generation. The resulting corpus will be useful for machine learning approaches to sarcasm detection as well as discourse analysis studies on irony and sarcasm.

Author Biographies

Antonio Moreno-Ortiz, Universidad de Málaga
Antonio Moreno-Ortiz is a Senior Lecturer and researcher at the University of Málaga, where he has worked for more than 20 years. His research interests include computational linguistics, corpus linguistics and language technologies, which has led him to develop multiple linguistic resources for natural language processing, such as BNC Indexer, OntoTerm, Sentitext and Lingmotif.
María García-Gámez, Universidad de Málaga
María García-Gámez is a research assistant at the University of Málaga, where she is currently working on her PhD thesis as a fully funded candidate. She holds a BA degree in English Studies and an MA in English Studies and Multilingual and Intercultural Communication. Her research interests involve corpus linguistics, sentiment analysis and the use of sarcasm in social media.


