Emotion Analysis for Topic “Mudik” during Covid-19 Pandemic

7 min readJun 19, 2020

A month ago we just left the month of Ramadan 1441 H and unlike previous years, this year’s Ramadan is quite different due to the Covid-19 pandemic.

Due to the pandemic, many countries immediately made adjustments and restrictions to limit the spread of the virus. No exception in Indonesia.

The Indonesian government has issued several new policies. One of them is Pembatasan Social Skala Besar (PSBB), a regulation that forbids people to gather in large numbers. In this year’s Ramadan, the government also addressed a new policy which is to ban “mudik”.

Mudik is an activity commonly done by the people approaching the day of Eid al-Fitr. The purpose of this activity is to visit their hometown and gather with relatives to celebrate the religious day of Muslims. It is a very common activity as Indonesia is one of country with the largest Muslims community.

This homecoming ban by the Indonesian government received various responses from the public. Some of them accept and understand the situation, but not a few are against the policy.

I am interested to know how Indonesian netizens would react to the policy. I do my own research and analysis, so please keep in mind that it must have many shortcomings.

The analysis used Emotion Recognition Analysis — a part of Natural Language Processing (NLP). Similar to sentiment analysis but it aims to identify the human emotion in the selected media. Here, I am using text data to analysis how people's emotions appear on the topic of mudik.

In previous research, there are a lot of approaches initiated regarding emotion analysis such as keywords-based approach, lexicon-based approach, and even machine learning approach. The analysis itself quite popular in the last few decades.

For mine, I used a keywords-based approach, which is the simplest one among the others. This approach might the least powerful model, however, I personally think it is enough to give us brief information about what happened.

In emotion analysis, there are no thumb rules to classify the input into specific classes of emotion. It is us to the researchers. I decided to classify the emotion into eight different emotion classes such as joy, trust, fear, surprise, sadness, disgust, anger, and anticipation.

The dataset used in my analysis was retrieved from Twitter during the period of May 15 to Mat 22, 2020. I used Twitter Streaming API with the predefined keyword “mudik” to retrieve the data. During the period, a total of 37,093 tweets were collected.

Below are the step-by-step how I did the analysis.

1. Init Dictionary

Since I used the keywords-based approach, an initiated predefined keyword is needed. I used collected keywords from research by Philip R. Shaver in 2002 with the title “Structure of the Indonesian Emotion Lexicon”. It is published in Asian Journal of Social Psychology. Philip collected some terms related to human emotion such as love, happiness, anger, fear, and sadness.

The emotion of classes from Philip’s research somehow did not match my needs. I still lack keywords related to some of the classes. To accommodate this issue, I manually define the keywords. The collected keywords of each emotion class will be stored in one place and we will call it an init dictionary.

2. Dictionary Init Extension

In the init dictionary that was prepared in the previous step, I notice that some of the emotion classes only have a few keywords. This might affect the analysis later. I decided to use the thesaurus dictionary for Bahasa Indonesia.

I manually find the synonyms of each keyword to expand the collection of the init dictionary. I used a thesaurus dictionary for Bahasa Indonesia that had already been compiled by another developer here.

3. Keywords Validation

Before the init dictionary is used in the analysis, it requires to be validated first. Not all of the keywords collected in the previous steps will be used. Only those with the correct interpretation will be included. This process should be done manually by humans to reduce bias.

The validation process was done using Kamus Besar Bahasa Indonesia (KBBI) as a reference. Humans will evaluate each keyword's meaning in the init dictionary and decide whether it is correctly correlated with its emotion class.

4. Preprocessing Input Data

The data (Twitter’s tweets) should through some preprocessing such as tokenization, normalization, and keyword extraction before the model will extract what emotion appears there. The keywords in each tweet were represented by unigram, bigram, and word with the first capital letters.

5. Analysis of Similarity

The classification process was done using a cosine similarity algorithm that compares the collections of keywords from the tweets to the init dictionary. The emotion class with the highest cosine similarity score is chosen to be the representative of the input.

The whole process has been done using Python3.7 and MySQL as the database. While Tweepy is used as a tool to crawl the data.

Among all of the human emotions classes, TRUST and SADNESS appear to be the class with the higher proportion.

During this period of time, Trust appears at about 24.6% and Sadness appears at about 21.7% of all collected tweets.

Trust (percaya), Sadness (sedih), Joy (senang), Anger (marah), Suprise (terkejut), Fear (takut), Anticipation (antisipasi), Disgust (Jijik)

The Proportion of Emotion Classes on Topic Mudik

Here in this analysis, I exclude the tweets with the emotion class of unknown.

I personally think that Trust appears with the highest proportion because many netizens are skeptical of the policy from the government which forbids mudik will be obeyed by the community. I am curious about this so I decided to visualize the word cloud from tweets that represent the Trust class.

Please bear in mind that the color does not represent anything.

Some words like mudik, polri (police), pembatas (restriction), covid, aman (safe), and moda transportasti (transportation mode) are mostly appears in the collected tweets. It seems to me that most on the conversation there are related to how people will still force to do mudik with the government restriction. Even, I am having the slight idea that people might be curious about how strict the rules are because there is word polri (police) appear.

It seems the Trust there is more related to DISTRUST. The people might not fully believe that the restriction will be fully obeyed by the community.

From the class of Sadness, some words like mudik, polri (police), keluarga (family), Covid, peraturan (policy), and cegah (prevent) are mostly appear. I personally think the conversation around this emotion is related to how people feel sad because of this government restriction. They can not do something they usually did, and then they can not gather with relatives to celebrate the religious day.

During the period of monitoring, there are some media accounts that tweeted news related to mudik. In the top 3 is @detikcom with around 15M followers at that time. Followed by @kompascom with a total of 7M followers and @VIVAcoid with 4M followers size. The only account outside the media account in the top 10 is @dwitasaridwita with followers size of around 1,3M at that time.

From the top 10 compositions, we know that this restriction from the government is a hot topic.

The tweet collected during the period mostly came from users located in some of the big cities in Indonesia like Yogyakarta, Banda Aceh, Jakarta, Tangerang, and Surabaya. The appearance of Yogyakarta might have been triggered since I run the script locally in the Yogyakarta area.

There are still many shortcomings in this analysis since the data that I used are not that big and the method is the simplest one. Let me know if you have further questions or perhaps want some discussion. Feel free to contact me.

Thanks for reading!

Emotion Analysis for Topic “Mudik” during Covid-19 Pandemic

Written by Rio Rizki Aryanto