Hi. We are Project JADE.

And this is our data science project, Time Series Analysis of NTF-ELCAC's Red-tagging of Community Pantries during the COVID-19 Pandemic. Our project aims to determine whether the red-tagging statements by NTF-ELCAC triggered a series of events resulting in the proliferation of fake news centered around community pantries.

Data Science Team

  • Encinares, Christian Dale | WFW
  • Jafri, Ali Mahmood | WFU

Here's an overview of our data science project.

Problem Formulation

On April 14, 2021, Ana Patricia Non started a community pantry on Maginhawa Street. This act inspired other Filipinos who in turn started their own community pantries. On April 19 and April 20 however, the NTF-ELCAC and other government entities such as the Quezon City police department would red-tag these community pantries, accusing them as communists using the pantries as a way to recruit members and collect funds. Because of these events, we wanted to know how much these statements negatively influenced the perception of community pantries.

Research Question

Did the NTF-ELCAC red-tagging statement negatively impact the perception of community pantries?

Hypothesis

The NTF-ELCAC statements heightened the occurrence of negative mis/disinformation tweets towards community pantries.

Null Hypothesis

The NTF-ELCAC statements did not heighten the occurrence of negative mis/disinformation tweets towards community pantries.

Action Plan

Analyze the dates where the negative mis/disinformation tweets about the community pantries occurred the most.

We looked for tweets containing fake news about community pantries.

Data Collection

Throughout our data collection phase, our team successfully gathered a total of 152 tweets that contained misinformation and disinformation about community pantries. We were able to gather tweets from a period spanning from the inception of the community pantry movement in 2021 up until 2022. It's worth noting that tweet didn't have to contain red-tagging in its content. It is included in our dataset as long as it involved the dissemination of fake news, with the aim of painting community pantries in a negative light.

Collection

Our team used a combination of automatic and manual methods to retrieve posts from Twitter. For automatic collection, we utilized the snscrape python library while for manual collection, we used Twitter's advanced search feature. Our intial scope of tweets was from 2016 to 2022 but since the COVID-19 community pantries only started around 2021, our range was narrowed down to 2021 to 2022.

Keywords

We used a number of keywords that we think would appear in tweets concerning the community pantry. These keywords generally involve the community pantries and its organizers, the CPP-NPA-NDF, and the NTF-ELCAC.

Some of the keywords we used are: NTF-ELCAC, NPA-CPP-NDF, Community Pantry, Patricia Non, Angel Locsin, Communist, and Propaganda

Review

Some of our peers reviewed the data we collected and identified some errors in our data's formatting and values. These were taken into account and corrected before proceeding with our data exploration.

We filtered out irrelevant features and visualized our data.

Data Exploration

Our data underwent preprocessing and exploration in order to ensure that the data we will use for our modelling is accurate as possible. We also explored and visualized our data through various visual tools to gain insights on underlying trends and patterns that would otherwise be difficult to see by just looking at raw data.

Preprocessing Steps

Feature Trimming

We identified the relevant features in our dataset and removed the ones we deemed irrelevant to this study. We were left with the following: ID, Account type, Account handle, Joined, Tweet, Tweet Type, Date posted, Content type, Likes, Replies, Retweets, Quote Tweets. Our primary focus of analysis is on the Date posted column.

Handling Missing Values

We ensured that none of our relevant features contained empty or null values. As a side effect of trimming the columns and manually reviewing values prior to importing the dataset, we found that all of the features already had complete values.

Handling outliers

We found that our only numerical values were Likes, Replies, Retweets, and Quote Tweets. Since we are mainly focusing on the Date posted feature, we decided not to perform normalization/standardization/scaling and kept the outliers

Ensuring Formatting Consistency

We ensured that features, especially our date and time values, had consistent formatting. We also made minor changes to feature labels for consistent capitalization.

Categorical data encoding

To help identify categories for each tweet, we performed one-hot encoding on our categorical features, Account type,Content type, and Tweet type. We used MultiLabelBinarizer for Tweet type and Content type as both of these can contain multiple labels. For Account type, we simply used pandas' get_dummies() function

Binning

In preparation for our time series analysis, we binned our time series data, the Date posted feature in particular, by day and by month.

Embedded below is a python notebook which contains our data preprocessing and exploration methodology.

Visualizations

Histograms: Engagement Distribution

Almost all of the tweets received little to no engagement.

Bar Graph: Distribution of Account Types

Majority of the users who posted fake news were anonymous followed by known identities. Our dataset did not contain fake news from media accounts.

Bar Graph: Distribution of Content Types

Majority of the fake news tweets appeal to emotion.

Bar Graph: Distribution of Tweet Types

Majority of the fake news tweets were text-based followed by replies.

Bar Graph: User Join Date

Majority of the users who posted fake news had newly created accounts from 2020 and 2021.

Scatter Plot: Correlation between Join Date and Post Date

There is a notable concentration of users who joined between the years 2020 to 2021 and posted a mis/disinformation tweet during the months of April and May 2021

We peformed time series analysis on our data.

Data Modelling

Given that the data we want to analyze is a time series, we used Event Detection Models to derive meaningful insights from such data. In particular, our team used Peak Finding and Change Point Detection as our Machine Learning models. Peak Finding was applied to identify the dates with the highest concentration of negative tweets, while Change Point Detection helped us pinpoint the dates when significant shifts occurred in the number of negative tweets.

For Peak Finding, our team only modified one of its parameters, the height parameter. It was set to a value of 4 which denotes that there must be at least 4 tweets on a given date for it to be considered a peak. From this, the Peak Finding Model detected three peaks occuring on the dates of April 20, 2021, April 23, 2021, and April 26, 2021 with 8, 12, and 15 fake news tweets respectively.

As for Change Point Detection, our team used the Pelt algorithm with the penalty value set to one. We believe that this penalty value accurately reflects the trends present in our data through mere visual inspection. The Change Point Detection Model detected seven change points occuring on the dates of April 21, 2021, May 1, 2021, May 21, 2021, July 30, 2021, August 4, 2021, May 16, 2022, and finally, May 26, 2022. We've noticed that these change points can be grouped together into 3 major periods occuring on April to May 2021, July to August 2021, and on May 2022.

From the results of these models and the patterns observed from the tweets themselves, we isolated and identified the following events to be probable causes of the peaks and change points in our data:

April to May 2021 Change Points:


July to August 2021 Change Points:


May 2022 Change Points:


We've noticed that, prior to April 20, 2021, there is a significant lack of negative mis/disinformation tweets concerning community pantries. However, starting from April 20, 2021, there was a sudden surge in such tweets, predominantly observed during the month of April 2021. Subsequently, the frequency of these tweets sharply declined in the following months, with only two identified periods of significant change based on the change points detected after May 2021.

Thus, our team has decided that we have insufficient evidence to conclude that the NTF-ELCAC red-tagging statements caused a spike in hate and mis/disinformation tweets towards community pantries. We believe that this is due to the lack of data prior to April 20, 2021. On top of this, the relevant and significant events in April 2021 are in close proximity to each other, making it difficult to determine the actual impact and influence that the NTF-ELCAC statement has on the peaks that follow. We recommend that future studies should focus on gathering more data from other social media platforms such as Facebook and Instagram to see the trend before the NTF-ELCAC statement.

Here's what we found out.

Data Communication

Introduction

In 2021, Community pantries were heavily affected by red-tagging from both citizens and government entities. Some were harassed by the police and had to close down temporarily. Our objective was to examine whether the government's,specifically the NTF-ELCAC's, red-tagging of community pantries influenced negative perceptions among the public towards these initiatives.

Materials and Methods

We collected various tweets that were against community pantries and contained disinformation and misinformation. Our primary focus was on recording and analyzing the dates when these tweets were posted in order to compare them to NTF-ELCAC's statements on April 19 and April 20, 2021. We used snscrape to retrieve most of the tweets and collected the rest manually using Twitter's advanced search function. We then performed peak detection and change point detection to test our hypothesis.

Results and Discussion

Given the absence of tweets prior to April 20, 2021, it is inconclusive whether the NTF-ELCAC statement resulted in an increase in hate and mis/disinformation tweets directed at community pantries. In order to ascertain the trend preceding the NTF-ELCAC statement, for future studies conducted on this topic,we suggest acquiring a larger dataset that includes posts from other social media platforms as well.

Implications

Establishing a correlation between red-tagging by organizations or prominent individuals and the presence of troll, disinformation, and misinformation posts highlights the need for regulations concerning red-tagging. Unwarranted and wanton red-tagging should be subject to penalities due to the possible harm it can inflict, as exemplified by what the organizers of the community pantries experienced.

Conclusion

Our team has concluded that we can neither reject nor accept the null hypothesis due to insufficient data and evidence. Though the effect of the NTF-ELCAC statement on negative Twitter posts is uncertain, we would like to emphasize the importance of these types of studies in order to determine and highlight the damages that red-tagging can do to both organizations and individuals alike. By doing so, we can put measures in place to prevent these kinds of detrimental events from reoccuring, safeguarding our community's humanitarian movements and programs.

Check out our poster!

Meet the Team

Christian Dale Encinares

Christian Dale Encinares is currently a 4th year BS Computer Science student studying at the University of the Philippines Diliman. He is currently under the Department of Computer Science's Computer Security Group conducting research on the usage of Serious Games as a means to promote cybersecurity literacy and awareness for Filipinos. On his free time, he enjoys playing video games and binging on tv series.

Ali Mahmood Jafri

Ali Jafri is a 2nd Year BS Computer Science student. His hobbies include: watching YouTube videos, listening to music, and playing games. He is also trying to learn Japanese.