In a recent study posted to the medRxiv* pre-print server, researchers evaluated methods that could automatically and quickly detect counterfeit coronavirus disease 2019 (COVID-19) prevention and treatment products using Twitter chats.
They employed natural language processing (NLP) and time-series anomaly detection methods based on the conviction that as any fraudulent product gains popularity among Twitter users, there is a corresponding increase in the volume of chats or mentions about the product. Intriguingly, these novel detection methods quickly detect sudden increases in the frequency of mentions on social media platforms, including Twitter and Facebook.
Study: Early detection of fraudulent COVID-19 products from Twitter chatter. Image Credit: Michele Ursi / Shutterstock
This news article was a review of a preliminary scientific report that had not undergone peer-review at the time of publication. Since its initial publication, the scientific report has now been peer reviewed and accepted for publication in a Scientific Journal. Links to the preliminary and peer-reviewed reports are available in the Sources section at the bottom of this article. View Sources
Background
Amid actual efforts to mitigate the impact of the COVID-19 pandemic by public health agencies globally, the unscrupulous promotion of fraudulent products claiming to treat, prevent, or cure severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection has been a persistent and annoying issue.
The United States Food and Drug Administration (FDA) issues warning letters to curb the spread of such products; however, only after many people have been exposed to them. However, in the US, such products cannot be sold or advertised on television or News. Therefore, entities selling such products promote them on social media platforms causing the spread of misinformation or an infodemic.
Therefore, there is an urgent need to devise vigilance tools that automatically identify potentially counterfeit COVID-19 products early and generate alerts. Fortunately, it is possible to automate real-time surveillance of fraudulent COVID-19 products on social media.
About the study
In the present study, researchers employed time-series anomaly detection methods to detect any or all abnormal increases in the mentions of COVID-19-related counterfeit products on Twitter. They systemically curated all the Twitter chats via NLP to generate alerts. The team used real-time data from the Twitter COVID-19 application-programming interface (API), directly provided by Twitter, to support COVID-19-related research. Subsequently, the team could gather 577,872,350 tweets that mentioned COVID-19 related keywords, including coronavirus, covid, etc., between February 19, 2020, and December 31, 2020.
The researchers excluded keywords collected after 2020 and 12 keywords that were mentioned less than 10 times on Twitter, including their linguistic variants. They gathered data continuously and stored it in a database hosted on the Google Cloud Platform.
Next, the team manually curated a comprehensive list of counterfeit COVID-19 products from the US FDA website. Likewise, they listed person(s) names who owned these products, their websites, and social media profiles, if any. The researchers also reviewed 183 FDA warning letters manually to create a list of products and entities and their earliest FDA issuance letter dates.
Further, they used a data-centric tool to catch spelling variants or misspellings in the names of counterfeit COVID-19 products. The variant generation tool applied semantic and lexical similarity measures to automatically identify such errors, including key phrases and multi-word expressions.
The team analyzed all the products and keyphrase spelling variants with at least 10 mentions in the curated data. Then, they normalized daily counts by the total number of Twitter posts collected on the same day. The mentions per 1000 tweets depicted the daily relative frequencies of COVID-19-related keywords and phrases.
Lastly, any data point at a distance farther than three standard deviations (SDs) from the 14-day moving average was considered a potential signal. It helped researchers determine whether the date of the first signal for a COVID-19-related keyword was detected earlier than the FDA letter issuance date, within a week or later.
Study findings
The FDA warning letters were issued between March 6, 2020, and June 22, 2021. The authors identified 221 potential keywords associated with the counterfeit COVID-19 products or the entities selling them. Of the total, the researchers assessed only 56 keywords because they only considered the first mention of a keyword in their analysis for early detection.
In total, 44 key phrases related to COVID-19 met all the inclusion criteria, and 43 of the 44 key phrases showed abnormal increases in their mentions at some point. A staggering 77.3% of keywords (34/44) were detectable before the FDA letter issuance dates through Twitter chatter. An additional 13.6% of keywords anomalously increased within seven days of the FDA letter issuance dates.
Conclusions
According to the authors, the current study is the first to use social media-based surveillance for detecting COVID-19 counterfeit products early relative to the FDA warning issuance dates. Specifically, the researchers identified products that gained popularity via promotion on Twitter. The study approach was simple, unsupervised with no need for training data, and economical because it relied on publically available social media chatter.
This news article was a review of a preliminary scientific report that had not undergone peer-review at the time of publication. Since its initial publication, the scientific report has now been peer reviewed and accepted for publication in a Scientific Journal. Links to the preliminary and peer-reviewed reports are available in the Sources section at the bottom of this article. View Sources
Journal references:
- Preliminary scientific report.
Source: Early detection of fraudulent COVID-19 products from Twitter chatter, Abeed Sarker, Sahithi Lakamana, Ruqi Liao, Aamir Abbas, Yuan-Chi Yang, Mohammed Ali Al-Garadi, medRxiv pre-print 2022, DOI: https://doi.org/10.1101/2022.05.09.22274776, https://www.medrxiv.org/content/10.1101/2022.05.09.22274776v1
- Peer reviewed and published scientific report.
Sarker, Abeed, Sahithi Lakamana, Ruqi Liao, Aamir Abbas, Yuan-Chi Yang, and Mohammed Al-Garadi. 2023. “The Early Detection of Fraudulent COVID-19 Products from Twitter Chatter: Data Set and Baseline Approach Using Anomaly Detection.” JMIR Infodemiology 3 (March): e43694. https://doi.org/10.2196/43694. https://infodemiology.jmir.org/2023/1/e43694.
Article Revisions
- May 13 2023 - The preprint preliminary research paper that this article was based upon was accepted for publication in a peer-reviewed Scientific Journal. This article was edited accordingly to include a link to the final peer-reviewed paper, now shown in the sources section.