Scientists have developed an automated knowledge portal known as COVIDScholar that complies more than 260,000 research articles, patents, and clinical trials related to the coronavirus disease 2019 (COVID-19). The portal has served more than 33,000 users since its release in 2020.
In a recent study published in PLoS One, researchers describe the development and utility of this COVID-19 research collection and analysis platform.
Study: COVIDScholar: An automated COVID-19 research aggregation and analysis platform. Image Credit: Thanakorn/Shutterstock
Background
The COVID-19 pandemic, which was caused by the emergence of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has caused significant damage to global healthcare and economic sectors. In an attempt to understand the dynamics of infection and develop interventions to counteract the pandemic, the global scientific community has responded at an unprecedented speed, which has led to the generation of an enormous amount of scientific literature.
Preprint servers have taken a more prominent role over conventional journals during the pandemic to manage a sharply increasing volume of literature. This has led to the accumulation of many low-quality articles, which can eventually affect the dissemination of impactful research.
Considering the need for a single comprehensive repository of COVID-19 literature, the scientists of the current study have developed an automated COVID-19 research collection and analysis platform (COVIDScholar) by using natural language processing (NLP) techniques.
Development of COVIDScholar
COVIDScholar is a data intake and processing pipeline, wherein COVID-19-related research articles, patents, and clinical trials are incorporated and processed to form a reliable repository.
In this platform, continuous monitoring of data sources for new documents is performed. The cleaning and analysis of incorporated documents are done by NLP techniques to produce document embeddings, COVID-19 relevance scores, inter-document similarity metrics, keywords, and subject-area tags.
Processed documents and NLP-derived metadata are finally made available for end-users on the frontend website, https://covidscholar.org.
COVIDScholar database
The COVIDScholar database comprises a total of 260,000 documents as of January 2022. The documents include 252,000 research articles, 3,303 patents, 1,712 clinical trials, 1,194 book chapters, and 1,196 datasets.
Of all research articles, 180,000 are directly related to COVID-19. Other articles are on COVID-19-related diseases, including severe acute respiratory syndrome (SARS), Middle East respiratory syndrome (MERS), and other respiratory diseases.
About 44% and 56% of documents sampled in October 2020 were preprints and peer-reviewed papers, respectively. In contrast, about 80% of current documents are peer-reviewed papers, thus indicating the evolution of high-quality literature.
The scientists used a Latent Dirichlet Allocation model to explore the subject distribution of documents available in COVIDScholar. The model estimates the number of topics in each document, as well as the number of words within each topic.
The researchers then applied the model to 10,000 randomly selected COVIDScholar documents published between January 2020 and January 2022. According to the model findings, topics related to “case numbers and pandemic growth” have reduced significantly from approximately 18% of papers published each month to less than 7.5%.
In contrast, an induction from 1-2% to more than 5% has been observed for topics related to “virology and mechanism” and “testing.” However, research articles related to these topics cover only 10% of articles published in January 2022.
About 85% of documents sampled in October 2020 in COVIDScholar were assigned at least one category label by the model. The most represented disciplines were “medical sciences” and “biological and chemical sciences,” followed by “public health,” “humanities and social sciences,” and “physical sciences, engineering, and computational studies.”
Furthermore, the scientists calculated the fraction of monthly COVID-19 publications primarily associated with each discipline. An increasing fraction of research in the “humanities/social sciences” category and a decreasing fraction of research in the “medical sciences” category was observed.
The induction in “humanities/social sciences” research coincides with the clear increase in studies investigating the impact of lockdown and social distancing on neuropsychological parameters.
Application of COVIDScholar
The COVIDScholar search portal has served over 33,000 users since its release in 2020, with over 8,600 users served weekly at its peak in the summer of 2020. On average, 2,000 active users are served on a monthly basis.
COVIDScholar is a large-scale artificial intelligence-driven platform with an enormous collection of COVID-19-related scientific literature. This platform can serve as a blueprint for future situations, where the rapid production and distribution of new scientific studies are inevitable.
Journal reference:
- Dagdelen, J., Trewartha, A., Huo, H., et al. (2023). COVIDScholar: An automated COVID-19 research aggregation and analysis platform. PLoS ONE. doi:10.1371/journal.pone.0281147.