Introduction to big data in biology
Targeted disease treatment
How does machine learning work in biology?
Applications of machine learning in biology and medicine?
References
Further Reading
Nowadays vast swathes of data are being generated in biological laboratories across the globe. These might emanate from genetic sequencing, metabolome analysis or similar. Advances such as these have enabled us to further our understanding of the intricacies of human biology and disease.
Recently the ability to mine such data for our greater benefit has entailed the use of higher-level analytical methods. Machine learning is a next generation technology and a sub-set of Artificial Intelligence (AI) now being used to navigate complex biological information in the search for specific patterns. This makes machine learning an ideal tool for targeted therapy in medicine.
Image Credit: PopTika/Shutterstock.com
Introduction to big data biology
Over the last ten years, there has been a dramatic increase in the number of large, highly complex datasets being generated in biological experimentation. These capture gene, protein, and metabolite abundance, microbiome composition, and population-wide genetic variation amongst other variables. As Camacho et al. (2018) have observed, “we live in the age of big data in biology and medicine, where data are collected on many different layers of biological organization.”
Collaborative researchers participating in the realm of biological big data experimentation nowadays typically generate petabytes (one petabyte is equal to one thousand million million ((1015)) or, strictly, 250 bytes) of data. Take The Cancer Genome Atlas (TCGA) for example, which has generated 2.5 petabytes of genomic, transcriptomic, proteomic and epigenomic data. This ground-breaking cancer genomics program has sampled multiple -omics measurements across 33 different cancer types.
Image Credit: Jirsak/Shutterstock.com
Targeted disease treatment
Targeted disease treatment is a type of precision medicine. An expanding area within this domain is that of targeted therapy, a cancer treatment that specifically targets the proteins in control of cancer cells. The treatment involves the use of drugs or other substances which are ‘targeted’ to carcinogenic cells for disruption or destruction. Meanwhile healthy cells in close proximity to the pathological region remain intact. Some common types of targeted therapy include monoclonal antibodies or small-molecule drugs.
How does machine learning work in biology?
Machine learning is a branch of AI and computer science that makes use of algorithms to imitate the way humans learn. This involves the use of computer software that can learn and adapt without the need for programming. By using algorithms and statistical data, machine learning is able to draw inferences from patterns of data. Real world examples include voice search technology and image recognition. In the life sciences laboratory, machine learning becomes an ideal tool for navigating big biological datasets that are nowadays being generated in abundance.
Machine learning based AI can be utilized for the detection of novel cancer targets. Applications consist of classification, clustering and neural networks. Two of the traditional machine learning-based algorithms are (1) decision tree and (2) deep learning. The decision tree algorithm works by selecting the topological features for cancer. Here a supervised classification algorithm (involving the use of training data that is already labelled or classified) is employed. This means specific biomarkers (such as genes or proteins) can be classified as key targets. Such classification-based applications can now utilize genome-wide transcription profiles, protein expression profiles and/or mutational landscapes to make highly accurate classifications of tumor subtypes.
The deep learning algorithm uses neural network features (an artificial network that mimics the biological neuronal circuitry of the human brain) in the identification of cancer targets and drug discovery. Presently many neural network models and being deployed for machine learning-based analysis. These benefit from a strong ability to mine complex biological information via links or nodes (i.e., interconnected ‘neurons’ modelled on the human brain).
The identification and annotation of genes in a newly sequenced genome presents a specific example of machine learning in the biological context. Here a machine-learning algorithm can learn about the genome and its key features, like a transcriptional start site or specific genomic properties of genes such as the GC content. This knowledge is then utilized to generate a model for finding these key properties. The algorithm can apply what it has learned from the training data to an entirely new genome and make predictions about organization and functional capacity.
Artificial intelligence in healthcare: opportunities and challenges | Navid Toosi Saidy | TEDxQUT
Applications of machine learning in biology and medicine?
The recent development of cancer-related multi-omics technologies are crucial for the exploration of novel anticancer targets and are perfectly partnered to AI biology analysis.
Some applications of machine learning that are becoming more widely used in biology include genome annotation; predictions of protein binding; the identification of key transcriptional drivers of cancer; predictions of metabolic functions in complex microbial communities and the characterization of transcriptional regulatory networks (Camacho, et al., 2018). In fact, any task where a pattern can be learned and then applied to a new dataset can be subject to machine learning.
References
Further Reading