Integrative Multi-Omics and AI-Driven Biomarker Discovery for Early Diagnosis of Complex Diseases
Abstract
The advent of massive data-gathering technologies offers unprecedented opportunities for exploratory, hypothesis-generating research. Due to the complexity of biological systems, such data represent an incredibly intricate combination of biological, technical, biological-laboratory, data-integration, and analytical noise. Consequently, to glean conclusions that can genuinely advance knowledge, the first step is to apply validated data-agnostic and data-driven clustering- and dimensionality-reducing algorithms to reveal the key biological variables contributing most, and then study their interaction and interdependence.
This article presents a general multi-omics framework that integrates gene expression, methylation, expression protein mass spectrometry, and copy number alteration, along with clinical follow-up, patient information, key pathways, and gene-gene networks of involvement, and encompasses unsupervised algorithms operating to reveal features most informative of ovarian cancer (OC).
Such an integrated framework, which can incorporate other “omics” data as they become available, offers multiple opportunities, ranging from supervised and non-supervised feature generation of a multi-omics type to integration of different types. It opens up unexplored avenues for the extraction of any type of biological knowledge from any type of data, irrespective of its discipline, bioprocess involved, or its dimensionality be that empirical time-series data, Boolean data, or others.
The focus presented here is strictly on experimental biological data relevant to comprehension of a dynamic biological system and its discoveries of networkome, pathway redundancies, driver(s) under combinations/chains of events, or other outputs informative of such a dynamical biological system. Emphasis is put on data exploration methodology, priori requirements, types, and advantages of different T.sensor-linked experimental data, observables, and the phase space explored, and on how to prepare the data in a compatible way for ensuing analysis and hypotheses generation. Key aspects in terms of generality of application to biological knowledge discovery from any type of experimental data are also discussed.