Statistical model constructs SARS-CoV-2 interactome with miRNAs, genes encoding proteins and co-infecting microbes

The unprecedented global health crises caused by the severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) outbreak have led to extensive research focused on understanding the mechanism of infection of the virus, which will help develop effective treatments.

Although previous studies have reported protein-protein interaction links over the lifecycle of viral infection, a more complete understanding of the complete interactome containing human microribonucleic acids (miRNAs), the genes encoding them. protein and co-infecting microbes is crucial.

Study: Construction of a multilayer interactome for SARS-CoV-2 in the context of lung disease: binding the virus to human genes and co-infecting microbes. Image Credit: Connect the World /

About the study

In a recent study published on the Preprint Server bioRxiv *, a team of researchers recently developed a statistical modeling method known as multilayer crosstalk (MLCrosstalk).

MLCrosstalk is an advanced statistical model based on Latent Dirichlet Allocation (LDA) that links several types of data to construct the whole interactome for SARS-CoV-2. MLCrosstalk can integrate samples with multiple layers of information, ensure consistent thematic distribution across all types of data, and infer relationships at the individual level that may differ from patient to patient.

The researchers also implemented a secondary refinement with network propagation to allow microbe-gene linkages to focus on larger network structures. They first evaluated the model trained by analyzing the pooling of samples of subject distributions. The model groups people with coronavirus disease 2019 (COVID-19), healthy people, and those with community-acquired pneumonia (CAP) into separate groups.

The authors used several known gene sets such as the Kyoto Encyclopedia of Genes and Genomes (KEGG), Virus-Host Protein-Protein Interaction (PPI), WikiPathways, and COVID-19-related gene sets to annotate the functions of overweight genes. .

MLCrosstalk workflow. We transform gene expression, microbe abundance and (pre) miRNA expression data, which is then entered into the MLCrosstalk model. After the training, we apply network propagation to refine the links. Comparing multiple layers and tracing the network can identify shared and specific paths and connections.

Study results

The MLCrosstalk model had three major advantages in analyzing the integration of several types of data. To this end, this modeling approach dealt with sparse and noisy data using the Dirichlet distribution of the hyperparameter, unifies the distribution of subjects for all patients / samples, thus making it easier to identify links between various data types, and can be easily extended to many data types with missing samples.

After analyzing the COVID-19 datasets, MLCrosstalk extracted dimensionally reduced models to show a detailed link between genes encoding the host protein, non-coding genes, and microbes. MLCrosstalk constructed a complete interactome for the gene-microbe-miRNA network, which was refined by network propagation to integrate pathway data and link host-pathogen interactions with biological relevance.

The researchers used the Kullback-Leibler divergence between the subject distributions and compared it to a random background to find that Subject 9 is the most interesting subject and that it differs from the background distribution. plan. Using several sets of known genes, the authors annotated the functions of the most heavily weighted genes in Topic 9 to find that these genes are highly enriched in heat shock response proteins and immune pathways.

The researchers analyzed and compared microbes with possible associations with SARS-CoV-2 and found that Rothia mucilaginosa, Prevotella melaninogenica, and Haemophilus parainfluenzae has shown reduced relative abundance in patients with COVID-19. The results also showed that genes involved in the Notch signaling pathway like NOTCH4, HDAC2, PSEN1 are significantly upregulated in the lungs during COVID-19. Other bacteria like Escherichia coli, Staphylococcus aureus, and Klebsiella pneumoniae have also been shown to be strongly associated with COVID-19, although many of them, especially gram-negative bacteria, can be nosocomial in some settings.

The researchers deduced specific COVID-19 related genes by comparing their occurrences in healthy individuals and those with COVID-19. They found that VEGFA-VEGFR2 and cytoplasmic ribosomal protein were associated with COVID-19.

Top ranked pathways were identified using a random walk restart (RWR) approach, in which results identified VEGFA-VEGFR2 as well as the immune pathway. MLCrosstalk identified genes like IFNAR1, IFNAR2 and STAT associated with viral entry.


The MLCrosstalk statistical model developed by the researchers overcame three challenges, including heterogeneity and noise in the data, the integration of multiple types of data, and the personalized identification of links. Using MLCrosstalk, a list of genes and microbes associated with SARS-CoV-2 was identified, latent patterns from multiple datasets were retrieved, and sample-specific links showing biological evidence were identified. identified.

The team identified microbial co-infections associated with COVID-19, with some microbes showing synergistic and antagonistic effects with COVID-19. The linked genes of R. mucilaginosa had a high representation in COVID-19 patients, while those of P. melaninogenica had an inferior representation in the aforementioned lane, creating an opposite pattern. Such distinct patterns were also observed between these two groups of microbes for other pathways such as the immune response, type II interferon signaling, and Notch signaling pathways.

*Important Notice

bioRxiv publishes preliminary scientific reports that are not peer reviewed and, therefore, should not be considered conclusive, guide clinical practice / health-related behavior, or treated as established information.

Comments are closed.