Article
A Data Science Approach for the Identification of Molecular Signatures of Aggressive Cancers
Registro en:
SILVA, Adriano Barbosa et al. A Data Science Approach for the Identification of Molecular Signatures of Aggressive Cancers. Cancers, v. 14, 2325, p. 1 - 24, May 2022.
2072-6694
10.3390/cancers14092325
Autor
Silva, Adriano Barbosa
Magalhães, Milena
Silva, Gilberto Ferreira da
Silva, Fabricio Alves Barbosa da
Carneiro, Flávia Raquel Gonçalves
Carels, Nicolas
Resumen
SIMPLE SUMMARY:
Simple Summary: Traditionally, chemotherapy has been approached through one-size-fits-all strategies.
However, personalized oncology would allow a rational approach to chemotherapies. Classically,
cancer diagnosis and prognosis are performed through mutation mapping, but this genomic
approach has an indirect relationship with the disease since it is based on the results of statistics
accumulated over time. By contrast, a strategy based on gene expression would enable figuring
out the actual disease phenotype and focusing on its specific molecular targets. In previous reports,
we paved the way in that direction by successively showing that targeting up-regulated hubs are a
suitable strategy to forward a tumor toward cell death and that the number of proteins to be targeted
is typically between 3 and 10 according to tumor aggressiveness. In this report, we focused on the
up-regulated genes of crucial cell signaling pathways, which are key hallmarks of unregulated cell
division and apoptosis. By principal component analysis, we identified the genes that most explain
the aggressiveness among cancer types. We also identified the genes that maximized the classification
between aggressive and mild cancers using the random forest algorithm. Finally, by mapping these
genes on the human interactome, we showed that they were close neighbors. The main hallmarks of cancer include sustaining proliferative signaling and resisting cell
death. We analyzed the genes of the WNT pathway and seven cross-linked pathways that may
explain the differences in aggressiveness among cancer types. We divided six cancer types (liver,
lung, stomach, kidney, prostate, and thyroid) into classes of high (H) and low (L) aggressiveness
considering the TCGA data, and their correlations between Shannon entropy and 5-year overall
survival (OS). Then, we used principal component analysis (PCA), a random forest classifier (RFC),
and protein–protein interactions (PPI) to find the genes that correlated with aggressiveness. Using
PCA, we found GRB2, CTNNB1, SKP1, CSNK2A1, PRKDC, HDAC1, YWHAZ, YWHAB, and PSMD2.
Except for PSMD2, the RFC analysis showed a different list, which was CAD, PSMD14, APH1A,
PSMD2, SHC1, TMEFF2, PSMD11, H2AFZ, PSMB5, and NOTCH1. Both methods use different
algorithmic approaches and have different purposes, which explains the discrepancy between the
two gene lists. The key genes of aggressiveness found by PCA were those that maximized the separation of H and L classes according to its third component, which represented 19% of the total
variance. By contrast, RFC classified whether the RNA-seq of a tumor sample was of the H or L type.
Interestingly, PPIs showed that the genes of PCA and RFC lists were connected neighbors in the PPI
signaling network of WNT and cross-linked pathways.