ROSIE : RObust Sparse ensemble for outlIEr detection and gene selection in cancer omics data

dc.contributor.authorJensch, Antje
dc.contributor.authorLopes, Marta B.
dc.contributor.authorVinga, Susana
dc.contributor.authorRadde, Nicole
dc.date.accessioned2024-10-24T09:34:30Z
dc.date.available2024-10-24T09:34:30Z
dc.date.issued2022de
dc.date.updated2023-11-14T02:09:06Z
dc.description.abstractThe extraction of novel information from omics data is a challenging task, in particular, since the number of features (e.g. genes) often far exceeds the number of samples. In such a setting, conventional parameter estimation leads to ill-posed optimization problems, and regularization may be required. In addition, outliers can largely impact classification accuracy. Here we introduce ROSIE, an ensemble classification approach, which combines three sparse and robust classification methods for outlier detection and feature selection and further performs a bootstrap-based validity check. Outliers of ROSIE are determined by the rank product test using outlier rankings of all three methods, and important features are selected as features commonly selected by all methods. We apply ROSIE to RNA-Seq data from The Cancer Genome Atlas (TCGA) to classify observations into Triple-Negative Breast Cancer (TNBC) and non-TNBC tissue samples. The pre-processed dataset consists of 16,600 genes and more than 1,000 samples. We demonstrate that ROSIE selects important features and outliers in a robust way. Identified outliers are concordant with the distribution of the commonly selected genes by the three methods, and results are in line with other independent studies. Furthermore, we discuss the association of some of the selected genes with the TNBC subtype in other investigations. In summary, ROSIE constitutes a robust and sparse procedure to identify outliers and important genes through binary classification. Our approach is ad hoc applicable to other datasets, fulfilling the overall goal of simultaneously identifying outliers and candidate disease biomarkers to the targeted in therapy research and personalized medicine frameworks.en
dc.description.sponsorshipDeutsche Forschungsgemeinschaftde
dc.description.sponsorshipFundacao para a Ciencia e a Tecnologiade
dc.description.sponsorshipEuropean Union's Horizon 2020 research and innovation programmede
dc.identifier.issn1477-0334
dc.identifier.issn0962-2802
dc.identifier.other1908132477
dc.identifier.urihttp://nbn-resolving.de/urn:nbn:de:bsz:93-opus-ds-151534de
dc.identifier.urihttp://elib.uni-stuttgart.de/handle/11682/15153
dc.identifier.urihttp://dx.doi.org/10.18419/opus-15134
dc.language.isoende
dc.relationinfo:eu-repo/grantAgreement/EC/H2020/951970de
dc.relation.uridoi:10.1177/09622802211072456de
dc.rightsinfo:eu-repo/semantics/openAccessde
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/de
dc.subject.ddc570de
dc.subject.ddc610de
dc.titleROSIE : RObust Sparse ensemble for outlIEr detection and gene selection in cancer omics dataen
dc.typearticlede
ubs.fakultaetKonstruktions-, Produktions- und Fahrzeugtechnikde
ubs.fakultaetFakultätsübergreifend / Sonstige Einrichtungde
ubs.institutInstitut für Systemtheorie und Regelungstechnikde
ubs.institutFakultätsübergreifend / Sonstige Einrichtungde
ubs.publikation.seiten947-958de
ubs.publikation.sourceStatistical methods in medical research 31 (2022), S. 947-958de
ubs.publikation.typZeitschriftenartikelde

Files

Original bundle

Now showing 1 - 1 of 1
Thumbnail Image
Name:
10.1177_09622802211072456.pdf
Size:
2.38 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
3.3 KB
Format:
Item-specific license agreed upon to submission
Description: