Exploiting domain knowledge to address class imbalance and a heterogeneous feature space in multi-class classification

dc.contributor.authorHirsch, Vitali
dc.contributor.authorReimann, Peter
dc.contributor.authorTreder-Tschechlov, Dennis
dc.contributor.authorSchwarz, Holger
dc.contributor.authorMitschang, Bernhard
dc.date.accessioned2025-03-19T13:17:54Z
dc.date.issued2023
dc.date.updated2024-11-02T09:20:42Z
dc.description.abstractReal-world data of multi-class classification tasks often show complex data characteristics that lead to a reduced classification performance. Major analytical challenges are a high degree of multi-class imbalance within data and a heterogeneous feature space, which increases the number and complexity of class patterns. Existing solutions to classification or data pre-processing only address one of these two challenges in isolation. We propose a novel classification approach that explicitly addresses both challenges of multi-class imbalance and heterogeneous feature space together. As main contribution, this approach exploits domain knowledge in terms of a taxonomy to systematically prepare the training data. Based on an experimental evaluation on both real-world data and several synthetically generated data sets, we show that our approach outperforms any other classification technique in terms of accuracy. Furthermore, it entails considerable practical benefits in real-world use cases, e.g., it reduces rework required in the area of product quality control.en
dc.description.sponsorshipDeutsche Forschungsgemeinschaft
dc.description.sponsorshipMinisterium für Wissenschaft, Forschung und Kunst Baden-Württemberg
dc.identifier.issn0949-877X
dc.identifier.issn1066-8888
dc.identifier.other1923490761
dc.identifier.urihttp://nbn-resolving.de/urn:nbn:de:bsz:93-opus-ds-160370de
dc.identifier.urihttps://elib.uni-stuttgart.de/handle/11682/16037
dc.identifier.urihttps://doi.org/10.18419/opus-16018
dc.language.isoen
dc.relation.uridoi:10.1007/s00778-023-00780-6
dc.rightsCC BY
dc.rightsinfo:eu-repo/semantics/openAccess
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subject.ddc004
dc.titleExploiting domain knowledge to address class imbalance and a heterogeneous feature space in multi-class classificationen
dc.typearticle
dc.type.versionpublishedVersion
ubs.fakultaetInformatik, Elektrotechnik und Informationstechnik
ubs.fakultaetFakultäts- und hochschulübergreifende Einrichtungen
ubs.institutInstitut für Parallele und Verteilte Systeme
ubs.institutGraduate School of Excellence for Advanced Manufacturing Engineering (GSaME)
ubs.publikation.seiten1037-1064
ubs.publikation.sourceThe VLDB journal 32 (2023), S. 1037-1064
ubs.publikation.typZeitschriftenartikel

Files

Original bundle

Now showing 1 - 1 of 1
Thumbnail Image
Name:
s00778-023-00780-6.pdf
Size:
1.62 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
3.3 KB
Format:
Item-specific license agreed upon to submission
Description: