Exploiting domain knowledge to address class imbalance and a heterogeneous feature space in multi-class classification
| dc.contributor.author | Hirsch, Vitali | |
| dc.contributor.author | Reimann, Peter | |
| dc.contributor.author | Treder-Tschechlov, Dennis | |
| dc.contributor.author | Schwarz, Holger | |
| dc.contributor.author | Mitschang, Bernhard | |
| dc.date.accessioned | 2025-03-19T13:17:54Z | |
| dc.date.issued | 2023 | |
| dc.date.updated | 2024-11-02T09:20:42Z | |
| dc.description.abstract | Real-world data of multi-class classification tasks often show complex data characteristics that lead to a reduced classification performance. Major analytical challenges are a high degree of multi-class imbalance within data and a heterogeneous feature space, which increases the number and complexity of class patterns. Existing solutions to classification or data pre-processing only address one of these two challenges in isolation. We propose a novel classification approach that explicitly addresses both challenges of multi-class imbalance and heterogeneous feature space together. As main contribution, this approach exploits domain knowledge in terms of a taxonomy to systematically prepare the training data. Based on an experimental evaluation on both real-world data and several synthetically generated data sets, we show that our approach outperforms any other classification technique in terms of accuracy. Furthermore, it entails considerable practical benefits in real-world use cases, e.g., it reduces rework required in the area of product quality control. | en |
| dc.description.sponsorship | Deutsche Forschungsgemeinschaft | |
| dc.description.sponsorship | Ministerium für Wissenschaft, Forschung und Kunst Baden-Württemberg | |
| dc.identifier.issn | 0949-877X | |
| dc.identifier.issn | 1066-8888 | |
| dc.identifier.other | 1923490761 | |
| dc.identifier.uri | http://nbn-resolving.de/urn:nbn:de:bsz:93-opus-ds-160370 | de |
| dc.identifier.uri | https://elib.uni-stuttgart.de/handle/11682/16037 | |
| dc.identifier.uri | https://doi.org/10.18419/opus-16018 | |
| dc.language.iso | en | |
| dc.relation.uri | doi:10.1007/s00778-023-00780-6 | |
| dc.rights | CC BY | |
| dc.rights | info:eu-repo/semantics/openAccess | |
| dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ | |
| dc.subject.ddc | 004 | |
| dc.title | Exploiting domain knowledge to address class imbalance and a heterogeneous feature space in multi-class classification | en |
| dc.type | article | |
| dc.type.version | publishedVersion | |
| ubs.fakultaet | Informatik, Elektrotechnik und Informationstechnik | |
| ubs.fakultaet | Fakultäts- und hochschulübergreifende Einrichtungen | |
| ubs.institut | Institut für Parallele und Verteilte Systeme | |
| ubs.institut | Graduate School of Excellence for Advanced Manufacturing Engineering (GSaME) | |
| ubs.publikation.seiten | 1037-1064 | |
| ubs.publikation.source | The VLDB journal 32 (2023), S. 1037-1064 | |
| ubs.publikation.typ | Zeitschriftenartikel |