Relevance of the two adjusting screws in data analytics: data quality and optimization of algorithms

dc.contributor.authorBettadapura Raghavendra, Shreyas
dc.date.accessioned2017-10-26T13:30:56Z
dc.date.available2017-10-26T13:30:56Z
dc.date.issued2017de
dc.description.abstractIn the context of learning from data, the impact on the performance of a learning algorithm has traditionally been studied through the perspective of data preprocessing and through that of empirical works. We attempt to provide a middle ground by employing an approach which enables a systematic analysis considering the interaction between the quality of the data provided for training, and the configurations applied to the learning algorithm. This is achieved through the concepts of a Data Quality Profile, which depicts quality indicators for the dataset and a Classification Configuration Profile, which depicts the configuration parameters applied to the learning algorithm. Both the profiles have the common characteristic of being able to distinctly view, and equally represent the variations in their properties, allowing for a systematic study. We demonstrate this through a prototypical implementation, considering the data quality indicators of missing values, label imbalance, and high cardinality, and evaluating it against the CART Decision Tree algorithm, configurable by its splitting criteria, early stopping criteria, and training data preprocessing operations. We were able to successfully observe a relationship between decreasing quality of the training data, and deterioration in the performance of the algorithm. The flexibility of the approach allows for easy progression to other algorithms, and implementations of more quality indicators.en
dc.identifier.other496363077
dc.identifier.urihttp://nbn-resolving.de/urn:nbn:de:bsz:93-opus-ds-93311de
dc.identifier.urihttp://elib.uni-stuttgart.de/handle/11682/9331
dc.identifier.urihttp://dx.doi.org/10.18419/opus-9314
dc.language.isoende
dc.rightsinfo:eu-repo/semantics/openAccessde
dc.subject.ddc004de
dc.titleRelevance of the two adjusting screws in data analytics: data quality and optimization of algorithmsen
dc.typemasterThesisde
ubs.fakultaetInformatik, Elektrotechnik und Informationstechnikde
ubs.institutInstitut für Parallele und Verteilte Systemede
ubs.publikation.seiten94de
ubs.publikation.typAbschlussarbeit (Master)de

Files

Original bundle

Now showing 1 - 1 of 1
Thumbnail Image
Name:
ShreyasThesisFinal.pdf
Size:
1.22 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
3.39 KB
Format:
Item-specific license agreed upon to submission
Description: