Exploring classification algorithms and data feature selection for domain specific industrial text data

Villanueva Zacarías, Alejandro Gabriel

Exploring classification algorithms and data feature selection for domain specific industrial text data

Files

AGVZ Master thesis.pdf (3.65 MB)

Date

2016

Authors

Villanueva Zacarías, Alejandro Gabriel

Abstract

Unstructured text data represents a valuable source of information that nonetheless remains sub utilised due to the lack of efficient methods to manipulate it and extract insights from it. One example of such deficiencies is the lack of suitable classification solutions that address the particular nature of domain-specific industrial text data. In this thesis we explore the factors that impact the performance of classification algorithms, as well as the properties of domain-specific industrial text data, to propose a framework that guides the design of text classification solutions that can achieve an optimal trade-off between accuracy and processing time. Our research model investigates the effect that the availability of data features has on the observed performance of a classification algorithm. To explain this relationship, we build a series of prototypical Naïve Bayes algorithm configurations out of existing components and test them on two role datasets from a quality process of an automotive company. A key finding is that properly designed feature selection techniques can play a major role in achieving optimal performance both in terms of accuracy and processing time by providing the right amount of meaningful features. We test our results for statistical significance, proceed to suggest an optimal solution for our application scenario and conclude by describing the nature of the variable relationships contained in our research model.

URI

http://nbn-resolving.de/urn:nbn:de:bsz:93-opus-ds-99062
http://elib.uni-stuttgart.de/handle/11682/9906
http://dx.doi.org/10.18419/opus-9889

Collections

05 Fakultät Informatik, Elektrotechnik und Informationstechnik

Full item page

Exploring classification algorithms and data feature selection for domain specific industrial text data

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By