Please use this identifier to cite or link to this item: http://dx.doi.org/10.18419/opus-9311
|Authors:||Chellathurai Saroja, Shalini|
|Title:||Measurement of the quality of structured and unstructured data accumulating in the product life cycle in a data quality dashboard|
|Other Titles:||Messen der Qualität von strukturierten und unstrukturierten Daten, die im Produktlebenszyklus anfallen in einem Datenqualitätsdashboard|
|Abstract:||This thesis provides an overview on existing data quality metrics for structured and unstructured data as well as on the existing data quality dashboards for measuring the quality of structured and unstructured data. Open research questions for interpreting the data quality are discussed. The metrics percentage of null values, percentage of duplicate values and percentage of non-domain values were selected and implemented as REST based web services. Furthermore, a web application was developed to enable (1) upload of the data file for which data quality shall be assessed from two standard formats JSON and CSV and (2) flexible integration of various data quality metrics. The latter is enabled by using an interface. To illustrate the functionality of this interface, the metric percentage of spelling mistakes provided by the supervisor of the thesis is integrated with the web application. The data quality is indicated as percentage in the range from 0 to 100 as well as encoded with colors for the whole dataset and for each column. Donut chart or pie chart visualizations are implemented for the chosen data quality metrics. The implemented web application and metrics were evaluated with the example datasets for data accumulating in the product life cycle as provided by the supervisor. Finally, the dashboard is compared with existing data quality dashboards and the results are tabulated.|
|Appears in Collections:||05 Fakultät Informatik, Elektrotechnik und Informationstechnik|
Items in OPUS are protected by copyright, with all rights reserved, unless otherwise indicated.