Deep learning in stream entity resolution

dc.contributor.authorSangolli, Suhas Devendrakeerti
dc.date.accessioned2022-10-24T07:56:49Z
dc.date.available2022-10-24T07:56:49Z
dc.date.issued2022de
dc.description.abstractEntity Resolution (ER) determines which virtual representations of entities map to the same real-world entity. Most current ER-related research in big-data scenarios focuses on volume and variety problems. However, with increased digitization, data is not only generated in bulk but also in a continuous fashion. So, velocity is also an issue that needs to be addressed in the ER domain. Another major issue in the deep learning-based ER is data labelling. It is hard to find pre-labelled data to train the model, and it turns out even more difficult when new data is being streamed continuously. In this thesis, we aim to address all the aforementioned issues by developing a deep learning-based classification function that incorporates continuous streaming entity pairs and classifies them into match or not-match. The end-to-end system has two main layers; one for training and another for prediction. In the training layer, we use a pre-trained language model (DistilBERT) as a base and train it iteratively as newer entity pairs arrive. To train the model, labelled data are obtained through active learning. The prediction layer makes use of the latest trained model to classify the streaming entity pairs into match or non-match. Both training and prediction layers function in parallel and independent of each other. We evaluate the system proposed in this thesis on several benchmark datasets that vary in size, skewness and origin-domain. As a evaluation metrics we use F1 score, losses, time and iterations. Our iterative model performs similar to the non-iterative models by achieving a match class’s f1 score of 0.97 for benchmark datasets.en
dc.identifier.other1819898709
dc.identifier.urihttp://nbn-resolving.de/urn:nbn:de:bsz:93-opus-ds-124741de
dc.identifier.urihttp://elib.uni-stuttgart.de/handle/11682/12474
dc.identifier.urihttp://dx.doi.org/10.18419/opus-12455
dc.language.isoende
dc.rightsinfo:eu-repo/semantics/openAccessde
dc.subject.ddc004de
dc.titleDeep learning in stream entity resolutionen
dc.typemasterThesisde
ubs.fakultaetInformatik, Elektrotechnik und Informationstechnikde
ubs.institutInstitut für Parallele und Verteilte Systemede
ubs.publikation.seiten82de
ubs.publikation.typAbschlussarbeit (Master)de

Files

Original bundle

Now showing 1 - 1 of 1
Thumbnail Image
Name:
DLStreamER.pdf
Size:
5.53 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
3.3 KB
Format:
Item-specific license agreed upon to submission
Description: