Linguistically-informed modeling of potentials for misunderstanding

Abstract

Misunderstandings are prevalent in communication. While there is a large amount of work on misunderstandings in conversations, only little attention has been given to misunderstandings that arise from text. This is because readers and writers typically do not interact with one another. However, texts that potentially evoke different interpretations can be identified by certain linguistic phenomena, especially those related to implicitness or underspecificity. In Computational Linguistics, there is a considerable amount of work conducted on such linguistic phenomena and the computational modeling thereof. However, most of these studies do not examine when these phenomena cause misunderstandings. This is a crucial aspect, because ambiguous language does not always cause misunderstanding. In this thesis, we provide the first steps to develop a computational model that can automatically identify whether an instructional text is likely to cause misunderstandings ("potentials for misunderstanding"). To achieve this goal, we build large corpora with potentials for misunderstanding in instructional texts. We follow previous work and define misunderstandings as the existence of multiple, plausible interpretations. As these interpretations may be similar in meaning to one another, we specifically define misunderstandings as the existence of multiple plausible, but conflicting interpretations. Therefore, we find texts that potentially cause misunderstanding ("potentials for misunderstanding") by looking for passages that have several plausible interpretations that are conflicting to one another. We automatically identify such passages from revision histories of instructional texts, based on the finding that we can find potentials for misunderstanding by looking into older versions of a text, and their clarifications thereof in newer versions. We specifically look for unclarified sentences that contain implicit and underspecified language, and study their clarifications. Through several analyses and crowdsourcing studies, we demonstrate that our corpora provide valuable resources on potentials for misunderstanding, as we find that revised sentences are better than their previous ones. Furthermore, we show that the provided corpora can be used for several computational modeling purposes. The three resulting models can each be combined to identify whether a text potentially causes misunderstanding or not. More specifically, we first develop a model that can detect improvements in a text, even when they are subtle and closely dependent on the context. In an analysis, we verify that the judgements from the model on what makes a better or equally good sentence overlap with the judgements by humans. Secondly, we build a transformer-based language model that automatically resolves potentials for misunderstanding caused by implicit references. We find that modeling discourse context improves the performance of this model. In an analysis, we find that the best model is not only capable of generating the golden resolution, but also capable of generating several plausible resolutions for implicit references in instructional text. We use this finding to build a large dataset with plausible and implausible resolutions of implicit and underspecified elements. We use the resulting dataset for a third computational task, in which we train a model to automatically distinguish between plausible and implausible resolutions for implicit and underspecified elements. We show that this model and the provided dataset can be used to find passages with several, plausible clarifications. Since our definition of misunderstanding focuses on conflicting clarifications, we conduct a final study to conclude the thesis. In particular, we provide and validate a crowdsourcing set-up that allows to find the cases with conflicting, plausible, resolutions. The set-up and findings could be used in future research to directly train a model to identify passages with implicit elements that have conflicting resolutions.

Description

Keywords

Citation

Endorsement

Review

Supplemented By

Referenced By