Methods for Mining Political Opinions from Texts and Large Language Models Von der Fakultät Informatik, Elektrotechnik und Informationstechnik der Universität Stuttgart zur Erlangung der Würde eines Doktors der Philosophie (Dr. phil.) genehmigte Abhandlung. Vorgelegt von Tanise Pagnan Ceron aus Turvo (SC), Brasilien Hauptberichter Prof. Dr. Sebastian Padó Mitberichter Prof. Dr. Katherine A. Keith Tag der mündlichen Prüfung: 14. Oktober 2024 Institut für Maschinelle Sprachverarbeitung der Universität Stuttgart 2025 Erklärung (Statement of Authorship) Hiermit erkläre ich, dass ich die vorliegende Arbeit selb- ständig verfasst habe und dabei keine andere als die angegebene Literatur verwendet habe. Alle Zitate und sinngemäßen Entlehnungen sind als solche unter genauer Angabe der Quelle gekennzeichnet. I hereby declare that this text is the result of my own work and that I have not used sources without declaration in the text. Any thoughts from others or literal quotations are clearly marked. (Tanise Pagnan Ceron) iii Zusammenfassung In demokratischen Gesellschaften ermöglicht die Meinungs- vielfalt, dass Individuen ihre Meinung ausdrücken und sich mit unterschiedlichen Perspektiven auseinanderzusetzen kön- nen. Diese Arbeit untersucht politische Meinungen aus zwei Perspektiven: Texte und Modelle. Hierbei werden sowohl ideologische Positionen als auch Präferenzen für politische Themen untersucht. Während die ideologische Analyse gut etabliert ist, stellen die Präferenzen für poli- tische Themen ein nuancierteres, wenig erforschtes For- schungsgebiet dar. Die Untersuchung Meinungen von politischen Parteien ist unerlässlich, um die Wahlentscheidungen der Wähler: innen, die politische Entscheidungsfindung und die Ver- schiebungen in den Parteiprogrammen im Laufe der Zeit zu verstehen. Im ersten Teil dieser Arbeit konzentriere ich mich auf Methoden zur Erkennung politischer Mein- ungen aus Parteiprogrammen. Die Automatisierung der Identifizierung politischer Meinungen hilft bei der Ver- arbeitung großer Datensätze, minimiert die Annotation- szeit und bietet zeitnahe Aktualisierungen zu neu veröf- fentlichten Informationen von Parteien. Ich untersuche, wie genau Par-teipositionen aus Texten mit minimalen v Annotationen identifiziert werden können und wie detail- liert dieser Prozess ist. Wir untersuchen auch, inwieweit Parteipositionen in großem Umfang über verschiedene Spra- chen und Länder hinweg identifiziert werden können. Die Ergebnisse zeigen, dass bei der Identifizierung von Parteipo- sitionen zwischen den Aufgaben der politischen Skalierung und Positionierung unterschieden werden kann, die erhe- bliche Unterschiede in Bezug auf Bewertung und Anwen- dung aufweisen. Darüber hinaus deuten die Ergebnisse darauf hin, dass die Verbesserung der Textrepräsentatio- nen durch domäneninternes Fine-tuning die Leistung erhe- blich verbessert, wenn Methoden von der Textähnlichkeit abhängen. Außerdem wird durch die sprachübergreifende Skalierung von Parteien mit mehrsprachigen Modellen eine hohe Leistung erzielt. Sprachmodelle sind mit dem Aufkommen von LLMs zu meinem Forschungsgegenstand geworden und werfen neue Fragen hinsichtlich der in ihnen eingebetteten und repro- duzierten Vorurteile auf. Angesichts der Wichtigkeit, poli- tische Vorurteile in LLMs zu beleuchten, befasst sich der zweite Teil dieser Arbeit mit der Bewertung und Iden- tifizierung politischer Vorurteile in LLMs. Unsere For- schungsfragen konzentrieren sich auf die robuste Bewer- tung von LLMs auf Vorurteile und die Identifizierung der politischen Vorurteile in Bezug auf Ideologie und Präferen- zen für politische Themen. Diese Arbeit enthält Defini- tionen von politischer Voreingenommenheit und politis- cher Weltanschauung, die bei der Entwicklung von Meth- oden zu deren Bewertung helfen. Darüber hinaus trägt vi sie dazu bei, einen Rahmen für eine robuste Bewertung von Voreingenommenheiten in LLMs und einem Daten- satz zur Bewertung politischer Meinungen in LLMs zu entwickeln. Schließlich zeigen die Ergebnisse, dass Mod- elle mit kleinen Parametern keine zuverlässigen Antworten liefern und dass LLMs in Bezug auf einige politische The- men konsistente politische Weltanschauungen vertreten. Insgesamt unterstreichen sie die Notwendigkeit weiterer For-schung, um die Komplexität und die gesellschaftlichen Auswirkungen der Entwicklung von Modellen zu verste- hen, die unterschiedliche politische Meinungen in KI-Sys- teme integrieren. vii Abstract In democratic societies, the diversity of opinions enables individuals to express their values and engage with dif- fering perspectives. This thesis investigates political opin- ions through two lenses: texts and models, examining both ideological positions and policy issue preferences. While ideological analysis is well-established, policy issue prefer- ences represent a more nuanced, underexplored research area. Investigating political opinions from political parties is essential for understanding voter choices, policy decision- making, and the shifts in party agendas over time. In the first part of this thesis, I focus on methods for mining polit- ical opinions from party manifestos. Automating the iden- tification of political opinions helps process large datasets, minimize annotation time, and offer timely updates on newly released information from parties. I investigate how accurately party positions can be identified from texts with minimal annotations and the level of detail achievable in this process. We also explore the extent to which party positions can be identified on a large scale across different languages and countries. Results demonstrate that the identification of party positions can be distinguished be- ix tween the tasks of political scaling and positioning which have substantial differences in terms of evaluation and ap- plication. Additionally, findings indicate that improving text representations through in-domain fine-tuning signif- icantly benefits the performance when methods depend on text similarity. And finally, party scaling across languages achieves high performance with multilingual models. Models have become my object of study with the advent of LLMs. They introduce new concerns regarding the type of biases embedded and reproduced by them. Given the importance of shedding light on political biases in LLMs, the second part of this thesis addresses the evaluation and identification of political biases in LLMs. Our research questions center on robustly evaluating LLMs for biases and identifying the political biases regarding ideology and policy issue preferences. This thesis provides definitions of political bias and political worldview, which aid in design- ing methods for their evaluation. Moreover, it contributes with a framework for a robust evaluation of biases in LLMs and a dataset for evaluating political opinions in LLMs. Finally, findings indicate that small parameter size models are not reliable in their answers, and that LLMs do hold consistent political worldviews in relation to some policy issues. Overall, they highlight the necessity for continued research to understand the complexities and societal impli- cations of developing models integrating diverse political opinions into AI systems. x Acknowledgements Writing and submitting this thesis marks the conclusion of a very important chapter in my life, one that has had a profound impact both on my professional career and my personal life. The learnings I have had and the fruits I have harvested would not have been possible without the support of many people around me. Although I believe that words won’t be enough to express my gratitude for their support and inspiration, I still want to express my thanks with a few words. First and foremost, I would like to thank my supervi- sor, Sebastian Padó. His guidance was always given in the precise amount of what I needed throughout the en- tire PhD. He offered more close supervision when I needed the most and was more hands-off at the right time when I was exploring new paths of research and career. I’m also really thankful for his support in my exploration for new opportunities that were not related to my PhD. This en- abled me to grow as a researcher, broadening my research horizons and considering the impact of my work beyond academia. I am extremely grateful for his dedication to constantly provide me with valuable feedback and advice xi – which pushed me beyond my own limits. Thanks, Se- bastian, for believing in me more than I did myself. I would like to thank Neele for all the support during my PhD. Her close friendship was another gift from this PhD. Thanks for the walks in the forest in moments of high stress, the dancing before deadlines, the dinners, the games, the shoulder that you have offered for me to com- plain or cry on, and all the words of advice. I extend my gratitude to Amelie, who embarked with me in this crazy startup journey, and has encouraged me to dream high. Thank you for your endless support and encourage- ment to all our endeavors, without them this idea would never have left the paper. Next, I would particularly like to thank Severin and Ale, my dear friends, who have been my big brothers in a foreign country. They were an impor- tant part of the reason I could call Stuttgart home after just a few months of living here. They have always offered a hand to help and have been supportive of impossible ideas. When I thought I was just sharing a silly idea, they would be more like: “It’s a fantastic idea! Yes, you can do it, Tanise. When and how are you starting?” Next, I would like to express my gratitude to IMS and its people. It has been an amazing experience to pursue my PhD in this environment. I am grateful for all the friends I have made in the department, the parties cel- ebrated together, the barbecues, the laughter at lunch, the games nights, the KWT nights, pastéis de nata and canelés, the moments in which we shared our struggles and joy, and the moments of distraction that made my past xii three years lighter and more enjoyable. Thanks, Chris, for reading my thesis and correcting my English. And of course, thanks Sabine Mohr for making my bureaucratic life so much smoother at the department. I would also like to extend my gratitude to Dmitry, my close collaborator, who has offered numerous insights dur- ing the research process of our collaborations. Addition- ally, I appreciate his efforts in reading and commenting on my thesis. Also, thanks for the thought-provoking discus- sions about world problems. I would also like to extend my gratitude to Professor Katie Keith for kindly agreeing to be part of my commit- tee and traveling all the way from the United States to Stuttgart for the defense. I truly appreciate it. Lastly, I extend my heartfelt thanks to my parents who, despite the distance, have always been there for me. And who, despite coming from a simpler background, have made sure to show me the importance of education and have encouraged me to keep pursuing this path. xiii Table of Contents I. Synopsis 1 1. Introduction 11 1.1. Opinions in the political arena . . . . . . . 16 1.2. Political opinions in large language models 19 1.3. Thesis Outline . . . . . . . . . . . . . . . . 22 2. Political Opinions in Texts 25 2.1. Political opinions at low dimensionality . . 25 2.2. Fine-grained scaling at low dimensionality 28 2.3. Modeling political opinions . . . . . . . . . 31 2.3.1. Motivation . . . . . . . . . . . . . . 31 2.3.2. Computational approaches for min- ing political opinions . . . . . . . . 34 2.4. Tasks: Political scaling vs political posi- tioning . . . . . . . . . . . . . . . . . . . . 37 2.4.1. Scaling . . . . . . . . . . . . . . . . 37 2.4.2. Scaling at a policy issue level . . . 39 2.4.3. Political positioning . . . . . . . . . 41 2.4.4. (Dis)Advantages of scaling and po- sitioning . . . . . . . . . . . . . . . 44 xv Table of Contents 2.5. Annotation and text representation for min- ing political opinions . . . . . . . . . . . . 50 2.5.1. Annotated Data . . . . . . . . . . . 50 2.5.2. More informed text representations 53 2.6. Overview of contributions and publications 57 2.6.1. Contributions . . . . . . . . . . . . 57 2.6.2. Unsupervised methods for party po- sitioning . . . . . . . . . . . . . . . 59 2.6.3. Unsupervised methods for party po- sitioning at a policy issue level . . . 64 2.6.4. Supervised methods for political scal- ing across countries and time . . . 73 3. Political Opinions in Large Language Models 81 3.1. Language models . . . . . . . . . . . . . . 83 3.2. Biases in language models . . . . . . . . . 84 3.3. Evaluation of biases in LLMs . . . . . . . 86 3.4. Political biases in LLMs . . . . . . . . . . 90 3.4.1. Political bias vs political worldviews 91 3.4.2. Evaluation of political biases in LLMs 95 3.5. Overview of contributions and publication 99 3.5.1. Contributions . . . . . . . . . . . . 99 3.5.2. Evaluating political worldviews in large language models . . . . . . . . . . 101 xvi Table of Contents II. Publications 109 4. Unsupervised Methods for Party Positioning 111 4.1. Introduction . . . . . . . . . . . . . . . . . 112 4.2. Related Work . . . . . . . . . . . . . . . . 113 4.2.1. Party Characterization . . . . . . . 113 4.2.2. Optimizing Text Representations for Similarity . . . . . . . . . . . . . . 113 4.3. Data . . . . . . . . . . . . . . . . . . . . . 114 4.3.1. The Manifesto Dataset . . . . . . . 114 4.3.2. Ground Truth: Wahl-o-Mat . . . . 114 4.4. Methods . . . . . . . . . . . . . . . . . . . 115 4.4.1. Building Informative Text Represen- tations . . . . . . . . . . . . . . . . 115 4.4.2. Four Models for Party Similarities . 116 4.5. Experimental Setup . . . . . . . . . . . . . 117 4.5.1. Datasets . . . . . . . . . . . . . . . 117 4.5.2. Models . . . . . . . . . . . . . . . . 118 4.5.3. Evaluation . . . . . . . . . . . . . . 118 4.6. Results and Discussion . . . . . . . . . . . 119 4.7. Conclusion . . . . . . . . . . . . . . . . . . 120 4.8. Appendix . . . . . . . . . . . . . . . . . . 123 5. Unsupervised Methods for Party Positioning at a Policy Issue Level 127 5.1. Introduction . . . . . . . . . . . . . . . . . 128 5.2. Related Work . . . . . . . . . . . . . . . . 129 5.3. Methodology . . . . . . . . . . . . . . . . 130 5.3.1. Workflow . . . . . . . . . . . . . . 130 xvii Table of Contents 5.3.2. Policy Domain Grouping . . . . . . 130 5.3.3. Policy Domain Prediction . . . . . 131 5.3.4. Computing Party (Dis)similarities . 131 5.3.5. Multidimensional Scaling . . . . . . 131 5.4. Experimental Setup . . . . . . . . . . . . . 132 5.4.1. Data . . . . . . . . . . . . . . . . . 132 5.4.2. Policy Domain Grouping . . . . . . 132 5.4.3. Policy Domain Labelling . . . . . . 132 5.4.4. Party (dis)similarity – sentence en- coders . . . . . . . . . . . . . . . . 133 5.4.5. Evaluation . . . . . . . . . . . . . . 133 5.5. Results and Discussion . . . . . . . . . . . 133 5.5.1. Annotated Setup . . . . . . . . . . 133 5.5.2. Predicted Setup . . . . . . . . . . . 135 5.6. Conclusion . . . . . . . . . . . . . . . . . . 136 5.7. Limitations . . . . . . . . . . . . . . . . . 136 5.8. Appendix . . . . . . . . . . . . . . . . . . 139 6. Supervised Methods for Political Scaling Across Countries and Time 145 6.1. Introduction . . . . . . . . . . . . . . . . . 146 6.2. MARPOR categories and political scales . 147 6.3. Methods . . . . . . . . . . . . . . . . . . . 148 6.3.1. Operationalization . . . . . . . . . 148 6.3.2. Problem settings . . . . . . . . . . 148 6.3.3. Dataset . . . . . . . . . . . . . . . 149 6.3.4. Models . . . . . . . . . . . . . . . . 149 6.3.5. From regression to classification with LITs . . . . . . . . . . . . . . . . . 150 xviii Table of Contents 6.3.6. Evaluation metrics . . . . . . . . . 150 6.4. Results . . . . . . . . . . . . . . . . . . . . 150 6.4.1. Predicting MARPOR categories . . 150 6.4.2. Computing RILE scores . . . . . . 151 6.4.3. Error analysis . . . . . . . . . . . . 151 6.5. Discussion . . . . . . . . . . . . . . . . . . 153 6.6. Related Work . . . . . . . . . . . . . . . . 153 6.7. Conclusion . . . . . . . . . . . . . . . . . . 153 6.8. Limitations . . . . . . . . . . . . . . . . . 154 6.9. Appendix . . . . . . . . . . . . . . . . . . 156 7. Evaluating Political Worldviews in Large Lan- guage Models 161 7.1. Introduction . . . . . . . . . . . . . . . . . 162 7.2. Related Work . . . . . . . . . . . . . . . . 163 7.3. Reliability-Aware Bias Analysis . . . . . . 164 7.4. The ProbVAA Dataset . . . . . . . . . . . 165 7.4.1. Sources . . . . . . . . . . . . . . . 165 7.4.2. Policy-Domain Annotation . . . . . 165 7.4.3. Robustness to Statement Variations 166 7.5. Experimental Setup . . . . . . . . . . . . . 167 7.5.1. Models . . . . . . . . . . . . . . . . 167 7.5.2. Prompt Design . . . . . . . . . . . 167 7.5.3. Mapping Responses onto Stances . 167 7.5.4. Sampling-based Reliability Testing 168 7.6. Reliability of Model Answers . . . . . . . . 168 7.6.1. Experimental Setup . . . . . . . . . 168 7.6.2. Results . . . . . . . . . . . . . . . . 168 xix Table of Contents 7.7. Political Consistency of Model Answers . . 169 7.7.1. Experimental Setup . . . . . . . . . 169 7.7.2. Results . . . . . . . . . . . . . . . . 170 7.8. Discussion . . . . . . . . . . . . . . . . . . 171 7.9. Conclusion . . . . . . . . . . . . . . . . . . 173 7.10. Limitations . . . . . . . . . . . . . . . . . 173 7.11. Appendix . . . . . . . . . . . . . . . . . . 178 III. Epilogue 185 8. Conclusion and Future Directions 187 8.1. Key findings and reflections . . . . . . . . 187 8.2. Limitations . . . . . . . . . . . . . . . . . 195 8.3. Outlook . . . . . . . . . . . . . . . . . . . 197 Bibliography 205 xx Part I. Synopsis 1 Publications and My Contributions This thesis is based on four scientific publications that I co-authored together with my advisor Sebastian Pado and many excellent researchers and PhD fellows: Dmitry Niko- laev (University of Manchester), Neele Falk (University of Stuttgart), Ana Baric (University of Zagreb), Gabriella Lapesa (GESIS and University of Düsseldorf), Nico Blokker and Sebastian Haunss (University of Bremen), and all my colleagues from the MARDY project. I am grateful to all my co-authors for their substantial contributions to these pleasant and fruitful collaborations. Moreover, I thank all the other people who were not co-authors, but who gave valuable feedback on my work. In the following, I detail my own contributions to each publication according to CRediT, the Contributor Roles Taxonomy. 3 Chapter 4 corresponds to the following publication: Tanise Ceron, Nico Blokker, and Sebastian Padó. 2022. Optimizing text representations to capture (dis)similarity between political par- ties. In Proceedings of the 26th Conference on Computational Natural Language Learning (CoNLL), pages 325–338, Abu Dhabi, United Arab Emirates (Hybrid). Association for Com- putational Linguistics. In this paper, I contributed with the conceptualization of the study by developing the original research concepts and executed all experiments and assessments. Sebastian and I developed the methodology for calculating the dis- tance between parties based on their manifestos. I worked on the data collection from the original source. I imple- mented both the similarity computation and trained the classifier, followed by the evaluation at all stages. My co-author Nico Blokker created the dataset implemented for the evaluation of the test set of the claim classifier. I authored the initial draft of the paper and carried out the majority of the revisions with the support of Sebas- tian. Throughout the process, I consulted with Niko who 4 provided valuable guidance in relation to the political sci- ence aspects of the paper. I was responsible for creat- ing the visualization to illustrate the method and for the project administration throughout the entire study, from conceptualization to reviewers’ response. This amounts to roughly 65% of the total work. Chapter 5 corresponds to the following publication: Tanise Ceron, Dmitry Nikolaev, and Sebas- tian Padó. 2023. Additive manifesto decom- position: A policy domain aware method for understanding party positioning. In Findings of the Association for Computational Linguis- tics: ACL 2023, pages 7874–7890, Toronto, Canada. Association for Computational Lin- guistics. I contributed to the conceptualization of this study by developing the overall research questions. I developed most of the methods used for answering the research ques- tions, while my advisor Sebastian and co-author Dmitry assisted me in shaping them during the computational ex- periments. I executed most of the computational experi- ments and the analysis. Dmitry ran the models for classi- 5 fying policy issues while I was responsible for their evalua- tion. Moreover, I created the visualization for the methods implemented in the study. I wrote the first version of the paper and handled most of the subsequent revisions, and the responses to the reviewers. My contribution to this paper amounts to approximately 60% of the total work. Chapter 6 corresponds to the following publication: Dmitry Nikolaev, Tanise Ceron, and Sebas- tian Padó. 2023. Multilingual estimation of political-party positioning: From label aggre- gation to long-input Transformers. In Pro- ceedings of the 2023 Conference on Empiri- cal Methods in Natural Language Processing, pages 9497–9511, Singapore. Association for Computational Linguistics. In this paper, I contributed with the conceptualization by formulating the research questions addressed during the study. I was very familiar with the data given my experi- ence in the task of political positioning, so I contributed with data curation in accessing and collecting the anno- tated data and the ground truth used for the evaluation. I contributed to the methodology by developing the de- sign for modeling the task of political scaling considering 6 different real world use cases for a robust evaluation of the methods. I also assisted in developing the evaluation metric, given that it was not straightforward precision or accuracy as in other supervised learning models. Dmitry trained and evaluated the models. He also carried out the error analysis. Finally, I contributed with writing the first version of the text, as well as in revising and editing the second and final version of the paper. I contributed approximately 40% of the total work for this paper. Chapter 7 corresponds to the following publication: Tanise Ceron, Neele Falk, Ana Barić, Dmitry Nikolaev and Sebastian Padó. 2024. Beyond prompt brittleness: Evaluating the reliability and consistency of political worldviews in LLMs. Accepted for publication at Transactions of the Association for Computational Linguistics (TACL). https://arxiv.org/html/2402.17649v2 Having developed certain knowledge for the political sci- ence field throughout my previous studies, I proposed the initial research questions to my co-authors. In this collab- oration, we worked closely together in the development of the framework for reliability biases analysis. Ana, Neele 7 https://arxiv.org/html/2402.17649v2 and I partially shared the data curation of the study by compiling the dataset together. I was responsible for con- ducting the annotations pertaining to the policy issues and the collection of human upper bound annotations via a survey. I participated in the prompting selection and eval- uation with Neele, Ana and Dmitry. Ana ran the models for generation given the prompts. I conducted all the anal- ysis regarding policy issues and political leaning. Neele, on the other hand, analyzed the reliability of the mod- els according to our framework. Sebastian contributed in writing and in shaping the storyline. Finally, I contributed a substantial amount in the writing of the original draft and the revision of the paper. I led the project administra- tion, which included the organization of meetings, keeping track of dates, submission process, and responses to the reviewers. Overall, my contributions amount to 40% of the work effort invested in this paper. Throughout my doctoral studies, I had the privilege of collaborating with excellent researchers on related and non-related topics to the following document. These col- laborations could not fit in this thesis, but to ensure thor- oughness, I include references to these papers below. 8 Tanise Ceron, Ana Barić, Andre Blessing, Sebastian Haunss, Jonas Kuhn, Gabriella Lapesa, Sebastian Padó, Sean Papay, and Patricia F. Zauchner. 2024, June. Automatic Analysis of Political Debates and Manifestos: Successes and Challenges. In Conference on Advances in Robust Argumentation Machines (pp. 71-88). Cham: Springer Nature Switzerland. Nico Blokker, Tanise Ceron, Andre Bless- ing, Erenay Dayanik, Sebastian Haunss, Jonas Kuhn, Gabriella Lapesa and Sebastian Padó. 2022. Why justifications of claims matter for understanding party positions. In Proceedings of the 2nd workshop on computational linguis- tics for political text analysis - KONVENS, Pots- dam, Germany. Maximilian Maurer, Tanise Ceron, Sebastian Padó, and Gabriella Lapesa. 2024. Toeing the Party Line: Election Manifestos as a Key to Understand Political Discourse on Twitter. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 6115–6130, 9 Miami, Florida, USA. Association for Compu- tational Linguistics. Tanise Ceron, Nhut Truong, and Aurelie Her- belot. 2022. Algorithmic Diversity and Tiny Models: Comparing Binary Networks and the Fruit Fly Algorithm on Document Representa- tion Tasks. In Proceedings of The Third Work- shop on Simple and Efficient Natural Language Processing (SustaiNLP), pages 17–28, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics. 10 1. Introduction “For your own sanity, you have to remember that not all problems can be solved. Not all problems can be solved, but all problems can be illuminated.” Ursula Franklin, in The Real World of Technology, 1999 People are shaped by numerous factors, including their personal experiences, cultural backgrounds, education, so- cial environments, access to information, and personality traits. These diverse influences lead to people holding var- ious viewpoints and interpretations of the world. An envi- ronment where this diversity of opinions can thrive is not only crucial to accommodating a heterogeneous society, 11 CHAPTER 1. INTRODUCTION but it is also essential to ensure the effective functioning of democracy (Balkin, 1995, 2017). This type of environ- ment allows individuals to articulate their values, engage with different perspectives, and learn from and share their own views with others. Political opinions play a particularly significant role among the many types of opinions shaped by these factors. Given their impact on governance and societal norms, political opinions permeate multiple levels of our lives, including in- terpersonal social interactions, professional settings, and the political arena itself. In this thesis, I categorize political opinions in three levels. At the most fine-grained level, they are stances (i.e. positions) taken by individuals regarding policies. As the example in Figure 1.1 shows, two citizens disagree on whether citizenship should be entitled by birth or through long-term residence or parental boundaries. The second is the level of policy issues. It encompasses a set of policies related to a broader set of beliefs. Figure 1.1 illustrates the example of migration (Wlezien, 2005; Green-Pedersen and Krogstrup, 2008). This level requires some internal con- sistency, given that people, for example, would generally agree with policies that are more in favor of open bor- 12 Figure 1.1.: Political opinions categorized into three levels of granularity. The colors represent the level. ders or more in favor of restrictive migration policies. The third and broadest level is at the ideological level. This level refers to preferences towards sets of policy issues that belong to the ideology under analysis. One example is the left–right scale, which encompasses policy issues, namely migration, economy, and government expenditure. The overall political opinions of individuals or politicians can be captured by placing them on this type of ideological scale. In this thesis, I work with political opinions mainly at the ideological and policy issue level both in the politi- cal arena and potential political opinions reproduced by LLMs. Figure 1.2 illustrates political opinions in these two contexts. The colors represent the spectrum of exist- ing political opinions. Each political party on the left side 13 CHAPTER 1. INTRODUCTION endorses some existing political opinions. They vary con- siderably because political parties act as citizens’ repre- sentatives, and in a multiparty system, they often mirror the diverse range of opinions held by the community at large. I explore methods for mining the political opinions endorsed by political entities in texts. I discuss more de- tails in Section (§ 1.1). While in the first part of this thesis, the object of study is texts, as Figure 1.2 illustrates, the object of study becomes large language models (LLMs) in the second part. The right side of the figure shows the po- litical opinions that large language models tend to repro- duce. Considering that AI systems are integrated into the daily lives of numerous citizens, there is growing relevance to analyze the types of biases embedded in these models. This understanding allows us to make well-informed deci- sions on designing and implementing applications for final users. Current models, for example, are one-size-fits-all models that could incorporate a limited number of opin- ions, as shown by the colors in Figure 1.2. Therefore, this thesis develops methods for extracting political opinions from LLMs. These methods help us gauge the diversity of political opinions embedded in LLMs. I discuss this aspect of LLMs in Section 1.2. 14 Figure 1.2.: This diagram represents the political opinions of political parties and the ones manifested or reproduced by in large language models (LLMs). The colors of the flags represent the distinct political opinions that the par- ties hold. The colors in the squares represent the spectrum of existing opinions. LLMs may run the risk of reproducing a limited num- ber of opinions, as shown by the colors of the squares. 15 CHAPTER 1. INTRODUCTION 1.1. Opinions in the political arena The political arena represents an environment where dif- ferent political opinions are given space to flourish and compete with one another. Parties are formed by individ- uals who share similar political opinions. They then com- pete for the electorate’s attention to gain support from people who potentially share similar ideas. They artic- ulate their opinions through various genres and modali- ties such as parliamentary speeches, public speeches, as- semblies, forums, roll call votes, manifestos, social me- dia posts, and media coverage. Having a space where parties express their preferences and ideologies and com- pete for the electorate’s attention is essential for democ- racies to thrive. Given its importance, this process has consistently attracted scholarly attention in political sci- ence, and the area has become known as party competition (Stokes, 1963). Understanding the dynamics of party com- petition is relevant because the results of these dynamics affect policy decisions, political engagement levels, and the quality of political representation (Baumann et al., 2021). At the intersection of party competition and political opinions lies the line of research that investigates the po- 16 1.1. OPINIONS IN THE POLITICAL ARENA sitioning of political actors. This research focus is relevant for understanding the factors influencing voter choices in elections, the decision-making behaviors of political par- ties once they become representatives in certain political roles, and matches and mismatches between the former and the latter (Benoit and Laver, 2006). Moreover, it is important to keep track of the extent to which parties change their agendas (and strategies) across time (König et al., 2013). Lastly, it can be employed to observe ideo- logical shifts (McDonald et al., 2007) or to identify the po- litical issues that political parties are most strongly cam- paigning across different countries (Seeberg, 2017). One way of analyzing the positioning of political ac- tors is by extracting and characterizing their opinions via their ideologies and policy issue preferences. In this thesis, I investigate approaches for automatically mining politi- cal opinions from political texts applied in the context of party positions. The focus is not on identifying the individual policies and the stances of the parties towards single policies (the most fine-grained level shown in Figure 1.1). I also do not focus on detecting and categorizing the argumentation of parties. My primary goal is to develop methods that extract party positions as an aggregation of 17 CHAPTER 1. INTRODUCTION stances. In other words, the results of these tasks provide insights into how close parties or political entities are to one another in relation to policy issues and ideologies. I develop and evaluate methods for extracting parties’ opinions from manifestos – electoral programs released by the parties themselves at the beginning of their election campaigns. This task has traditionally been called politi- cal positioning or political scaling in the political science and NLP literature (Laver et al., 2003; Benoit and Laver, 2006; Slapin and Proksch, 2008; Glavaš et al., 2017). How- ever, prior research has not explored the potential of text representations fine-tuned for specific domains. Addition- ally, it has not focused on capturing the scaling of parties in texts that specifically contain information about those ideological scales. Lastly, no study has prior to this thesis aimed at identifying party positions end-to-end at a more detailed level, such as within specific policy issues. The primary contributions of this thesis address the pre- viously mentioned research gaps. They lie in the develop- ment and evaluation of supervised and unsupervised meth- ods for the tasks of political positioning and political scal- ing. I develop new methods to build more powerful text representations for the political domain that enhance the 18 1.2. POLITICAL OPINIONS IN LARGE LANGUAGE MODELS performance of our tasks. I design an end-to-end pipeline to capture the scaling of parties at the level of policy is- sues. Finally, I propose methods to capture the scaling of parties in settings across several countries and languages. More detailed information on the tasks, methods, findings, and discussion are found in Chapter 2. 1.2. Political opinions in large language models The advances in the technology underlying large language models (LLMs) have made it possible for many people to interact with systems powered by these models. These ap- plications have become easily accessible and widely used, given their benefits in productivity and creativity, becom- ing pervasive in the private and work lives of users (Wolf and Maier, 2024). This user-friendliness is achieved thanks to LLMs’ ability to produce text based on a free natural language prompt, resulting in “universal” models that are task-agnostic. It enables users to easily interact with ap- plications by giving “human-like” written instructions to perform several tasks such as text generation, summariza- 19 CHAPTER 1. INTRODUCTION tion, classification, and question-answering – all in one system. This growing interaction raises concerns regarding the manifestation of harmful biases embedded in them – which has drawn the attention of academic research and the pub- lic sphere 1. In the context of political opinions, I argue that harmful biases take place when the output of models reinforces a limited number of viewpoints which pertain to only few groups in society. Aligned with the earlier discus- sion on fostering a democratic culture by creating a space for diverse opinions and beliefs to coexist (Balkin, 2017), the widespread presence of these systems underscores the importance of understanding the political opinions they reflect. These opinions manifest as biases encoded in the models, which may (or may not) influence the results of the aforementioned downstream tasks. Therefore, I ar- gue that the first step is determining the types of political biases that are encoded in LLMs. In this thesis, I draw from the accumulated knowledge on building and evaluating methods for the tasks of po- 1Cf. https://www.washingtonpost.com/technology/2023/ 08/16/chatgpt-ai-political-bias-research/ and https://www.forbes.com/sites/emmawoollacott/2023/ 08/17/chatgpt-has-liberal-bias-say-researchers/ 20 https://www.washingtonpost.com/technology/2023/08/16/chatgpt-ai-political-bias-research/ https://www.washingtonpost.com/technology/2023/08/16/chatgpt-ai-political-bias-research/ https://www.forbes.com/sites/emmawoollacott/2023/08/17/chatgpt-has-liberal-bias-say-researchers/ https://www.forbes.com/sites/emmawoollacott/2023/08/17/chatgpt-has-liberal-bias-say-researchers/ 1.2. POLITICAL OPINIONS IN LARGE LANGUAGE MODELS litical positioning and scaling. The main research ques- tions guiding our investigation in the second part of this thesis include how to robustly evaluate LLMs for biases and what political biases these models encode. Specif- ically, I investigate the extent to which the answers of chat-instructed LLMs are reliable when prompts are refor- mulated to control for prompt brittleness. Finally, I also evaluate whether LLMs reproduce consistent preferences towards left–right orientation and specific policy issues. The latter aspect which is a more fine-grained analysis of political biases has not been previously investigated. Among the main contributions, I formulate definitions for political opinions at three granularity levels: policy preference, policy issue preference, and ideological posi- tioning (described in depth in Chapter 3). It facilitates the design of methods for identifying political biases in LLMs. Next, we propose a framework for evaluating the reliability of the answers generated by LLMs. This frame- work can be implemented to evaluate other types of biases by taking prompt brittleness into account. In the study from this part of the thesis, we compile and annotate a dataset, ProbVAA, which is valuable for investigating po- litical opinions in LLMs at the refined level of policy is- 21 CHAPTER 1. INTRODUCTION sues. Finally, we analyze models for the type of political biases encoded in these models, both in terms of left–right scaling and in regard to specific policy issues. 1.3. Thesis Outline The manuscript is structured as described below. Following this Introduction Chapter, Chapter 2 delves deeper into mining political opinions from texts. I define the tasks of political positioning and political scaling more thoroughly. I describe previous work conducted by the po- litical science and the natural language processing (NLP) communities. I address the research gaps and the data used throughout the experiments. Then, I discuss the ad- vantages and disadvantages of positioning and scaling for analyzing political parties, drawn from the findings of our experiments. Finally, I conclude with the contributions made by this thesis in terms of methods and analysis for the task of political positioning and scaling. Chapter 3 focuses on mining and evaluating political opinions in LLMs. I discuss the related work concerning general biases in pre-trained language models and how the need to evaluate political biases has become more predom- 22 1.3. THESIS OUTLINE inant. I discuss why LLMs lack reliability in their answers, what problems this causes for our evaluation, and the need to build a more robust bias evaluation. Then, I focus on defining political biases, previous studies in this area, and the research gaps. Finally, I highlight our contributions in relation to methods for bias evaluation and the analysis of political biases embedded in these models. Chapters 4, 5, 6 and 7 present the studies in the form of publications that contributed to this thesis. Finally, Chapter 8 summarizes the answers to the re- search questions mentioned in Sections 1.1 and 1.2. Next, I consider the future of research in positioning and scal- ing, outlining the next steps for enhancing models and increasing their interpretability. Additionally, I highlight the necessity of further advancing research in evaluating LLMs for biases. Finally, I explore the societal implica- tions that should be considered when implementing sys- tems for downstream tasks and suggest how the NLP com- munity can contribute to addressing these issues. 23 2. Political Opinions in Texts This chapter details the research program outlined in Sec- tion 1.1. I first describe the tasks from the political science perspective and then dive into them through the lens of computational social science (CSS). 2.1. Political opinions at low dimensionality As outlined in Section 1.1, diverging positioning of po- litical parties creates an environment where a variety of political views can come together, fostering party compe- tition. This environment provides a basis for individuals to choose the party that aligns most closely with their 25 CHAPTER 2. POLITICAL OPINIONS IN TEXTS Figure 2.1.: Example of low-dimensional scaling based on (Benoit and Laver, 2006). views. This endorsement of different opinions has given space to the study of the positioning of political actors, which is crucial for a number of reasons. Firstly, it en- ables the understanding of parties within the context of party competition – how parties relate to one another and what relevant topics of discussion are for them. In ad- dition to that, studying this phenomenon is crucial for comprehending the motivations behind voters’ choices in elections, the decisions of political parties when they are in power (Benoit and Laver, 2006), and the strategies of par- ties to gain terrain during their campaign (Meguid, 2005; Green and Hobolt, 2008). 26 2.1. POLITICAL OPINIONS AT LOW DIMENSIONALITY One of the approaches used for investigating party po- sitions reduces the information regarding a given actor (politician or political party) into a low-dimensional scale that commonly represents ideologies. This is illustrated in the example in Figure 2.1 following Benoit and Laver, 2006, p. 46. In the example, the three parties (social democrat, conservative, and liberal) are placed onto a two- dimensional space representing their position within the left–right and libertarian–conservative ideologies. Besides the scales of the example, a wide variety of scales have been proposed and long debated in the literature (Laver et al., 2003; Slapin and Proksch, 2008; Diermeier et al., 2012; Lauderdale and Clark, 2014; Barberá, 2015). Some scales are based on deductive approaches rooted in polit- ical theory and philosophy (Jahn, 2011), while others are more inductive data-driven approaches (Gabel and Hu- ber, 2000; Albright, 2010; Rheault and Cochrane, 2020). Whereas some researchers have for years focused on the left–right scale (Volkens et al., 2021), others argue that, in order to understand the political spectrum in a country more thoroughly, it is necessary to look into several ideo- logical scales and have a multidimensional analysis of the parties (Bakker and Hobolt, 2013; Rovný, 2012b). 27 CHAPTER 2. POLITICAL OPINIONS IN TEXTS Placing parties on a scale helps political scientists un- derstand the political landscape more easily because of its low dimensionality (Heywood, 2021). The scaling of par- ties offers a fundamental framework for analyzing party competition and for establishing the connection between citizens and political parties more easily (Huber and In- glehart, 1995). Moreover, it allows researchers to monitor parties under the same set of policies and understand how their positioning changes across years – e.g. whether par- ties are moving more to the left or right. 2.2. Fine-grained scaling at low dimensionality Ideological scales are one way of analyzing parties. An- other focus explores the fine-grained differences and sim- ilarities between parties, which are usually policy issue- specific. This type of analysis is important to understand which issues explain the value retrieved from the scaling of parties. Figure 2.2 illustrates an example of the task of policy issue scaling in Figure 2.2. In this case, expert an- notators from the Chapel Hill Survey manually place the 28 2.2. FINE-GRAINED SCALING AT LOW DIMENSIONALITY Figure 2.2.: Example of policy issue positions taken from the Chapel Hill Expert Survey (CHES) based on the German context in 2019. main German parties onto a scale regarding their position- ing in the issues of “spend vs tax” (i.e., about the expen- diture and collection of tax money) and “immigration pol- icy”1. Rovný (2012a) conducted a multidimensional analy- sis within different issues with information extracted from several surveys. The study discusses how radical right po- litical parties strategically differentiate their views on sec- ondary issues (unrelated to the economic domain) to boost support among a wider range of voters. Green-Pedersen (2007) argues that investigating party positions with re- 1The plot was adapted from their visualization tool https:// chesdata.shinyapps.io/Shiny-CHES/ 29 https://chesdata.shinyapps.io/Shiny-CHES/ https://chesdata.shinyapps.io/Shiny-CHES/ CHAPTER 2. POLITICAL OPINIONS IN TEXTS spect to specific issues is increasingly relevant within the context of Western European politics to understand the reasons why some issues (e.g. refugees and immigrants) are more central in a given country and time, and how parties are strategically placing more attention to them to gain more support from voters. Identifying party positions on specific issues also sheds light on the framework of saliency theory. It investigates how parties selectively emphasize issues, and it posits that certain parties only adopt clear stances on issues they re- gard as worthy of attention Sio and Weber (2014). Alter- natively, it is also relevant for comparative studies across countries and regions of the globe. For example, the afore- mentioned Chapel Hill Expert Survey (CHES) is a large- scale survey involving the manual effort of expert anno- tators from many countries that investigate party posi- tions (Jolly et al., 2022). The survey contains party posi- tions on ideological scales such as left–right, libertarian– authoritarian, and on specific policies such as deregula- tion, immigration policy, multiculturalism, urban-rural, environment, and European integration. Given that the annotations are standardized, it is, in theory, possible to compare parties across countries and time. Another great 30 2.3. MODELING POLITICAL OPINIONS effort from the community to investigate parties across countries has been the development of the codebook and annotations in the framework of the Manifesto Research on Political Representation project (MARPOR) (Burst et al., 2021), formerly known as the Comparative Mani- festos Project (CMP). I discuss more details about MAR- POR later in §§ 2.5.1. 2.3. Modeling political opinions 2.3.1. Motivation Traditionally, the opinions of political actors have been investigated through a series of methods such as surveys (Rovný, 2012a; Jolly et al., 2022), the answers of parties to voting advice applications (VAAs) (König and Nyhuis, 2020), or by annotating large amounts of data from man- ifestos such as MARPOR (Burst et al., 2021). These ap- proaches demand significant resources in terms of trained personnel and funding. It requires field experts who are familiar with the country’s political spectrum to carry out surveys and annotations. The difficulty of the task of annotation is also a factor to take into account. In the 31 CHAPTER 2. POLITICAL OPINIONS IN TEXTS case of MARPOR, studies also discuss the low reliability of coders (Mikhaylov et al., 2008) except in cases where annotators are well-trained (Lacewell and Werner, 2013). Others contend that the task’s complexity is further accen- tuated by the intricacy of the MARPOR codebook, which is highly detailed and domain-specific (Gemenis, 2013). These studies indicate that scaling up this process manu- ally within a single country is very challenging, and doing so across multiple countries is even more complex due to variations in political landscapes and languages. Time is also an important factor in this case. Consider the scenario in which political scientists would like to an- alyze manifestos immediately after they are published at the start of the election campaign to gain insights on the programs of the parties and what has changed from the previous elections in a short amount of time. This is not possible manually because annotations take a long time to be carried out and evaluated for quality in terms of inter- annotator agreement. A system capable of automatically identifying the opinions of political parties could prove highly beneficial. Taking these factors into account, automating the task of identifying political opinions can offer significant ad- 32 2.3. MODELING POLITICAL OPINIONS vantages. For example, funding and resources could be used for hiring more annotators to annotate a small set of manifestos rather than all manifestos. This small set would be considered high-quality data that is then used for training. This circle would streamline the process, and potentially ensure a more accurate analysis. Automation would handle large volumes of data, reduce the annotation time, and provide timely updates. This makes it a useful tool for political scientists who require reliable and up-to- date information on party positions for analysis. Ideally, citizens would also benefit from the automation of this task. For instance, the results of this analysis can be used in VAAs – applications that estimate the alignment of vot- ers with parties or candidates running for the government. The retrieved information about party positions and can- didates can be added to these applications so that they are not only reliant on the answers provided by the par- ties themselves. This could add to the trustworthiness of these applications since the information would come di- rectly from what is written in the manifestos written by the parties. At the same time, it places significant expec- tations on the faithfulness of the extraction methods. 33 CHAPTER 2. POLITICAL OPINIONS IN TEXTS 2.3.2. Computational approaches for mining political opinions In data-driven computational social sciences (CSS), text becomes the primary source of information for analyzing and understanding phenomena in society (Zhang et al., 2020). In our case, it is a source from which to extract party positions automatically. Existing approaches for this purpose have been based on textual information made available by parties or their members, such as manifestos, parliamentary speeches, and social media posts such as those posted on X (formerly Twitter). Earlier approaches for automatically mining political opinions are based on word counts. Laver et al. (2003) pro- posed the Wordscores approach. It consists of two parts, first assigning a probabilistic score to words that are found in the pre-defined reference texts. The reference texts rep- resent the extremes of positioning, and they can represent many issues or a single issue. The word scores are then used to assign a weighted score to unseen documents. This approach compares the frequency of each word in the ref- erence texts and contrasts these frequencies with the word counts from the texts under analysis. This approach has 34 2.3. MODELING POLITICAL OPINIONS two main drawbacks. Firstly, it highly depends on refer- ence texts as the “gold standard” of party positions. Se- lecting reference texts requires expertise and agreement about what characterizes extreme policy positions, mak- ing the implementation of this method more challenging. Secondly, it presupposes that the political discourse re- mains relatively static over time because it does not take the relevance of words into account. To deal with these drawbacks, Slapin and Proksch (2008) proposed Wordfish. It employs a Poisson distribution to infer a unidimensional document scale based on the distribution of word frequen- cies. The words are considered proxies for ideological po- sitions. A similar approach is implemented by Lauderdale and Herzog (2016) for party positions based on parliamen- tary speech instead of manifestos. Both aforementioned approaches, however, do not take semantic and syntactic relations into account because they are bag-of-words-based models. For example, even though the terms “foreigner” and “migrant” might be used inter- changeably, this type of model does not consider the simi- larity in the term because the token type is different. If the reference texts used the former term and the unseen texts the latter term, the approach does not take it into account 35 CHAPTER 2. POLITICAL OPINIONS IN TEXTS as similar words for the scoring. To compensate for that, Glavaš et al. (2017) and Nanni et al. (2022) investigate po- sitioning in parliamentary speeches in the European Union with the use of word embeddings – which take the seman- tic relations such as the similarity between “foreigner” and “migrant” into account. They create a multilingual seman- tic space with speeches in different languages, then they retrieve the positioning by aligning word embeddings ac- cording to the highest similarity of words between pairs of documents. They scale the alignment scores into sin- gle values per party with a graph-based algorithm. The results are evaluated against ground truths referring to left–right and European integration scores. Results show that this method surpasses the performance of Wordfish, the prevailing standard for political scaling until then. In a similar line of research, Rheault and Cochrane (2020) utilize party embeddings derived from word em- beddings, which are enhanced with political metadata and fine-tuned on parliamentary corpora. They employ princi- pal component analysis (PCA) to reduce the dimensions of these aggregated party embeddings to determine party po- sitions. Their findings reveal that the positions extracted from parliamentary speeches align with manifesto posi- 36 2.4. TASKS: POLITICAL SCALING VS POLITICAL POSITIONING tions on a left-to-right scale for parties in English-speaking countries namely Great Britain, Canada, and the United States. 2.4. Tasks: Political scaling vs political positioning Although the existing literature does not address this issue directly, this thesis contends that the tasks of political scaling and positioning of political actors have some small, but important divergences. They have the same objective which is to reduce information into one dimension, but they differ in what they are modeling, leading to variations in the evaluation and the applicability. We discuss these points below. 2.4.1. Scaling The task of political scaling exclusively places political par- ties or actors into a scale that reduces pre-defined policy issues into one dimension, which is usually ideologically motivated. For example, left–right is a dimension that ar- guably contrasts a more progressive and redistributive role 37 CHAPTER 2. POLITICAL OPINIONS IN TEXTS for the state to a more conservative and market-oriented role (Budge et al., 2001). Another example of dimension is libertarian–authoritarian where the latter upholds tra- ditional morality, law and order, and cultural uniformity, while the former supports cultural and ethnic diversity and advocates for individuals’ freedom (Duch and Strøm, 2004; Bakker and Hobolt, 2013). As it can be observed, these scales aggregate certain pre-defined policy issues which are relevant for that dimension, such as law and order and lib- eral society. In this way, the position of the political actor is reduced to one dimension according to these policy is- sues. The MARPOR project has extensively worked on this task of scaling parties to left–right with annotated data in what is known as the RILE score (Budge, 2013; Volkens et al., 2013). They define 24 categories from their code- book that belong to either the left or the right (Cf. Table 1 of Section 6.2 in Chapter 6) and compute the RILE score as the difference between the number of times that these two categories proportionally occur in the manifestos. The score gives a final value between -1 and 1 and tells how left or right a given party is according to their manifesto. This measure has been repeatedly criticized because it is inflex- 38 2.4. TASKS: POLITICAL SCALING VS POLITICAL POSITIONING ible and not easily comparable over time (Flentje et al., 2017). The Chapel Hill Expert Survey mentioned in Sec- tion 2.2 also places parties into a left–right scale. In their approach, expert annotators assign a score between 0 and 10, and the final scale is determined by averaging these scores across all annotators. Prior to this thesis, only one study has directly attempted to automate the task of po- litical scaling. Subramanian et al. (2018) use a two-step approach with hierarchical bi-LSTM to predict both fine- and coarse-grained positions, and then convert them into scaling scores with probabilistic soft logic. Besides them, we argue that the other aforementioned methods only per- form the task of political positioning, as explained later in Section 2.4.3. 2.4.2. Scaling at a policy issue level Although not discussed in the political science literature, I argue in this thesis that we can also scale parties or po- litical actors in specific policy issues. The main reason for arguing in favor of this definition is that there is also a pre- defined scale in which we analyze parties. For example, we can place parties into a scale that indicates whether they are rather in favor or against migration or government ex- 39 CHAPTER 2. POLITICAL OPINIONS IN TEXTS penditure (Cf. example in Figure 2.2). When examining a particular policy issue, we focus on a selected group of re- lated policies. While this method is similar to ideological scaling, it differs in specificity - ideological scaling consid- ers policies across different issues, whereas this approach concentrates on policies within a single, specific topic area. This type of analysis is more fine-grained because it al- lows us to understand along which dimensions parties align or diverge the most. It sheds light on the reasons why some parties are closer to others at an aggregated level of the analysis, adding a layer of interpretability into the global positioning of parties. Additionally, by segmenting the texts according to policy issues, it becomes feasible to incorporate the concept of salience into the analysis. The extent of the discussion on a particular issue can serve as an indicator of its relative importance to a party, com- pared to other issues and to the priorities of other parties (Epstein and Segal, 2000). The text segmentation is not part of the task of polit- ical scaling, but it is necessary to investigate the issues of interest. Studies that focused on policy issues in po- litical positioning have so far not proposed a method to segment texts automatically. They either manually adapt 40 2.4. TASKS: POLITICAL SCALING VS POLITICAL POSITIONING the reference texts to texts corresponding to specific pol- icy issues (Laver et al., 2003) or they manually identify spans of manifestos that discuss policy issues such as the economy (Slapin and Proksch, 2008). Another strategy is to take the entire text into account and evaluate the correlation with ground truths that position parties on a European integration scale (Glavaš et al., 2017). This last work, however, does not provide insights on the specific issue of European integration, it only measures whether party positions at an aggregated level correlates with the positioning within the specific issue of European integra- tion. 2.4.3. Political positioning The task of political positioning, on the other hand, is not dependent on a scale that encompasses a limited and pre-defined set of policy issues. Its objective is to identify party positions based on an undefined set of policy issues or policies. For example, when analyzing economic texts, we assess the extent of similarity or dissimilarity among political parties regarding this specific issue. Conversely, if the analysis encompasses the entire manifesto, we evaluate 41 CHAPTER 2. POLITICAL OPINIONS IN TEXTS the extent to which parties align or diverge across various issues addressed in their manifestos. On the computational methods discussed in Section 2.3.2, they all address political positioning. Wordscores attempts to do it by basing their left–right dimension on reference texts (Laver et al., 2003) while Wordfish computes party positions based on the entire manifestos and compares the frequency of words between pairs of parties (Slapin and Proksch, 2008). Slapin and Proksch (2008) even mentions “using the entire manifesto text as data, we expect this dimension to correspond to a left-right politics dimension, which we confirm by comparing the results to other es- timates of left-right positions”, meaning that they expect the results to correlate with a certain scale (left–right), but it is not a direct scaling on this specific dimension. I argue that it can only be considered scaling if their method first selects the parts of the text that belong to the left– right dimension, and then performs scaling on these spans of text. Consider the case where countries do not have a predominantly strong left–right positioning across policy issues, the results of the analysis would not highly corre- late with the left–right scale, for example. Indeed, their study is limited to the context of parties in Germany. 42 2.4. TASKS: POLITICAL SCALING VS POLITICAL POSITIONING In the same manner, the methods proposed by Rheault and Cochrane (2020) and Glavaš et al. (2017) also measure party positions rather than their scaling. They run their analysis on entire documents, not selecting parts of the text relative to issues that are related to the left–right scale. They, however, evaluate their results against this scale. In the political science literature, a similar approach to the task of political positioning is the estimation of ideal points. The focus is not only on the aggregation of pref- erences from parties, but also from lawmakers. A law- maker’s political stance can be represented by a numeri- cal value called an ideal point, which distills their politi- cal preferences into a single quantitative measure. In the past, only roll call votes were utilized for this type of anal- ysis, but later on textual information was added to make the analysis more contextual and meaningful. Vafa et al. (2020), for example, performs text-based ideal point esti- mate with Tweets from US presidential candidates. Ger- rish and Blei (2011) and Lauderdale and Clark (2014) in- stead explore legislative texts and opinion texts from the U.S. Supreme Court respectively. These studies rely on bag of words combined with topic modeling methods. 43 CHAPTER 2. POLITICAL OPINIONS IN TEXTS Characteristic Scal. Posit. Need of pre-defined set of policies 4 □ Need of text segmentation 4 □ Interpretation of positions , / Generalization across time & country , / Inclusion of new topics / , Table 2.1.: Main differences between the tasks of position- ing and scaling drawn from our findings. Scal. stands for scaling and Posit. for positioning. 2.4.4. (Dis)Advantages of scaling and positioning Both approaches to analyzing political actors – scaling and positioning – present distinct advantages and disadvan- tages. Table 2.1 summarizes the differences drawn from our findings while working on methods for both tasks. We discuss these aspects in detail in the subsequent part of this subsection. Firstly, scales are not consistently defensible across dif- ferent countries or historical periods. Political science re- search has demonstrated the sensitivity of the left–right scale to variations in both geographical and temporal con- texts (König et al., 2013; Flentje et al., 2017). Similarly, expert agreement in what constitutes one side of the scale 44 2.4. TASKS: POLITICAL SCALING VS POLITICAL POSITIONING and the other has been proven to be debatable and con- troversial among political scientists (Slapin and Proksch, 2008; Flentje et al., 2017). On the other hand, having a scale with pre-defined pol- icy issues can be an advantage because it can potentially create a basis for comparison of results across time and countries (provided that the policy issues that constitute a scale have been agreed on). When measuring a given scale – which encompasses the same issues – it is possible to compare the results of the values across time. A com- parable analysis can be conducted across various coun- tries to observe the ideological alignment of parties on an international scale. This approach can yield valuable in- sights, particularly when examining political blocs such as the European Union. In the case of the positioning task, the results might not be reliably compared across differ- ent countries because we cannot ensure that parties debate similar policies across nations (unless we classify them be- forehand). Each country’s political context and agenda can lead to significant variations in the issues being pri- oritized and discussed. Similarly, the positioning cannot be easily transferred across different text genres. This is both because the issues and policies being addressed may 45 CHAPTER 2. POLITICAL OPINIONS IN TEXTS differ. Additional factors related to stylistic differences in text and the inherent political characteristics of each coun- try might be a challenge too (Burnham, 2024). However, the latter point is also a problem for the task of political scaling. Another drawback of scaling is the challenge of incor- porating newly identified policy issues. This was, for in- stance, the case with many COVID-19-related policies which did not exist in the MARPOR categories. This is both challenging for annotators who need to be retrained to annotate unseen data, and computational models to gen- eralize over new topics in case they have not been anno- tated yet. On the other hand, political positioning is able to avoid this problem because it can work with any text at hand without additional annotations. For example, all parties mention COVID-19 related policies in their man- ifestos following the pandemic’s start, given its relevance to the political spectrum. A disadvantage of political positioning is that it is harder to interpret because dimensions are not readily defined. When all the information is taken into account, such as when the entire manifestos are processed, it becomes chal- lenging to explain what is causing the (dis)similarity be- 46 2.4. TASKS: POLITICAL SCALING VS POLITICAL POSITIONING tween parties. In the scaling approach, the assessment is better informed because the policy issues involved in the analysis are more clearly defined and fewer in quantity. This, however, can be mitigated in the political position- ing if the text for the analysis is separated into extracts that belong to defined policy issues (e.g. economy or mi- gration). Finally, given that scaling is based on a pre-defined set of policy issues, the text under analysis needs to contain only this set of policy issues. This requires segmenting the text into sections that exclusively address these policy issues, achievable through either classification or cluster- ing. In contrast, the task of positioning does not require differentiating which parts of the text should be included in the analysis. Challenges in policy issue scaling. The new challenge in this setup is the limited data availability. In other words, there is usually not a substantial amount of data from the same time period that addresses a specific policy issue. The scarcity of data may lead to increased sensi- tivity due to the reduced number of data points in the pairwise similarity of sentences between parties. Alterna- 47 CHAPTER 2. POLITICAL OPINIONS IN TEXTS tively, another consequence is the difficulty in capturing nuanced shifts in party positions over time. When data is sparse, especially for less prominent issues, the model may struggle to accurately reflect subtle changes in po- sitioning, leading to potential oversimplifications of the analysis. Still related to data, parties may have differ- ent data distribution depending on the policy issue. As the saliency theory suggests, parties emphasize by writing more about topics that are very relevant for them, there- fore there might be policy issues where there is extensive documentation for some parties in comparison to others, making the modeling of the positioning less reliable. When the positioning analysis is done at an aggregated level, these differences cancel each other out, resulting in less variance in the results. Finally, evaluating positioning at a more fine-grained level is challenging due to the difficulty in finding ground truth data that precisely corresponds to the same policy issues. As a result, we might need to rely on a manual inspection of the results. As previously noted, text segmentation is not inherently part of the positioning task, but it may be required de- pending on the context so that the positioning can take place in each of these segments. Text segmentation can be 48 2.4. TASKS: POLITICAL SCALING VS POLITICAL POSITIONING done via classification or by clustering spans of the text. In the former case, there is the need for a substantial vol- ume of annotations to train a classifier that can predict the policy issue that a text span belongs to. The lat- ter requires annotations for the evaluation of the resulting clusters. Alternatively, this process can be done manually by selecting parts of a text that belong to certain policy issues. Scaling or positioning? Whether to implement posi- tioning or scaling depends on the scope of the application and the data available. One might be interested in spe- cific scales and how parties shift within a predefined set of issues. Scaling is more appropriate in this case while po- sitioning is more appropriate if the scope is to understand how close parties are to one another in general – without a reference scale. Alternatively, there could be a significant amount of text pertaining to a new emerging issue that has not yet been included in a codebook or there is not enough annotated data to segment texts correctly. In both cases, positioning is better suited for implementation. 49 CHAPTER 2. POLITICAL OPINIONS IN TEXTS 2.5. Annotation and text representation for mining political opinions 2.5.1. Annotated Data Party manifestos or electoral programs are some of the most informative sources regarding parties’ policies. They outline parties’ views, intentions, and motives for the up- coming years. Since these texts aim not only to inform but also to persuade potential voters in a competitive environ- ment (Budge et al., 2001), they offer valuable insights into the parties’ positions on various policies due to their direct expression of party opinions. The emphasis of issues in the manifestos can also hint to the policies that parties con- sider most relevant for their campaign, with more space dedicated to them according to the saliency theory frame- work (Budge, 2001; Dolezal et al., 2014). Consequently, they are widely used in political science research. Man- ifestos are examined to explore the similarities between parties on various policies (Budge, 2003), predict poten- tial party coalitions (Druckman et al., 2005), and assess 50 2.5. ANNOTATION AND TEXT REPRESENTATION FOR MINING POLITICAL OPINIONS how well the parties align with the voters’ worldviews (Mc- Gregor, 2013). Despite being a great resource because of the detailed annotations, the MARPOR dataset is poorly explored by the NLP community. It is a huge dataset that consists of 5151 annotated manifestos from over 67 countries across several continents. It is the largest dataset in the political science domain. The codebook has 7 broad issue domains (Cf. Table 1 of Section 4.3 in Chapter 4 for examples) and 143 fine-grained categories that belong to the broad domains (examples in Table 1 of Section 5.2 in Chapter 5). The categories are labeled based on policies and may include the stance on the policy. Within the domain of external relations, for example, there are two labels for Military: Positive and Military: Negative because parties might argue against spending more or less funding on the military while the category Peace only has one side be- cause parties do not argue against it. The detailed annotations allows researchers to under- stand the salience of issues emphasized by parties and also their positioning towards some policies (e.g. positive and negative labels within the military policy issue). On the one hand, the annotated categories provide a straightfor- 51 CHAPTER 2. POLITICAL OPINIONS IN TEXTS ward way to analyze political positions within categories that contain the negative and positive stances in terms of issue salience (Epstein and Segal, 2000). On the other hand, they can be analyzed under a low-dimensional ide- ological framework. The most prominent approach in this latter case is the RILE index (Laver and Budge, 1992; Budge, 2013; Volkens et al., 2013). The RILE index is calculated by taking the difference in the proportions of categories associated with left-wing and right-wing posi- tions that occur in the manifestos (Table 1 of Section 6.2 in Chapter 6 illustrates the RILE categories). It has con- sistently been used in publications and continues to be a standard reference scale for party positioning, despite nu- merous proposals for improvement or replacement through both theory-based and data-driven approaches (Cochrane, 2015; Mölder, 2016; Flentje et al., 2017). The annotations of MARPOR have been a valuable re- source for answering our overarching research questions. We make use of the annotations from the lower to the higher level of granularity. We used the broad annotated domains for fine-tuning the models with in-domain data. We utilized the fine-grained annotations across countries for our training and evaluating classifiers for the scaling 52 2.5. ANNOTATION AND TEXT REPRESENTATION FOR MINING POLITICAL OPINIONS task in a multilingual setup. Finally, we also used the la- bels for computing party positions and the RILE score as ground truths for our evaluation. 2.5.2. More informed text representations Both the tasks of positioning and scaling can be seen as a text representation problem as we are dealing with the challenge of converting textual data into structured for- mats that capture the semantic and syntactic properties of the different political opinions, allowing us to measure the (dis)similarities between them. Models based on static word embeddings (Glavaš et al., 2017; Rheault and Cochrane, 2020) already show a jump in performance in comparison with bag of word models used previously in the task of identifying the positioning of political actors. Word embeddings, such as those uti- lized in models like GloVe (Pennington et al., 2014) and Word2Vec (Mikolov et al., 2013), have numerous advan- tages such as capturing better semantic relationships be- tween words, incorporating contextual information, and providing an efficient representation of words that can be reduced to smaller vectors. Lastly, they are one of the first breakthroughs for transfer learning, they can be used 53 CHAPTER 2. POLITICAL OPINIONS IN TEXTS without being trained from scratch. This allows leverag- ing knowledge from large corpora in-domain, enhancing performance especially when labeled data is limited. Next, the NLP landscape was taken over by contextual- ized word embeddings based on the Transformers architec- ture, e.g. BERT, RoBERTa or GPT-3 (Devlin et al., 2019; Liu et al., 2020; Brown et al., 2020). This type of repre- sentation has improved the performance of multiple NLP tasks by capturing corpus-specific word usage and allowing for fine-tuning that is relatively easy and low in computa- tional resource demands in comparison with training mod- els from scratch. This significantly enhances the quality of token representations. BERT’s and RoBERTa’s origi- nal architectures, for example, encode representations not only at a token level, but also at a sentence level, with the classification token (CLS). However, the CLS token representation has been shown inefficient because it has initially been trained to predict the next sentence. One of the proposed solutions was to average the representations of all tokens in a given sentence (May et al., 2019; Qiao et al., 2019), but a simpler and more computationally effi- cient language model, namely GloVe, performed better at similarity tasks (Reimers and Gurevych, 2019a). 54 2.5. ANNOTATION AND TEXT REPRESENTATION FOR MINING POLITICAL OPINIONS Studies suggest that a model such as Sentence-BERT (SBERT) Reimers and Gurevych (2019a) is more suitable for similarity tasks – which is the basis for our methods for computing political positioning. SBERT is based on BERT (Devlin et al., 2019) or RoBERTa representations (Liu et al., 2020), but it outperforms these models in such tasks because it is further trained to place similar sen- tences in proximity to one another in the semantic space, producing more semantically meaningful representations of sentences. It uses Siamese network architecture with the objective of minimizing the distance between the sim- ilar pair of sentences and pushing the dissimilar pair away in the semantic space. This is optimized by the triplet loss function shown below: max(∥Sa − Sp∥ − ∥Sa − Sn∥+ ϵ, 0) (2.1) where the triplet is composed of anchor (Sa), positive (Sp), and negative (Sn) sentences where Sa and Sp are more similar to each other than Sa and Sn. Margin ϵ guarantees that Sp is at least ϵ closer to Sa than Sn. Given that SBERT has advanced the field significantly in sentence encoding, this thesis aims at evaluating the po- tential of SBERT in the political domain. To our knowl- 55 CHAPTER 2. POLITICAL OPINIONS IN TEXTS edge, this is the first study to apply and assess SBERT models within the political domain. Our findings indi- cate that SBERT is highly adaptable to domain-specific data. Although the vanilla SBERT model performs ef- fectively with non-English languages, such as German, its efficiency is greatly enhanced through fine-tuning with domain-specific texts, like manifestos, thereby improving its performance for our task. In the experiments, we take into account in the fine-tuning regime both information at a meta level of political documents (party ids) and the extensive annotations from MARPOR. This allowed us to keep a weakly supervised regime with in-domain data in order to assess what works best in the context of political positioning. Further details are in Chapters 4 and 5. In this thesis, we further explore the optimization of sen- tence representations with post-processing. Research has explored the extent of anisotropy in the distribution of rep- resentations within transformer language models and its potential influence on the performance of similarity tasks (Ethayarajh, 2019; Gao et al., 2019). Anisotropy causes the sentence-embedding manifold to be in a cone-shaped format, leading two random vectors to be very similar to one another. Given this fact, we also experiment with a 56 2.6. OVERVIEW OF CONTRIBUTIONS AND PUBLICATIONS simple yet effective post-processing method proposed by Su et al. (2021) to mitigate this effect. We employ and evaluate the embeddings before and after the whitening transformation. Results show that the transformation re- sults in higher performance in the task. More details are provided in Chapters 4 and 5. 2.6. Overview of contributions and publications In the following, I describe our contributions and a sum- mary of each publication that contributes to the automa- tion of mining political opinions investigated during this thesis. 2.6.1. Contributions In terms of political positioning, we develop novel meth- ods for computing party positions based on text similarity. We propose two approaches that vary in the level of anno- tations required – contrasting scenarios with and with no annotations. We propose methods for fine-tuning state-of- the-art sentence embedding models based on transformers 57 CHAPTER 2. POLITICAL OPINIONS IN TEXTS with in domain data so that the representations are more informative for the domain of political texts. There have been no fully automated approaches pro- posed for identifying the positioning in relation to policy issues. Therefore, we propose an end-to-end pipeline for this purpose. More specifically, we work on a scenario where newly published manifestos have no annotations – simulating real world case analysis where there is the need of immediate analysis of the manifestos after their release. The research gap concerns both the segmentation of texts according to the policy issues they belong to, and party positions within these dimensions. Our pipeline consists of two stages. The first stage involves a classifier that cat- egorizes manifesto sentences based on their corresponding policy issue. The second step regards an unsupervised text similarity method for identifying party positions within these issues – which is inspired by on the approach we developed for the task positioning, and includes a dimen- sionality reduction component. Regarding scaling, we explore the task supervised meth- ods using state-of-the-art models. We evaluate how classi- fiers using transformers-based model representations that take short and long input perform in this task. Our ap- 58 2.6. OVERVIEW OF CONTRIBUTIONS AND PUBLICATIONS proaches are designed to evaluate real world case scenarios such as when annotations are not available in a country or in a time period. Our objective is also to understand to what extent we can use existing annotations to perform political scaling at large scale across several languages, in- cluding low-resource ones. 2.6.2. Unsupervised methods for party positioning Below are the key points discussed in the paper illustrated in Chapter 4 regarding party positions with unsupervised methods. Objectives Given the context introduced in Chapter 2, this paper aims at developing and evaluating unsupervised and weakly supervised methods that capture the positioning of polit- ical parties. Our investigation has three main objectives, the first lies in understanding to what extent we can re- liably determine the positioning of political parties with unsupervised methods and what type of text representa- tion best tackles this task. The second regards the anno- 59 CHAPTER 2. POLITICAL OPINIONS IN TEXTS tations – to what extent we can forgo annotations in this task. And finally, the last objective pertains to evaluat- ing the level of discourse structure that best captures the similarity between parties – whether positioning is best captured with only sentences that contain claims or with all sentences. Proposed methodology We develop and compare two methods for measuring the distance between parties based on their manifestos that take into account the amount of information we want to include in the modelling of the positioning. In the first scenario, we assume that there is enough annotation re- garding the policy-domains that the sentences of the man- ifestos belong to, thus, this information is included in the function to measure distances. We posit that language models may find it easier to determine the proximity be- tween parties by comparing sentences from correspond- ing topics, or in our case, policy issues. Taking this into account, we propose a domain-based approach, which computes the distance of the parties with the pair-wise distance between pairs of sentences from the manifestos that belong to the same domain. The final distance be- 60 2.6. OVERVIEW OF CONTRIBUTIONS AND PUBLICATIONS tween a pair of parties is the average of all distances. To contrast with that and evaluate the limits of capturing the positioning without annotations, we develop the sec- ond approach to compute distance between parties called twin-matching. In this approach, the distance between a pair of parties is calculated with the pair-wise similar- ity between all sentences from both parties, where only the highest similarity pair is input for the function. This is normalized by the highest pair-wise similarity between sentences from the same manifestos for each of the parties from the pair (refer to § 4.4.2 for more details). We hy- pothesize that this step offers a substitute for information regarding the policy issue in the absence of annotations. In order to answer the question in relation to discourse structure, we evaluate two setups. In the first setup, we take into account all sentences from the manifestos. We argue that this scenario is less informative because it does not discriminate between sentences that might contain stances or not. The second setup instead is more infor- mative because it only considers claims in the function. We posit that claims are already charged with a party’s positioning towards a topic and that they contain the es- sential aspects of their proposed policies, given that po- 61 CHAPTER 2. POLITICAL OPINIONS IN TEXTS litical claims contain a demand (Koopmans and Statham, 1999). For that, we run a claim classifier that predicts the sentences containing a claim in the manifestos. Both dis- course structures (all sentences and claims) are evaluated under the domain-based and twin-matching similarity computational approaches, leading to a total of four se- tups for comparison. Besides that, we focused our evaluation on the text rep- resentations. We evaluated 6 word or sentence embedding models in the 4 setups – from the simplest static word em- bedding fastText to SBERT both vanilla and fine-tuning on in-domain data with manifestos from previous elec- tions. The two fine-tuned models are SBERTparty) which uses relations extracted from the party id of the mani- festos for fine-tuning and SBERTdomain) which uses the do- main annotations from the manifestos. Among all models, there were multilingual and German monolingual models. Because the representations derived from transformers fall into an anisotropic distribution (where two random rep- resentations have high similarity), we experimented with post-processing the representations with whitening trans- formation, as suggested by Su et al. (2021). All setups are evaluated against party positions according to their an- 62 2.6. OVERVIEW OF CONTRIBUTIONS AND PUBLICATIONS swers in the voting advice application (Wahl-o-Mat) from the same year as the analyzed manifestos. Main findings Firstly, the multilingual SBERT model is the best perform- ing one (i.e., with higher correlation to the ground truth), confirming the high performance of the SBERT family of models in similarity texts, as shown by Reimers and Gurevych (2019b). Then, we observe that nearly all repre- sentations were improved with post-processing, suggesting that the transformation of the space to an isotropic dis- tribution improves the performance of tasks which are do- main specific, such as in the case of political debate. The vanilla version of SBERT performs best in the more infor- mative case with information from the domain (using the domain-based approach) while fine-tuned SBERTparty cor- relates better with the ground truth in the absence of do- main annotations, suggesting that in-domain information embedded in the model helps in capturing the similarity when there is lack of domain specificity in the data to be modelled. Moreover, we observe no significant difference between using all sentences and claims only, suggesting that claims are not the only discourse structure reinforcing 63 CHAPTER 2. POLITICAL OPINIONS IN TEXTS party positions – which goes in line with the findings that justifications also matter in the analysis of party positions (Blokker et al., 2022). Lastly and most strikingly, between the similarity computation approaches, the best results are obtained in the twin-matching approach (with fine-tuned SBERTparty) reaching 0.70 correlation. These findings vali- date the notion that NLP techniques can be employed to identify the (dis)similarity between parties based on their policy stances using a combination of unstructured dis- course and in-domain sentence representations. 2.6.3. Unsupervised methods for party positioning at a policy issue level Below we highlight the main points concerning the paper presented in Chapter 5 on party positions at an aggregated level and within policy issues. Objectives Following the previous study regarding party positioning and its promising results at an aggregated level of infor- mation, we aim at understanding the extent to which po- sitioning can be reliably carried out at a policy issue level. 64 2.6. OVERVIEW OF CONTRIBUTIONS AND PUBLICATIONS This requires working with specific parts of the manifestos that discuss specific issues, e.g., migration, economy, and education. In this paper, the objective is to expand on the previously developed methods and evaluate their limits in terms of fine-graininess. We contend that our approach provides interpretability to party positions by shedding light on the issues within the spectrum of politics on which parties exhibit agreement or disagreement. We propose a workflow that segments the manifestos based on cluster- ing. We then classify unseen data into these newly cre- ated labels that represent coherent policy issues. Then, we identify the positioning of political parties within each policy issue. Given our objectives, we evaluate each stage of the workflow under the condition where annotations are absent for a collection of manifestos. This evaluation aims to gauge the reliability of our approach for applying it in contexts where annotations from manifestos of forthcom- ing elections may be unavailable. Proposed methodology In order to reach the objectives stated above, we pro- pose a methodology aimed at estimating party similarity within policy issues, addressing inherent constraints. This 65 CHAPTER 2. POLITICAL OPINIONS IN TEXTS methodology comprises several stages: (a) defining appro- priate policy issues, (b) automatically labeling domains if manual labels are unavailable, (c) computing similarities at the domain level and aggregating them globally, and (d) extracting understandable party positions on signifi- cant policy axes using multidimensional scaling. In the first step of the workflow (a), we aim to define broad categories of policy issues that are not yet satis- fied with the MARPOR annotations. We argue that the annotations from MARPOR are either too broad (in the case of the 7 domains) or too fine-grained given that many categories even contain a stance label. Therefore, our ini- tial step involves breaking down the manifestos into co- herent segments, which we define as policy issues. These domains need to be coherent and easily understandable within the context of policies to facilitate our goal. In addition to that, they must remain impartial in terms of stance. This means that categories representing opposing viewpoints (such as positive and negative stances on a par- ticular issue like immigration) should fall under the same policy issue. The granularity of these domains is crucial, as they should offer sufficient detail to provide meaningful insights on policy issues, but should not be overly detailed 66 2.6. OVERVIEW OF CONTRIBUTIONS AND PUBLICATIONS to the extent that practical classification becomes unfea- sible. In order to create a new level of labels that fall in between the broad domains and the fine-grained categories of MARPOR, we compute the pairwise distance between all pairs of sentences belonging to the fine-grained MAR- POR categories from German manifestos. This results in a distance matrix with the MARPOR categories as rows and columns. Then, we run agglomerative hierarchical clustering to group similar MARPOR categories in the same cluster. These clusters are manually named accord- ing to the categories that fall into them. This process can be seen as a third level of annotations for the manifestos. For instance, sentences that were annotated as Military: Negative, Peace, and Military: Positive are now within the policy issue of military and peace. In the second step (b), we train a classifier (referred to as policy issue labeller) with two different training data settings, DEtrain is trained with manifestos from Germany only and DACHtrain with manifestos from all German-speaking countries. We choose this setup to evaluate whether more data can improve the performance of the classifiers. Three classifiers with either SBERT or RoBERTa representations 67 CHAPTER 2. POLITICAL OPINIONS IN TEXTS and a classifications head on top are trained and evaluated under these two setups. After having predicted the labels of the manifestos with the best performing policy issue labeller, we use a simi- lar strategy from the previous study (the domain-based approach) to calculate the similarity of parties within do- mains, as proposed in the third step (c). The distance matrices from the policy-domains are averaged in order to capture the positioning at an aggregated level. We corre- late the distance between parties with the distance matrix derived from the MARPOR categories – considered our ground truth in this case. In the last step (d) we run a dimensionality reduction strategy (principal component analysis) on the individual distance matrix of each policy issue. We visualize the values of the first principal com- ponent in a scale to inspect party positions within policy issues. This allows us to understand how closely related are parties in each topic. Finally, we evaluate 4 differ- ent models for sentence representations in stages (c) and (d). Similarly to the previous study, we post-process the representations with whitening transformation, since they always boost the performance of the results in comparison with no post-processing. 68 2.6. OVERVIEW OF CONTRIBUTIONS AND PUBLICATIONS The evaluation of the pipeline varies in each step so that we can assess to what extent annotations can be forgone. Step (a) is inspected manually given that we do not have ground truth for the newly mapped domains. In step (b), we evaluate the policy issue labeller based on the mapped MARPOR annotations. In step (c), party positions is eval- uated with political science knowledge about the stances and ideologies of parties within each domain according to the German political spectrum. The predicted scenario for step (d) is only evaluated on the accuracy of the classifier, where we identify which domains are successfully classified and which ones the models struggle the most with. That is a proxy for what domains can be reliably used in an analysis without annotated data. Main findings Our manual inspection shows that the clustering strategy employed in step (a) resulted in 13 clusters that match the demands we initially pre-defined for solid domains. That is, all positive and negative categories belonging to the same topic fall in the same cluster, and the clusters themselves fit into well-known policy issues (Benoit and Laver, 2006; Jolly et al., 2022). 69 CHAPTER 2. POLITICAL OPINIONS IN TEXTS In the second step, we evaluated three transformers- based models for classifying the newly mapped policy- domains with two different data regimes. RoBERTaxlm +MLP reached the best performance in both regimes with 62,5% and 64,5% accuracy in the DEtrain and DACHtrain respectively. The increase in the amount of data (from other German speaking countries) helped the classifier by only 2%, suggesting that classifying policy issues contin- ues to be a hard task for models regardless of the amount of training data. Moreover, the 2 point-improvement in performance also suggests that annotated data from other countries can be used for this classification task, but the gains in performance are low. The results of positioning at an aggregated level in the predicted scenario achieve a very high correlation against both ground truths – when comparing the first principal component against the RILE index and the distance ma- trix of the similarity computation against the distance ma- trix computed with MARPOR categories. While in the annotated setting, the best representations reach correla- tions as high as 0.94 and 0.84 in RILE and MARPOR, the pipeline including the label classifier reaches 0.79 and 0.80 respectively. The best representations are again the 70 2.6. OVERVIEW OF CONTRIBUTIONS AND PUBLICATIONS fine-tuned SBERT. This time though, the best fine-tuning strategy is when the model is optimized to approximate sentences from the same MARPOR high-level domain, our model SBERTdomain). This suggests that even though the predictions are not extremely accurate, the in-domain knowledge embedded in the fine-tuning process helps it estimate the domains during the similarity computation. Lastly, the final step of the workflow is partially suc- cessful. With our dimensionality reduction technique, we show that indeed parties do not follow the same left–right scaling in all policy issues, as expected Heywood (2021). According to expert domain knowledge, we observe that the results reflect certain well-established aspects of Ger- man politics. For instance, in the domain of foreign rela- tions, EU, and protectionism, which exhibits only a mod- erate correlation with the left–right spectrum, the AfD stands out compared to other parties. This deviation can arguably be attributed to its opposition to EU member- ship and its differing stance on relations with Russia, set- ting it apart from other parties clustered within the same ideological position. Another instance is evident in the domain of education and technology, where the AfD and Die Linke, typically positioned at opposing ends of the 71 CHAPTER 2. POLITICAL OPINIONS IN TEXTS left-right spectrum, surprisingly share significant common ground in advocating for expanded education and invest- ment in technology and infrastructure. On the contrary, in domains such as military and peace and immigration and multiculturalism, party positions closely align with the broader left-right scale, with right-leaning parties exhibit- ing more militaristic tendencies and greater aversion to immigration. Finally, we check for the performance of the policy issue labeller in each label given that in a scenario without annotations, only the positioning within high per- formant policy issue