Methods for Mining Political Opinions
from Texts and Large Language Models

Von der Fakultät Informatik, Elektrotechnik und
Informationstechnik der Universität Stuttgart zur

Erlangung der Würde eines Doktors der
Philosophie (Dr. phil.) genehmigte Abhandlung.

Vorgelegt von

Tanise Pagnan Ceron
aus Turvo (SC), Brasilien

Hauptberichter Prof. Dr. Sebastian Padó
Mitberichter Prof. Dr. Katherine A. Keith

Tag der mündlichen Prüfung: 14. Oktober 2024
Institut für Maschinelle Sprachverarbeitung

der Universität Stuttgart

2025


Erklärung (Statement of Authorship)
Hiermit erkläre ich, dass ich die vorliegende Arbeit selb-
ständig verfasst habe und dabei keine andere als die angegebene
Literatur verwendet habe. Alle Zitate und sinngemäßen
Entlehnungen sind als solche unter genauer Angabe der
Quelle gekennzeichnet.

I hereby declare that this text is the result of my own work
and that I have not used sources without declaration in
the text. Any thoughts from others or literal quotations
are clearly marked.

(Tanise Pagnan Ceron)

iii


Zusammenfassung

In demokratischen Gesellschaften ermöglicht die Meinungs-
vielfalt, dass Individuen ihre Meinung ausdrücken und sich
mit unterschiedlichen Perspektiven auseinanderzusetzen kön-
nen. Diese Arbeit untersucht politische Meinungen aus
zwei Perspektiven: Texte und Modelle. Hierbei werden
sowohl ideologische Positionen als auch Präferenzen für
politische Themen untersucht. Während die ideologische
Analyse gut etabliert ist, stellen die Präferenzen für poli-
tische Themen ein nuancierteres, wenig erforschtes For-
schungsgebiet dar.

Die Untersuchung Meinungen von politischen Parteien
ist unerlässlich, um die Wahlentscheidungen der Wähler:
innen, die politische Entscheidungsfindung und die Ver-
schiebungen in den Parteiprogrammen im Laufe der Zeit
zu verstehen. Im ersten Teil dieser Arbeit konzentriere
ich mich auf Methoden zur Erkennung politischer Mein-
ungen aus Parteiprogrammen. Die Automatisierung der
Identifizierung politischer Meinungen hilft bei der Ver-
arbeitung großer Datensätze, minimiert die Annotation-
szeit und bietet zeitnahe Aktualisierungen zu neu veröf-
fentlichten Informationen von Parteien. Ich untersuche,
wie genau Par-teipositionen aus Texten mit minimalen

v


Annotationen identifiziert werden können und wie detail-
liert dieser Prozess ist. Wir untersuchen auch, inwieweit
Parteipositionen in großem Umfang über verschiedene Spra-
chen und Länder hinweg identifiziert werden können. Die
Ergebnisse zeigen, dass bei der Identifizierung von Parteipo-
sitionen zwischen den Aufgaben der politischen Skalierung
und Positionierung unterschieden werden kann, die erhe-
bliche Unterschiede in Bezug auf Bewertung und Anwen-
dung aufweisen. Darüber hinaus deuten die Ergebnisse
darauf hin, dass die Verbesserung der Textrepräsentatio-
nen durch domäneninternes Fine-tuning die Leistung erhe-
blich verbessert, wenn Methoden von der Textähnlichkeit
abhängen. Außerdem wird durch die sprachübergreifende
Skalierung von Parteien mit mehrsprachigen Modellen eine
hohe Leistung erzielt.

Sprachmodelle sind mit dem Aufkommen von LLMs zu
meinem Forschungsgegenstand geworden und werfen neue
Fragen hinsichtlich der in ihnen eingebetteten und repro-
duzierten Vorurteile auf. Angesichts der Wichtigkeit, poli-
tische Vorurteile in LLMs zu beleuchten, befasst sich der
zweite Teil dieser Arbeit mit der Bewertung und Iden-
tifizierung politischer Vorurteile in LLMs. Unsere For-
schungsfragen konzentrieren sich auf die robuste Bewer-
tung von LLMs auf Vorurteile und die Identifizierung der
politischen Vorurteile in Bezug auf Ideologie und Präferen-
zen für politische Themen. Diese Arbeit enthält Defini-
tionen von politischer Voreingenommenheit und politis-
cher Weltanschauung, die bei der Entwicklung von Meth-
oden zu deren Bewertung helfen. Darüber hinaus trägt

vi


sie dazu bei, einen Rahmen für eine robuste Bewertung
von Voreingenommenheiten in LLMs und einem Daten-
satz zur Bewertung politischer Meinungen in LLMs zu
entwickeln. Schließlich zeigen die Ergebnisse, dass Mod-
elle mit kleinen Parametern keine zuverlässigen Antworten
liefern und dass LLMs in Bezug auf einige politische The-
men konsistente politische Weltanschauungen vertreten.
Insgesamt unterstreichen sie die Notwendigkeit weiterer
For-schung, um die Komplexität und die gesellschaftlichen
Auswirkungen der Entwicklung von Modellen zu verste-
hen, die unterschiedliche politische Meinungen in KI-Sys-
teme integrieren.

vii


Abstract

In democratic societies, the diversity of opinions enables
individuals to express their values and engage with dif-
fering perspectives. This thesis investigates political opin-
ions through two lenses: texts and models, examining both
ideological positions and policy issue preferences. While
ideological analysis is well-established, policy issue prefer-
ences represent a more nuanced, underexplored research
area.

Investigating political opinions from political parties is
essential for understanding voter choices, policy decision-
making, and the shifts in party agendas over time. In the
first part of this thesis, I focus on methods for mining polit-
ical opinions from party manifestos. Automating the iden-
tification of political opinions helps process large datasets,
minimize annotation time, and offer timely updates on
newly released information from parties. I investigate
how accurately party positions can be identified from texts
with minimal annotations and the level of detail achievable
in this process. We also explore the extent to which party
positions can be identified on a large scale across different
languages and countries. Results demonstrate that the
identification of party positions can be distinguished be-

ix


tween the tasks of political scaling and positioning which
have substantial differences in terms of evaluation and ap-
plication. Additionally, findings indicate that improving
text representations through in-domain fine-tuning signif-
icantly benefits the performance when methods depend on
text similarity. And finally, party scaling across languages
achieves high performance with multilingual models.

Models have become my object of study with the advent
of LLMs. They introduce new concerns regarding the type
of biases embedded and reproduced by them. Given the
importance of shedding light on political biases in LLMs,
the second part of this thesis addresses the evaluation and
identification of political biases in LLMs. Our research
questions center on robustly evaluating LLMs for biases
and identifying the political biases regarding ideology and
policy issue preferences. This thesis provides definitions of
political bias and political worldview, which aid in design-
ing methods for their evaluation. Moreover, it contributes
with a framework for a robust evaluation of biases in LLMs
and a dataset for evaluating political opinions in LLMs.
Finally, findings indicate that small parameter size models
are not reliable in their answers, and that LLMs do hold
consistent political worldviews in relation to some policy
issues. Overall, they highlight the necessity for continued
research to understand the complexities and societal impli-
cations of developing models integrating diverse political
opinions into AI systems.

x


Acknowledgements

Writing and submitting this thesis marks the conclusion
of a very important chapter in my life, one that has had a
profound impact both on my professional career and my
personal life. The learnings I have had and the fruits I
have harvested would not have been possible without the
support of many people around me. Although I believe
that words won’t be enough to express my gratitude for
their support and inspiration, I still want to express my
thanks with a few words.

First and foremost, I would like to thank my supervi-
sor, Sebastian Padó. His guidance was always given in
the precise amount of what I needed throughout the en-
tire PhD. He offered more close supervision when I needed
the most and was more hands-off at the right time when I
was exploring new paths of research and career. I’m also
really thankful for his support in my exploration for new
opportunities that were not related to my PhD. This en-
abled me to grow as a researcher, broadening my research
horizons and considering the impact of my work beyond
academia. I am extremely grateful for his dedication to
constantly provide me with valuable feedback and advice

xi


– which pushed me beyond my own limits. Thanks, Se-
bastian, for believing in me more than I did myself.

I would like to thank Neele for all the support during
my PhD. Her close friendship was another gift from this
PhD. Thanks for the walks in the forest in moments of
high stress, the dancing before deadlines, the dinners, the
games, the shoulder that you have offered for me to com-
plain or cry on, and all the words of advice. I extend
my gratitude to Amelie, who embarked with me in this
crazy startup journey, and has encouraged me to dream
high. Thank you for your endless support and encourage-
ment to all our endeavors, without them this idea would
never have left the paper. Next, I would particularly like
to thank Severin and Ale, my dear friends, who have been
my big brothers in a foreign country. They were an impor-
tant part of the reason I could call Stuttgart home after
just a few months of living here. They have always offered
a hand to help and have been supportive of impossible
ideas. When I thought I was just sharing a silly idea, they
would be more like: “It’s a fantastic idea! Yes, you can do
it, Tanise. When and how are you starting?”

Next, I would like to express my gratitude to IMS and
its people. It has been an amazing experience to pursue
my PhD in this environment. I am grateful for all the
friends I have made in the department, the parties cel-
ebrated together, the barbecues, the laughter at lunch,
the games nights, the KWT nights, pastéis de nata and
canelés, the moments in which we shared our struggles and
joy, and the moments of distraction that made my past

xii


three years lighter and more enjoyable. Thanks, Chris,
for reading my thesis and correcting my English. And of
course, thanks Sabine Mohr for making my bureaucratic
life so much smoother at the department.

I would also like to extend my gratitude to Dmitry, my
close collaborator, who has offered numerous insights dur-
ing the research process of our collaborations. Addition-
ally, I appreciate his efforts in reading and commenting on
my thesis. Also, thanks for the thought-provoking discus-
sions about world problems.

I would also like to extend my gratitude to Professor
Katie Keith for kindly agreeing to be part of my commit-
tee and traveling all the way from the United States to
Stuttgart for the defense. I truly appreciate it.

Lastly, I extend my heartfelt thanks to my parents who,
despite the distance, have always been there for me. And
who, despite coming from a simpler background, have
made sure to show me the importance of education and
have encouraged me to keep pursuing this path.

xiii


Table of Contents

I. Synopsis 1

1. Introduction 11
1.1. Opinions in the political arena . . . . . . . 16
1.2. Political opinions in large language models 19
1.3. Thesis Outline . . . . . . . . . . . . . . . . 22

2. Political Opinions in Texts 25
2.1. Political opinions at low dimensionality . . 25
2.2. Fine-grained scaling at low dimensionality 28
2.3. Modeling political opinions . . . . . . . . . 31

2.3.1. Motivation . . . . . . . . . . . . . . 31
2.3.2. Computational approaches for min-

ing political opinions . . . . . . . . 34
2.4. Tasks: Political scaling vs political posi-

tioning . . . . . . . . . . . . . . . . . . . . 37
2.4.1. Scaling . . . . . . . . . . . . . . . . 37
2.4.2. Scaling at a policy issue level . . . 39
2.4.3. Political positioning . . . . . . . . . 41
2.4.4. (Dis)Advantages of scaling and po-

sitioning . . . . . . . . . . . . . . . 44

xv


Table of Contents

2.5. Annotation and text representation for min-
ing political opinions . . . . . . . . . . . . 50
2.5.1. Annotated Data . . . . . . . . . . . 50
2.5.2. More informed text representations 53

2.6. Overview of contributions and publications 57
2.6.1. Contributions . . . . . . . . . . . . 57
2.6.2. Unsupervised methods for party po-

sitioning . . . . . . . . . . . . . . . 59
2.6.3. Unsupervised methods for party po-

sitioning at a policy issue level . . . 64
2.6.4. Supervised methods for political scal-

ing across countries and time . . . 73

3. Political Opinions in Large Language Models 81
3.1. Language models . . . . . . . . . . . . . . 83
3.2. Biases in language models . . . . . . . . . 84
3.3. Evaluation of biases in LLMs . . . . . . . 86
3.4. Political biases in LLMs . . . . . . . . . . 90

3.4.1. Political bias vs political worldviews 91
3.4.2. Evaluation of political biases in LLMs 95

3.5. Overview of contributions and publication 99
3.5.1. Contributions . . . . . . . . . . . . 99
3.5.2. Evaluating political worldviews in large

language models . . . . . . . . . . 101

xvi


Table of Contents

II. Publications 109

4. Unsupervised Methods for Party Positioning 111
4.1. Introduction . . . . . . . . . . . . . . . . . 112
4.2. Related Work . . . . . . . . . . . . . . . . 113

4.2.1. Party Characterization . . . . . . . 113
4.2.2. Optimizing Text Representations for

Similarity . . . . . . . . . . . . . . 113
4.3. Data . . . . . . . . . . . . . . . . . . . . . 114

4.3.1. The Manifesto Dataset . . . . . . . 114
4.3.2. Ground Truth: Wahl-o-Mat . . . . 114

4.4. Methods . . . . . . . . . . . . . . . . . . . 115
4.4.1. Building Informative Text Represen-

tations . . . . . . . . . . . . . . . . 115
4.4.2. Four Models for Party Similarities . 116

4.5. Experimental Setup . . . . . . . . . . . . . 117
4.5.1. Datasets . . . . . . . . . . . . . . . 117
4.5.2. Models . . . . . . . . . . . . . . . . 118
4.5.3. Evaluation . . . . . . . . . . . . . . 118

4.6. Results and Discussion . . . . . . . . . . . 119
4.7. Conclusion . . . . . . . . . . . . . . . . . . 120
4.8. Appendix . . . . . . . . . . . . . . . . . . 123

5. Unsupervised Methods for Party Positioning
at a Policy Issue Level 127
5.1. Introduction . . . . . . . . . . . . . . . . . 128
5.2. Related Work . . . . . . . . . . . . . . . . 129
5.3. Methodology . . . . . . . . . . . . . . . . 130

5.3.1. Workflow . . . . . . . . . . . . . . 130

xvii


Table of Contents

5.3.2. Policy Domain Grouping . . . . . . 130
5.3.3. Policy Domain Prediction . . . . . 131
5.3.4. Computing Party (Dis)similarities . 131
5.3.5. Multidimensional Scaling . . . . . . 131

5.4. Experimental Setup . . . . . . . . . . . . . 132
5.4.1. Data . . . . . . . . . . . . . . . . . 132
5.4.2. Policy Domain Grouping . . . . . . 132
5.4.3. Policy Domain Labelling . . . . . . 132
5.4.4. Party (dis)similarity – sentence en-

coders . . . . . . . . . . . . . . . . 133
5.4.5. Evaluation . . . . . . . . . . . . . . 133

5.5. Results and Discussion . . . . . . . . . . . 133
5.5.1. Annotated Setup . . . . . . . . . . 133
5.5.2. Predicted Setup . . . . . . . . . . . 135

5.6. Conclusion . . . . . . . . . . . . . . . . . . 136
5.7. Limitations . . . . . . . . . . . . . . . . . 136
5.8. Appendix . . . . . . . . . . . . . . . . . . 139

6. Supervised Methods for Political Scaling Across
Countries and Time 145
6.1. Introduction . . . . . . . . . . . . . . . . . 146
6.2. MARPOR categories and political scales . 147
6.3. Methods . . . . . . . . . . . . . . . . . . . 148

6.3.1. Operationalization . . . . . . . . . 148
6.3.2. Problem settings . . . . . . . . . . 148
6.3.3. Dataset . . . . . . . . . . . . . . . 149
6.3.4. Models . . . . . . . . . . . . . . . . 149
6.3.5. From regression to classification with

LITs . . . . . . . . . . . . . . . . . 150

xviii


Table of Contents

6.3.6. Evaluation metrics . . . . . . . . . 150
6.4. Results . . . . . . . . . . . . . . . . . . . . 150

6.4.1. Predicting MARPOR categories . . 150
6.4.2. Computing RILE scores . . . . . . 151
6.4.3. Error analysis . . . . . . . . . . . . 151

6.5. Discussion . . . . . . . . . . . . . . . . . . 153
6.6. Related Work . . . . . . . . . . . . . . . . 153
6.7. Conclusion . . . . . . . . . . . . . . . . . . 153
6.8. Limitations . . . . . . . . . . . . . . . . . 154
6.9. Appendix . . . . . . . . . . . . . . . . . . 156

7. Evaluating Political Worldviews in Large Lan-
guage Models 161
7.1. Introduction . . . . . . . . . . . . . . . . . 162
7.2. Related Work . . . . . . . . . . . . . . . . 163
7.3. Reliability-Aware Bias Analysis . . . . . . 164
7.4. The ProbVAA Dataset . . . . . . . . . . . 165

7.4.1. Sources . . . . . . . . . . . . . . . 165
7.4.2. Policy-Domain Annotation . . . . . 165
7.4.3. Robustness to Statement Variations 166

7.5. Experimental Setup . . . . . . . . . . . . . 167
7.5.1. Models . . . . . . . . . . . . . . . . 167
7.5.2. Prompt Design . . . . . . . . . . . 167
7.5.3. Mapping Responses onto Stances . 167
7.5.4. Sampling-based Reliability Testing 168

7.6. Reliability of Model Answers . . . . . . . . 168
7.6.1. Experimental Setup . . . . . . . . . 168
7.6.2. Results . . . . . . . . . . . . . . . . 168

xix


Table of Contents

7.7. Political Consistency of Model Answers . . 169
7.7.1. Experimental Setup . . . . . . . . . 169
7.7.2. Results . . . . . . . . . . . . . . . . 170

7.8. Discussion . . . . . . . . . . . . . . . . . . 171
7.9. Conclusion . . . . . . . . . . . . . . . . . . 173
7.10. Limitations . . . . . . . . . . . . . . . . . 173
7.11. Appendix . . . . . . . . . . . . . . . . . . 178

III. Epilogue 185

8. Conclusion and Future Directions 187
8.1. Key findings and reflections . . . . . . . . 187
8.2. Limitations . . . . . . . . . . . . . . . . . 195
8.3. Outlook . . . . . . . . . . . . . . . . . . . 197

Bibliography 205

xx


Part I.

Synopsis

1


Publications and My
Contributions

This thesis is based on four scientific publications that I
co-authored together with my advisor Sebastian Pado and
many excellent researchers and PhD fellows: Dmitry Niko-
laev (University of Manchester), Neele Falk (University of
Stuttgart), Ana Baric (University of Zagreb), Gabriella
Lapesa (GESIS and University of Düsseldorf), Nico Blokker
and Sebastian Haunss (University of Bremen), and all my
colleagues from the MARDY project. I am grateful to all
my co-authors for their substantial contributions to these
pleasant and fruitful collaborations. Moreover, I thank
all the other people who were not co-authors, but who
gave valuable feedback on my work. In the following, I
detail my own contributions to each publication according
to CRediT, the Contributor Roles Taxonomy.

3


Chapter 4 corresponds to the following publication:

Tanise Ceron, Nico Blokker, and Sebastian
Padó. 2022. Optimizing text representations
to capture (dis)similarity between political par-
ties. In Proceedings of the 26th Conference
on Computational Natural Language Learning
(CoNLL), pages 325–338, Abu Dhabi, United
Arab Emirates (Hybrid). Association for Com-
putational Linguistics.

In this paper, I contributed with the conceptualization
of the study by developing the original research concepts
and executed all experiments and assessments. Sebastian
and I developed the methodology for calculating the dis-
tance between parties based on their manifestos. I worked
on the data collection from the original source. I imple-
mented both the similarity computation and trained the
classifier, followed by the evaluation at all stages. My
co-author Nico Blokker created the dataset implemented
for the evaluation of the test set of the claim classifier.
I authored the initial draft of the paper and carried out
the majority of the revisions with the support of Sebas-
tian. Throughout the process, I consulted with Niko who

4


provided valuable guidance in relation to the political sci-
ence aspects of the paper. I was responsible for creat-
ing the visualization to illustrate the method and for the
project administration throughout the entire study, from
conceptualization to reviewers’ response. This amounts to
roughly 65% of the total work.

Chapter 5 corresponds to the following publication:

Tanise Ceron, Dmitry Nikolaev, and Sebas-
tian Padó. 2023. Additive manifesto decom-
position: A policy domain aware method for
understanding party positioning. In Findings
of the Association for Computational Linguis-
tics: ACL 2023, pages 7874–7890, Toronto,
Canada. Association for Computational Lin-
guistics.

I contributed to the conceptualization of this study by
developing the overall research questions. I developed
most of the methods used for answering the research ques-
tions, while my advisor Sebastian and co-author Dmitry
assisted me in shaping them during the computational ex-
periments. I executed most of the computational experi-
ments and the analysis. Dmitry ran the models for classi-

5


fying policy issues while I was responsible for their evalua-
tion. Moreover, I created the visualization for the methods
implemented in the study. I wrote the first version of the
paper and handled most of the subsequent revisions, and
the responses to the reviewers. My contribution to this
paper amounts to approximately 60% of the total work.

Chapter 6 corresponds to the following publication:

Dmitry Nikolaev, Tanise Ceron, and Sebas-
tian Padó. 2023. Multilingual estimation of
political-party positioning: From label aggre-
gation to long-input Transformers. In Pro-
ceedings of the 2023 Conference on Empiri-
cal Methods in Natural Language Processing,
pages 9497–9511, Singapore. Association for
Computational Linguistics.

In this paper, I contributed with the conceptualization
by formulating the research questions addressed during the
study. I was very familiar with the data given my experi-
ence in the task of political positioning, so I contributed
with data curation in accessing and collecting the anno-
tated data and the ground truth used for the evaluation.
I contributed to the methodology by developing the de-
sign for modeling the task of political scaling considering

6


different real world use cases for a robust evaluation of
the methods. I also assisted in developing the evaluation
metric, given that it was not straightforward precision or
accuracy as in other supervised learning models. Dmitry
trained and evaluated the models. He also carried out
the error analysis. Finally, I contributed with writing the
first version of the text, as well as in revising and editing
the second and final version of the paper. I contributed
approximately 40% of the total work for this paper.

Chapter 7 corresponds to the following publication:

Tanise Ceron, Neele Falk, Ana Barić, Dmitry
Nikolaev and Sebastian Padó. 2024. Beyond
prompt brittleness: Evaluating the reliability
and consistency of political worldviews in LLMs.
Accepted for publication at Transactions of
the Association for Computational Linguistics
(TACL). https://arxiv.org/html/2402.17649v2

Having developed certain knowledge for the political sci-
ence field throughout my previous studies, I proposed the
initial research questions to my co-authors. In this collab-
oration, we worked closely together in the development of
the framework for reliability biases analysis. Ana, Neele

7

https://arxiv.org/html/2402.17649v2


and I partially shared the data curation of the study by
compiling the dataset together. I was responsible for con-
ducting the annotations pertaining to the policy issues and
the collection of human upper bound annotations via a
survey. I participated in the prompting selection and eval-
uation with Neele, Ana and Dmitry. Ana ran the models
for generation given the prompts. I conducted all the anal-
ysis regarding policy issues and political leaning. Neele,
on the other hand, analyzed the reliability of the mod-
els according to our framework. Sebastian contributed in
writing and in shaping the storyline. Finally, I contributed
a substantial amount in the writing of the original draft
and the revision of the paper. I led the project administra-
tion, which included the organization of meetings, keeping
track of dates, submission process, and responses to the
reviewers. Overall, my contributions amount to 40% of
the work effort invested in this paper.

Throughout my doctoral studies, I had the privilege
of collaborating with excellent researchers on related and
non-related topics to the following document. These col-
laborations could not fit in this thesis, but to ensure thor-
oughness, I include references to these papers below.

8


Tanise Ceron, Ana Barić, Andre Blessing,
Sebastian Haunss, Jonas Kuhn, Gabriella Lapesa,
Sebastian Padó, Sean Papay, and Patricia F.
Zauchner. 2024, June. Automatic Analysis
of Political Debates and Manifestos: Successes
and Challenges. In Conference on Advances in
Robust Argumentation Machines (pp. 71-88).
Cham: Springer Nature Switzerland.

Nico Blokker, Tanise Ceron, Andre Bless-
ing, Erenay Dayanik, Sebastian Haunss, Jonas
Kuhn, Gabriella Lapesa and Sebastian Padó.
2022. Why justifications of claims matter for
understanding party positions. In Proceedings
of the 2nd workshop on computational linguis-
tics for political text analysis - KONVENS, Pots-
dam, Germany.

Maximilian Maurer, Tanise Ceron, Sebastian
Padó, and Gabriella Lapesa. 2024. Toeing the
Party Line: Election Manifestos as a Key to
Understand Political Discourse on Twitter. In
Findings of the Association for Computational
Linguistics: EMNLP 2024, pages 6115–6130,

9


Miami, Florida, USA. Association for Compu-
tational Linguistics.

Tanise Ceron, Nhut Truong, and Aurelie Her-
belot. 2022. Algorithmic Diversity and Tiny
Models: Comparing Binary Networks and the
Fruit Fly Algorithm on Document Representa-
tion Tasks. In Proceedings of The Third Work-
shop on Simple and Efficient Natural Language
Processing (SustaiNLP), pages 17–28, Abu Dhabi,
United Arab Emirates (Hybrid). Association
for Computational Linguistics.

10


1. Introduction

“For your own sanity, you
have to remember that not
all problems can be solved.
Not all problems can be
solved, but all problems
can be illuminated.”

Ursula Franklin, in The
Real World of Technology,

1999

People are shaped by numerous factors, including their
personal experiences, cultural backgrounds, education, so-
cial environments, access to information, and personality
traits. These diverse influences lead to people holding var-
ious viewpoints and interpretations of the world. An envi-
ronment where this diversity of opinions can thrive is not
only crucial to accommodating a heterogeneous society,

11


CHAPTER 1. INTRODUCTION

but it is also essential to ensure the effective functioning
of democracy (Balkin, 1995, 2017). This type of environ-
ment allows individuals to articulate their values, engage
with different perspectives, and learn from and share their
own views with others.

Political opinions play a particularly significant role among
the many types of opinions shaped by these factors. Given
their impact on governance and societal norms, political
opinions permeate multiple levels of our lives, including in-
terpersonal social interactions, professional settings, and
the political arena itself.

In this thesis, I categorize political opinions in three
levels. At the most fine-grained level, they are stances
(i.e. positions) taken by individuals regarding policies. As
the example in Figure 1.1 shows, two citizens disagree on
whether citizenship should be entitled by birth or through
long-term residence or parental boundaries. The second is
the level of policy issues. It encompasses a set of policies
related to a broader set of beliefs. Figure 1.1 illustrates the
example of migration (Wlezien, 2005; Green-Pedersen and
Krogstrup, 2008). This level requires some internal con-
sistency, given that people, for example, would generally
agree with policies that are more in favor of open bor-

12


Figure 1.1.: Political opinions categorized into three levels
of granularity. The colors represent the level.

ders or more in favor of restrictive migration policies. The
third and broadest level is at the ideological level. This
level refers to preferences towards sets of policy issues that
belong to the ideology under analysis. One example is the
left–right scale, which encompasses policy issues, namely
migration, economy, and government expenditure. The
overall political opinions of individuals or politicians can
be captured by placing them on this type of ideological
scale.

In this thesis, I work with political opinions mainly at
the ideological and policy issue level both in the politi-
cal arena and potential political opinions reproduced by
LLMs. Figure 1.2 illustrates political opinions in these
two contexts. The colors represent the spectrum of exist-
ing political opinions. Each political party on the left side

13


CHAPTER 1. INTRODUCTION

endorses some existing political opinions. They vary con-
siderably because political parties act as citizens’ repre-
sentatives, and in a multiparty system, they often mirror
the diverse range of opinions held by the community at
large. I explore methods for mining the political opinions
endorsed by political entities in texts. I discuss more de-
tails in Section (§ 1.1). While in the first part of this thesis,
the object of study is texts, as Figure 1.2 illustrates, the
object of study becomes large language models (LLMs) in
the second part. The right side of the figure shows the po-
litical opinions that large language models tend to repro-
duce. Considering that AI systems are integrated into the
daily lives of numerous citizens, there is growing relevance
to analyze the types of biases embedded in these models.
This understanding allows us to make well-informed deci-
sions on designing and implementing applications for final
users. Current models, for example, are one-size-fits-all
models that could incorporate a limited number of opin-
ions, as shown by the colors in Figure 1.2. Therefore, this
thesis develops methods for extracting political opinions
from LLMs. These methods help us gauge the diversity of
political opinions embedded in LLMs. I discuss this aspect
of LLMs in Section 1.2.

14


Figure 1.2.: This diagram represents the political opinions
of political parties and the ones manifested
or reproduced by in large language models
(LLMs). The colors of the flags represent
the distinct political opinions that the par-
ties hold. The colors in the squares represent
the spectrum of existing opinions. LLMs may
run the risk of reproducing a limited num-
ber of opinions, as shown by the colors of the
squares.

15


CHAPTER 1. INTRODUCTION

1.1. Opinions in the political arena

The political arena represents an environment where dif-
ferent political opinions are given space to flourish and
compete with one another. Parties are formed by individ-
uals who share similar political opinions. They then com-
pete for the electorate’s attention to gain support from
people who potentially share similar ideas. They artic-
ulate their opinions through various genres and modali-
ties such as parliamentary speeches, public speeches, as-
semblies, forums, roll call votes, manifestos, social me-
dia posts, and media coverage. Having a space where
parties express their preferences and ideologies and com-
pete for the electorate’s attention is essential for democ-
racies to thrive. Given its importance, this process has
consistently attracted scholarly attention in political sci-
ence, and the area has become known as party competition
(Stokes, 1963). Understanding the dynamics of party com-
petition is relevant because the results of these dynamics
affect policy decisions, political engagement levels, and the
quality of political representation (Baumann et al., 2021).

At the intersection of party competition and political
opinions lies the line of research that investigates the po-

16


1.1. OPINIONS IN THE POLITICAL ARENA

sitioning of political actors. This research focus is relevant
for understanding the factors influencing voter choices in
elections, the decision-making behaviors of political par-
ties once they become representatives in certain political
roles, and matches and mismatches between the former
and the latter (Benoit and Laver, 2006). Moreover, it is
important to keep track of the extent to which parties
change their agendas (and strategies) across time (König
et al., 2013). Lastly, it can be employed to observe ideo-
logical shifts (McDonald et al., 2007) or to identify the po-
litical issues that political parties are most strongly cam-
paigning across different countries (Seeberg, 2017).

One way of analyzing the positioning of political ac-
tors is by extracting and characterizing their opinions via
their ideologies and policy issue preferences. In this thesis,
I investigate approaches for automatically mining politi-
cal opinions from political texts applied in the context
of party positions. The focus is not on identifying the
individual policies and the stances of the parties towards
single policies (the most fine-grained level shown in Figure
1.1). I also do not focus on detecting and categorizing the
argumentation of parties. My primary goal is to develop
methods that extract party positions as an aggregation of

17


CHAPTER 1. INTRODUCTION

stances. In other words, the results of these tasks provide
insights into how close parties or political entities are to
one another in relation to policy issues and ideologies.

I develop and evaluate methods for extracting parties’
opinions from manifestos – electoral programs released by
the parties themselves at the beginning of their election
campaigns. This task has traditionally been called politi-
cal positioning or political scaling in the political science
and NLP literature (Laver et al., 2003; Benoit and Laver,
2006; Slapin and Proksch, 2008; Glavaš et al., 2017). How-
ever, prior research has not explored the potential of text
representations fine-tuned for specific domains. Addition-
ally, it has not focused on capturing the scaling of parties
in texts that specifically contain information about those
ideological scales. Lastly, no study has prior to this thesis
aimed at identifying party positions end-to-end at a more
detailed level, such as within specific policy issues.

The primary contributions of this thesis address the pre-
viously mentioned research gaps. They lie in the develop-
ment and evaluation of supervised and unsupervised meth-
ods for the tasks of political positioning and political scal-
ing. I develop new methods to build more powerful text
representations for the political domain that enhance the

18


1.2. POLITICAL OPINIONS IN LARGE LANGUAGE
MODELS

performance of our tasks. I design an end-to-end pipeline
to capture the scaling of parties at the level of policy is-
sues. Finally, I propose methods to capture the scaling of
parties in settings across several countries and languages.
More detailed information on the tasks, methods, findings,
and discussion are found in Chapter 2.

1.2. Political opinions in large

language models

The advances in the technology underlying large language
models (LLMs) have made it possible for many people to
interact with systems powered by these models. These ap-
plications have become easily accessible and widely used,
given their benefits in productivity and creativity, becom-
ing pervasive in the private and work lives of users (Wolf
and Maier, 2024). This user-friendliness is achieved thanks
to LLMs’ ability to produce text based on a free natural
language prompt, resulting in “universal” models that are
task-agnostic. It enables users to easily interact with ap-
plications by giving “human-like” written instructions to
perform several tasks such as text generation, summariza-

19


CHAPTER 1. INTRODUCTION

tion, classification, and question-answering – all in one
system.

This growing interaction raises concerns regarding the
manifestation of harmful biases embedded in them – which
has drawn the attention of academic research and the pub-
lic sphere 1. In the context of political opinions, I argue
that harmful biases take place when the output of models
reinforces a limited number of viewpoints which pertain to
only few groups in society. Aligned with the earlier discus-
sion on fostering a democratic culture by creating a space
for diverse opinions and beliefs to coexist (Balkin, 2017),
the widespread presence of these systems underscores the
importance of understanding the political opinions they
reflect. These opinions manifest as biases encoded in the
models, which may (or may not) influence the results of
the aforementioned downstream tasks. Therefore, I ar-
gue that the first step is determining the types of political
biases that are encoded in LLMs.

In this thesis, I draw from the accumulated knowledge
on building and evaluating methods for the tasks of po-

1Cf. https://www.washingtonpost.com/technology/2023/
08/16/chatgpt-ai-political-bias-research/ and
https://www.forbes.com/sites/emmawoollacott/2023/
08/17/chatgpt-has-liberal-bias-say-researchers/

20

https://www.washingtonpost.com/technology/2023/08/16/chatgpt-ai-political-bias-research/
https://www.washingtonpost.com/technology/2023/08/16/chatgpt-ai-political-bias-research/
https://www.forbes.com/sites/emmawoollacott/2023/08/17/chatgpt-has-liberal-bias-say-researchers/
https://www.forbes.com/sites/emmawoollacott/2023/08/17/chatgpt-has-liberal-bias-say-researchers/


1.2. POLITICAL OPINIONS IN LARGE LANGUAGE
MODELS

litical positioning and scaling. The main research ques-
tions guiding our investigation in the second part of this
thesis include how to robustly evaluate LLMs for biases
and what political biases these models encode. Specif-
ically, I investigate the extent to which the answers of
chat-instructed LLMs are reliable when prompts are refor-
mulated to control for prompt brittleness. Finally, I also
evaluate whether LLMs reproduce consistent preferences
towards left–right orientation and specific policy issues.
The latter aspect which is a more fine-grained analysis of
political biases has not been previously investigated.

Among the main contributions, I formulate definitions
for political opinions at three granularity levels: policy
preference, policy issue preference, and ideological posi-
tioning (described in depth in Chapter 3). It facilitates
the design of methods for identifying political biases in
LLMs. Next, we propose a framework for evaluating the
reliability of the answers generated by LLMs. This frame-
work can be implemented to evaluate other types of biases
by taking prompt brittleness into account. In the study
from this part of the thesis, we compile and annotate a
dataset, ProbVAA, which is valuable for investigating po-
litical opinions in LLMs at the refined level of policy is-

21


CHAPTER 1. INTRODUCTION

sues. Finally, we analyze models for the type of political
biases encoded in these models, both in terms of left–right
scaling and in regard to specific policy issues.

1.3. Thesis Outline

The manuscript is structured as described below.
Following this Introduction Chapter, Chapter 2 delves

deeper into mining political opinions from texts. I define
the tasks of political positioning and political scaling more
thoroughly. I describe previous work conducted by the po-
litical science and the natural language processing (NLP)
communities. I address the research gaps and the data
used throughout the experiments. Then, I discuss the ad-
vantages and disadvantages of positioning and scaling for
analyzing political parties, drawn from the findings of our
experiments. Finally, I conclude with the contributions
made by this thesis in terms of methods and analysis for
the task of political positioning and scaling.

Chapter 3 focuses on mining and evaluating political
opinions in LLMs. I discuss the related work concerning
general biases in pre-trained language models and how the
need to evaluate political biases has become more predom-

22


1.3. THESIS OUTLINE

inant. I discuss why LLMs lack reliability in their answers,
what problems this causes for our evaluation, and the need
to build a more robust bias evaluation. Then, I focus on
defining political biases, previous studies in this area, and
the research gaps. Finally, I highlight our contributions in
relation to methods for bias evaluation and the analysis of
political biases embedded in these models.

Chapters 4, 5, 6 and 7 present the studies in the form
of publications that contributed to this thesis.

Finally, Chapter 8 summarizes the answers to the re-
search questions mentioned in Sections 1.1 and 1.2. Next,
I consider the future of research in positioning and scal-
ing, outlining the next steps for enhancing models and
increasing their interpretability. Additionally, I highlight
the necessity of further advancing research in evaluating
LLMs for biases. Finally, I explore the societal implica-
tions that should be considered when implementing sys-
tems for downstream tasks and suggest how the NLP com-
munity can contribute to addressing these issues.

23


2. Political Opinions in
Texts

This chapter details the research program outlined in Sec-
tion 1.1. I first describe the tasks from the political science
perspective and then dive into them through the lens of
computational social science (CSS).

2.1. Political opinions at low

dimensionality

As outlined in Section 1.1, diverging positioning of po-
litical parties creates an environment where a variety of
political views can come together, fostering party compe-
tition. This environment provides a basis for individuals
to choose the party that aligns most closely with their

25


CHAPTER 2. POLITICAL OPINIONS IN TEXTS

Figure 2.1.: Example of low-dimensional scaling based on
(Benoit and Laver, 2006).

views. This endorsement of different opinions has given
space to the study of the positioning of political actors,
which is crucial for a number of reasons. Firstly, it en-
ables the understanding of parties within the context of
party competition – how parties relate to one another and
what relevant topics of discussion are for them. In ad-
dition to that, studying this phenomenon is crucial for
comprehending the motivations behind voters’ choices in
elections, the decisions of political parties when they are in
power (Benoit and Laver, 2006), and the strategies of par-
ties to gain terrain during their campaign (Meguid, 2005;
Green and Hobolt, 2008).

26


2.1. POLITICAL OPINIONS AT LOW DIMENSIONALITY

One of the approaches used for investigating party po-
sitions reduces the information regarding a given actor
(politician or political party) into a low-dimensional scale
that commonly represents ideologies. This is illustrated
in the example in Figure 2.1 following Benoit and Laver,
2006, p. 46. In the example, the three parties (social
democrat, conservative, and liberal) are placed onto a two-
dimensional space representing their position within the
left–right and libertarian–conservative ideologies. Besides
the scales of the example, a wide variety of scales have
been proposed and long debated in the literature (Laver
et al., 2003; Slapin and Proksch, 2008; Diermeier et al.,
2012; Lauderdale and Clark, 2014; Barberá, 2015). Some
scales are based on deductive approaches rooted in polit-
ical theory and philosophy (Jahn, 2011), while others are
more inductive data-driven approaches (Gabel and Hu-
ber, 2000; Albright, 2010; Rheault and Cochrane, 2020).
Whereas some researchers have for years focused on the
left–right scale (Volkens et al., 2021), others argue that,
in order to understand the political spectrum in a country
more thoroughly, it is necessary to look into several ideo-
logical scales and have a multidimensional analysis of the
parties (Bakker and Hobolt, 2013; Rovný, 2012b).

27


CHAPTER 2. POLITICAL OPINIONS IN TEXTS

Placing parties on a scale helps political scientists un-
derstand the political landscape more easily because of its
low dimensionality (Heywood, 2021). The scaling of par-
ties offers a fundamental framework for analyzing party
competition and for establishing the connection between
citizens and political parties more easily (Huber and In-
glehart, 1995). Moreover, it allows researchers to monitor
parties under the same set of policies and understand how
their positioning changes across years – e.g. whether par-
ties are moving more to the left or right.

2.2. Fine-grained scaling at low

dimensionality

Ideological scales are one way of analyzing parties. An-
other focus explores the fine-grained differences and sim-
ilarities between parties, which are usually policy issue-
specific. This type of analysis is important to understand
which issues explain the value retrieved from the scaling
of parties. Figure 2.2 illustrates an example of the task of
policy issue scaling in Figure 2.2. In this case, expert an-
notators from the Chapel Hill Survey manually place the

28


2.2. FINE-GRAINED SCALING AT LOW
DIMENSIONALITY

Figure 2.2.: Example of policy issue positions taken from
the Chapel Hill Expert Survey (CHES) based
on the German context in 2019.

main German parties onto a scale regarding their position-
ing in the issues of “spend vs tax” (i.e., about the expen-
diture and collection of tax money) and “immigration pol-
icy”1. Rovný (2012a) conducted a multidimensional analy-
sis within different issues with information extracted from
several surveys. The study discusses how radical right po-
litical parties strategically differentiate their views on sec-
ondary issues (unrelated to the economic domain) to boost
support among a wider range of voters. Green-Pedersen
(2007) argues that investigating party positions with re-

1The plot was adapted from their visualization tool https://
chesdata.shinyapps.io/Shiny-CHES/

29

https://chesdata.shinyapps.io/Shiny-CHES/
https://chesdata.shinyapps.io/Shiny-CHES/


CHAPTER 2. POLITICAL OPINIONS IN TEXTS

spect to specific issues is increasingly relevant within the
context of Western European politics to understand the
reasons why some issues (e.g. refugees and immigrants)
are more central in a given country and time, and how
parties are strategically placing more attention to them to
gain more support from voters.

Identifying party positions on specific issues also sheds
light on the framework of saliency theory. It investigates
how parties selectively emphasize issues, and it posits that
certain parties only adopt clear stances on issues they re-
gard as worthy of attention Sio and Weber (2014). Alter-
natively, it is also relevant for comparative studies across
countries and regions of the globe. For example, the afore-
mentioned Chapel Hill Expert Survey (CHES) is a large-
scale survey involving the manual effort of expert anno-
tators from many countries that investigate party posi-
tions (Jolly et al., 2022). The survey contains party posi-
tions on ideological scales such as left–right, libertarian–
authoritarian, and on specific policies such as deregula-
tion, immigration policy, multiculturalism, urban-rural,
environment, and European integration. Given that the
annotations are standardized, it is, in theory, possible to
compare parties across countries and time. Another great

30


2.3. MODELING POLITICAL OPINIONS

effort from the community to investigate parties across
countries has been the development of the codebook and
annotations in the framework of the Manifesto Research
on Political Representation project (MARPOR) (Burst
et al., 2021), formerly known as the Comparative Mani-
festos Project (CMP). I discuss more details about MAR-
POR later in §§ 2.5.1.

2.3. Modeling political opinions

2.3.1. Motivation

Traditionally, the opinions of political actors have been
investigated through a series of methods such as surveys
(Rovný, 2012a; Jolly et al., 2022), the answers of parties
to voting advice applications (VAAs) (König and Nyhuis,
2020), or by annotating large amounts of data from man-
ifestos such as MARPOR (Burst et al., 2021). These ap-
proaches demand significant resources in terms of trained
personnel and funding. It requires field experts who are
familiar with the country’s political spectrum to carry out
surveys and annotations. The difficulty of the task of
annotation is also a factor to take into account. In the

31


CHAPTER 2. POLITICAL OPINIONS IN TEXTS

case of MARPOR, studies also discuss the low reliability
of coders (Mikhaylov et al., 2008) except in cases where
annotators are well-trained (Lacewell and Werner, 2013).
Others contend that the task’s complexity is further accen-
tuated by the intricacy of the MARPOR codebook, which
is highly detailed and domain-specific (Gemenis, 2013).
These studies indicate that scaling up this process manu-
ally within a single country is very challenging, and doing
so across multiple countries is even more complex due to
variations in political landscapes and languages.

Time is also an important factor in this case. Consider
the scenario in which political scientists would like to an-
alyze manifestos immediately after they are published at
the start of the election campaign to gain insights on the
programs of the parties and what has changed from the
previous elections in a short amount of time. This is not
possible manually because annotations take a long time to
be carried out and evaluated for quality in terms of inter-
annotator agreement. A system capable of automatically
identifying the opinions of political parties could prove
highly beneficial.

Taking these factors into account, automating the task
of identifying political opinions can offer significant ad-

32


2.3. MODELING POLITICAL OPINIONS

vantages. For example, funding and resources could be
used for hiring more annotators to annotate a small set
of manifestos rather than all manifestos. This small set
would be considered high-quality data that is then used
for training. This circle would streamline the process, and
potentially ensure a more accurate analysis. Automation
would handle large volumes of data, reduce the annotation
time, and provide timely updates. This makes it a useful
tool for political scientists who require reliable and up-to-
date information on party positions for analysis. Ideally,
citizens would also benefit from the automation of this
task. For instance, the results of this analysis can be used
in VAAs – applications that estimate the alignment of vot-
ers with parties or candidates running for the government.
The retrieved information about party positions and can-
didates can be added to these applications so that they
are not only reliant on the answers provided by the par-
ties themselves. This could add to the trustworthiness of
these applications since the information would come di-
rectly from what is written in the manifestos written by
the parties. At the same time, it places significant expec-
tations on the faithfulness of the extraction methods.

33


CHAPTER 2. POLITICAL OPINIONS IN TEXTS

2.3.2. Computational approaches for

mining political opinions

In data-driven computational social sciences (CSS), text
becomes the primary source of information for analyzing
and understanding phenomena in society (Zhang et al.,
2020). In our case, it is a source from which to extract
party positions automatically. Existing approaches for
this purpose have been based on textual information made
available by parties or their members, such as manifestos,
parliamentary speeches, and social media posts such as
those posted on X (formerly Twitter).

Earlier approaches for automatically mining political
opinions are based on word counts. Laver et al. (2003) pro-
posed the Wordscores approach. It consists of two parts,
first assigning a probabilistic score to words that are found
in the pre-defined reference texts. The reference texts rep-
resent the extremes of positioning, and they can represent
many issues or a single issue. The word scores are then
used to assign a weighted score to unseen documents. This
approach compares the frequency of each word in the ref-
erence texts and contrasts these frequencies with the word
counts from the texts under analysis. This approach has

34


2.3. MODELING POLITICAL OPINIONS

two main drawbacks. Firstly, it highly depends on refer-
ence texts as the “gold standard” of party positions. Se-
lecting reference texts requires expertise and agreement
about what characterizes extreme policy positions, mak-
ing the implementation of this method more challenging.
Secondly, it presupposes that the political discourse re-
mains relatively static over time because it does not take
the relevance of words into account. To deal with these
drawbacks, Slapin and Proksch (2008) proposed Wordfish.
It employs a Poisson distribution to infer a unidimensional
document scale based on the distribution of word frequen-
cies. The words are considered proxies for ideological po-
sitions. A similar approach is implemented by Lauderdale
and Herzog (2016) for party positions based on parliamen-
tary speech instead of manifestos.

Both aforementioned approaches, however, do not take
semantic and syntactic relations into account because they
are bag-of-words-based models. For example, even though
the terms “foreigner” and “migrant” might be used inter-
changeably, this type of model does not consider the simi-
larity in the term because the token type is different. If the
reference texts used the former term and the unseen texts
the latter term, the approach does not take it into account

35


CHAPTER 2. POLITICAL OPINIONS IN TEXTS

as similar words for the scoring. To compensate for that,
Glavaš et al. (2017) and Nanni et al. (2022) investigate po-
sitioning in parliamentary speeches in the European Union
with the use of word embeddings – which take the seman-
tic relations such as the similarity between “foreigner” and
“migrant” into account. They create a multilingual seman-
tic space with speeches in different languages, then they
retrieve the positioning by aligning word embeddings ac-
cording to the highest similarity of words between pairs
of documents. They scale the alignment scores into sin-
gle values per party with a graph-based algorithm. The
results are evaluated against ground truths referring to
left–right and European integration scores. Results show
that this method surpasses the performance of Wordfish,
the prevailing standard for political scaling until then.

In a similar line of research, Rheault and Cochrane
(2020) utilize party embeddings derived from word em-
beddings, which are enhanced with political metadata and
fine-tuned on parliamentary corpora. They employ princi-
pal component analysis (PCA) to reduce the dimensions of
these aggregated party embeddings to determine party po-
sitions. Their findings reveal that the positions extracted
from parliamentary speeches align with manifesto posi-

36


2.4. TASKS: POLITICAL SCALING VS POLITICAL
POSITIONING

tions on a left-to-right scale for parties in English-speaking
countries namely Great Britain, Canada, and the United
States.

2.4. Tasks: Political scaling vs

political positioning

Although the existing literature does not address this issue
directly, this thesis contends that the tasks of political
scaling and positioning of political actors have some small,
but important divergences. They have the same objective
which is to reduce information into one dimension, but
they differ in what they are modeling, leading to variations
in the evaluation and the applicability. We discuss these
points below.

2.4.1. Scaling

The task of political scaling exclusively places political par-
ties or actors into a scale that reduces pre-defined policy
issues into one dimension, which is usually ideologically
motivated. For example, left–right is a dimension that ar-
guably contrasts a more progressive and redistributive role

37


CHAPTER 2. POLITICAL OPINIONS IN TEXTS

for the state to a more conservative and market-oriented
role (Budge et al., 2001). Another example of dimension
is libertarian–authoritarian where the latter upholds tra-
ditional morality, law and order, and cultural uniformity,
while the former supports cultural and ethnic diversity and
advocates for individuals’ freedom (Duch and Strøm, 2004;
Bakker and Hobolt, 2013). As it can be observed, these
scales aggregate certain pre-defined policy issues which are
relevant for that dimension, such as law and order and lib-
eral society. In this way, the position of the political actor
is reduced to one dimension according to these policy is-
sues.

The MARPOR project has extensively worked on this
task of scaling parties to left–right with annotated data in
what is known as the RILE score (Budge, 2013; Volkens
et al., 2013). They define 24 categories from their code-
book that belong to either the left or the right (Cf. Table
1 of Section 6.2 in Chapter 6) and compute the RILE score
as the difference between the number of times that these
two categories proportionally occur in the manifestos. The
score gives a final value between -1 and 1 and tells how left
or right a given party is according to their manifesto. This
measure has been repeatedly criticized because it is inflex-

38


2.4. TASKS: POLITICAL SCALING VS POLITICAL
POSITIONING

ible and not easily comparable over time (Flentje et al.,
2017). The Chapel Hill Expert Survey mentioned in Sec-
tion 2.2 also places parties into a left–right scale. In their
approach, expert annotators assign a score between 0 and
10, and the final scale is determined by averaging these
scores across all annotators. Prior to this thesis, only one
study has directly attempted to automate the task of po-
litical scaling. Subramanian et al. (2018) use a two-step
approach with hierarchical bi-LSTM to predict both fine-
and coarse-grained positions, and then convert them into
scaling scores with probabilistic soft logic. Besides them,
we argue that the other aforementioned methods only per-
form the task of political positioning, as explained later in
Section 2.4.3.

2.4.2. Scaling at a policy issue level

Although not discussed in the political science literature,
I argue in this thesis that we can also scale parties or po-
litical actors in specific policy issues. The main reason for
arguing in favor of this definition is that there is also a pre-
defined scale in which we analyze parties. For example, we
can place parties into a scale that indicates whether they
are rather in favor or against migration or government ex-

39


CHAPTER 2. POLITICAL OPINIONS IN TEXTS

penditure (Cf. example in Figure 2.2). When examining a
particular policy issue, we focus on a selected group of re-
lated policies. While this method is similar to ideological
scaling, it differs in specificity - ideological scaling consid-
ers policies across different issues, whereas this approach
concentrates on policies within a single, specific topic area.

This type of analysis is more fine-grained because it al-
lows us to understand along which dimensions parties align
or diverge the most. It sheds light on the reasons why
some parties are closer to others at an aggregated level
of the analysis, adding a layer of interpretability into the
global positioning of parties. Additionally, by segmenting
the texts according to policy issues, it becomes feasible to
incorporate the concept of salience into the analysis. The
extent of the discussion on a particular issue can serve as
an indicator of its relative importance to a party, com-
pared to other issues and to the priorities of other parties
(Epstein and Segal, 2000).

The text segmentation is not part of the task of polit-
ical scaling, but it is necessary to investigate the issues
of interest. Studies that focused on policy issues in po-
litical positioning have so far not proposed a method to
segment texts automatically. They either manually adapt

40


2.4. TASKS: POLITICAL SCALING VS POLITICAL
POSITIONING

the reference texts to texts corresponding to specific pol-
icy issues (Laver et al., 2003) or they manually identify
spans of manifestos that discuss policy issues such as the
economy (Slapin and Proksch, 2008). Another strategy
is to take the entire text into account and evaluate the
correlation with ground truths that position parties on a
European integration scale (Glavaš et al., 2017). This last
work, however, does not provide insights on the specific
issue of European integration, it only measures whether
party positions at an aggregated level correlates with the
positioning within the specific issue of European integra-
tion.

2.4.3. Political positioning

The task of political positioning, on the other hand, is
not dependent on a scale that encompasses a limited and
pre-defined set of policy issues. Its objective is to identify
party positions based on an undefined set of policy issues
or policies. For example, when analyzing economic texts,
we assess the extent of similarity or dissimilarity among
political parties regarding this specific issue. Conversely, if
the analysis encompasses the entire manifesto, we evaluate

41


CHAPTER 2. POLITICAL OPINIONS IN TEXTS

the extent to which parties align or diverge across various
issues addressed in their manifestos.

On the computational methods discussed in Section 2.3.2,
they all address political positioning. Wordscores attempts
to do it by basing their left–right dimension on reference
texts (Laver et al., 2003) while Wordfish computes party
positions based on the entire manifestos and compares the
frequency of words between pairs of parties (Slapin and
Proksch, 2008). Slapin and Proksch (2008) even mentions
“using the entire manifesto text as data, we expect this
dimension to correspond to a left-right politics dimension,
which we confirm by comparing the results to other es-
timates of left-right positions”, meaning that they expect
the results to correlate with a certain scale (left–right),
but it is not a direct scaling on this specific dimension. I
argue that it can only be considered scaling if their method
first selects the parts of the text that belong to the left–
right dimension, and then performs scaling on these spans
of text. Consider the case where countries do not have a
predominantly strong left–right positioning across policy
issues, the results of the analysis would not highly corre-
late with the left–right scale, for example. Indeed, their
study is limited to the context of parties in Germany.

42


2.4. TASKS: POLITICAL SCALING VS POLITICAL
POSITIONING

In the same manner, the methods proposed by Rheault
and Cochrane (2020) and Glavaš et al. (2017) also measure
party positions rather than their scaling. They run their
analysis on entire documents, not selecting parts of the
text relative to issues that are related to the left–right
scale. They, however, evaluate their results against this
scale.

In the political science literature, a similar approach to
the task of political positioning is the estimation of ideal
points. The focus is not only on the aggregation of pref-
erences from parties, but also from lawmakers. A law-
maker’s political stance can be represented by a numeri-
cal value called an ideal point, which distills their politi-
cal preferences into a single quantitative measure. In the
past, only roll call votes were utilized for this type of anal-
ysis, but later on textual information was added to make
the analysis more contextual and meaningful. Vafa et al.
(2020), for example, performs text-based ideal point esti-
mate with Tweets from US presidential candidates. Ger-
rish and Blei (2011) and Lauderdale and Clark (2014) in-
stead explore legislative texts and opinion texts from the
U.S. Supreme Court respectively. These studies rely on
bag of words combined with topic modeling methods.

43


CHAPTER 2. POLITICAL OPINIONS IN TEXTS

Characteristic Scal. Posit.
Need of pre-defined set of policies 4 □
Need of text segmentation 4 □
Interpretation of positions , /
Generalization across time & country , /
Inclusion of new topics / ,

Table 2.1.: Main differences between the tasks of position-
ing and scaling drawn from our findings. Scal.
stands for scaling and Posit. for positioning.

2.4.4. (Dis)Advantages of scaling and

positioning

Both approaches to analyzing political actors – scaling and
positioning – present distinct advantages and disadvan-
tages. Table 2.1 summarizes the differences drawn from
our findings while working on methods for both tasks. We
discuss these aspects in detail in the subsequent part of
this subsection.

Firstly, scales are not consistently defensible across dif-
ferent countries or historical periods. Political science re-
search has demonstrated the sensitivity of the left–right
scale to variations in both geographical and temporal con-
texts (König et al., 2013; Flentje et al., 2017). Similarly,
expert agreement in what constitutes one side of the scale

44


2.4. TASKS: POLITICAL SCALING VS POLITICAL
POSITIONING

and the other has been proven to be debatable and con-
troversial among political scientists (Slapin and Proksch,
2008; Flentje et al., 2017).

On the other hand, having a scale with pre-defined pol-
icy issues can be an advantage because it can potentially
create a basis for comparison of results across time and
countries (provided that the policy issues that constitute
a scale have been agreed on). When measuring a given
scale – which encompasses the same issues – it is possible
to compare the results of the values across time. A com-
parable analysis can be conducted across various coun-
tries to observe the ideological alignment of parties on an
international scale. This approach can yield valuable in-
sights, particularly when examining political blocs such as
the European Union. In the case of the positioning task,
the results might not be reliably compared across differ-
ent countries because we cannot ensure that parties debate
similar policies across nations (unless we classify them be-
forehand). Each country’s political context and agenda
can lead to significant variations in the issues being pri-
oritized and discussed. Similarly, the positioning cannot
be easily transferred across different text genres. This is
both because the issues and policies being addressed may

45


CHAPTER 2. POLITICAL OPINIONS IN TEXTS

differ. Additional factors related to stylistic differences in
text and the inherent political characteristics of each coun-
try might be a challenge too (Burnham, 2024). However,
the latter point is also a problem for the task of political
scaling.

Another drawback of scaling is the challenge of incor-
porating newly identified policy issues. This was, for in-
stance, the case with many COVID-19-related policies which
did not exist in the MARPOR categories. This is both
challenging for annotators who need to be retrained to
annotate unseen data, and computational models to gen-
eralize over new topics in case they have not been anno-
tated yet. On the other hand, political positioning is able
to avoid this problem because it can work with any text
at hand without additional annotations. For example, all
parties mention COVID-19 related policies in their man-
ifestos following the pandemic’s start, given its relevance
to the political spectrum.

A disadvantage of political positioning is that it is harder
to interpret because dimensions are not readily defined.
When all the information is taken into account, such as
when the entire manifestos are processed, it becomes chal-
lenging to explain what is causing the (dis)similarity be-

46


2.4. TASKS: POLITICAL SCALING VS POLITICAL
POSITIONING

tween parties. In the scaling approach, the assessment is
better informed because the policy issues involved in the
analysis are more clearly defined and fewer in quantity.
This, however, can be mitigated in the political position-
ing if the text for the analysis is separated into extracts
that belong to defined policy issues (e.g. economy or mi-
gration).

Finally, given that scaling is based on a pre-defined set
of policy issues, the text under analysis needs to contain
only this set of policy issues. This requires segmenting
the text into sections that exclusively address these policy
issues, achievable through either classification or cluster-
ing. In contrast, the task of positioning does not require
differentiating which parts of the text should be included
in the analysis.

Challenges in policy issue scaling. The new challenge
in this setup is the limited data availability. In other
words, there is usually not a substantial amount of data
from the same time period that addresses a specific policy
issue. The scarcity of data may lead to increased sensi-
tivity due to the reduced number of data points in the
pairwise similarity of sentences between parties. Alterna-

47


CHAPTER 2. POLITICAL OPINIONS IN TEXTS

tively, another consequence is the difficulty in capturing
nuanced shifts in party positions over time. When data
is sparse, especially for less prominent issues, the model
may struggle to accurately reflect subtle changes in po-
sitioning, leading to potential oversimplifications of the
analysis. Still related to data, parties may have differ-
ent data distribution depending on the policy issue. As
the saliency theory suggests, parties emphasize by writing
more about topics that are very relevant for them, there-
fore there might be policy issues where there is extensive
documentation for some parties in comparison to others,
making the modeling of the positioning less reliable. When
the positioning analysis is done at an aggregated level,
these differences cancel each other out, resulting in less
variance in the results. Finally, evaluating positioning at
a more fine-grained level is challenging due to the difficulty
in finding ground truth data that precisely corresponds to
the same policy issues. As a result, we might need to rely
on a manual inspection of the results.

As previously noted, text segmentation is not inherently
part of the positioning task, but it may be required de-
pending on the context so that the positioning can take
place in each of these segments. Text segmentation can be

48


2.4. TASKS: POLITICAL SCALING VS POLITICAL
POSITIONING

done via classification or by clustering spans of the text.
In the former case, there is the need for a substantial vol-
ume of annotations to train a classifier that can predict
the policy issue that a text span belongs to. The lat-
ter requires annotations for the evaluation of the resulting
clusters. Alternatively, this process can be done manually
by selecting parts of a text that belong to certain policy
issues.

Scaling or positioning? Whether to implement posi-
tioning or scaling depends on the scope of the application
and the data available. One might be interested in spe-
cific scales and how parties shift within a predefined set of
issues. Scaling is more appropriate in this case while po-
sitioning is more appropriate if the scope is to understand
how close parties are to one another in general – without a
reference scale. Alternatively, there could be a significant
amount of text pertaining to a new emerging issue that
has not yet been included in a codebook or there is not
enough annotated data to segment texts correctly. In both
cases, positioning is better suited for implementation.

49


CHAPTER 2. POLITICAL OPINIONS IN TEXTS

2.5. Annotation and text

representation for mining

political opinions

2.5.1. Annotated Data

Party manifestos or electoral programs are some of the
most informative sources regarding parties’ policies. They
outline parties’ views, intentions, and motives for the up-
coming years. Since these texts aim not only to inform but
also to persuade potential voters in a competitive environ-
ment (Budge et al., 2001), they offer valuable insights into
the parties’ positions on various policies due to their direct
expression of party opinions. The emphasis of issues in the
manifestos can also hint to the policies that parties con-
sider most relevant for their campaign, with more space
dedicated to them according to the saliency theory frame-
work (Budge, 2001; Dolezal et al., 2014). Consequently,
they are widely used in political science research. Man-
ifestos are examined to explore the similarities between
parties on various policies (Budge, 2003), predict poten-
tial party coalitions (Druckman et al., 2005), and assess

50


2.5. ANNOTATION AND TEXT REPRESENTATION FOR
MINING POLITICAL OPINIONS

how well the parties align with the voters’ worldviews (Mc-
Gregor, 2013).

Despite being a great resource because of the detailed
annotations, the MARPOR dataset is poorly explored by
the NLP community. It is a huge dataset that consists of
5151 annotated manifestos from over 67 countries across
several continents. It is the largest dataset in the political
science domain. The codebook has 7 broad issue domains
(Cf. Table 1 of Section 4.3 in Chapter 4 for examples)
and 143 fine-grained categories that belong to the broad
domains (examples in Table 1 of Section 5.2 in Chapter
5). The categories are labeled based on policies and may
include the stance on the policy. Within the domain of
external relations, for example, there are two labels for
Military: Positive and Military: Negative because parties
might argue against spending more or less funding on the
military while the category Peace only has one side be-
cause parties do not argue against it.

The detailed annotations allows researchers to under-
stand the salience of issues emphasized by parties and also
their positioning towards some policies (e.g. positive and
negative labels within the military policy issue). On the
one hand, the annotated categories provide a straightfor-

51


CHAPTER 2. POLITICAL OPINIONS IN TEXTS

ward way to analyze political positions within categories
that contain the negative and positive stances in terms
of issue salience (Epstein and Segal, 2000). On the other
hand, they can be analyzed under a low-dimensional ide-
ological framework. The most prominent approach in this
latter case is the RILE index (Laver and Budge, 1992;
Budge, 2013; Volkens et al., 2013). The RILE index is
calculated by taking the difference in the proportions of
categories associated with left-wing and right-wing posi-
tions that occur in the manifestos (Table 1 of Section 6.2
in Chapter 6 illustrates the RILE categories). It has con-
sistently been used in publications and continues to be a
standard reference scale for party positioning, despite nu-
merous proposals for improvement or replacement through
both theory-based and data-driven approaches (Cochrane,
2015; Mölder, 2016; Flentje et al., 2017).

The annotations of MARPOR have been a valuable re-
source for answering our overarching research questions.
We make use of the annotations from the lower to the
higher level of granularity. We used the broad annotated
domains for fine-tuning the models with in-domain data.
We utilized the fine-grained annotations across countries
for our training and evaluating classifiers for the scaling

52


2.5. ANNOTATION AND TEXT REPRESENTATION FOR
MINING POLITICAL OPINIONS

task in a multilingual setup. Finally, we also used the la-
bels for computing party positions and the RILE score as
ground truths for our evaluation.

2.5.2. More informed text representations

Both the tasks of positioning and scaling can be seen as
a text representation problem as we are dealing with the
challenge of converting textual data into structured for-
mats that capture the semantic and syntactic properties
of the different political opinions, allowing us to measure
the (dis)similarities between them.

Models based on static word embeddings (Glavaš et al.,
2017; Rheault and Cochrane, 2020) already show a jump
in performance in comparison with bag of word models
used previously in the task of identifying the positioning
of political actors. Word embeddings, such as those uti-
lized in models like GloVe (Pennington et al., 2014) and
Word2Vec (Mikolov et al., 2013), have numerous advan-
tages such as capturing better semantic relationships be-
tween words, incorporating contextual information, and
providing an efficient representation of words that can be
reduced to smaller vectors. Lastly, they are one of the
first breakthroughs for transfer learning, they can be used

53


CHAPTER 2. POLITICAL OPINIONS IN TEXTS

without being trained from scratch. This allows leverag-
ing knowledge from large corpora in-domain, enhancing
performance especially when labeled data is limited.

Next, the NLP landscape was taken over by contextual-
ized word embeddings based on the Transformers architec-
ture, e.g. BERT, RoBERTa or GPT-3 (Devlin et al., 2019;
Liu et al., 2020; Brown et al., 2020). This type of repre-
sentation has improved the performance of multiple NLP
tasks by capturing corpus-specific word usage and allowing
for fine-tuning that is relatively easy and low in computa-
tional resource demands in comparison with training mod-
els from scratch. This significantly enhances the quality
of token representations. BERT’s and RoBERTa’s origi-
nal architectures, for example, encode representations not
only at a token level, but also at a sentence level, with
the classification token (CLS). However, the CLS token
representation has been shown inefficient because it has
initially been trained to predict the next sentence. One of
the proposed solutions was to average the representations
of all tokens in a given sentence (May et al., 2019; Qiao
et al., 2019), but a simpler and more computationally effi-
cient language model, namely GloVe, performed better at
similarity tasks (Reimers and Gurevych, 2019a).

54


2.5. ANNOTATION AND TEXT REPRESENTATION FOR
MINING POLITICAL OPINIONS

Studies suggest that a model such as Sentence-BERT
(SBERT) Reimers and Gurevych (2019a) is more suitable
for similarity tasks – which is the basis for our methods
for computing political positioning. SBERT is based on
BERT (Devlin et al., 2019) or RoBERTa representations
(Liu et al., 2020), but it outperforms these models in such
tasks because it is further trained to place similar sen-
tences in proximity to one another in the semantic space,
producing more semantically meaningful representations
of sentences. It uses Siamese network architecture with
the objective of minimizing the distance between the sim-
ilar pair of sentences and pushing the dissimilar pair away
in the semantic space. This is optimized by the triplet loss
function shown below:

max(∥Sa − Sp∥ − ∥Sa − Sn∥+ ϵ, 0) (2.1)

where the triplet is composed of anchor (Sa), positive (Sp),
and negative (Sn) sentences where Sa and Sp are more
similar to each other than Sa and Sn. Margin ϵ guarantees
that Sp is at least ϵ closer to Sa than Sn.

Given that SBERT has advanced the field significantly
in sentence encoding, this thesis aims at evaluating the po-
tential of SBERT in the political domain. To our knowl-

55


CHAPTER 2. POLITICAL OPINIONS IN TEXTS

edge, this is the first study to apply and assess SBERT
models within the political domain. Our findings indi-
cate that SBERT is highly adaptable to domain-specific
data. Although the vanilla SBERT model performs ef-
fectively with non-English languages, such as German,
its efficiency is greatly enhanced through fine-tuning with
domain-specific texts, like manifestos, thereby improving
its performance for our task. In the experiments, we take
into account in the fine-tuning regime both information
at a meta level of political documents (party ids) and the
extensive annotations from MARPOR. This allowed us to
keep a weakly supervised regime with in-domain data in
order to assess what works best in the context of political
positioning. Further details are in Chapters 4 and 5.

In this thesis, we further explore the optimization of sen-
tence representations with post-processing. Research has
explored the extent of anisotropy in the distribution of rep-
resentations within transformer language models and its
potential influence on the performance of similarity tasks
(Ethayarajh, 2019; Gao et al., 2019). Anisotropy causes
the sentence-embedding manifold to be in a cone-shaped
format, leading two random vectors to be very similar to
one another. Given this fact, we also experiment with a

56


2.6. OVERVIEW OF CONTRIBUTIONS AND
PUBLICATIONS

simple yet effective post-processing method proposed by
Su et al. (2021) to mitigate this effect. We employ and
evaluate the embeddings before and after the whitening
transformation. Results show that the transformation re-
sults in higher performance in the task. More details are
provided in Chapters 4 and 5.

2.6. Overview of contributions and

publications

In the following, I describe our contributions and a sum-
mary of each publication that contributes to the automa-
tion of mining political opinions investigated during this
thesis.

2.6.1. Contributions

In terms of political positioning, we develop novel meth-
ods for computing party positions based on text similarity.
We propose two approaches that vary in the level of anno-
tations required – contrasting scenarios with and with no
annotations. We propose methods for fine-tuning state-of-
the-art sentence embedding models based on transformers

57


CHAPTER 2. POLITICAL OPINIONS IN TEXTS

with in domain data so that the representations are more
informative for the domain of political texts.

There have been no fully automated approaches pro-
posed for identifying the positioning in relation to policy
issues. Therefore, we propose an end-to-end pipeline for
this purpose. More specifically, we work on a scenario
where newly published manifestos have no annotations –
simulating real world case analysis where there is the need
of immediate analysis of the manifestos after their release.
The research gap concerns both the segmentation of texts
according to the policy issues they belong to, and party
positions within these dimensions. Our pipeline consists
of two stages. The first stage involves a classifier that cat-
egorizes manifesto sentences based on their corresponding
policy issue. The second step regards an unsupervised text
similarity method for identifying party positions within
these issues – which is inspired by on the approach we
developed for the task positioning, and includes a dimen-
sionality reduction component.

Regarding scaling, we explore the task supervised meth-
ods using state-of-the-art models. We evaluate how classi-
fiers using transformers-based model representations that
take short and long input perform in this task. Our ap-

58


2.6. OVERVIEW OF CONTRIBUTIONS AND
PUBLICATIONS

proaches are designed to evaluate real world case scenarios
such as when annotations are not available in a country
or in a time period. Our objective is also to understand
to what extent we can use existing annotations to perform
political scaling at large scale across several languages, in-
cluding low-resource ones.

2.6.2. Unsupervised methods for party

positioning

Below are the key points discussed in the paper illustrated
in Chapter 4 regarding party positions with unsupervised
methods.

Objectives

Given the context introduced in Chapter 2, this paper
aims at developing and evaluating unsupervised and weakly
supervised methods that capture the positioning of polit-
ical parties. Our investigation has three main objectives,
the first lies in understanding to what extent we can re-
liably determine the positioning of political parties with
unsupervised methods and what type of text representa-
tion best tackles this task. The second regards the anno-

59


CHAPTER 2. POLITICAL OPINIONS IN TEXTS

tations – to what extent we can forgo annotations in this
task. And finally, the last objective pertains to evaluat-
ing the level of discourse structure that best captures the
similarity between parties – whether positioning is best
captured with only sentences that contain claims or with
all sentences.

Proposed methodology

We develop and compare two methods for measuring the
distance between parties based on their manifestos that
take into account the amount of information we want to
include in the modelling of the positioning. In the first
scenario, we assume that there is enough annotation re-
garding the policy-domains that the sentences of the man-
ifestos belong to, thus, this information is included in the
function to measure distances. We posit that language
models may find it easier to determine the proximity be-
tween parties by comparing sentences from correspond-
ing topics, or in our case, policy issues. Taking this into
account, we propose a domain-based approach, which
computes the distance of the parties with the pair-wise
distance between pairs of sentences from the manifestos
that belong to the same domain. The final distance be-

60


2.6. OVERVIEW OF CONTRIBUTIONS AND
PUBLICATIONS

tween a pair of parties is the average of all distances. To
contrast with that and evaluate the limits of capturing
the positioning without annotations, we develop the sec-
ond approach to compute distance between parties called
twin-matching. In this approach, the distance between
a pair of parties is calculated with the pair-wise similar-
ity between all sentences from both parties, where only
the highest similarity pair is input for the function. This
is normalized by the highest pair-wise similarity between
sentences from the same manifestos for each of the parties
from the pair (refer to § 4.4.2 for more details). We hy-
pothesize that this step offers a substitute for information
regarding the policy issue in the absence of annotations.

In order to answer the question in relation to discourse
structure, we evaluate two setups. In the first setup, we
take into account all sentences from the manifestos. We
argue that this scenario is less informative because it does
not discriminate between sentences that might contain
stances or not. The second setup instead is more infor-
mative because it only considers claims in the function.
We posit that claims are already charged with a party’s
positioning towards a topic and that they contain the es-
sential aspects of their proposed policies, given that po-

61


CHAPTER 2. POLITICAL OPINIONS IN TEXTS

litical claims contain a demand (Koopmans and Statham,
1999). For that, we run a claim classifier that predicts the
sentences containing a claim in the manifestos. Both dis-
course structures (all sentences and claims) are evaluated
under the domain-based and twin-matching similarity
computational approaches, leading to a total of four se-
tups for comparison.

Besides that, we focused our evaluation on the text rep-
resentations. We evaluated 6 word or sentence embedding
models in the 4 setups – from the simplest static word em-
bedding fastText to SBERT both vanilla and fine-tuning
on in-domain data with manifestos from previous elec-
tions. The two fine-tuned models are SBERTparty) which
uses relations extracted from the party id of the mani-
festos for fine-tuning and SBERTdomain) which uses the do-
main annotations from the manifestos. Among all models,
there were multilingual and German monolingual models.
Because the representations derived from transformers fall
into an anisotropic distribution (where two random rep-
resentations have high similarity), we experimented with
post-processing the representations with whitening trans-
formation, as suggested by Su et al. (2021). All setups are
evaluated against party positions according to their an-

62


2.6. OVERVIEW OF CONTRIBUTIONS AND
PUBLICATIONS

swers in the voting advice application (Wahl-o-Mat) from
the same year as the analyzed manifestos.

Main findings

Firstly, the multilingual SBERT model is the best perform-
ing one (i.e., with higher correlation to the ground truth),
confirming the high performance of the SBERT family
of models in similarity texts, as shown by Reimers and
Gurevych (2019b). Then, we observe that nearly all repre-
sentations were improved with post-processing, suggesting
that the transformation of the space to an isotropic dis-
tribution improves the performance of tasks which are do-
main specific, such as in the case of political debate. The
vanilla version of SBERT performs best in the more infor-
mative case with information from the domain (using the
domain-based approach) while fine-tuned SBERTparty cor-
relates better with the ground truth in the absence of do-
main annotations, suggesting that in-domain information
embedded in the model helps in capturing the similarity
when there is lack of domain specificity in the data to be
modelled. Moreover, we observe no significant difference
between using all sentences and claims only, suggesting
that claims are not the only discourse structure reinforcing

63


CHAPTER 2. POLITICAL OPINIONS IN TEXTS

party positions – which goes in line with the findings that
justifications also matter in the analysis of party positions
(Blokker et al., 2022). Lastly and most strikingly, between
the similarity computation approaches, the best results are
obtained in the twin-matching approach (with fine-tuned
SBERTparty) reaching 0.70 correlation. These findings vali-
date the notion that NLP techniques can be employed to
identify the (dis)similarity between parties based on their
policy stances using a combination of unstructured dis-
course and in-domain sentence representations.

2.6.3. Unsupervised methods for party

positioning at a policy issue level

Below we highlight the main points concerning the paper
presented in Chapter 5 on party positions at an aggregated
level and within policy issues.

Objectives

Following the previous study regarding party positioning
and its promising results at an aggregated level of infor-
mation, we aim at understanding the extent to which po-
sitioning can be reliably carried out at a policy issue level.

64


2.6. OVERVIEW OF CONTRIBUTIONS AND
PUBLICATIONS

This requires working with specific parts of the manifestos
that discuss specific issues, e.g., migration, economy, and
education. In this paper, the objective is to expand on the
previously developed methods and evaluate their limits in
terms of fine-graininess. We contend that our approach
provides interpretability to party positions by shedding
light on the issues within the spectrum of politics on which
parties exhibit agreement or disagreement. We propose a
workflow that segments the manifestos based on cluster-
ing. We then classify unseen data into these newly cre-
ated labels that represent coherent policy issues. Then,
we identify the positioning of political parties within each
policy issue. Given our objectives, we evaluate each stage
of the workflow under the condition where annotations are
absent for a collection of manifestos. This evaluation aims
to gauge the reliability of our approach for applying it in
contexts where annotations from manifestos of forthcom-
ing elections may be unavailable.

Proposed methodology

In order to reach the objectives stated above, we pro-
pose a methodology aimed at estimating party similarity
within policy issues, addressing inherent constraints. This

65


CHAPTER 2. POLITICAL OPINIONS IN TEXTS

methodology comprises several stages: (a) defining appro-
priate policy issues, (b) automatically labeling domains if
manual labels are unavailable, (c) computing similarities
at the domain level and aggregating them globally, and
(d) extracting understandable party positions on signifi-
cant policy axes using multidimensional scaling.

In the first step of the workflow (a), we aim to define
broad categories of policy issues that are not yet satis-
fied with the MARPOR annotations. We argue that the
annotations from MARPOR are either too broad (in the
case of the 7 domains) or too fine-grained given that many
categories even contain a stance label. Therefore, our ini-
tial step involves breaking down the manifestos into co-
herent segments, which we define as policy issues. These
domains need to be coherent and easily understandable
within the context of policies to facilitate our goal. In
addition to that, they must remain impartial in terms of
stance. This means that categories representing opposing
viewpoints (such as positive and negative stances on a par-
ticular issue like immigration) should fall under the same
policy issue. The granularity of these domains is crucial,
as they should offer sufficient detail to provide meaningful
insights on policy issues, but should not be overly detailed

66


2.6. OVERVIEW OF CONTRIBUTIONS AND
PUBLICATIONS

to the extent that practical classification becomes unfea-
sible. In order to create a new level of labels that fall in
between the broad domains and the fine-grained categories
of MARPOR, we compute the pairwise distance between
all pairs of sentences belonging to the fine-grained MAR-
POR categories from German manifestos. This results in
a distance matrix with the MARPOR categories as rows
and columns. Then, we run agglomerative hierarchical
clustering to group similar MARPOR categories in the
same cluster. These clusters are manually named accord-
ing to the categories that fall into them. This process can
be seen as a third level of annotations for the manifestos.
For instance, sentences that were annotated as Military:
Negative, Peace, and Military: Positive are now within the
policy issue of military and peace.

In the second step (b), we train a classifier (referred to
as policy issue labeller) with two different training data
settings, DEtrain is trained with manifestos from Germany
only and DACHtrain with manifestos from all German-speaking
countries. We choose this setup to evaluate whether more
data can improve the performance of the classifiers. Three
classifiers with either SBERT or RoBERTa representations

67


CHAPTER 2. POLITICAL OPINIONS IN TEXTS

and a classifications head on top are trained and evaluated
under these two setups.

After having predicted the labels of the manifestos with
the best performing policy issue labeller, we use a simi-
lar strategy from the previous study (the domain-based

approach) to calculate the similarity of parties within do-
mains, as proposed in the third step (c). The distance
matrices from the policy-domains are averaged in order to
capture the positioning at an aggregated level. We corre-
late the distance between parties with the distance matrix
derived from the MARPOR categories – considered our
ground truth in this case. In the last step (d) we run
a dimensionality reduction strategy (principal component
analysis) on the individual distance matrix of each policy
issue. We visualize the values of the first principal com-
ponent in a scale to inspect party positions within policy
issues. This allows us to understand how closely related
are parties in each topic. Finally, we evaluate 4 differ-
ent models for sentence representations in stages (c) and
(d). Similarly to the previous study, we post-process the
representations with whitening transformation, since they
always boost the performance of the results in comparison
with no post-processing.

68


2.6. OVERVIEW OF CONTRIBUTIONS AND
PUBLICATIONS

The evaluation of the pipeline varies in each step so that
we can assess to what extent annotations can be forgone.
Step (a) is inspected manually given that we do not have
ground truth for the newly mapped domains. In step (b),
we evaluate the policy issue labeller based on the mapped
MARPOR annotations. In step (c), party positions is eval-
uated with political science knowledge about the stances
and ideologies of parties within each domain according to
the German political spectrum. The predicted scenario for
step (d) is only evaluated on the accuracy of the classifier,
where we identify which domains are successfully classified
and which ones the models struggle the most with. That
is a proxy for what domains can be reliably used in an
analysis without annotated data.

Main findings

Our manual inspection shows that the clustering strategy
employed in step (a) resulted in 13 clusters that match
the demands we initially pre-defined for solid domains.
That is, all positive and negative categories belonging to
the same topic fall in the same cluster, and the clusters
themselves fit into well-known policy issues (Benoit and
Laver, 2006; Jolly et al., 2022).

69


CHAPTER 2. POLITICAL OPINIONS IN TEXTS

In the second step, we evaluated three transformers-
based models for classifying the newly mapped policy-
domains with two different data regimes. RoBERTaxlm

+MLP reached the best performance in both regimes with
62,5% and 64,5% accuracy in the DEtrain and DACHtrain
respectively. The increase in the amount of data (from
other German speaking countries) helped the classifier by
only 2%, suggesting that classifying policy issues contin-
ues to be a hard task for models regardless of the amount
of training data. Moreover, the 2 point-improvement in
performance also suggests that annotated data from other
countries can be used for this classification task, but the
gains in performance are low.

The results of positioning at an aggregated level in the
predicted scenario achieve a very high correlation against
both ground truths – when comparing the first principal
component against the RILE index and the distance ma-
trix of the similarity computation against the distance ma-
trix computed with MARPOR categories. While in the
annotated setting, the best representations reach correla-
tions as high as 0.94 and 0.84 in RILE and MARPOR,
the pipeline including the label classifier reaches 0.79 and
0.80 respectively. The best representations are again the

70


2.6. OVERVIEW OF CONTRIBUTIONS AND
PUBLICATIONS

fine-tuned SBERT. This time though, the best fine-tuning
strategy is when the model is optimized to approximate
sentences from the same MARPOR high-level domain,
our model SBERTdomain). This suggests that even though
the predictions are not extremely accurate, the in-domain
knowledge embedded in the fine-tuning process helps it
estimate the domains during the similarity computation.

Lastly, the final step of the workflow is partially suc-
cessful. With our dimensionality reduction technique, we
show that indeed parties do not follow the same left–right
scaling in all policy issues, as expected Heywood (2021).
According to expert domain knowledge, we observe that
the results reflect certain well-established aspects of Ger-
man politics. For instance, in the domain of foreign rela-
tions, EU, and protectionism, which exhibits only a mod-
erate correlation with the left–right spectrum, the AfD
stands out compared to other parties. This deviation can
arguably be attributed to its opposition to EU member-
ship and its differing stance on relations with Russia, set-
ting it apart from other parties clustered within the same
ideological position. Another instance is evident in the
domain of education and technology, where the AfD and
Die Linke, typically positioned at opposing ends of the

71


CHAPTER 2. POLITICAL OPINIONS IN TEXTS

left-right spectrum, surprisingly share significant common
ground in advocating for expanded education and invest-
ment in technology and infrastructure. On the contrary,
in domains such as military and peace and immigration
and multiculturalism, party positions closely align with the
broader left-right scale, with right-leaning parties exhibit-
ing more militaristic tendencies and greater aversion to
immigration. Finally, we check for the performance of the
policy issue labeller in each label given that in a scenario
without annotations, only the positioning within high per-
formant policy issue