Biochemical investigations of  
multivalent chromatin reading domains 

 
Von der Fakultät 3: Chemie der Universität Stuttgart 
zur Erlangung der Würde eines 

Doktors der Naturwissenschaften (Dr. rer. nat.) genehmigte Abhandlung 

 
Vorgelegt von 

Michel Choudalakis 

aus Athen, Griechenland 

 
Hauptberichter:                            Prof. Dr. Albert Jeltsch 

Mitberichter:                                Prof. Dr. Stephan Nußberger 

Prüfungsausschussvorsitzender:  Prof. Dr. Jens Brockmeyer 

 
Tag der mündlichen Prüfung: 04.06.2024 

 
Institut für Biochemie und Technische Biochemie 

der Universität Stuttgart 

 
2024


III 

 
Thesis – Antithesis – Synthesis 
 
 
Ὁ βίος βραχύς, 
ἡ δὲ τέχνη μακρή, 
ὁ δὲ καιρὸς ὀξύς, 

ἡ δὲ πεῖρα σφαλερή, 
ἡ δὲ κρίσις χαλεπή. 

 
Ἱπποκράτης (Ἀφορισμοί) 

 
Life is short, 

and the craft long, 
time is brief, 

experimentations perilous, 
and judgment difficult. 

 
Hippocrates (Aphorisms)  

 
Αἰέν ἀριστεύειν 
- Ἱππολόχoς ο Λύκιος προς Γλαύκοντα 

Όμηρος (Ιλιάδα, Ζ’: 208) 
 

Always excel 
- Hippolochus of Lycia to his son Glaucus 

Homer (Ilias) 
 

IV 

  
V 

Acknowledgements 

I would like to express my deepest gratitude to Prof. Dr. Albert Jeltsch for providing me with 

the opportunity to conduct my doctoral studies in the biochemistry laboratories, for his 

mentorship, and for his constructive critique during my thesis. His contribution to my 

development as a scientist is invaluable.  

 
Additionally, I am extremely thankful to Prof. Dr. Stephan Nußberger and Prof. Dr. Johannes 

Kästner for reading and reviewing my thesis as part of the examination committee. Also, a 

special thank you goes to Prof. Dr. Jens Brockmeyer who was able to cover a sick leave on 

short notice and participate in the committee on the examination day. 

 
I am particularly grateful to Dr. Dr. Pavel Bashtrykov for his support throughout my time here.  

My studies would not have been the same without his assistance and friendship. His 

constructive commentary, his support, and his encouragement are of great value to me.  

 
Many thanks go to my office mates throughout the years for the great atmosphere, their 

assistance, their input, and their friendship. I thank the gents, Dr. Julian Broche for instructing 

me in my first steps in bioinformatics, the endless discussions about our projects, science and 

the world in general, and Stefan Kunert for the laughter and the jokes. And a special thanks to 

the ladies whose presence changed the office, especially in reducing the number of energy drink 

Pfanddosen! To Nivethika Rajaram for being the good person and spirited conversationalist 

that she is, to Claudia Albrecht for her kind and happy self, and to our latest addition Franziska 

Dorscht for her dynamic energy and greening up our office! Each added their own personality 

to the cordial atmosphere and never objected to my spontaneous exclamations, rhetorical 

questions, and impromptu statements. We are indeed the International Office of Awesome! I 

also extend my sincere thanks to all my colleagues that assisted me in experiments (especially 

Jannis, the man of the cell cycle FACS!), brainstormed solutions (TC we might be ChIPing 

until retirement…), and joined me in the adventure of trying to decipher the mysteries of nature. 

Gizem, Philipp, Mina, Tabea, Alex, Franzi K., Micha, it was fun talking and doing science with 

you. I also thank my students Cassandra and Nico for the months we worked together trying to 

make nature do our bidding and for making me a better teacher.  

 
My warmest thanks go to Dragica, Regina, and Branca. They gave me a “good morning” in all 

the languages we speak, sometimes a meal or a hug, some good advice, and always a friendly 


VI 

smile. They helped me in many ways and I cherish the fond memories. I would be remiss not 

to thank Elisabeth and Lea, who so often tackle mountains of bureaucracy for us and the 

institute. I’d like to acknowledge the rest of the group, especially to all who brought cake. You 

made everybody’s day! 

 
To my very dear friends Anastasia, Tomek, Svein Tore, Paul, Hege Jeanette and Robert, and 

everyone else in Norway, thank you for the amazing times we had together and for throwing a 

party every time we were allowed meet in groups of more than 3! To Arnhild and Stein Egil a 

great big thank you for embracing me and making me part of the family, as well as to my sister 

Emmanouella for her love and support through the almost 4 decades of my existence. During 

this time too many people to name here changed my life. To all my friends and teachers, thank 

you. Your efforts were and are greatly appreciated. 

 
To Alexandra Elbakyan, thank you for democratising access to science. Your efforts make the 

world better every day. 

 
Finally, it is impossible to convey the depth of gratitude I feel towards my wife, Ida Helene. 

She is my soulmate and my love, and she has made life beautiful. She has provided me with the 

unwavering support and encouragement that helped me throughout this challenging journey. 

  
VII 

List of Publications 

6 peer-reviewed articles, one review book chapter, one manuscript under review 
 
1. Dukatz M, Holzer K, Choudalakis M, Emperle M, Lungu C, Bashtrykov P, Jeltsch A. 
H3K36me2/3 binding and DNA binding of the DNA methyltransferase DNMT3A PWWP 
domain both contribute to its chromatin interaction. Journal of molecular biology. 2019. 
10.1016/j.jmb.2019.09.006 
 
2. Pinter S, Knodel F, Choudalakis M, Schnee P, Kroll C, Fuchs M, Broehm A, Weirich S, 
Roth M, Eisler SA, Zuber J, Jeltsch A, Rathert P. A functional LSD1 coregulator screen reveals 
a novel transcriptional regulatory cascade connecting R-loop homeostasis with epigenetic 
regulation. Nucleic acids research. 2021. 10.1093/nar/gkab180 
 
3. Schnee P, Choudalakis M, Weirich S, Khella MS, Carvalho H, Pleiss J, Jeltsch A. 
Mechanistic basis of the increased methylation activity of the SETD2 protein lysine 
methyltransferase towards a designed super-substrate peptide. Communications Chemistry. 
2022. 10.1038/s42004-022-00753-w 
 
4. Kunert S, Linhard V, Weirich S, Choudalakis M, Osswald F, Krämer F, Köhler AR, Bröhm 
A, Wollenhaupt J, Schwalbe H, Jeltsch A. The MECP2-TRD domain interacts with the 
DNMT3A-ADD domain at the H3-tail binding site. Protein Science. 2023. 10.1002/pro.4542 
 
5. Jeltsch A, Choudalakis M, Dukatz M, Kunert S. Chapter 1: ADD domains – A regulatory 
hub in chromatin biology and disease. Book: Chromatin readers in Health and Disease - 
Histone mark readers. 2023, Elsevier. 10.1016/B978-0-12-823376-4.00002-1 
 
6. Choudalakis M, Kungulovski G, Mauser R, Bashtrykov P, Jeltsch A. Refined read-out: The 
hUHRF1 Tandem-Tudor domain prefers binding to histone H3 tails containing K4me1 in the 
context of H3K9me2/3. Protein Science. 2023. 10.1002/pro.4760 
 
7. Choudalakis M, Bashtrykov P, Jeltsch A. RepEnTools: An automated repeat enrichment 
analysis package for ChIP-seq data reveals hUHRF1 Tandem-Tudor domain enrichment in 
young repeats. Mobile DNA. 2024. 10.1186/s13100-024-00315-y 
 
Chandrasekaran TT, Choudalakis M, Bröhm A, Weirich S, Kouroukli AG, Ammerpohl O, 
Rathert P, Bashtrykov P, Jeltsch A. SETDB1 activity is globally directed by H3K14 acetylation 
via its Triple Tudor Domain. Under review. bioRxiv: 10.1101/2024.04.22.590554 
 
 
VIII 

Declaration of Authorship 

 
I hereby certify that the dissertation entitled 

Biochemical investigations of multivalent chromatin reading domains 

is entirely my own work except where otherwise indicated. Passages and ideas from other 

sources have been clearly indicated.  

 
Erklärung über die Eigenständigkeit der Dissertation 

 
Ich versichere, dass ich die vorliegende Arbeit mit dem Titel  

Biochemical investigations of multivalent chromatin reading domains 

selbständig verfasst und keine anderen als die angegebenen Quellen und Hilfsmittel benutzt 

habe; aus fremden Quellen entnommene Passagen und Gedanken sind als solche kenntlich 

gemacht. 

 
Michel Choudalakis 

 
Stuttgart, 19.01.2024 

 
1 

Contents 
 

Abbreviations ......................................................................................................................................... 5 

Abstract .................................................................................................................................................. 7 

Zusammenfassung ............................................................................................................................... 10 

1. Introduction ..................................................................................................................................... 13 

1.1. Phenotype, epigenetics and nucleosomes ................................................................................... 13 

1.2. Genes, regulatory elements and epigenomic control .................................................................. 16 

1.3. Chromatin architecture, organisation and regulation ................................................................. 18 

1.4. Repeat elements, regulation and function .................................................................................. 19 

1.5. Epigenome signals, and the histone code ................................................................................... 23 

1.5.1 Histone lysine modifications ................................................................................................ 23 

1.5.2 Writers and functional roles of H3K4me1 and H3K9me2/3 ................................................ 24 

1.6. Chromatin readers of particular interest for this work ............................................................... 26 

1.6.1 DNMT3A contains an H3K36me2/3 reader ......................................................................... 26 

1.6.2 DDX19A contains an H3K27me3 reader ............................................................................. 26 

1.6.3 UHRF1 contains an H3K9me2/3 reader ............................................................................... 27 

1.6.4 Principles of HiMID – histone substrate interactions ........................................................... 28 

1.6.5 Thermodynamics of multivalent histone PTM read-out ....................................................... 31 

1.7. Experimental methods to investigate multivalent interactions ................................................... 33 

1.7.1. Biochemical characterisation of HiMIDs ............................................................................ 34 

1.7.2. CIDOP and downstream analyses ....................................................................................... 36 

1.8. Biological context of multivalent epigenomic marks ................................................................. 37 

1.8.1. Overview of concepts in gene regulation: TFs, accessibility, and histone code theory ...... 37 

1.8.2. The histone code theory in practice ..................................................................................... 38 

1.9. Aims of this study ....................................................................................................................... 42 

2. Materials and Methods ................................................................................................................... 44 

2.1. Plasmids and mutagenesis .......................................................................................................... 44 

2.2. Bacterial strains .......................................................................................................................... 46 

2.3. Expression of recombinant proteins ........................................................................................... 47 


2 

2.4. Purification of recombinant proteins .......................................................................................... 48 

2.5. CelluSpots MODified™ histone peptide arrays ......................................................................... 49 

2.6. Equilibrium peptide binding ....................................................................................................... 49 

2.7 Investigation of bivalent histone enrichment (in vitro) ............................................................... 51 

2.7.1 Chromatin extraction and bivalent histone H3 enrichment .................................................. 51 

2.7.2 Analysis of bivalent histone H3 modifications after enrichment ......................................... 52 

2.7.3 DNA analysis of locus-specific enrichment ......................................................................... 53 

2.7.3 DNA library generation for HTS after bivalent histone H3 enrichment .............................. 53 

2.8 Investigation of bivalent histone enrichment (in silico) .............................................................. 54 

2.8.1 HTS data preparation and correlation analysis ..................................................................... 54 

2.8.2 H3K9me2 genome-wide analysis ......................................................................................... 55 

2.8.3 UHRF1-TTD CIDOP peak calling and fragmentation ......................................................... 55 

2.8.4 Plotting of heatmaps and k-means clustering ....................................................................... 55 

2.8.5 Analyses of UHRF1-TTD CIDOP peaks from HepG2 cells ................................................ 56 

2.8.6 refTSS analyses .................................................................................................................... 57 

2.8.7 Heatmap of ARID5B peaks from HepG2 cells .................................................................... 58 

2.8.8 Analyses of data from HCT116 cells ................................................................................... 58 

2.8.9 Murine ChIP scatter plots ..................................................................................................... 59 

2.8.10 Analyses on REs ................................................................................................................. 59 

2.9 Structural analysis ....................................................................................................................... 61 

2.10 Statistics..................................................................................................................................... 61 

3. Results and Discussion .................................................................................................................... 62 

3.1 Technical improvements ............................................................................................................. 62 

3.1.1 Improved site-directed mutagenesis protocol with Q5 polymerase ..................................... 62 

3.1.2 Improved autoinduction protocol for recombinant protein expression in E. coli ................. 64 

3.1.3 CIDOP and ChIP .................................................................................................................. 66 

3.2 DNMT3A-ADD and SETD2 spectroscopic assays ..................................................................... 68 

3.3 DNMT3A-PWWP binding to H3K36me2/3 and DNA ............................................................... 70 

3.4 DDX19A binding to R-loops and to H3K27me3 ........................................................................ 75 

3.5 UHRF1-TTD binding to H3K4me1-K9me2/3 ............................................................................ 80 


3 

3.5.1 UHRF1-TTD binds to H3K4me1-K9me2 peptides on arrays .............................................. 81 

3.5.2 UHRF1-TTD binds to H3K4me1-K9me2/3 peptides in equilibrium peptide binding assays

 ....................................................................................................................................................... 82 

3.5.3 UHRF1-TTD mutants bind differentially to H3K9me3 vs H3K4me1-K9me3 peptides ...... 84 

3.5.4 UHRF1-TTD adopts discrete binding modes for different marks ........................................ 91 

3.5.5 UHRF1-TTD binds nucleosomes with H3K4me1 and H3K9me2/3 .................................... 94 

3.5.6 UHRF1-TTD prefers native H3 with both K4me1 and K9me2/3 ........................................ 97 

3.5.7 UHRF1-TTD binds on promoters of cell type specific genes and down-regulated genes in 

HepG2 ......................................................................................................................................... 100 

3.5.8 H3K4me1, H3K9me2 and UHRF1-TTD are found on enhancers of cell-type specific genes 

in HepG2 ..................................................................................................................................... 103 

3.5.9 H3K4me1, H3K9me2 and UHRF1-TTD are found on the flanks of cell-type specific TFBS 

in HepG2 ..................................................................................................................................... 105 

3.5.10 Murine UHRF1 correlates to H3K4me1 genome-wide .................................................... 107 

3.5.11 UHRF1-TTD down-regulates genes with H3K4me1-K9me2 enriched enhancers .......... 108 

3.5.12 Previous studies of UHRF1-TTD functions ..................................................................... 110 

3.6. RepEnTools: An automated repeat enrichment analysis package for ChIP-seq data ............... 113 

3.6.1. RepEnTools is fast and efficient on the chm13v2 assembly ............................................. 114 

3.6.2. RepEnTools is reproducible and accurate for repeat masker regions, excluding some Simple 

repeats .......................................................................................................................................... 117 

3.6.3. RepEnTools screening reveals hUHRF1 Tandem-Tudor Domain enrichment on REs ..... 119 

3.6.4. hUHRF1-TTD is enriched on REs with H3K4me1-K9me2 or H3K4me1-K9me3, two 

distinct double marks ................................................................................................................... 120 

3.6.5. mUHRF1 is also enriched on TEs with H3K4me1-K9me2/3 ........................................... 122 

3.6.6. Epigenomics of TE regulation, UHRF1 and plausible biological roles ............................ 125 

3.7 Future investigations of UHRF1 and H3K4me1-K9me2/3 ....................................................... 129 

3.7.1. Considerations regarding the biological context of the investigations .............................. 129 

3.7.2. Improved approaches for UHRF1 depletion/removal ....................................................... 130 

3.7.3. Future directions for bioinformatic analyses ..................................................................... 132 

3.8 Perceptions regarding the histone code theory .......................................................................... 134 

3.8.1. Controversy regarding the histone code theory ................................................................. 134 


4 

3.8.2. Current research on transcriptional regulation and the histone code ................................. 136 

4. Conclusions .................................................................................................................................... 137 

4.1. Overview of investigations into HiMID multivalent interactions ............................................ 137 

4.2. HiMID multivalent chromatin interactions in biological context............................................. 138 

4.3. The future of multivalent HiMIDs and the histone code .......................................................... 139 

References .......................................................................................................................................... 142 

Appendix ............................................................................................................................................ 162 

 
5 

Abbreviations 

5(h)mC 
ABC 
bp 
ChIP 
CIDOP 
DNMT 
DTT 
EDTA  
ESCs  
F 
FA 
FITC 
GST 
HiMID 
HTS 
IPTG 
LADs 
MPP8 
PAGE  
PHD 
PRC 
PTM 
PWWP 
(q)PCR  
RE 
RING 
SDS 
SQ  
STE 
TAF3  
TBS-T 
TE 
TF 
TSS 
UHRF1 

 
5 (hydroxy-)methylcytosine 
Association-By-Contact model 
Base pairs 
Chromatin immunoprecipitation 
Chromatin Interacting Domain Precipitation 
DNA methyltransferase  
Dithiothreitol  
Ethylenediaminetetraacetic acid 
Embryonic stem cells 
Fluorescein  
Fluorescence anisotropy 
Fluorescein isothiocyanate  
Glutathione S-transferase  
Histone modification interacting domains 
High-throughput sequencing 
Isopropyl β-D-1-thiogalactopyranoside 
Lamina associated domains 
Mitotic-phase phosphoprotein 8 
Polyacrylamide gel electrophoresis  
Polycomb Repressive Complex 
Plant homeodomain 
Post-translational modifications 
Pro-Trp-Trp-Pro motif 
(quantitative) Polymerase Chain Reaction  
Repeat element 
Really Interesting New Gene  
Sodium dodecyl sulphate 
Starting quantity  
Saline/tris(hydroxymethyl)aminomethane/EDTA 
TATA-binding protein-associated factor-1 
Tween 20- Tris Buffered Saline 
Transposable element 
Transcription factor  
Transcriptional start site  
Ubiquitin-like, with PHD and RING finger domains 1 
  

6 

  
Abstract 

7 

Abstract 
In eukaryotes, the negatively charged nuclear DNA wraps around cationic histone proteins 

to form nucleosomes and compact the genetic information. Histones carry several post-

translational modifications (PTMs) that appear in combinatorial patterns. These marks are 

interpreted by non-covalent interactions with proteins containing histone modification 

interacting domains (HiMIDs), also known as “reader” domains. Thirty years ago, it was 

proposed that the histone marks act as signals in the regulation of transcription and other 

chromatin functions. With time, this concept has been refined to suggest that combinatorial 

patterns of marks represent context-specific signals, termed a 'histone code'. It functions as one 

of the epigenetic regulatory mechanisms, which control reversible and heritable changes in 

cellular phenotype. Intermolecular models demonstrate thermodynamic benefits from 

multivalent engagement of nucleosomes, suggesting their widespread occurrence. However, so 

far only few multivalent readers are known and dissecting their function has been very 

challenging. 

 
This thesis focuses on HiMIDs with complex roles that simultaneously interact with two 

histone PTMs or two different substrates. Introducing the theoretical foundation, I discuss the 

thermodynamic and biological basis of how multivalent interactions can guide effector protein 

complexes, targeting their functions to distinct regions and chromatin states. Then, I present 

data from the characterisation of the readers DNMT3A-PWWP, DDX19A, and UHRF1-TTD 

in the context of multivalent engagement of histone PTMs and biomolecules. Starting with 

DNMT3A-PWWP, I quantified the binding of the wild-type (WT) and a mutant domain to 

histone H3K36me2/3 peptides, showing negligible differences, while my colleagues showed 

that the mutant has drastically reduced binding to DNA and nucleosomal substrates. I, then, 

studied the R-loop helicase DDX19A to demonstrate a very strong binding to H3K27me3 

peptides in the nanomolar range, complementing the findings of a complex functional study. 

The latter showed that interaction with H3K27me3 is necessary for robust DDX19A-mediated 

R-loop resolution, and LSD1-target gene silencing. 

 
With UHRF1-TTD, I discovered and quantified its preferential binding to H3K4me1-

K9me2/3 peptides vs H3K9me2/3 alone and engineered mutants with specific and differential 

binding changes leading to the discovery of a novel Kme1 read-out mechanism, based on the 

interaction of R207 methylene groups with the H3K4me1 methyl group and on counting the H-

bond capacity of H3K4. High-throughput sequencing (HTS) data revealed strong TTD binding 


Abstract 

8 

at chromatin sites with H3K4me1 peaks and broad H3K9me2/3 signal, which are enriched on 

enhancers and promoters of cell-type specific genes at the flanks of cell-type specific 

transcription factor binding sites. Data from the full-length protein in mouse and human cells 

evidenced the physiological role of the H3K4me1-K9me2/3 double marks in TTD-mediated 

UHRF1 recruitment.  

 
To further illustrate this point, I investigated UHRF1-dependent silencing of repeat 

elements (RE). To this end, I developed RepEnTools, improving the previously available 

programmes for RE enrichment analysis in chromatin pulldown studies by leveraging new 

tools, with carefully chosen and validated settings, enhancing accessibility, and adding some 

key functions. RepEnTools analyses showed that chromatin binding of hUHRF1-TTD and full-

length mUHRF1 was strongly enriched on different REs promoters with the H3K4me1-K9me3 

double mark where UHRF1 represses their expression. The data suggest a novel functional role 

for the H3K4me1-K9me3 signal of the histone code that is both sequence independent and 

conserved in two distinct mammals. Taken together, the work presented here is consistent with 

and supports the histone code theory, best illustrated by UHRF1-TTD which binds a specific 

double mark that has a biological meaning going beyond the meaning of the individual marks. 

 
In this thesis, I presented various mechanisms that influence epigenomic regulation, 

including chromatin 3D-architecture, accessibility, transcription factor recruitment, and 

chromatin marks. Especially in the context of UHRF1-TTD functions, I discussed how DNA, 

RNA, histones, and covalent modifications thereof interweave to produce the signalling 

network necessary throughout the lifetime of the mammalian cell, during differentiation, 

development and every other phase of life. Thus, within the three-dimensional scaffold of 

chromatin structures these biomolecules and their modifications collectively form the context-

specific network of effectors and maintainers of the epigenomic modifications. The ways in 

which they influence transcription and translation are only now becoming unravelled. Hence, 

the recent data suggest the existence of not just a histone code, but a 3D-chromatin modification 

code, which dictates how biomolecules and their modifications collectively implement 

epigenomic regulation by interactions along the chromatin and through 3D space. As shown in 

these projects, readers commonly use the mechanism of multivalent interactions to interpret 

such contextual signals and guide epigenomic effectors to their targets. The tools and workflows 

that were developed and applied in this work can be employed to reveal more instances of 

refined read-out among HiMIDs. 

 
Abstract 

9 

Additionally, I leveraged my experience with fluorescence spectroscopy and made 

contributions to another two published studies. The first study demonstrated that the DNMT3A-

ADD Zn-finger domain, which is a known H3K4me0 reader, also binds to a domain from the 

MECP2 protein. The association was quantified, and the specificity demonstrated with a 

binding deficient triple mutant. This interaction offers complex additional regulation options to 

DNMT3A and MECP2, in interplay with the histone code. The second study focused on 

SETD2, a H3K36me3 depositing enzyme, and the mechanism of its preference for a designed 

“super substrate” peptide. By elegantly combining computational simulations and experimental 

data, the study demonstrated that an H3 peptide substrate predominantly exists in an extended 

conformation in solution, while the super substrate forms a hairpin conformation. Upon binding 

to the enzyme, the hairpin is opened and the super substrate adopts a similar conformation as 

the canonical substrate. These results highlighted the dynamic nature of solubilised peptides' 

conformations, their impact on protein-protein interactions, and the significance of dynamic 

conformational changes in interactions.  

 
Zusammenfassung 

10 

Zusammenfassung 

Im Kern eukaryotischer Zellen wickelt sich das Polyanion DNA um kationische 

Histonproteine, und bildet so Nukleosomen welche die genetische Information verdichten. 

Histone tragen eine Vielzahl von posttranslationalen Modifikationen (PTMs), die in 

kombinatorischen Mustern auftreten. Diese Markierungen werden durch nicht-kovalente 

Wechselwirkungen (WW) mit Proteinen interpretiert, die Histon Modification Interacting 

Domains (HiMIDs), so genannte "Lese"-Domänen, enthalten. Vor dreißig Jahren wurde 

vorgeschlagen, dass Histone PTMs als Signale bei der Regulation der Transkription und anderer 

Chromatin Funktionen fungieren. Dieses Konzept im Laufe der Zeit wurde dahingehend 

erweitert, dass kombinatorische Muster von Markierungen kontextspezifische Signale 

darstellen, einen sogenannten "Histon-Code". Dies ist einer der epigenetischen 

Regulationsmechanismen, die reversible und vererbbare Veränderungen des zellulären 

Phänotyps kontrollieren. Intermolekulare Modelle zeigen, dass die multivalente Bindung von 

Nukleosomen thermodynamische Vorteile mit sich bringt, was darauf hindeutet, dass dieser 

Mechanismus weit verbreitet ist. Es sind allerdings bislang nur wenige multivalente 

Lesedomänen bekannt, und ihre Funktion zu entschlüsseln ist eine große Herausforderung.  

 
Diese Arbeit konzentriert sich auf HiMIDs mit komplexen Funktionen, die mit zwei 

Histon-PTMs oder zwei unterschiedlichen Substraten interagieren. Nach einer Einführung in 

die theoretischen Grundlagen diskutiere ich die thermodynamischen und biologischen 

Prinzipien, die erklären wie multivalente WW Effektorproteinkomplexe lenken können, um 

ihre Funktionen auf bestimmte Regionen und Chromatinzustände auszurichten. Anschließend 

präsentiere ich Daten der Charakterisierung der Lesedomänen DNMT3A-PWWP, DDX19A 

und UHRF1-TTD im Kontext einer multivalenten WW mit Histon-PTMs und Biomolekülen. 

Beginnend mit DNMT3A-PWWP habe ich die Bindungsstärke der Wildtyp (WT) und einer 

mutierten Domäne an Histon-H3K36me2/3-Peptide quantifiziert und vernachlässigbare 

Unterschiede festgestellt, während meine Kollegen zeigten, dass die Mutante eine drastisch 

reduzierte Bindung an DNA und nukleosomale Substrate aufweist. Als nächstes habe ich die 

R-Loop-Helikase DDX19A untersucht, und eine sehr starke Bindung an H3K27me3-Peptide 

im nanomolaren Bereich nachgewiesen, und so die Ergebnisse einer komplexen funktionellen 

Studie ergänzt. Letztere zeigte, dass WW mit H3K27me3 für eine robuste DDX19A-vermittelte 

R-Loop-Auflösung und LSD1-Zielgen-Silencing notwendig ist. 

 
Zusammenfassung 

11 

Mit UHRF1-TTD habe ich die bevorzugte Bindung an H3K4me1-K9me2/3 Peptide vs. nur 

H3K9me2/3 enthaltende Peptide entdeckt und quantifiziert und Mutanten mit spezifischer und 

differentieller Bindung generiert, und so einen neuartigen Kme1 Auslesemechanismus 

entdeckt, der auf der WW von Methylengruppen aus R207 mit der H3K4me1-Methylgruppe 

und auf der Zählung der H-Bindungskapazität von H3K4 beruht. Hochdurchsatz-DNA 

Sequenzierungsdaten (HTS) zeigten eine starke TTD-Bindung an Chromatin-Regionen mit 

H3K4me1-Peaks mit breitem H3K9me2/3-Signal, die angereichert an Enhancern und 

Promotoren zelltypspezifischer Gene, an den Flanken zelltypspezifischer Transkriptionsfaktor-

Bindungsstellen vorkommen. Daten des nativen UHRF1 Proteins aus Maus- und Humanzellen 

belegten die physiologische Rolle der Doppelmarkierung bei der TTD-vermittelten UHRF1-

Rekrutierung.  

 
Zur weiteren Veranschaulichung wurde das UHRF1-abhängiges Repeat-Element-

Silencing (RE) untersucht. Hierfür habe ich RepEnTools entwickelt, ein Programm zur ChIP-

Seq-Datenanalyse von REs, dass auf älteren Programmen aufbaut.  Dabei werden neue 

bioinformatische Werkzeuge mit sorgfältig ausgewählten und validierten Einstellungen 

eingesetzt, die Zugänglichkeit verbessert und einige Schlüsselfunktionen hinzugefügt. 

RepEnTools Analysen zeigten, dass die Chromatinbindung von hUHRF1-TTD und mUHRF1 

stark an verschiedenen RE-Promotoren mit H3K4me1-K9me2/3-Doppelmarkierungen 

angereichert war und UHRF1 dann deren Expression unterdrückt. Die Daten deuten auf eine 

neue funktionelle Rolle des H3K4me1-K9me3 Signals im Histon-Code hin, die sowohl 

sequenzunabhängig als auch in zwei verschiedenen Säugetierarten konserviert ist. Insgesamt 

stimmen die hier vorgestellten Ergebnisse mit der Histon-Code-Theorie überein und 

unterstützen sie, was am besten durch UHRF1-TTD veranschaulicht wird, die spezifische 

Doppelmodifikationen bindet, deren biologische Bedeutung über die Bedeutungen der 

einzelnen Modifikationen hinausgeht. 

 
In dieser Arbeit habe ich verschiedene Mechanismen vorgestellt, die die epigenomische 

Regulation beeinflussen, darunter die 3D-Architektur des Chromatins, die Zugänglichkeit, die 

Rekrutierung von Transkriptionsfaktoren, und die Chromatinmarkierungen. Im 

Zusammenhang mit den Funktionen von UHRF1-TTD habe ich erörtert, wie DNA, RNA, 

Histone, und kovalente Modifikationen miteinander verwoben sind, um das lebensnotwendige 

Signalnetzwerk zu bilden, das während der gesamten Lebensdauer der Säugetierzelle, während 

der Differenzierung, der Entwicklung und jeder anderen Lebensphase erforderlich ist. In dem 

dreidimensionalen Gerüst der Chromatinstrukturen bilden Biomoleküle und ihre 


Zusammenfassung 

12 

Modifikationen ein kontextspezifisches Netzwerk von Effektoren und Bewahrern der 

epigenomischen Modifikationen. Die Art und Weise, wie sie die Transkription und Translation 

beeinflussen, wird gerade erst enträtselt. Aktuelle Daten legen nahe, dass es nicht nur einen 

Histon-Code gibt, sondern auch einen Code für 3D-Chromatin-Modifikationen der bestimmt 

wie Modifikationen die epigenomische Regulation durch Interaktionen entlang des Chromatins 

und durch den 3D Raum kollektiv umsetzen. Wie in diesen Projekten gezeigt wurde, nutzen die 

Lesedomänen weitgehend den Mechanismus der multivalenten Wechselwirkungen, um solche 

kontextbezogenen Signale zu interpretieren und epigenomische Effektoren zu ihren Zielen zu 

bringen. Die in dieser Arbeit entwickelten und angewandten Werkzeuge und Arbeitsabläufe 

können genutzt werden, um weitere Fälle von verfeinertem Read-out unter HiMIDs 

aufzudecken. 

 
Zusätzlich habe ich meine Erfahrung mit Fluoreszenzspektrometrie in Beiträgen bei zwei 

weiteren veröffentlichten Studien eingebracht. In der ersten Studie wurde gezeigt, dass die 

DNMT3A-ADD Zn-Finger-Domäne, ein bekannter H3K4me0-Leser, auch an eine Domäne des 

MECP2-Proteins bindet. Die Assoziation wurde quantifiziert, und die Spezifität wurde mit 

einer bindungsdefizienten Dreifachmutante nachgewiesen. Diese Interaktion bietet DNMT3A 

und MECP2 im Zusammenspiel mit dem Histoncode komplexe zusätzliche 

Regulationsmöglichkeiten. Die zweite Studie konzentrierte sich auf SETD2, ein Enzym, das 

H3K36me3 generiert, und den Mechanismus seiner starken Präferenz für ein entworfenes 

"Supersubstrat"-Peptid. Durch eine elegante Kombination von Computersimulationen und 

experimentellen Daten zeigte die Studie, dass ein H3 Peptidsubstrat in Lösung überwiegend in 

einer gestreckten Konformation vorliegt, während das Supersubstrat eine Haarnadel-

Konformation bildet. Nach der Bindung an das Enzym wird die Haarnadel geöffnet und das 

Supersubstrat nimmt eine ähnliche Konformation wie das kanonische Substrat an. Diese 

Ergebnisse verdeutlichten die dynamische Natur der Konformationen von gelösten Peptiden, 

ihre Auswirkungen auf Protein-Protein-Wechselwirkungen und die Bedeutung dynamischer 

Konformationsänderungen bei Wechselwirkungen.  


1. Introduction 

13 

1. Introduction 

1.1. Phenotype, epigenetics and nucleosomes 

Eukaryotes maintain large amounts of genomic material, most of which is stored in the 

nucleus. Throughout the lifetime of a cell, numerous perceived stimuli will require an 

appropriate response, while cells of multicellular organisms must follow programmes of 

differentiation and development. The very large nuclear DNA molecules contain an enormous 

amount of information, necessary for 

transcription, replication and other 

DNA-templated processes. Access to 

the parts of information not actively 

used is not required, and therefore they 

can be compressed and stored in a more 

efficient manner. However, the resulting 

genomic architecture needs to be 

dynamic, adaptable and reversible 

(Luger, et al., 2012). It must also be 

heritable from parent to daughter cells. 

This control over the dissemination of 

genetic information is known as 

epigenetics, and controls the cellular 

phenotype (Berger, et al., 2009). During 

development, the epigenetic landscape 

is changing step-wise, reflecting the 

alterations in cellular phenotype. This 

plasticity is best illustrated by the 

Waddington landscape model: 

embryonic cells begin to differentiate, 

eventually taking a single fork on the 

path, while de-differentiation can also 

occur, e.g. in cancer (Figure 1A). 

DNA compaction steps have 

evolved on the basis the molecule’s 

physical properties. The histones H2A, 

H2B, H3, and H4 form an octamer of 

Figure 1. A The epigenetic landscape of Waddington 
illustrates the plasticity of differentiation paths. Figure 
adapted from Rajagopal et al., 2016 (Rajagopal and Stanger, 
2016). B Schematic chromatin architecture and mechanisms 
of epigenetic signalling. DNA is wrapped around 
nucleosomes, which are packed in chromatin fibres. PTM, 
post-translational modification. Figure adapted from Rosa et 
al., 2013. (Rosa and Shaw, 2013) C Histone H3 tail sequence 
demonstrating various post-translational modifications and 
the residue they are known to occur on. 


1. Introduction 

14 

histone heterodimers. As DNA carries a negative charge from the phosphate backbone, 147 bp 

wrap 1.7 times counter-clockwise around the very positively charged histone octamer to form 

the canonical nucleosome (Figure 1B). The DNA can be covalently modified, as in 5-methyl 

cytosine (5mC) and its oxidation products. Also, the flexible N-terminal tails of each histone 

carry a large number of post-translational modifications (PTM), e.g. lysine and arginine 

methylation or lysine acetylation, especially on the H3 and H4 histones (Jenuwein and Allis, 

2001; Millan-Zambrano, et al., 2022; Strahl and Allis, 2000). Mass spectrometry has identified 

hundreds of different histone marks and that a single histone H3 tail typically carries 2-5 

modified residues simultaneously (Janssen, et al., 2019; Lu, et al., 2021) (Figure 1C). 

Approximately 60 bp of linker DNA bridge successive nucleosomes and nucleosomes form 3D 

structures for higher-order arrangement. This is achieved with the help of additional proteins, 

all together known as chromatin.  

 
Chromatin can have different properties and states, varying between very compacted and 

fully accessible. Simply put, since the condensed parts can efficiently sequester unneeded genes 

from the transcription machinery, genes can transition from one chromatin state to another in a 

reversible manner, and additional information is used to finely tune the outcome. The 

epigenomic signalling and control mechanisms are used to regulate gene expression, as well as 

homeostasis, response to stimuli, cell cycle progress, and cellular differentiation processes. 

Evolution developed four principal epigenome signalling mechanisms based on the modulation 

of the physicochemical properties of chromatin, viz. DNA methylation, histone PTMs, histone 

variants, and non-coding RNA (Allis and Jenuwein, 2016). Together these mechanisms shape 

the epigenetic landscape in a reversible and heritable manner and regulate the cellular 

phenotype. With the occasional exception of the latter, the other three mechanisms provide in 

situ signals. While some have direct effects nucleosome compaction and chromatin structure, 

others are directly interpreted by “reader” protein domains that interact non-covalently, 

specifically and selectively. Many readers are known for PTMs on histone tails and 5mC. 

Accordingly, associated enzymes are called “writers” or “erasers” depending on their influence 

on the mark. For histone PTMs, the exact position, type, and degree of modification are 

important (Figure 1C), as this allows for specific and selective interactions. A very thorough 

ACS Chemical Review on lysine methylation analysed most aspects of lysine methylation, 

including the mechanisms and physicochemical basis of methyllysine read-out by the readers 

(Luo, 2018). Methyllysine has minimal changes in size and no change in charge when compared 

to an unmodified lysine, suggesting the simultaneous use of multiple intricate mechanisms for 

discrimination between Kme0/1/2/3. The author also proposed that readers can amplify the 


1. Introduction 

15 

effect via multivalent interactions and multiple PTMs. The interactions between reader domains 

for histone PTMs, known as HiMIDs (Histone Modification Interacting Domains), and 

combinations of epigenome marks are the main focus of this work. Additional consideration 

will be given to the narrow biological meaning of these signals, as well as their biological 

context and the broader implications for our understanding of phenotype regulation and the 

associated mechanisms. 

 
“An epigenetic trait is a stably heritable phenotype resulting from changes in a 
chromosome without alterations in the DNA sequence.” (Berger, et al., 2009) 


1. Introduction 

16 

1.2. Genes, regulatory elements and epigenomic control 

Metazoan genomes include protein-coding (~ 2 %) and various types of RNA genes, while 

49% of the mammalian genome is occupied by repeat elements (REs) (Stamidis and Żylicz, 

2023). The latter include transposable elements (TEs), i.e. integrated viral sequences, and 

simple tandem repeats that may have functional roles, e.g. at centromeres and telomeres. The 

most studied principal functional element in genomic research is the protein-coding gene, with 

one or more exons extending over the gene body. At the beginning of a gene we find the 

transcriptional start site (TSS), a nucleosome-free region in actively transcribed genes, while 

the region 500-1000 bp upstream from TSS comprises the promoter. This is the easiest to 

identify gene-regulatory region, and a collection of all promoters in the human genome was 

only recently completed (Nurk, et al., 2022). However, transcriptional regulation starts at the 

enhancers.  

 
The enhancers are genomic regions where transcription factors (TF) and the protein 

complexes necessary to prepare for transcription are first recruited to bring together all the 

necessary participants and tightly modulate the process (Figure 2A). Enhancers are challenging 

to identify since an enhancer region can be proximal (1-10 kb from TSS) or distal (10- >100 kb 

from TSS), and they are often cell-type specific, therefore not detectable in all cells. The criteria 

to identify enhancer regions are non-trivial. Principal requirements are a nucleosome-free 

center, enhancer-specific epigenomic signals, enhancer-specific interactors, and bi-directional 

enhancer-RNA (eRNA) starting from the center (Carullo and Day, 2019). The accessible DNA 

typically contains sequence motifs preferentially bound by one or more TFs, called TF binding 

sites (TFBS). The least complex interaction model is the Association-By-Contact (ABC) model 

for cis acting enhancers proximal to the affected gene. However, association of a specific gene 

to a specific enhancer is another hurdle as there might be multiple interacting enhancers for one 

gene, resulting in cooperative or competitive regulation (Figure 2B), or even one enhancer 

regulating multiple genes. Therefore, multiple methods of high-throughput experiments and 

combinatorial computational analyses were developed to untangle this complex situation. 

Recently, the ChIP-Enrich database integrated all available data on human enhancers and based 

on improved analytical methods it provides the most comprehensive enhancer analysis 

programme to date (Qin, et al., 2022). Using these data, the ChIP-Enrich programme takes as 

input user provided ChIP-seq peaks (or similar) and analyses their location in regard to overlap 

of promoters, proximal enhancers, or distal enhancers. A number of different settings and 

options are available to fine-tune the next step, which is the assignment of the regions to specific 


1. Introduction 

17 

genes. The “hybrid” setting of the algorithm uses information from the ABC model and distal 

enhancer assignments as appropriate. 

 
The step-wise process of enhancer formation is typically described as starting with a 

pioneering TF opening the condensed chromatin, additional TFs being recruited, remodelling 

complexes establishing the nucleosome-free center of the enhancer region, the Mediator 

complex looping the enhancer(s) to a gene promoter, and RNA-Pol II expressing eRNA (Calo 

and Wysocka, 2013; Carullo and Day, 2019; Panigrahi and O'Malley, 2021) (Figure 2A). 

Enhancers typically also carry specific epigenome marks. Histone H3 lysine 4 

monomethylation (H3K4me1) is a mark found on primed and active enhancers, but often 

associated regardless of enhancer activity (Calo and Wysocka, 2013; Rada-Iglesias, 2018). 

Enhancers of actively transcribed genes carry acetylated H3K27 (H3K27ac) (Figure 2C), 

whereas primed enhancers have 5-hydroxymethyl cytosine (5hmC) (Figure 2D). This way, two 

different epigenome marks are combined to signal a change in the enhancer state. Interestingly, 

5-methylcytosine in CpG context was shown to inhibit or stimulate binding of different TFs, 

especially for developmentally important ones (Yin, et al., 2017). Despite great advances in the 

last decades, a lot regarding enhancers and their mechanism of function is still unclear 

(Panigrahi and O'Malley, 2021). 

 
Figure 2. A Scheme of typical interactors at an active genomic enhancer and enhancer-promoter looping.           

B Alternative models for enhancer regulation at one gene. C Epigenome marks and interactors typically found at 

an active enhancer. D A primed enhancer represents a different enhancer stage, made clear by the different 

epigenomic marks. Panels A and B adapted from Carullo et al. 2019 (Carullo and Day, 2019), panels C and D 

adapted from Calo et al. 2013 (Calo and Wysocka, 2013). 


1. Introduction 

18 

1.3. Chromatin architecture, organisation and regulation 

 
Figure 3. A Simple scheme of the various chromosome bands on a chromosome and their nuclear position.          

B Detailed scheme depicting the typical nuclear organisation of compartment A (active) and B, hetero- and 

euchromatin, lamina associated domains (LADs), and topologically associated domains (TADs). C Exemplary 

depiction of the TAD surrounding a single gene and its regulatory elements. The intensity of the red colour 

denotes the strength of pair-wise interactions between regions, demonstrating the regulatory neighbourhood 

formed by the loops. D Two-dimensional interpretation of the data from Panel C. Here, the positions of the 

ChromHMM enhancers are marked. Panel A adapted from Eberhart et al., 2013(Eberhart, et al., 2013), panel B 

adapted from Padeken et al., 2022 (Padeken, et al., 2022), panels C and D adapted from Diehl et al, 2020 (Diehl, 

et al., 2020).  

Moving from nucleosomes and individual genes to higher-order chromatin organisation, 

each chromosome contains large structures with specific physical properties and interactions. 

Stemming from the early chromosome-banding techniques, the condensed, intensely stained 

parts of chromatin are known as heterochromatin, whereas the rest of the genome is more 

accessible and lighter coloured, hence called euchromatin (Figure 3A).  

 
Heterochromatin is further distinguished in constitutive (C-band) and facultative (G or 

Giemsa-band), and the latter is challenging to stain, more dynamic and with genes that 


1. Introduction 

19 

participate in development and differentiation (Figure 3A) (Eberhart, et al., 2013). On the other 

hand, the constitutive heterochromatin comprises mostly tandem repeats, such as 

pericentromeric satellites. It corresponds to ~ 8% of the human genome and was previously 

considered to correlate with consistently strongly suppressed gene expression (Nurk, et al., 

2022). Of particular interest to this work, this is the only part of the genome robust enough to 

survive the harsh C-banding chemical treatment, while chromatin fragmentation experiments 

gave rise to the term sonication-resistant chromatin (Becker, et al., 2017). This great resistance 

of constitutive heterochromatin to mechanical and chemical attempts of solubilisation creates 

massive challenges to its analysis. Recently, extensive research begun to unravel the mystery it 

poses. We now have ample evidence that it harbours REs and genes that are being expressed to 

some degree (Altemose, et al., 2022; Hoyt, et al., 2022), as well as chromatin associated protein 

complexes to repress their expression (McCarthy, et al., 2021). 

 
Information from modern and far more complex experiments revealed the existence of an 

active compartment (A) and another one (B), which mostly overlap with the euchromatic and 

heterochromatic regions (Figure 3B). The regions close to the nuclear lamina constitute the 

lamina associated domains (LADs), while genome-wide chromatin is locally organised in 

topologically associated domains (TADs). Cohesin and CCCTC-binding factor (CTCF) 

cooperate to form and regulate dynamic boundaries, extruding these broad regions to form loops 

(Diehl, et al., 2020; Mach and Giorgetti, 2023). Within these, frequent interactions between 

otherwise distant genomic sequences can take place (Figure 3C-D), generating regulatory 

neighbourhoods that can bring together one gene and multiple enhancers. Such interactions are 

only possible through the 3D space, explaining distal interactions and complex regulatory 

networks (Figure 2) that extend beyond the simple ABC model of 1 gene - 1 enhancer 

interaction. It has been shown experimentally that these interaction networks and the 

mechanisms regulating them have direct effects on gene regulation (Padeken, et al., 2022). 

However, research into the compartmentalisation of the genome and the associated functions is 

still at an early stage with many questions yet to be answered.  

 
1.4. Repeat elements, regulation and function 

As mentioned previously, REs constitute about half of the mammalian genome 

(Stamidis and Żylicz, 2023). Aberrant expression or expression and transposition of REs are 

deleterious to mammalian cells. These points indicate how important regulation of REs is for 

the host. REs will be briefly introduced here, along with some information on their silencing 


1. Introduction 

20 

and their functions. An excellent overview was published recently (Fueyo, et al., 2022), with 

additional works examining specific aspects of this broad and multifaceted subject (Grundy, et 

al., 2022; Protasova, et al., 2021; Stamidis and Żylicz, 2023). Other reviews focus more on the 

functions of REs (Gasparotto, et al., 2023; Gebrie, 2023; Geis and Goff, 2020; Senft and 

Macfarlan, 2021).   

 
Figure 4. Scheme of the various families of repeat elements and their typical structure. Interspersed repeats 

include transposable elements capable of autonomous retrotransposition and non-autonomous ones, while 

tandem repeats include satellites, simple and low complexity repeats. Figure modified from Hoyt et al, 2022 

(Hoyt, et al., 2022). 

Repeats are classified into tandem repeats, short sequences repeated almost identically, 

and transposable elements (TEs) (Figure 4). The former are typically satellite repeats found on 

(peri-) centromeres and telomeres, while TEs are further divided into DNA transposons (not 

shown), non-autonomous, and autonomous retrotransposons. The latter encode all the proteins 

required for their retrotransposition. Table 1 summarises the terms that describe the grouping 

hierarchy. Another characteristic is competence in retrotransposition (rc) or lack thereof (non-

rc), referring to whether the specific instance of a TE still contains the necessary functional 

elements or the sequence has been degraded with time. Autonomous TEs are found in the clases 

of LTR (Long Terminal Repeats) elements (including endogenous retroviri, ERVs) or Long 

Interspersed Elements (LINEs). ERVs consist of two long terminal repeats (LTRs) flanking the 

open reading frames that encode the viral proteins. The deregulation of TEs typically results in 

dedifferentiation, cancer, genomic instability, and/or cell death (Fueyo, et al., 2022). The main 

evolutionary reaction are robust mechanisms to silence their expression, co-transcriptionally or 

post-transcriptionally. The principal co-transcriptional gene silencing (CTGS) pathways are 

based on a) DNA methylation, b) histone modifications, and c) germline-specific small RNA. 

Post-transcriptional mechanisms of RE silencing refer to pathways that destabilise their RNA 

transcripts and shorten their half-life, such as DICER mediated cleavage, and/or prevent their 

translation. As these processes are cytoplasmic, they will not be discussed further.  


1. Introduction 

21 

 
The major transposition competent ERVs in humans are young TEs such as HERVK, 

HERVH, and the SVAs that contain ERVK, and in mice they are the IAPEz elements (Fueyo, 

et al., 2022; Wang, et al., 2005). In both species, some LINE-1 (L1) elements are also active. 

The primary model of TE expression regulation in mammalian nuclei was developed from data 

on ERV silencing in pre-implantation embryonic cells. In most somatic cells, TE expression 

levels are very low, non-detectable with standard protocols (Hoyt, et al., 2022). There, silencing 

is achieved via promoter 5mC (Greenberg and Bourc'his, 2019; Kato, et al., 2018). During 

embryonic development and germline establishment, 5mC levels are drastically reduced to 

reprogram the cells’ epigenome, resulting in a need to re-enforce RE suppression. Of note, TE 

expression is indispensable for embryonic development and lineage-specific development, and 

it is tightly regulated (Fueyo, et al., 2022). 

 
In a simplified description, CTGS starts with binding of sequence specific KRAB-ZFPs 

(Krüppel-associated box domain zinc-finger protein) to the RE promoter, followed by 

recruitment of a co-repressor named TRIM28 or KAP1 followed by SETDB1, deposition of 

H3K9me3 and later 5mC (Greenberg and Bourc'his, 2019; Padeken, et al., 2022; Schultz, et al., 

2002; Yang, et al., 2022). SETDB1 is a H3K9me2/3 methyltransferase that participates in 

multiple TE silencing mechanisms (Bilodeau, et al., 2009; Karimi, et al., 2011; Kato, et al., 

2018; Matsui, et al., 2010; Seczynska, et al., 2022). DNA methylation is the more robust, long-

term silencing mechanism, that once established makes the ZFP path redundant, while 

H3K9me3 provides an added repressive mechanism (Kato, et al., 2018). This illustrates how 

the interplay of silencing effectors and the various epigenomic signals is leveraged for 

establishment and maintenance of persistent RE silencing. The model described here is 

conceptionally simple and favoured by evolution to regulate a wide range of targets, but actual 

implementation is non-trivial. To make matters even more complicated, ERVs evolve their 

sequences, especially in the promoter region, to escape suppression by their host. Over millions 

of years, the host genome adapts the KRAB-ZFP TFs to avoid escape, strongly indicating co-

Table 1. Overview of the RE grouping hierarchy terms as used by the RE database Dfam 
and here (Storer, et al., 2021). 

Term example 1 example 2 
Class LTR LINE 

Superfamily ERV L1 
Family ERVK L1PA 

Subfamily or RE LTR22 L1PA1 
RE instance LTR22 or L1PA1 at specific genomic coordinates 


1. Introduction 

22 

evolution of the KRAB-ZFP TFs along with mammalian REs, in a constant arms race to supress 

the latter (Boissinot and Sookdeo, 2016; Jacobs, et al., 2014; Kato, et al., 2018; Pontis, et al., 

2019). Nevertheless, the most recently diverged TEs have yet to elicit the evolution of an 

appropriate repressor TF, making transcriptional regulation of the youngest TEs a very 

challenging issue, which indeed is a matter of life or death for the cell. In the end, additional 

tools and alternative approaches are necessary for the survival of the cell, the organism and the 

species. This is most evident by the co-option and domestication of REs into regulatory roles 

that allow the further evolution of mammalian genomes and epigenomes (Gasparotto, et al., 

2023; Gebrie, 2023; Senft and Macfarlan, 2021). 

 
A special note must be made regarding L1s, that comprise ~21% of the human genome, 

and can essentially be separated into non-transcribed, transcription competent but not intact, 

and transcription and transposition competent (intact) copies (Fueyo, et al., 2022; Seczynska, 

et al., 2022). Young L1s are strongly expressed and also transposition competent in ESC, 

producing intronless mRNAs (Percharde, et al., 2018). They are named L1P in Primates, with 

PA1/HS being the youngest and exclusive to Hominidae, and active PA2 elements are shared 

with Pan troglodytes, while young L1Md (Md_T, G, and A) are active in Mus musculus 

domesticus (Lee, et al., 2007; Sookdeo, et al., 2013). In the past, L1s were considered part of 

the “junk” DNA, not being expressed and having no function. However, their transcription is 

necessary in the early embryonic stages and in neuronal cells (Mangiavacchi, et al., 2021; 

Percharde, et al., 2018). The same is true for ERVs in committed stem cells (Enriquez-Gasca, 

et al., 2023; Fu, et al., 2021; Mohner, et al., 2023). To explain these phenomena, already in 

1969, Britten and Davidson proposed the “gene-battery” model, seeing REs as reservoirs of 

regulatory elements and drivers of gene regulatory evolution (Fueyo, et al., 2022). Pioneering 

studies showed that the L1 anti-sense promoter can enhance transcription of opposite strand 

genes or be transposed as a sense promoter and express silent genes with chimeric mRNA 

(Nigumann, et al., 2002; Speek, 2001). REs may contain promoter or enhancer sequences, and 

specific TFBS, e.g. for KLF TFs (Enriquez-Gasca, et al., 2023; Pontis, et al., 2019; Pontis, et 

al., 2022; Sanchez-Luque, et al., 2019; Xiang, et al., 2022). In recent studies, they were also 

shown to influence 3D chromatin architecture overall and in cell-type specific manners, 

forming insulators and rearranging TADs (Diehl, et al., 2020; Lu, et al., 2021). TEs can have 

additional effects and influence cells in health and in disease (Fueyo, et al., 2022; Gasparotto, 

et al., 2023; Grundy, et al., 2022; Kong, et al., 2019; Mohner, et al., 2023; Shah, et al., 2023; 

Zadran, et al., 2023). Interestingly, TE-based enhancers in mESC are repressed by H3K9me3 

deposited by SETDB1 and related interactors (Barral, et al., 2022; Rowe, et al., 2013).  


1. Introduction 

23 

1.5. Epigenome signals, and the histone code 

1.5.1 Histone lysine modifications 

 
Figure 5. The different lysine (K) methylation states show a change of hydrogen bonding capacity and 

hydrophobicity, but not of charge. Lysine acetylation (ac) neutralizes the residue. 

As mentioned previously, epigenome control mechanisms are based on the modulation of 

the physicochemical properties of chromatin molecules, e.g. in lysine methylation (Figure 5). 

For histone PTMs, the exact position, type and degree of modification are important, as they 

allow for specific and selective interactions (Figure 1C). In the enhancer example, H3K4me1 

is a mark specifically found on primed and active enhancers, whereas H3K4me3 is found 

flanking the TSS of actively transcribed genes, and H3K27ac is found on both enhancers and 

promoters of active genes. Starting from the original idea that each mark has an individual 

message to convey (Turner, 1993), eventually simple but specific roles were assigned to the 

major histone PMTs, aka instructive histone modification patterns (Allis and Jenuwein, 2016; 

Millan-Zambrano, et al., 2022) (Table 2). Importantly, the initial concept attributed absolute, 

unambiguous roles to single histone PTMs on the basis of correlative studies. 

 
Table 2. Exemplary instructive histone H3 lysine modification patterns, according to 
current understanding. Eu. – euchromatin, Hetero. – heterochromatin. 

H3 
Modification 

Location 
Association to 

gene transcription 
Association to 

chromatin state 
K4me1 Enhancers Poised and active Eu. 
K4me3 Promoters Active Eu. 
K9me2 Intergenic Silent? LADs / Eu./ Hetero. 
K9me3 Intergenic, Promoters Silent Constitutive hetero. 

K27me3 Intergenic, Promoters Silent Facultative hetero. 
K27ac Enhancers, Promoters Active Eu. 

K36me2 Intergenic Active Eu./ Hetero. 
K36me3 Gene bodies Active Eu. 

 
Early mass-spectrometry studies found that each H3 tail typically carries 2-5 modifications 

on different residues (cis), and state-of-the-art studies documented ~ 600 combinations of 

double marks on single H3 tails (Lu, et al., 2021). Since each nucleosome contains two copies 


1. Introduction 

24 

of each histone, different PTMs may also be on separate tails (trans). A known example of 

PTMs in cis is H3K9me3-S10ph (phosphorylation), a cell-cycle dependent methyl/phospho-

switch which inhibits read-out by most H3K9me3 readers (Bock, et al., 2011). A landmark 

finding was the co-occurrence of “competing” PTMs on neighbouring histones in trans, that 

brought to focus the intermediate chromatin states (Bernstein, et al., 2006; Rada-Iglesias, et al., 

2011). Other studies correlated histone marks to DNA (hydroxy-)methylation levels (Allis and 

Jenuwein, 2016; Millan-Zambrano, et al., 2022). In light of those findings, the concept of 

instructive PTMs was further expanded to propose the extended “histone code” theory, i.e. 

combinations of histone PTMs and DNA methylation states that work together to encode 

complex, specific information for the cell to interpret (Allis and Jenuwein, 2016; Jenuwein and 

Allis, 2001; Strahl and Allis, 2000), as illustrated with the example of primed and active 

enhancer states. Generally, double marks were shown to either signal for a synergistic effect by 

amplifying the individual ones, or indicate a new biological message. 

 
1.5.2 Writers and functional roles of H3K4me1 and H3K9me2/3 

In this section, I will provide additional information on three histone marks of particular 

relevance to this work, viz. H3K4me1, H3K9me2 and H3K9me3. The Shilatifard group showed 

in KO (gene knock-out) studies that H3K4me1 is deposited on enhancers by MLL3 and MLL4, 

and interpreted this as a functional role for this mark (Herz, et al., 2012; Hu, et al., 2013). 

However, mass spectroscopy determined its relative abundance to be ~30%, indicating that it 

is the most frequent modification on H3K4 (Janssen, et al., 2019; Lu, et al., 2021). Hence, 

H3K4me1 certainly cannot be restricted to enhancer regions, which represent only a tiny 

fraction of the genome. Later studies by the same and other groups used catalytic domain 

truncations or inactive mutants of these enzymes and concluded that the absence of the enzyme 

had a much greater effect than loss of the mark (Boileau, et al., 2023; Cao, et al., 2018; Dorighi, 

et al., 2017; Rickels, et al., 2017; Sze, et al., 2017). This might be related to the fact that the 

MLL enzymes are part of huge complexes with additional roles in the recruitment of 

downstream effectors. Commendably, the Shilatifard lab has widely publicised these findings 

(Morgan and Shilatifard, 2020; Morgan and Shilatifard, 2023; Rickels and Shilatifard, 2018). 

They have also zealously pursued their new hypothesis of catalytically independent and mark 

independent effects of histone methyltransferases (Cao, et al., 2018; Douillet, et al., 2020). 

More recent examinations of MLL3/MLL4 dependent H3K4me1 found that the modest effects 

were more pronounced on enhancers with dynamic H3K4me1 levels during epiblast formation 

(mouse embryonic day 4.5 to 5.5) (Boileau, et al., 2023), as well as in the process of enhancer 


1. Introduction 

25 

(re)activation during germline development (Bleckwehl, et al., 2021). A more extensive study 

showed that MLL3/MLL4 catalytic activity, while “largely dispensable for enhancer 

activation” from embryoblast to the three germ layers (mouse embryonic day 3.5 to 7.5), did 

result in notably aberrant lineage selection during further differentiation (Xie, et al., 2023). The 

underlying mechanism of this effect is still unclear.  

 
H3K9me3 is conserved from unicellular to multicellular organisms, and a multitude of 

studies have strongly associated it with constitutive heterochromatin. It is deposited by 

independent or combined efforts of six protein lysine methyltransferases (PKMTs) (Padeken, 

et al., 2022). The most important PKMTs for the present work are SETDB1 and G9a (Fukuda, 

et al., 2021). SETDB1 catalyses H3K9me2/3 in euchromatin and heterochromatin, and 

participates in silencing of lineage-inappropriate genes, and REs via multiple mechanisms 

(Becker, et al., 2016; Bilodeau, et al., 2009; Karimi, et al., 2011; Kato, et al., 2018; Li, et al., 

2006; Matsui, et al., 2010; Schultz, et al., 2002; Seczynska, et al., 2022). Two very interesting 

reviews provide an overview of SETDB1 functions (Fukuda and Shinkai, 2020; Zhu, et al., 

2020). In contrast to SETDB1, G9a can only methylate up to H3K9me2. In lineage-committed 

cells, H3K9me2 appears in LOCKs (Large Organised Chromatin K9 domains), patterns 

spanning many genes to make up megabase-long genome domains, and mass spectrometric 

analyses showed that it is the most abundant PTM on H3 exceeding 60% in relative abundance 

(Janssen, et al., 2019; Lu, et al., 2021). In mESC, H3K9me2 amounts are drastically lower and 

these cells lack LOCKs, indicating that LOCK formation is connected to lineage commitment 

and differentiation (Wen, et al., 2009). Data from multiple publications support that G9a and 

SETDB1 are indispensable for early lineage commitment and maintaining cellular identity 

(Becker, et al., 2016), with large-scale reorganisation of the H3K9me2/3 patterns required as 

differentiation progresses (Becker, et al., 2016; Padeken, et al., 2022). Mouse embryos lacking 

functional SETDB1 perish shortly after implantation (day 4.5-5.5), while G9a defects are lethal 

shortly after that (day 9.5). Other H3K9 PKMTs are non-essential when knocked out 

individually (Cho, et al., 2011). The major roles attributed to both H3K9me2 and H3K9me3 

have been the formation of constitutive heterochromatin, as well as the silencing of genes and 

REs, which are linked here (Padeken, et al., 2022). To this day, accurately dissecting the 

functions of the two marks has been challenging.  

 
1. Introduction 

26 

1.6. Chromatin readers of particular interest for this work 

In this section, I will present the reader proteins and domains that will be discussed in this 

work. Next, I will introduce the principles of an accurate lysine methylation level read-out and 

the HiMID folds of the readers. Finally, I will discuss the principles that enable the efficient 

concurrent recognition of multiple chromatin marks. 

 
1.6.1 DNMT3A contains an H3K36me2/3 reader 

DNA-(cytosine-5)-methyltransferase 3A (DNMT3A) methylates cytosine at the 5’ position 

in double stranded DNA, without needing pre-existing methylation of the opposite strand (de 

novo methylation). It contains the reader domain PWWP that binds H3K36me2/3 (Bock, et al., 

2011; Dhayalan, et al., 2010; Dukatz, et al., 2019; Qiu, et al., 2002; Weinberg, et al., 2019), an 

ADD Zn-finger domain that binds H3K9me3 in the absence of H3K4me3/2 (Dhayalan, et al., 

2011; Otani, et al., 2009), and an S-adenosyl-L-methionine-dependent methyltransferase 

domain that contains the active site (Vire, et al., 2006) (Figure 6). 

 
Figure 6. The known domains of the 908 aa long murine DNA (cytosine-5)-methyltransferase 3A (DNMT3A) 

(UniProtKB code: O88508) protein from amino- to carboxy-terminal end. ADD, ATRX–DNMT3–DNMT3L 

domain; SAMmt, S-adenosyl-L-methionine-dependent methyltransferase domain. Figure created with 

‘MyDomains’ image creator (Hulo, et al., 2008) using information from (Bock, et al., 2011; Dhayalan, et al., 

2010; Dhayalan, et al., 2011; Vire, et al., 2006; Xu, et al., 2015). 

 
The function of DNMT3A is the genome-wide de novo methylation, required for proper 

development of multicellular eukaryotes. The multitudinous paths and intertwined networks 

that result to this are yet to be fully uncovered (Gowher and Jeltsch, 2018; Jeltsch and 

Jurkowska, 2016; Jurkowska and Jeltsch, 2015).  

 
1.6.2 DDX19A contains an H3K27me3 reader 

The DEAD-box Helicase 19A (DDX19A) is an ATP-dependent RNA helicase that 

unwinds RNA:DNA hybrid structures (known as R-loops), that are formed during transcription 

and upon DNA damage (Hodroj, et al., 2017; Pinter, et al., 2021). The study that generated the 

crystal structure of the 96% identical DDX19B proposed a numbered classification scheme for 

the two large protein domains (Collins, et al., 2009) (Figure 7). Domain 1/DEAD and domain 


1. Introduction 

27 

2/CTD form one lobe each, and the N-terminal helix is inserted into the cleft between them. It 

can be displaced by binding of an ATP analogue and has an autoinhibitory function. The 

function of the intrinsically disordered region (IDR) is unclear, but all other domains are 

required for efficient ATP-dependent RNA helicase function. The additional annotation of 

DDX19A domains is based on homology and was retrieved from UniProtKB. 

 
Figure 7. The known domains of the 478 aa long human ATP-dependent RNA helicase DDX19A (UniProtKB 

code: Q9NUU7) protein. Figure created with ‘MyDomains’ image creator (Hulo, et al., 2008) using information 

from (Collins, et al., 2009). IDR, intrinsically disordered region  

 
1.6.3 UHRF1 contains an H3K9me2/3 reader 

The protein Ubiquitin-like with PHD and RING finger domains 1 (UHRF1) contains an N-

terminal ubiquitin-like domain that binds to DNMT1 (Li, et al., 2018), tandem Tudor domains 

that bind H3K9me2/3 in absence of H3K4me3/2 (Nady, et al., 2011), a Zn-finger PHD that 

binds to the unmodified H3R2me0-NTD tail (Rajakumara, et al., 2011; Xie, et al., 2012), a Set-

and RING-Associated (SRA) domain that binds hemi-methylated DNA (Avvakumov, et al., 

2008; Bashtrykov, et al., 2014), and a RING Zn-finger domain that provides a ubiquitin ligase 

function (Figure 8). The tandem Tudor domain has been reported in two different constructs to 

have H3R2me0/K9me3 as optimal substrate with reported KD values ranging from 2 to 22 μM 

(Nady, et al., 2011; Rothbart, et al., 2012). The first Tudor subdomain forms an aromatic cage 

to recognize H3K9me3, with the H3K4 placed in a groove between the tandem subdomains 

after conformation adjustment (Nady, et al., 2011). Moreover, UHRF1 occludes interaction 

surfaces and pockets by flexible linkers that can adopt variable conformations, including two 

autoinhibitory linkers that can occupy the cleft between the two Tudor domains. In the presence 

of specific ligands, spatial rearrangements of UHRF1 domains can occur and the autoinhibitory 

peptides are displaced to allow binding of modified H3 or modified LIG1 peptide (Houliston, 

et al., 2017). 

 
Figure 8. The known domains of the 793 aa long human Ubiquitin-like, with PHD and RING finger domains 1 

(UHRF1) (UniProtKB code: Q96T88) protein. Ubl, ubiquitin-like domain; PHD, plant homeodomain; SRA, Set-

and RING-Associated domain; RING, Really Interesting New Gene ubiquitin ligase domain. Figure created with 


1. Introduction 

28 

‘MyDomains’ image creator (Hulo, et al., 2008) using information from (Arita, et al., 2012; Jurkowska and 

Jeltsch, 2015; Rothbart, et al., 2013; Xu, et al., 2015). 

 
UHRF1 is an E3 ubiquitin-protein ligase with an expression peak in mid S-phase. 

Functioning as a hub for epigenetic processes, the protein is primarily known for its central role 

in maintaining DNA methylation and responding to DNA damage, where it recruits various 

chromatin-interacting proteins and coordinates their functions (Mancini, et al., 2021). Via its 

SRA domain it binds hemi-methylated DNA and recruits DNMT1 to replication foci (Alhosin, 

et al., 2016; Bashtrykov, et al., 2014; Hopfner, et al., 2000), thus acting as an important 

transcription and cell cycle regulator. Recent UHRF1 studies have progressed past its 

established roles in 5mC deposition and DNA damage response to focus on the regulation of 

cell fate during differentiation (Kim, et al., 2018; Obata, et al., 2014; Ramesh, et al., 2016; 

Sakai, et al., 2022; Yamashita, et al., 2017). UHRF1 expression is deregulated in most cancers, 

and these aberrations can be considered as a universal biomarker for cancer (Ashraf, et al., 

2017; Mancini, et al., 2021; Wang, et al., 2019). In the embryonic preimplantation nucleus, only 

small amounts of maternal UHRF1 are available (Maenohara, et al., 2017), and UHRF1-/- mice 

undergo developmental arrest after gastrulation (after embryonic day 7.5) (Sharif, et al., 2007). 

During more than two decades of research, UHRF1 studies have been hampered by the protein’s 

essentiality for cell cycle progression and embryonic survival, as well as its particularly wide-

ranging participation in multiple cellular processes (Fujimori, et al., 1998; Mancini, et al., 2021; 

Muto, et al., 2002).  

 
1.6.4 Principles of HiMID – histone substrate interactions 

The cell machinery interacts non-covalently with the histone PTMs via reader domains, 

aka HiMIDs. These are classified by their protein domain fold, as well as the specific amino 

acid residue and PTM that is bound (Taverna, et al., 2007). The lysine-methylation reader folds 

encompass Royal family modules, Zn–finger Plant homeodomain (PHD), and ankyrins (Zhou, 

2015). Typically, the reader has interactions with an extended epitope to recognise the lysine 

methylation context, e.g. the sequences flanking H3K4me vs H3K9me (Figure 1C), while the 

methylation state is identified by the preferential interactions that leverage the physicochemical 

properties (H-bond capacity, hydrophobicity) specific to that methylation state (Figure 5). 

HiMIDs typically follow the induced fit paradigm, being pre-folded, ready to receive the 

substrate in a suitable groove with minimal reorganisation and associated entropic costs. Using 

a well-understood mechanism, the di- or tri-methylammonium of lysine is recognised within a 

cage of aromatic residues via cation–π interactions and hydrophobic desolvation (Xu, et al., 


1. Introduction 

29 

2015). The presence of a carboxylate group in the cage favours Kme2 over Kme3, due to 

hydrogen bonding and direct ion-pair interaction (Figure 5). Similarly, Kme1 is typically 

recognised by incomplete aromatic cages with two H-bonds formed between the Nε amino 

group and residues from the reader domain (Li, et al., 2007; Liu and Huang, 2018). 

 
This work will focus on a protein with a previously unknown histone reader function 

(DDX19A) and two modules from the Royal Family (DNMT3A-PWWP and UHRF1-Tandem 

Tudor). The HiMID within the former is yet to be identified. Royal Family folds, named after 

the unfortunate offspring of Henry VIII Tudor, have twisted β-barrels of five antiparallel β-

sheets. PWWP domains are found in 20 human proteins, and are loosely conserved (Xu, et al., 

2015). The DNMT3A-PWWP was the first PWWP shown to selectively interact with 

H3K36me3 (Dhayalan, et al., 2010), and in addition to the five-stranded β-barrel, a bundle of 

α-helices is found at the C-terminus (Figure 9A). The Tandem Tudor Domain (TTD) comprises 

two typical Tudor β-barrels, each packed tightly with a linker joining them (Xu, et al., 2015) 

(Figure 9B). UHRF1-TTD was shown to bind H3K9me2 and H3K9me3 (Nady, et al., 2011; 

Rothbart, et al., 2012). It will be shown that the three constructs discussed in this work are 

multivalent, preferentially interacting with multiple substrates (molecules or histone PTMs). 


1. Introduction 

30 

 
Figure 9. Structural basis of histone H3 PTM binding by reader domains of particular interest. A DNMT3A-

PWWP structure in grey ribbon representation, with annotation for the residues within the aromatic cage and the 

H3 peptide highlighted in green (PDB 3LLR) (Rondelet, et al., 2016). B UHRF1-TTD structure in tan and the 

H3 peptide in light red (PDB 2L3R) (Nady, et al., 2011). Panels A and B generated with Chimera (Pettersen, et 

al., 2004). 

 
1. Introduction 

31 

1.6.5 Thermodynamics of multivalent histone PTM read-out 

The mechanisms typically employed by multivalent reader proteins range from formation 

of multimeric complexes of proteins, to having cassettes of multiple reader domains in one 

protein, or more rarely with a single, multivalent HiMID. The same strategies have been 

reported for binding to multiple histone PTMs (Figure 10A) (Li, et al., 2015; Ruthenburg, et 

al., 2007). In a model of two linked but discrete interaction modules, avidity is the result of 

thermodynamic benefits (Ruthenburg, et al., 2007). Explained simply, the read-out of two 

PTMs or two subunits of a complex (e.g. histone PTM and DNA as part of a nucleosome) takes 

place because the first binding event increases the probability of the second happening. This 

reciprocal benefit in association is pivotal, and stems from the link between binding sites on the 

readers, the link between the substrates, and the minimal requirements for reader reorganisation. 

In other words, each single binding event contributes to enthalpy ΔH while preorganisation 

reduces the entropic cost ΔS. This leads to stronger binding for the concurrent double 

interaction, per the free energy equation ΔG=ΔH-TΔS (Figure 10B). It also explains how 

reduction of the net entropy drives multivalent binding, even in weakly interacting systems. 

From an evolutionary perspective, assembly of HiMIDs into complexes or cassettes to identify 

complex substrates represents a more-or-less straightforward modular toolbox approach, 

combining established mechanisms and leveraging clear entropic benefits. 

 
Figure 10. Theoretical basis of multivalent histone PTM engagement. A Multivalent engagement of epigenome 

marks can take place by multiple reading modules, assembled in protein complexes or found in single proteins, 

as well as by single reader domains. Panel A adapted from (Musselman, et al., 2012). B Thermodynamic model 

to explain the increased affinity during multivalent binding, in comparison to single binding events. Panel B 

adapted from (Ruthenburg, et al., 2007). 


1. Introduction 

32 

The previous model assumes negligible costs for binding strain and/or conformational 

adjustment of reader and substrate. Experimental data from single HiMIDs engaging 

multivalent histone PTM substrates suggest that this model is unsuitable, as the simplified 

parameters are non-negligible (Du, et al., 2021; Jurkowska, et al., 2017; Ramón-Maiques, et 

al., 2007; Su, et al., 2014). In all these cases, the entropic costs are not insignificant as reader 

and substrate are rearranged to optimise binding for the combinatorial substrate. This suggests 

that the interaction is driven by strong enthalpic gains, e.g. increasing desolvation and contact 

surfaces, maximising electrostatic interactions etc. This can give rise to difficult to predict 

specificities (e.g. SPIN1 preference for H3K4me3-R8me2a or H3K4me3-K9me3) and 

unexpected novel interaction mechanisms. 

 
There is another very important distinction between a very strong monovalent and an 

equally strong multivalent interaction. In the multivalent case, each individual interaction is 

weaker, and hence they can be broken and blocked afterwards. Therefore, the multivalent 

interaction is easier to regulate, as it is more flexible and suitable for reversible signalling, while 

still providing a strong binding.  


1. Introduction 

33 

1.7. Experimental methods to investigate multivalent interactions 

In light of the ubiquitous presence of the histone code for metazoan genome organisation 

and gene regulation, deciphering it is a major aim of the research community. However, the 

experimental techniques suited to this purpose are limited, and histone PTM-specific antibodies 

typically find extensive use in such investigations despite the challenges they pose 

(Kungulovski and Jeltsch, 2015; Kungulovski, et al., 2015; Ruthenburg, et al., 2007). Among 

protein-protein interaction (PPI) assays, peptide arrays permit high-throughput screening of 

multiple substrates, while equilibrium peptide binding assays can quantify binding constants 

allowing comparisons. CIDOP (Chromatin Interacting DOmain Precipitation) is a protocol 

similar to chromatin immunoprecipitation (ChIP), using native chromatin to selectively retain 

nucleosomes carrying the target PTMs. Afterwards, locus specific (qPCR) or whole genome 

(HTS) analyses of DNA are possible (Figure 11). For histone proteins, Western Blot with 

suitably validated antibodies permits probing a CIDOP sample to detect a specific PTM or the 

combination of multiple ones.  

 
Figure 11. Scheme of the study design utilised in the present work to characterise HiMIDs (Histone 

Modification Interacting Domains) and validate binding to optimal substrates. CIDOP - Chromatin Interacting 

DOmain Precipitation; HTS - High-Throughput Sequencing; WB - Western Blot 

 
1. Introduction 

34 

1.7.1. Biochemical characterisation of HiMIDs 

One approach to investigate the biochemical properties of HiMIDs is to analyse their 

binding to modified peptides. To characterize the specificity in PTM recognition, binding 

studies to as many different peptides with different PTM patters as possible are needed. 

CelluSpots™ peptide arrays offer a particularly powerful approach for an initial screening of 

HiMIDs binding to modified histone tails. These arrays contain 384 synthesised peptide spots 

in duplicate, each of them a 20 aa long peptide of the core histone protein N-terminal tails which 

have up to four combinatorial PTMs, with particular emphasis on the H3 N-terminus (1-19 aa). 

Their development allowed the investigation of the effect of double marks, such as the 

H3K9me3-S10ph, on HiMID binding (Bock, et al., 2011). In the context of this study, the arrays 

allowed for the identification of PTM patterns among the optimal substrates for specific 

HiMIDs. Then, binding affinities to peptides carrying PTMs can be quantified in equilibrium 

binding assays, validating avidity by multiple PTMs in cis vs peptides with a single or no 

modification. In this study, fluorescence anisotropy was extensively used to determine 

equilibrium binding constants (KD) of (modified) peptides to HiMIDs. 

 
Fluorescence anisotropy/depolarisation (FA) and its theoretical framework were developed 

almost a century ago (Perrin, 1926). Incident polarized light is absorbed by fluorophores in the 

sample producing a population of excited fluorophores, with their transition dipole moments 

oriented along the vector of the polarized exciting light (Lakowicz, 2007). When the 

fluorophores return to their ground state, polarised photons are emitted. However, between the 

two processes there is a pause, the lifetime of the excited state, typically around 10 ns. As the 

fluorophores are covalently bound to small peptides, they are highly rotationally mobile and 

affected by Brownian motion with rotational diffusion rates (tumbling) in the range of 0.1 ns, 

whereas a 25 kD protein is expected to rotate with tumbling rates of ~10 ns (Lakowicz, 2007; 

Sosa, et al., 2010). Thus, the tumbling rate of proteins is comparable to the decay time of many 

fluorophores. For a population of the mobile fluorophores the ratio of emission parallel to 

excitation vs. perpendicular to it is not equal to 1, demonstrating depolarisation/anisotropy of 

the polarised fluorescent light which directly depends on the mobility (Figure 12A). The 

fluorimeter requires one polariser for excitation and one for emission. Near-simultaneous 

acquisitions record the parallel (I||) (with respect to polarized excitation) and the perpendicular 

(I⊥) emission, and the equilibrium FA (r) is calculated as r = (I|| − I⊥)/(I|| + 2I⊥) (Lakowicz, 

2007). This definition of anisotropy was originally introduced by A.  Jabłoński (Jablonski, 

1960). An older term named polarisation is defined as P = (I|| − I⊥)/(I|| + I⊥) (Jameson and Ross, 


1. Introduction 

35 

2010; Lakowicz, 2007). The two describe the same phenomenon and essentially contain the 

same information. However, in r the denominator corresponds to the total emission intensity, 

which can be useful for equation simplification.  

 
Figure 12. A Physical principle of fluorescence anisotropy (FA). Tumbling of the fluorophore increases 

depolarisation of the emission. B Experimental principle of FA titrations. Binding of the protein leads to loss of 

anisotropy. 

 
In practical applications, the average change of polarisation between absorption and 

emission is the primary experimental information of interest, and that is dependent on the 

mobility of the fluorophore, which is dependent on the size of the peptide bound fluorophore. 

If the peptide is then bound by a HiMID, the fluorophore becomes attached to a large, slow 

moving complex, strongly reducing depolarisation (Figure 12B). Essentially, by this we detect 

the change in effective molecular volume. Equilibrium peptide binding is investigated by FA 

in titrations with step-wise addition of the larger partner (HiMID) to the fluorescently labelled 

peptide (Rossi and Taylor, 2011).  

 
1. Introduction 

36 

1.7.2. CIDOP and downstream analyses 

Once the optimal substrate has been identified using 

synthetic peptides, placing the discovery in biological 

context is critical. CIDOP and ChIP experiments are 

powerful tools to this end (Kungulovski, et al., 2014). Using 

native chromatin as substrate and enrichment reagents 

(HiMIDs, antibodies) we can study the histone PTMs and 

the genomic sequence of the enriched nucleosomes. After 

validation of the existence of the multivalent substrate in 

native chromatin, the hypothesised interaction with the 

HiMID must be reproduced. Briefly, the method consists of 

native chromatin fragmented to mononucleosome size by 

micrococcal nuclease (MNase), interaction with affinity 

reagents to bind histone PTMs etc, precipitation with 

magnetic beads, and isolation of DNA or histones (Figure 

13). DNA downstream analyses may be quantitative 

polymerase chain reaction (qPCR) or High-Throughput 

Sequencing (HTS). Bioinformatic analysis of the binding 

profiles determined by HTS can be extensive and 

particularly informative. Moreover, the histone PTMs of 

the CIDOP samples can be detected by Western blot, 

making up the profile of histone PTM enrichment of this 

combination of HiMID and assay conditions. During this work, previous protocols were 

optimised and a modular approach applied to enable reagent-specific protocol optimisation.  

 
Samples with adequate enrichment in selected qPCR reporter regions can be investigated 

genome-wide using HTS. Illumina “sequencing-by-synthesis” produces paired-end reads of up 

to 150 bp length by isothermal PCR amplification, identifying each individually added 

nucleotide via fluorescent dyes and terminators that can be removed in the next step (Heather 

and Chain, 2016). Typically, the reads (sequences) are then quality controlled, mapped against 

a reference genome assembly, filtered against known problem regions of the assembly, and then 

quantified (Yohe and Thyagarajan, 2017). The major limitation of the method is found in 

analysing constitutive heterochromatin, GC-rich regions and other REs. The first is due to low 

experimental solubility and therefore under-representation, the second is due to the PCR-like 

Figure 13. Principle of CIDOP 

(Chromatin Interacting DOmain 

Precipitation) and downstream analyses.

Figure modified from Choudalakis et 

al., 2023 (Choudalakis, et al., 2023). 


1. Introduction 

37 

conditions of sequencing, and the latter compounds the previous two with additional 

bioinformatical challenges in assigning matching regions during mapping. Downstream 

analyses can be very extensive and non-trivial. The ChIP-Enrich webserver (chip-

enrich.med.umich.edu) offers the most comprehensive method to analyse enrichments on 

enhancers (Qin, et al., 2022), and webservers that use a list of gene names (gene set), such as 

Enrichr (maayanlab.cloud/Enrichr)(Xie, et al., 2021), can perform Gene Set Enrichment 

Analysis (GSEA) to mine multiple databases for correlation. Since analysis of REs in ChIP-seq 

data is particularly challenging, numerous specialised tools employ a variety of approaches in 

attempts to address this (Goerner-Potvin and Bourque, 2018).  

 
1.8. Biological context of multivalent epigenomic marks  

1.8.1. Overview of concepts in gene regulation: TFs, accessibility, and histone 

code theory 

Eukaryotes are characterised by the mainly nuclear localisation of their genome, its large 

size, and the consequent necessities of chromatin compaction and regulation of gene expression. 

Research into the latter truly started in 1969 with the milestone discoveries of RNA 

polymerases, followed by the first gene-specific DNA-binding TF (TFIIA), necessary for 5S 

rDNA transcription, and the description of the eukaryotic pre-initiation complex (PIC) (Roeder, 

2019). Additional breakthroughs came in 1994, when investigation of chromatinised DNA 

substrates showed that in eukaryotes chromatin remodelling is required for TF binding and 

transcriptional activation. Further experiments demonstrated the looping of enhancers to 

promoters via large protein complexes, with cooperating TFs and chromatin modellers (Figure 

2A). Research bringing together chromatin architecture and gene regulation was met with 

resistance by the TF field that advocated working with naked DNA in purified, reconstituted 

systems, to insure biochemically well-defined assays (Kadonaga, 2019). Today, the effect of 

chromatin accessibility to transcriptional regulation by TFs is widely accepted, although many 

questions are yet to be answered (Mach and Giorgetti, 2023). 

 
Research into epigenetic mechanisms such as histone PTMs and DNA methylation started 

in the 1970s and were already linked to gene regulation by 1980. The mechanisms were shown 

to be ubiquitous among eukaryotes with principles that are largely conserved from yeast to 

humans (Allis and Jenuwein, 2016; Millan-Zambrano, et al., 2022). Correlative studies sparked 

the idea that each histone PTM conveys a simple, unambiguous individual message understood 


1. Introduction 

38 

by reader proteins (Turner, 1993). Landmark articles in 2000-2001 drew insight from available 

data, and put forth the theory that colocalising histone PTMs form a “histone code” (Jenuwein 

and Allis, 2001; Strahl and Allis, 2000), where multiple marks “act in a combinatorial or 

sequential fashion on one or multiple histone tails, to specify unique downstream 

functions”(Strahl and Allis, 2000). Also, the interplay of 5mC and H4ac on gene regulation was 

beautifully demonstrated (Grandjean, et al., 2001). The existence of histone phospho-switches 

highlighted the similarities to signal transduction networks, “suggesting that multiple 

modifications combine to confer bistability, robustness, and adaptability” (Schreiber and 

Bernstein, 2002). Later, cross-talk of marks with opposing “canonical instructions” revealed 

the existence of steady intermediates, aka poised states (Bernstein, et al., 2006; Rada-Iglesias, 

et al., 2011). The interplay of signals was nicely summarised in a contemporary review (Li, et 

al., 2015). Additional discoveries led to extension and refinements in the theory, showcasing 

the interplay between histone PTMs and 5(h)mC to encode the complex information necessary 

to sustain eukaryotic life (Allis and Jenuwein, 2016; Jenuwein and Allis, 2001; Li, et al., 2021; 

Ruthenburg, et al., 2007; Zhang, et al., 2015). Some TF researchers expressed strong opposition 

to the histone code concept (Ptashne, 2013). At that time, the debate was principally fuelled by 

the moderate strength and small number of supporting evidence. Nowadays, epigenome signals 

and the extended histone code is understood as part of the basis for cellular phenotypical 

heterogeneity in many high-profile publications (Carter and Zhao, 2021; Cavalli and Heard, 

2019; Greenberg and Bourc'his, 2019; Lappalainen and Greally, 2017; Millan-Zambrano, et al., 

2022), even extending into specific instances of transgenerational epigenetic inheritance (Fitz-

James and Cavalli, 2022; Skvortsova, et al., 2018). 

 
1.8.2. The histone code theory in practice 

Histone PTMs can act as effectors and maintainers of epigenomic regulation. They can be 

persistent across cell divisions, they are necessary and adequate for downstream effects with 

well documented causality, and they are able to exert their influence in a sequence independent 

manner (Millan-Zambrano, et al., 2022). Of course, the specifics vary for each of the large 

number of known histone PTMs and so far only few have been studied as extensively as the 

major ones (Table 2). Studies have leveraged advanced experimental designs to document the 

fate of parental histone modifications to the daughter cells. After passage of the replication fork, 

the two daughter DNA strands are wrapped around nucleosomes. These contain recycled old 

histones and newly synthesized histones. Then, starting from the S-phase major histone PTMs 

can be restored within one cell cycle (Alabert, et al., 2015), and accurately reproduced to 


1. Introduction 

39 

maintain the epigenomic landscape (Reverón-Gómez, et al., 2018). The fastest mark to be 

restored is H3K4me3 with the process completed by G2 or within 6 h after replication. The 

repressive marks H3K27me3 and H3K9me3 are primarily restored in the next G1 phase. 

H3K9me3 is particularly interesting as it was clearly shown to be inherited independently of 

DNA sequence, DNA methylation, or RNA interference (Audergon, et al., 2015; Ragunathan, 

et al., 2015). Additional details on the process and a beautiful epigenetic circuit for sequence 

independent establishment were published recently (Li, et al., 2023; Yuan and Moazed, 2024). 

 
To address the causality of histone PTMs, some of the best evidence comes from studies 

of H3K9me3, constitutive heterochromatin, their connection, and their functions. Specifically, 

H3K9me3 is necessary for phase separation of the reader CBX5 that forms heterochromatin in 

complex with TRIM28 and SUV39 (Wang, et al., 2019). Loss of H3K9me3 leads to loss of 

heterochromatin (Fukuda, et al., 2021; Montavon, et al., 2021), upregulation of RE transcription 

(McCarthy, et al., 2021; Padeken, et al., 2022; Zhao, et al., 2023), loss of bookmarking TFs 

(Djeghloul, et al., 2023) or gain of TF binding to regulatory elements (Padeken, et al., 2022), 

and cellular senescence (Zhang, et al., 2021). A particularly elegant study demonstrated the 

physiological role of H3K9me3 and sonication-resistant heterochromatin by alternatively 

safeguarding pluripotency in uncommitted embryonic cells or lineage fidelity during 

differentiation (Nicetto, et al., 2019). Other marks are deposited as consequence of an event. 

The very informative review “Histone post-translational modifications — cause and 

consequence of genome function” describes more examples (Millan-Zambrano, et al., 2022). 

Perturbation of the histone marks by mutation of a writer, a reader or a histone residue have 

been identified as the cause of multiple diseases (Li, et al., 2021; Lutsik, et al., 2020).  The clear 

regulatory effects of histone PTMs on gene expression are successfully leveraged in many 

studies of epigenomic editing (Policarpi, et al., 2021).  

 
A note should also be made regarding the different meanings attached to identical terms 

between various authors. Some give causal marks a broad meaning of necessity for specific 

interactions to take place (Allis and Jenuwein, 2016; Morgan and Shilatifard, 2020). This is the 

standard interpretation within the histone code theory, and the one used here. Others consider 

a causal histone PTM to have a direct effect, such as histone core acetylations that cause 

conformational changes to the nucleosome and actively decompact chromatin (Millan-

Zambrano, et al., 2022). This is also termed an “instructive” or causative role, and distinguished 

from histone tail PTMs that typically must be read and therefore have “indirect” effects.  

 
1. Introduction 

40 

Depending on the mark’s nature and function, histone PTMs might be deposited in broad 

domains or sharp peaks, correlating strongly with specific 3D-chromatin structures and 

accessibility (Table 2). The read-out mechanism involves mono- or multivalent readers, that 

interact not only with histone marks, but also with additional “epitopes”, such as the DNA/RNA 

backbone, folding, sequence, or other biomolecules. The interactions of multiple readers in the 

nucleosomal context were recently thoroughly discussed (McGinty and Tan, 2021; Peng, et al., 

2021). The model of thermodynamic benefits discussed in chapter 1.5 explains how multiple 

recognition sites can result in specific and strong engagement and answers for the generally low 

strength of interactions observed in vitro with short monovalent peptides, the typical 

quantitation method. In this way, the combination of “low-affinity” anchor points can assist the 

reader to identify the intended target and bind more strongly, restrict orientations, or cause a 

conformational change (Li, et al., 2015; McGinty and Tan, 2021; Peng, et al., 2021). For 

example, this can be useful in the nucleosomal context to avoid promiscuous targeting of 

histone-like motifs on non-histones. However, chromatin states are unlikely to be discriminated 

solely in this manner. Therefore, on the next level of complexity, the theory proposes the 

coexistence of multiple epige