Biochemical investigations of multivalent chromatin reading domains   Von der Fakultät 3: Chemie der Universität Stuttgart zur Erlangung der Würde eines Doktors der Naturwissenschaften (Dr. rer. nat.) genehmigte Abhandlung     Vorgelegt von Michel Choudalakis aus Athen, Griechenland Hauptberichter: Prof. Dr. Albert Jeltsch Mitberichter: Prof. Dr. Stephan Nußberger Prüfungsausschussvorsitzender: Prof. Dr. Jens Brockmeyer Tag der mündlichen Prüfung: 04.06.2024 Institut für Biochemie und Technische Biochemie der Universität Stuttgart 2024       III Thesis – Antithesis – Synthesis Ὁ βίος βραχύς, ἡ δὲ τέχνη μακρή, ὁ δὲ καιρὸς ὀξύς, ἡ δὲ πεῖρα σφαλερή, ἡ δὲ κρίσις χαλεπή. Ἱπποκράτης (Ἀφορισμοί) Life is short, and the craft long, time is brief, experimentations perilous, and judgment difficult. Hippocrates (Aphorisms) Αἰέν ἀριστεύειν - Ἱππολόχoς ο Λύκιος προς Γλαύκοντα Όμηρος (Ιλιάδα, Ζ’: 208) Always excel - Hippolochus of Lycia to his son Glaucus Homer (Ilias)   IV     V Acknowledgements I would like to express my deepest gratitude to Prof. Dr. Albert Jeltsch for providing me with the opportunity to conduct my doctoral studies in the biochemistry laboratories, for his mentorship, and for his constructive critique during my thesis. His contribution to my development as a scientist is invaluable. Additionally, I am extremely thankful to Prof. Dr. Stephan Nußberger and Prof. Dr. Johannes Kästner for reading and reviewing my thesis as part of the examination committee. Also, a special thank you goes to Prof. Dr. Jens Brockmeyer who was able to cover a sick leave on short notice and participate in the committee on the examination day. I am particularly grateful to Dr. Dr. Pavel Bashtrykov for his support throughout my time here. My studies would not have been the same without his assistance and friendship. His constructive commentary, his support, and his encouragement are of great value to me. Many thanks go to my office mates throughout the years for the great atmosphere, their assistance, their input, and their friendship. I thank the gents, Dr. Julian Broche for instructing me in my first steps in bioinformatics, the endless discussions about our projects, science and the world in general, and Stefan Kunert for the laughter and the jokes. And a special thanks to the ladies whose presence changed the office, especially in reducing the number of energy drink Pfanddosen! To Nivethika Rajaram for being the good person and spirited conversationalist that she is, to Claudia Albrecht for her kind and happy self, and to our latest addition Franziska Dorscht for her dynamic energy and greening up our office! Each added their own personality to the cordial atmosphere and never objected to my spontaneous exclamations, rhetorical questions, and impromptu statements. We are indeed the International Office of Awesome! I also extend my sincere thanks to all my colleagues that assisted me in experiments (especially Jannis, the man of the cell cycle FACS!), brainstormed solutions (TC we might be ChIPing until retirement…), and joined me in the adventure of trying to decipher the mysteries of nature. Gizem, Philipp, Mina, Tabea, Alex, Franzi K., Micha, it was fun talking and doing science with you. I also thank my students Cassandra and Nico for the months we worked together trying to make nature do our bidding and for making me a better teacher. My warmest thanks go to Dragica, Regina, and Branca. They gave me a “good morning” in all the languages we speak, sometimes a meal or a hug, some good advice, and always a friendly   VI smile. They helped me in many ways and I cherish the fond memories. I would be remiss not to thank Elisabeth and Lea, who so often tackle mountains of bureaucracy for us and the institute. I’d like to acknowledge the rest of the group, especially to all who brought cake. You made everybody’s day! To my very dear friends Anastasia, Tomek, Svein Tore, Paul, Hege Jeanette and Robert, and everyone else in Norway, thank you for the amazing times we had together and for throwing a party every time we were allowed meet in groups of more than 3! To Arnhild and Stein Egil a great big thank you for embracing me and making me part of the family, as well as to my sister Emmanouella for her love and support through the almost 4 decades of my existence. During this time too many people to name here changed my life. To all my friends and teachers, thank you. Your efforts were and are greatly appreciated. To Alexandra Elbakyan, thank you for democratising access to science. Your efforts make the world better every day. Finally, it is impossible to convey the depth of gratitude I feel towards my wife, Ida Helene. She is my soulmate and my love, and she has made life beautiful. She has provided me with the unwavering support and encouragement that helped me throughout this challenging journey.     VII List of Publications 6 peer-reviewed articles, one review book chapter, one manuscript under review 1. Dukatz M, Holzer K, Choudalakis M, Emperle M, Lungu C, Bashtrykov P, Jeltsch A. H3K36me2/3 binding and DNA binding of the DNA methyltransferase DNMT3A PWWP domain both contribute to its chromatin interaction. Journal of molecular biology. 2019. 10.1016/j.jmb.2019.09.006 2. Pinter S, Knodel F, Choudalakis M, Schnee P, Kroll C, Fuchs M, Broehm A, Weirich S, Roth M, Eisler SA, Zuber J, Jeltsch A, Rathert P. A functional LSD1 coregulator screen reveals a novel transcriptional regulatory cascade connecting R-loop homeostasis with epigenetic regulation. Nucleic acids research. 2021. 10.1093/nar/gkab180 3. Schnee P, Choudalakis M, Weirich S, Khella MS, Carvalho H, Pleiss J, Jeltsch A. Mechanistic basis of the increased methylation activity of the SETD2 protein lysine methyltransferase towards a designed super-substrate peptide. Communications Chemistry. 2022. 10.1038/s42004-022-00753-w 4. Kunert S, Linhard V, Weirich S, Choudalakis M, Osswald F, Krämer F, Köhler AR, Bröhm A, Wollenhaupt J, Schwalbe H, Jeltsch A. The MECP2-TRD domain interacts with the DNMT3A-ADD domain at the H3-tail binding site. Protein Science. 2023. 10.1002/pro.4542 5. Jeltsch A, Choudalakis M, Dukatz M, Kunert S. Chapter 1: ADD domains – A regulatory hub in chromatin biology and disease. Book: Chromatin readers in Health and Disease - Histone mark readers. 2023, Elsevier. 10.1016/B978-0-12-823376-4.00002-1 6. Choudalakis M, Kungulovski G, Mauser R, Bashtrykov P, Jeltsch A. Refined read-out: The hUHRF1 Tandem-Tudor domain prefers binding to histone H3 tails containing K4me1 in the context of H3K9me2/3. Protein Science. 2023. 10.1002/pro.4760 7. Choudalakis M, Bashtrykov P, Jeltsch A. RepEnTools: An automated repeat enrichment analysis package for ChIP-seq data reveals hUHRF1 Tandem-Tudor domain enrichment in young repeats. Mobile DNA. 2024. 10.1186/s13100-024-00315-y Chandrasekaran TT, Choudalakis M, Bröhm A, Weirich S, Kouroukli AG, Ammerpohl O, Rathert P, Bashtrykov P, Jeltsch A. SETDB1 activity is globally directed by H3K14 acetylation via its Triple Tudor Domain. Under review. bioRxiv: 10.1101/2024.04.22.590554     VIII Declaration of Authorship I hereby certify that the dissertation entitled Biochemical investigations of multivalent chromatin reading domains is entirely my own work except where otherwise indicated. Passages and ideas from other sources have been clearly indicated. Erklärung über die Eigenständigkeit der Dissertation Ich versichere, dass ich die vorliegende Arbeit mit dem Titel Biochemical investigations of multivalent chromatin reading domains selbständig verfasst und keine anderen als die angegebenen Quellen und Hilfsmittel benutzt habe; aus fremden Quellen entnommene Passagen und Gedanken sind als solche kenntlich gemacht. Michel Choudalakis Stuttgart, 19.01.2024   1 Contents Abbreviations ......................................................................................................................................... 5 Abstract .................................................................................................................................................. 7 Zusammenfassung ............................................................................................................................... 10 1. Introduction ..................................................................................................................................... 13 1.1. Phenotype, epigenetics and nucleosomes ................................................................................... 13 1.2. Genes, regulatory elements and epigenomic control .................................................................. 16 1.3. Chromatin architecture, organisation and regulation ................................................................. 18 1.4. Repeat elements, regulation and function .................................................................................. 19 1.5. Epigenome signals, and the histone code ................................................................................... 23 1.5.1 Histone lysine modifications ................................................................................................ 23 1.5.2 Writers and functional roles of H3K4me1 and H3K9me2/3 ................................................ 24 1.6. Chromatin readers of particular interest for this work ............................................................... 26 1.6.1 DNMT3A contains an H3K36me2/3 reader ......................................................................... 26 1.6.2 DDX19A contains an H3K27me3 reader ............................................................................. 26 1.6.3 UHRF1 contains an H3K9me2/3 reader ............................................................................... 27 1.6.4 Principles of HiMID – histone substrate interactions ........................................................... 28 1.6.5 Thermodynamics of multivalent histone PTM read-out ....................................................... 31 1.7. Experimental methods to investigate multivalent interactions ................................................... 33 1.7.1. Biochemical characterisation of HiMIDs ............................................................................ 34 1.7.2. CIDOP and downstream analyses ....................................................................................... 36 1.8. Biological context of multivalent epigenomic marks ................................................................. 37 1.8.1. Overview of concepts in gene regulation: TFs, accessibility, and histone code theory ...... 37 1.8.2. The histone code theory in practice ..................................................................................... 38 1.9. Aims of this study ....................................................................................................................... 42 2. Materials and Methods ................................................................................................................... 44 2.1. Plasmids and mutagenesis .......................................................................................................... 44 2.2. Bacterial strains .......................................................................................................................... 46 2.3. Expression of recombinant proteins ........................................................................................... 47 2 2.4. Purification of recombinant proteins .......................................................................................... 48 2.5. CelluSpots MODified™ histone peptide arrays ......................................................................... 49 2.6. Equilibrium peptide binding ....................................................................................................... 49 2.7 Investigation of bivalent histone enrichment (in vitro) ............................................................... 51 2.7.1 Chromatin extraction and bivalent histone H3 enrichment .................................................. 51 2.7.2 Analysis of bivalent histone H3 modifications after enrichment ......................................... 52 2.7.3 DNA analysis of locus-specific enrichment ......................................................................... 53 2.7.3 DNA library generation for HTS after bivalent histone H3 enrichment .............................. 53 2.8 Investigation of bivalent histone enrichment (in silico) .............................................................. 54 2.8.1 HTS data preparation and correlation analysis ..................................................................... 54 2.8.2 H3K9me2 genome-wide analysis ......................................................................................... 55 2.8.3 UHRF1-TTD CIDOP peak calling and fragmentation ......................................................... 55 2.8.4 Plotting of heatmaps and k-means clustering ....................................................................... 55 2.8.5 Analyses of UHRF1-TTD CIDOP peaks from HepG2 cells ................................................ 56 2.8.6 refTSS analyses .................................................................................................................... 57 2.8.7 Heatmap of ARID5B peaks from HepG2 cells .................................................................... 58 2.8.8 Analyses of data from HCT116 cells ................................................................................... 58 2.8.9 Murine ChIP scatter plots ..................................................................................................... 59 2.8.10 Analyses on REs ................................................................................................................. 59 2.9 Structural analysis ....................................................................................................................... 61 2.10 Statistics..................................................................................................................................... 61 3. Results and Discussion .................................................................................................................... 62 3.1 Technical improvements ............................................................................................................. 62 3.1.1 Improved site-directed mutagenesis protocol with Q5 polymerase ..................................... 62 3.1.2 Improved autoinduction protocol for recombinant protein expression in E. coli ................. 64 3.1.3 CIDOP and ChIP .................................................................................................................. 66 3.2 DNMT3A-ADD and SETD2 spectroscopic assays ..................................................................... 68 3.3 DNMT3A-PWWP binding to H3K36me2/3 and DNA ............................................................... 70 3.4 DDX19A binding to R-loops and to H3K27me3 ........................................................................ 75 3.5 UHRF1-TTD binding to H3K4me1-K9me2/3 ............................................................................ 80   3 3.5.1 UHRF1-TTD binds to H3K4me1-K9me2 peptides on arrays .............................................. 81 3.5.2 UHRF1-TTD binds to H3K4me1-K9me2/3 peptides in equilibrium peptide binding assays ....................................................................................................................................................... 82 3.5.3 UHRF1-TTD mutants bind differentially to H3K9me3 vs H3K4me1-K9me3 peptides ...... 84 3.5.4 UHRF1-TTD adopts discrete binding modes for different marks ........................................ 91 3.5.5 UHRF1-TTD binds nucleosomes with H3K4me1 and H3K9me2/3 .................................... 94 3.5.6 UHRF1-TTD prefers native H3 with both K4me1 and K9me2/3 ........................................ 97 3.5.7 UHRF1-TTD binds on promoters of cell type specific genes and down-regulated genes in HepG2 ......................................................................................................................................... 100 3.5.8 H3K4me1, H3K9me2 and UHRF1-TTD are found on enhancers of cell-type specific genes in HepG2 ..................................................................................................................................... 103 3.5.9 H3K4me1, H3K9me2 and UHRF1-TTD are found on the flanks of cell-type specific TFBS in HepG2 ..................................................................................................................................... 105 3.5.10 Murine UHRF1 correlates to H3K4me1 genome-wide .................................................... 107 3.5.11 UHRF1-TTD down-regulates genes with H3K4me1-K9me2 enriched enhancers .......... 108 3.5.12 Previous studies of UHRF1-TTD functions ..................................................................... 110 3.6. RepEnTools: An automated repeat enrichment analysis package for ChIP-seq data ............... 113 3.6.1. RepEnTools is fast and efficient on the chm13v2 assembly ............................................. 114 3.6.2. RepEnTools is reproducible and accurate for repeat masker regions, excluding some Simple repeats .......................................................................................................................................... 117 3.6.3. RepEnTools screening reveals hUHRF1 Tandem-Tudor Domain enrichment on REs ..... 119 3.6.4. hUHRF1-TTD is enriched on REs with H3K4me1-K9me2 or H3K4me1-K9me3, two distinct double marks ................................................................................................................... 120 3.6.5. mUHRF1 is also enriched on TEs with H3K4me1-K9me2/3 ........................................... 122 3.6.6. Epigenomics of TE regulation, UHRF1 and plausible biological roles ............................ 125 3.7 Future investigations of UHRF1 and H3K4me1-K9me2/3 ....................................................... 129 3.7.1. Considerations regarding the biological context of the investigations .............................. 129 3.7.2. Improved approaches for UHRF1 depletion/removal ....................................................... 130 3.7.3. Future directions for bioinformatic analyses ..................................................................... 132 3.8 Perceptions regarding the histone code theory .......................................................................... 134 3.8.1. Controversy regarding the histone code theory ................................................................. 134 4 3.8.2. Current research on transcriptional regulation and the histone code ................................. 136 4. Conclusions .................................................................................................................................... 137 4.1. Overview of investigations into HiMID multivalent interactions ............................................ 137 4.2. HiMID multivalent chromatin interactions in biological context............................................. 138 4.3. The future of multivalent HiMIDs and the histone code .......................................................... 139 References .......................................................................................................................................... 142 Appendix ............................................................................................................................................ 162   5 Abbreviations 5(h)mC ABC bp ChIP CIDOP DNMT DTT EDTA ESCs F FA FITC GST HiMID HTS IPTG LADs MPP8 PAGE PHD PRC PTM PWWP (q)PCR RE RING SDS SQ STE TAF3 TBS-T TE TF TSS UHRF1 5 (hydroxy-)methylcytosine Association-By-Contact model Base pairs Chromatin immunoprecipitation Chromatin Interacting Domain Precipitation DNA methyltransferase Dithiothreitol Ethylenediaminetetraacetic acid Embryonic stem cells Fluorescein Fluorescence anisotropy Fluorescein isothiocyanate Glutathione S-transferase Histone modification interacting domains High-throughput sequencing Isopropyl β-D-1-thiogalactopyranoside Lamina associated domains Mitotic-phase phosphoprotein 8 Polyacrylamide gel electrophoresis Polycomb Repressive Complex Plant homeodomain Post-translational modifications Pro-Trp-Trp-Pro motif (quantitative) Polymerase Chain Reaction Repeat element Really Interesting New Gene Sodium dodecyl sulphate Starting quantity Saline/tris(hydroxymethyl)aminomethane/EDTA TATA-binding protein-associated factor-1 Tween 20- Tris Buffered Saline Transposable element Transcription factor Transcriptional start site Ubiquitin-like, with PHD and RING finger domains 1   6 Abstract  7  Abstract In eukaryotes, the negatively charged nuclear DNA wraps around cationic histone proteins to form nucleosomes and compact the genetic information. Histones carry several post- translational modifications (PTMs) that appear in combinatorial patterns. These marks are interpreted by non-covalent interactions with proteins containing histone modification interacting domains (HiMIDs), also known as “reader” domains. Thirty years ago, it was proposed that the histone marks act as signals in the regulation of transcription and other chromatin functions. With time, this concept has been refined to suggest that combinatorial patterns of marks represent context-specific signals, termed a 'histone code'. It functions as one of the epigenetic regulatory mechanisms, which control reversible and heritable changes in cellular phenotype. Intermolecular models demonstrate thermodynamic benefits from multivalent engagement of nucleosomes, suggesting their widespread occurrence. However, so far only few multivalent readers are known and dissecting their function has been very challenging. This thesis focuses on HiMIDs with complex roles that simultaneously interact with two histone PTMs or two different substrates. Introducing the theoretical foundation, I discuss the thermodynamic and biological basis of how multivalent interactions can guide effector protein complexes, targeting their functions to distinct regions and chromatin states. Then, I present data from the characterisation of the readers DNMT3A-PWWP, DDX19A, and UHRF1-TTD in the context of multivalent engagement of histone PTMs and biomolecules. Starting with DNMT3A-PWWP, I quantified the binding of the wild-type (WT) and a mutant domain to histone H3K36me2/3 peptides, showing negligible differences, while my colleagues showed that the mutant has drastically reduced binding to DNA and nucleosomal substrates. I, then, studied the R-loop helicase DDX19A to demonstrate a very strong binding to H3K27me3 peptides in the nanomolar range, complementing the findings of a complex functional study. The latter showed that interaction with H3K27me3 is necessary for robust DDX19A-mediated R-loop resolution, and LSD1-target gene silencing. With UHRF1-TTD, I discovered and quantified its preferential binding to H3K4me1- K9me2/3 peptides vs H3K9me2/3 alone and engineered mutants with specific and differential binding changes leading to the discovery of a novel Kme1 read-out mechanism, based on the interaction of R207 methylene groups with the H3K4me1 methyl group and on counting the H- bond capacity of H3K4. High-throughput sequencing (HTS) data revealed strong TTD binding Abstract  8 at chromatin sites with H3K4me1 peaks and broad H3K9me2/3 signal, which are enriched on enhancers and promoters of cell-type specific genes at the flanks of cell-type specific transcription factor binding sites. Data from the full-length protein in mouse and human cells evidenced the physiological role of the H3K4me1-K9me2/3 double marks in TTD-mediated UHRF1 recruitment. To further illustrate this point, I investigated UHRF1-dependent silencing of repeat elements (RE). To this end, I developed RepEnTools, improving the previously available programmes for RE enrichment analysis in chromatin pulldown studies by leveraging new tools, with carefully chosen and validated settings, enhancing accessibility, and adding some key functions. RepEnTools analyses showed that chromatin binding of hUHRF1-TTD and full- length mUHRF1 was strongly enriched on different REs promoters with the H3K4me1-K9me3 double mark where UHRF1 represses their expression. The data suggest a novel functional role for the H3K4me1-K9me3 signal of the histone code that is both sequence independent and conserved in two distinct mammals. Taken together, the work presented here is consistent with and supports the histone code theory, best illustrated by UHRF1-TTD which binds a specific double mark that has a biological meaning going beyond the meaning of the individual marks. In this thesis, I presented various mechanisms that influence epigenomic regulation, including chromatin 3D-architecture, accessibility, transcription factor recruitment, and chromatin marks. Especially in the context of UHRF1-TTD functions, I discussed how DNA, RNA, histones, and covalent modifications thereof interweave to produce the signalling network necessary throughout the lifetime of the mammalian cell, during differentiation, development and every other phase of life. Thus, within the three-dimensional scaffold of chromatin structures these biomolecules and their modifications collectively form the context- specific network of effectors and maintainers of the epigenomic modifications. The ways in which they influence transcription and translation are only now becoming unravelled. Hence, the recent data suggest the existence of not just a histone code, but a 3D-chromatin modification code, which dictates how biomolecules and their modifications collectively implement epigenomic regulation by interactions along the chromatin and through 3D space. As shown in these projects, readers commonly use the mechanism of multivalent interactions to interpret such contextual signals and guide epigenomic effectors to their targets. The tools and workflows that were developed and applied in this work can be employed to reveal more instances of refined read-out among HiMIDs. Abstract  9  Additionally, I leveraged my experience with fluorescence spectroscopy and made contributions to another two published studies. The first study demonstrated that the DNMT3A- ADD Zn-finger domain, which is a known H3K4me0 reader, also binds to a domain from the MECP2 protein. The association was quantified, and the specificity demonstrated with a binding deficient triple mutant. This interaction offers complex additional regulation options to DNMT3A and MECP2, in interplay with the histone code. The second study focused on SETD2, a H3K36me3 depositing enzyme, and the mechanism of its preference for a designed “super substrate” peptide. By elegantly combining computational simulations and experimental data, the study demonstrated that an H3 peptide substrate predominantly exists in an extended conformation in solution, while the super substrate forms a hairpin conformation. Upon binding to the enzyme, the hairpin is opened and the super substrate adopts a similar conformation as the canonical substrate. These results highlighted the dynamic nature of solubilised peptides' conformations, their impact on protein-protein interactions, and the significance of dynamic conformational changes in interactions. Zusammenfassung  10 Zusammenfassung Im Kern eukaryotischer Zellen wickelt sich das Polyanion DNA um kationische Histonproteine, und bildet so Nukleosomen welche die genetische Information verdichten. Histone tragen eine Vielzahl von posttranslationalen Modifikationen (PTMs), die in kombinatorischen Mustern auftreten. Diese Markierungen werden durch nicht-kovalente Wechselwirkungen (WW) mit Proteinen interpretiert, die Histon Modification Interacting Domains (HiMIDs), so genannte "Lese"-Domänen, enthalten. Vor dreißig Jahren wurde vorgeschlagen, dass Histone PTMs als Signale bei der Regulation der Transkription und anderer Chromatin Funktionen fungieren. Dieses Konzept im Laufe der Zeit wurde dahingehend erweitert, dass kombinatorische Muster von Markierungen kontextspezifische Signale darstellen, einen sogenannten "Histon-Code". Dies ist einer der epigenetischen Regulationsmechanismen, die reversible und vererbbare Veränderungen des zellulären Phänotyps kontrollieren. Intermolekulare Modelle zeigen, dass die multivalente Bindung von Nukleosomen thermodynamische Vorteile mit sich bringt, was darauf hindeutet, dass dieser Mechanismus weit verbreitet ist. Es sind allerdings bislang nur wenige multivalente Lesedomänen bekannt, und ihre Funktion zu entschlüsseln ist eine große Herausforderung. Diese Arbeit konzentriert sich auf HiMIDs mit komplexen Funktionen, die mit zwei Histon-PTMs oder zwei unterschiedlichen Substraten interagieren. Nach einer Einführung in die theoretischen Grundlagen diskutiere ich die thermodynamischen und biologischen Prinzipien, die erklären wie multivalente WW Effektorproteinkomplexe lenken können, um ihre Funktionen auf bestimmte Regionen und Chromatinzustände auszurichten. Anschließend präsentiere ich Daten der Charakterisierung der Lesedomänen DNMT3A-PWWP, DDX19A und UHRF1-TTD im Kontext einer multivalenten WW mit Histon-PTMs und Biomolekülen. Beginnend mit DNMT3A-PWWP habe ich die Bindungsstärke der Wildtyp (WT) und einer mutierten Domäne an Histon-H3K36me2/3-Peptide quantifiziert und vernachlässigbare Unterschiede festgestellt, während meine Kollegen zeigten, dass die Mutante eine drastisch reduzierte Bindung an DNA und nukleosomale Substrate aufweist. Als nächstes habe ich die R-Loop-Helikase DDX19A untersucht, und eine sehr starke Bindung an H3K27me3-Peptide im nanomolaren Bereich nachgewiesen, und so die Ergebnisse einer komplexen funktionellen Studie ergänzt. Letztere zeigte, dass WW mit H3K27me3 für eine robuste DDX19A-vermittelte R-Loop-Auflösung und LSD1-Zielgen-Silencing notwendig ist. Zusammenfassung  11  Mit UHRF1-TTD habe ich die bevorzugte Bindung an H3K4me1-K9me2/3 Peptide vs. nur H3K9me2/3 enthaltende Peptide entdeckt und quantifiziert und Mutanten mit spezifischer und differentieller Bindung generiert, und so einen neuartigen Kme1 Auslesemechanismus entdeckt, der auf der WW von Methylengruppen aus R207 mit der H3K4me1-Methylgruppe und auf der Zählung der H-Bindungskapazität von H3K4 beruht. Hochdurchsatz-DNA Sequenzierungsdaten (HTS) zeigten eine starke TTD-Bindung an Chromatin-Regionen mit H3K4me1-Peaks mit breitem H3K9me2/3-Signal, die angereichert an Enhancern und Promotoren zelltypspezifischer Gene, an den Flanken zelltypspezifischer Transkriptionsfaktor- Bindungsstellen vorkommen. Daten des nativen UHRF1 Proteins aus Maus- und Humanzellen belegten die physiologische Rolle der Doppelmarkierung bei der TTD-vermittelten UHRF1- Rekrutierung. Zur weiteren Veranschaulichung wurde das UHRF1-abhängiges Repeat-Element- Silencing (RE) untersucht. Hierfür habe ich RepEnTools entwickelt, ein Programm zur ChIP- Seq-Datenanalyse von REs, dass auf älteren Programmen aufbaut.  Dabei werden neue bioinformatische Werkzeuge mit sorgfältig ausgewählten und validierten Einstellungen eingesetzt, die Zugänglichkeit verbessert und einige Schlüsselfunktionen hinzugefügt. RepEnTools Analysen zeigten, dass die Chromatinbindung von hUHRF1-TTD und mUHRF1 stark an verschiedenen RE-Promotoren mit H3K4me1-K9me2/3-Doppelmarkierungen angereichert war und UHRF1 dann deren Expression unterdrückt. Die Daten deuten auf eine neue funktionelle Rolle des H3K4me1-K9me3 Signals im Histon-Code hin, die sowohl sequenzunabhängig als auch in zwei verschiedenen Säugetierarten konserviert ist. Insgesamt stimmen die hier vorgestellten Ergebnisse mit der Histon-Code-Theorie überein und unterstützen sie, was am besten durch UHRF1-TTD veranschaulicht wird, die spezifische Doppelmodifikationen bindet, deren biologische Bedeutung über die Bedeutungen der einzelnen Modifikationen hinausgeht. In dieser Arbeit habe ich verschiedene Mechanismen vorgestellt, die die epigenomische Regulation beeinflussen, darunter die 3D-Architektur des Chromatins, die Zugänglichkeit, die Rekrutierung von Transkriptionsfaktoren, und die Chromatinmarkierungen. Im Zusammenhang mit den Funktionen von UHRF1-TTD habe ich erörtert, wie DNA, RNA, Histone, und kovalente Modifikationen miteinander verwoben sind, um das lebensnotwendige Signalnetzwerk zu bilden, das während der gesamten Lebensdauer der Säugetierzelle, während der Differenzierung, der Entwicklung und jeder anderen Lebensphase erforderlich ist. In dem dreidimensionalen Gerüst der Chromatinstrukturen bilden Biomoleküle und ihre Zusammenfassung  12 Modifikationen ein kontextspezifisches Netzwerk von Effektoren und Bewahrern der epigenomischen Modifikationen. Die Art und Weise, wie sie die Transkription und Translation beeinflussen, wird gerade erst enträtselt. Aktuelle Daten legen nahe, dass es nicht nur einen Histon-Code gibt, sondern auch einen Code für 3D-Chromatin-Modifikationen der bestimmt wie Modifikationen die epigenomische Regulation durch Interaktionen entlang des Chromatins und durch den 3D Raum kollektiv umsetzen. Wie in diesen Projekten gezeigt wurde, nutzen die Lesedomänen weitgehend den Mechanismus der multivalenten Wechselwirkungen, um solche kontextbezogenen Signale zu interpretieren und epigenomische Effektoren zu ihren Zielen zu bringen. Die in dieser Arbeit entwickelten und angewandten Werkzeuge und Arbeitsabläufe können genutzt werden, um weitere Fälle von verfeinertem Read-out unter HiMIDs aufzudecken. Zusätzlich habe ich meine Erfahrung mit Fluoreszenzspektrometrie in Beiträgen bei zwei weiteren veröffentlichten Studien eingebracht. In der ersten Studie wurde gezeigt, dass die DNMT3A-ADD Zn-Finger-Domäne, ein bekannter H3K4me0-Leser, auch an eine Domäne des MECP2-Proteins bindet. Die Assoziation wurde quantifiziert, und die Spezifität wurde mit einer bindungsdefizienten Dreifachmutante nachgewiesen. Diese Interaktion bietet DNMT3A und MECP2 im Zusammenspiel mit dem Histoncode komplexe zusätzliche Regulationsmöglichkeiten. Die zweite Studie konzentrierte sich auf SETD2, ein Enzym, das H3K36me3 generiert, und den Mechanismus seiner starken Präferenz für ein entworfenes "Supersubstrat"-Peptid. Durch eine elegante Kombination von Computersimulationen und experimentellen Daten zeigte die Studie, dass ein H3 Peptidsubstrat in Lösung überwiegend in einer gestreckten Konformation vorliegt, während das Supersubstrat eine Haarnadel- Konformation bildet. Nach der Bindung an das Enzym wird die Haarnadel geöffnet und das Supersubstrat nimmt eine ähnliche Konformation wie das kanonische Substrat an. Diese Ergebnisse verdeutlichten die dynamische Natur der Konformationen von gelösten Peptiden, ihre Auswirkungen auf Protein-Protein-Wechselwirkungen und die Bedeutung dynamischer Konformationsänderungen bei Wechselwirkungen. 1. Introduction  13  1. Introduction 1.1. Phenotype, epigenetics and nucleosomes Eukaryotes maintain large amounts of genomic material, most of which is stored in the nucleus. Throughout the lifetime of a cell, numerous perceived stimuli will require an appropriate response, while cells of multicellular organisms must follow programmes of differentiation and development. The very large nuclear DNA molecules contain an enormous amount of information, necessary for transcription, replication and other DNA-templated processes. Access to the parts of information not actively used is not required, and therefore they can be compressed and stored in a more efficient manner. However, the resulting genomic architecture needs to be dynamic, adaptable and reversible (Luger, et al., 2012). It must also be heritable from parent to daughter cells. This control over the dissemination of genetic information is known as epigenetics, and controls the cellular phenotype (Berger, et al., 2009). During development, the epigenetic landscape is changing step-wise, reflecting the alterations in cellular phenotype. This plasticity is best illustrated by the Waddington landscape model: embryonic cells begin to differentiate, eventually taking a single fork on the path, while de-differentiation can also occur, e.g. in cancer (Figure 1A). DNA compaction steps have evolved on the basis the molecule’s physical properties. The histones H2A, H2B, H3, and H4 form an octamer of Figure 1. A The epigenetic landscape of Waddington illustrates the plasticity of differentiation paths. Figure adapted from Rajagopal et al., 2016 (Rajagopal and Stanger, 2016). B Schematic chromatin architecture and mechanisms of epigenetic signalling. DNA is wrapped around nucleosomes, which are packed in chromatin fibres. PTM, post-translational modification. Figure adapted from Rosa et al., 2013. (Rosa and Shaw, 2013) C Histone H3 tail sequence demonstrating various post-translational modifications and the residue they are known to occur on. 1. Introduction  14 histone heterodimers. As DNA carries a negative charge from the phosphate backbone, 147 bp wrap 1.7 times counter-clockwise around the very positively charged histone octamer to form the canonical nucleosome (Figure 1B). The DNA can be covalently modified, as in 5-methyl cytosine (5mC) and its oxidation products. Also, the flexible N-terminal tails of each histone carry a large number of post-translational modifications (PTM), e.g. lysine and arginine methylation or lysine acetylation, especially on the H3 and H4 histones (Jenuwein and Allis, 2001; Millan-Zambrano, et al., 2022; Strahl and Allis, 2000). Mass spectrometry has identified hundreds of different histone marks and that a single histone H3 tail typically carries 2-5 modified residues simultaneously (Janssen, et al., 2019; Lu, et al., 2021) (Figure 1C). Approximately 60 bp of linker DNA bridge successive nucleosomes and nucleosomes form 3D structures for higher-order arrangement. This is achieved with the help of additional proteins, all together known as chromatin. Chromatin can have different properties and states, varying between very compacted and fully accessible. Simply put, since the condensed parts can efficiently sequester unneeded genes from the transcription machinery, genes can transition from one chromatin state to another in a reversible manner, and additional information is used to finely tune the outcome. The epigenomic signalling and control mechanisms are used to regulate gene expression, as well as homeostasis, response to stimuli, cell cycle progress, and cellular differentiation processes. Evolution developed four principal epigenome signalling mechanisms based on the modulation of the physicochemical properties of chromatin, viz. DNA methylation, histone PTMs, histone variants, and non-coding RNA (Allis and Jenuwein, 2016). Together these mechanisms shape the epigenetic landscape in a reversible and heritable manner and regulate the cellular phenotype. With the occasional exception of the latter, the other three mechanisms provide in situ signals. While some have direct effects nucleosome compaction and chromatin structure, others are directly interpreted by “reader” protein domains that interact non-covalently, specifically and selectively. Many readers are known for PTMs on histone tails and 5mC. Accordingly, associated enzymes are called “writers” or “erasers” depending on their influence on the mark. For histone PTMs, the exact position, type, and degree of modification are important (Figure 1C), as this allows for specific and selective interactions. A very thorough ACS Chemical Review on lysine methylation analysed most aspects of lysine methylation, including the mechanisms and physicochemical basis of methyllysine read-out by the readers (Luo, 2018). Methyllysine has minimal changes in size and no change in charge when compared to an unmodified lysine, suggesting the simultaneous use of multiple intricate mechanisms for discrimination between Kme0/1/2/3. The author also proposed that readers can amplify the 1. Introduction  15  effect via multivalent interactions and multiple PTMs. The interactions between reader domains for histone PTMs, known as HiMIDs (Histone Modification Interacting Domains), and combinations of epigenome marks are the main focus of this work. Additional consideration will be given to the narrow biological meaning of these signals, as well as their biological context and the broader implications for our understanding of phenotype regulation and the associated mechanisms. “An epigenetic trait is a stably heritable phenotype resulting from changes in a chromosome without alterations in the DNA sequence.” (Berger, et al., 2009) 1. Introduction  16 1.2. Genes, regulatory elements and epigenomic control Metazoan genomes include protein-coding (~ 2 %) and various types of RNA genes, while 49% of the mammalian genome is occupied by repeat elements (REs) (Stamidis and Żylicz, 2023). The latter include transposable elements (TEs), i.e. integrated viral sequences, and simple tandem repeats that may have functional roles, e.g. at centromeres and telomeres. The most studied principal functional element in genomic research is the protein-coding gene, with one or more exons extending over the gene body. At the beginning of a gene we find the transcriptional start site (TSS), a nucleosome-free region in actively transcribed genes, while the region 500-1000 bp upstream from TSS comprises the promoter. This is the easiest to identify gene-regulatory region, and a collection of all promoters in the human genome was only recently completed (Nurk, et al., 2022). However, transcriptional regulation starts at the enhancers. The enhancers are genomic regions where transcription factors (TF) and the protein complexes necessary to prepare for transcription are first recruited to bring together all the necessary participants and tightly modulate the process (Figure 2A). Enhancers are challenging to identify since an enhancer region can be proximal (1-10 kb from TSS) or distal (10- >100 kb from TSS), and they are often cell-type specific, therefore not detectable in all cells. The criteria to identify enhancer regions are non-trivial. Principal requirements are a nucleosome-free center, enhancer-specific epigenomic signals, enhancer-specific interactors, and bi-directional enhancer-RNA (eRNA) starting from the center (Carullo and Day, 2019). The accessible DNA typically contains sequence motifs preferentially bound by one or more TFs, called TF binding sites (TFBS). The least complex interaction model is the Association-By-Contact (ABC) model for cis acting enhancers proximal to the affected gene. However, association of a specific gene to a specific enhancer is another hurdle as there might be multiple interacting enhancers for one gene, resulting in cooperative or competitive regulation (Figure 2B), or even one enhancer regulating multiple genes. Therefore, multiple methods of high-throughput experiments and combinatorial computational analyses were developed to untangle this complex situation. Recently, the ChIP-Enrich database integrated all available data on human enhancers and based on improved analytical methods it provides the most comprehensive enhancer analysis programme to date (Qin, et al., 2022). Using these data, the ChIP-Enrich programme takes as input user provided ChIP-seq peaks (or similar) and analyses their location in regard to overlap of promoters, proximal enhancers, or distal enhancers. A number of different settings and options are available to fine-tune the next step, which is the assignment of the regions to specific 1. Introduction  17  genes. The “hybrid” setting of the algorithm uses information from the ABC model and distal enhancer assignments as appropriate. The step-wise process of enhancer formation is typically described as starting with a pioneering TF opening the condensed chromatin, additional TFs being recruited, remodelling complexes establishing the nucleosome-free center of the enhancer region, the Mediator complex looping the enhancer(s) to a gene promoter, and RNA-Pol II expressing eRNA (Calo and Wysocka, 2013; Carullo and Day, 2019; Panigrahi and O'Malley, 2021) (Figure 2A). Enhancers typically also carry specific epigenome marks. Histone H3 lysine 4 monomethylation (H3K4me1) is a mark found on primed and active enhancers, but often associated regardless of enhancer activity (Calo and Wysocka, 2013; Rada-Iglesias, 2018). Enhancers of actively transcribed genes carry acetylated H3K27 (H3K27ac) (Figure 2C), whereas primed enhancers have 5-hydroxymethyl cytosine (5hmC) (Figure 2D). This way, two different epigenome marks are combined to signal a change in the enhancer state. Interestingly, 5-methylcytosine in CpG context was shown to inhibit or stimulate binding of different TFs, especially for developmentally important ones (Yin, et al., 2017). Despite great advances in the last decades, a lot regarding enhancers and their mechanism of function is still unclear (Panigrahi and O'Malley, 2021).     Figure 2. A Scheme of typical interactors at an active genomic enhancer and enhancer-promoter looping. B Alternative models for enhancer regulation at one gene. C Epigenome marks and interactors typically found at an active enhancer. D A primed enhancer represents a different enhancer stage, made clear by the different epigenomic marks. Panels A and B adapted from Carullo et al. 2019 (Carullo and Day, 2019), panels C and D adapted from Calo et al. 2013 (Calo and Wysocka, 2013). 1. Introduction  18 1.3. Chromatin architecture, organisation and regulation Figure 3. A Simple scheme of the various chromosome bands on a chromosome and their nuclear position. B Detailed scheme depicting the typical nuclear organisation of compartment A (active) and B, hetero- and euchromatin, lamina associated domains (LADs), and topologically associated domains (TADs). C Exemplary depiction of the TAD surrounding a single gene and its regulatory elements. The intensity of the red colour denotes the strength of pair-wise interactions between regions, demonstrating the regulatory neighbourhood formed by the loops. D Two-dimensional interpretation of the data from Panel C. Here, the positions of the ChromHMM enhancers are marked. Panel A adapted from Eberhart et al., 2013(Eberhart, et al., 2013), panel B adapted from Padeken et al., 2022 (Padeken, et al., 2022), panels C and D adapted from Diehl et al, 2020 (Diehl, et al., 2020). Moving from nucleosomes and individual genes to higher-order chromatin organisation, each chromosome contains large structures with specific physical properties and interactions. Stemming from the early chromosome-banding techniques, the condensed, intensely stained parts of chromatin are known as heterochromatin, whereas the rest of the genome is more accessible and lighter coloured, hence called euchromatin (Figure 3A). Heterochromatin is further distinguished in constitutive (C-band) and facultative (G or Giemsa-band), and the latter is challenging to stain, more dynamic and with genes that 1. Introduction  19  participate in development and differentiation (Figure 3A) (Eberhart, et al., 2013). On the other hand, the constitutive heterochromatin comprises mostly tandem repeats, such as pericentromeric satellites. It corresponds to ~ 8% of the human genome and was previously considered to correlate with consistently strongly suppressed gene expression (Nurk, et al., 2022). Of particular interest to this work, this is the only part of the genome robust enough to survive the harsh C-banding chemical treatment, while chromatin fragmentation experiments gave rise to the term sonication-resistant chromatin (Becker, et al., 2017). This great resistance of constitutive heterochromatin to mechanical and chemical attempts of solubilisation creates massive challenges to its analysis. Recently, extensive research begun to unravel the mystery it poses. We now have ample evidence that it harbours REs and genes that are being expressed to some degree (Altemose, et al., 2022; Hoyt, et al., 2022), as well as chromatin associated protein complexes to repress their expression (McCarthy, et al., 2021). Information from modern and far more complex experiments revealed the existence of an active compartment (A) and another one (B), which mostly overlap with the euchromatic and heterochromatic regions (Figure 3B). The regions close to the nuclear lamina constitute the lamina associated domains (LADs), while genome-wide chromatin is locally organised in topologically associated domains (TADs). Cohesin and CCCTC-binding factor (CTCF) cooperate to form and regulate dynamic boundaries, extruding these broad regions to form loops (Diehl, et al., 2020; Mach and Giorgetti, 2023). Within these, frequent interactions between otherwise distant genomic sequences can take place (Figure 3C-D), generating regulatory neighbourhoods that can bring together one gene and multiple enhancers. Such interactions are only possible through the 3D space, explaining distal interactions and complex regulatory networks (Figure 2) that extend beyond the simple ABC model of 1 gene - 1 enhancer interaction. It has been shown experimentally that these interaction networks and the mechanisms regulating them have direct effects on gene regulation (Padeken, et al., 2022). However, research into the compartmentalisation of the genome and the associated functions is still at an early stage with many questions yet to be answered. 1.4. Repeat elements, regulation and function As mentioned previously, REs constitute about half of the mammalian genome (Stamidis and Żylicz, 2023). Aberrant expression or expression and transposition of REs are deleterious to mammalian cells. These points indicate how important regulation of REs is for the host. REs will be briefly introduced here, along with some information on their silencing 1. Introduction  20 and their functions. An excellent overview was published recently (Fueyo, et al., 2022), with additional works examining specific aspects of this broad and multifaceted subject (Grundy, et al., 2022; Protasova, et al., 2021; Stamidis and Żylicz, 2023). Other reviews focus more on the functions of REs (Gasparotto, et al., 2023; Gebrie, 2023; Geis and Goff, 2020; Senft and Macfarlan, 2021).   Figure 4. Scheme of the various families of repeat elements and their typical structure. Interspersed repeats include transposable elements capable of autonomous retrotransposition and non-autonomous ones, while tandem repeats include satellites, simple and low complexity repeats. Figure modified from Hoyt et al, 2022 (Hoyt, et al., 2022). Repeats are classified into tandem repeats, short sequences repeated almost identically, and transposable elements (TEs) (Figure 4). The former are typically satellite repeats found on (peri-) centromeres and telomeres, while TEs are further divided into DNA transposons (not shown), non-autonomous, and autonomous retrotransposons. The latter encode all the proteins required for their retrotransposition. Table 1 summarises the terms that describe the grouping hierarchy. Another characteristic is competence in retrotransposition (rc) or lack thereof (non- rc), referring to whether the specific instance of a TE still contains the necessary functional elements or the sequence has been degraded with time. Autonomous TEs are found in the clases of LTR (Long Terminal Repeats) elements (including endogenous retroviri, ERVs) or Long Interspersed Elements (LINEs). ERVs consist of two long terminal repeats (LTRs) flanking the open reading frames that encode the viral proteins. The deregulation of TEs typically results in dedifferentiation, cancer, genomic instability, and/or cell death (Fueyo, et al., 2022). The main evolutionary reaction are robust mechanisms to silence their expression, co-transcriptionally or post-transcriptionally. The principal co-transcriptional gene silencing (CTGS) pathways are based on a) DNA methylation, b) histone modifications, and c) germline-specific small RNA. Post-transcriptional mechanisms of RE silencing refer to pathways that destabilise their RNA transcripts and shorten their half-life, such as DICER mediated cleavage, and/or prevent their translation. As these processes are cytoplasmic, they will not be discussed further. 1. Introduction  21  The major transposition competent ERVs in humans are young TEs such as HERVK, HERVH, and the SVAs that contain ERVK, and in mice they are the IAPEz elements (Fueyo, et al., 2022; Wang, et al., 2005). In both species, some LINE-1 (L1) elements are also active. The primary model of TE expression regulation in mammalian nuclei was developed from data on ERV silencing in pre-implantation embryonic cells. In most somatic cells, TE expression levels are very low, non-detectable with standard protocols (Hoyt, et al., 2022). There, silencing is achieved via promoter 5mC (Greenberg and Bourc'his, 2019; Kato, et al., 2018). During embryonic development and germline establishment, 5mC levels are drastically reduced to reprogram the cells’ epigenome, resulting in a need to re-enforce RE suppression. Of note, TE expression is indispensable for embryonic development and lineage-specific development, and it is tightly regulated (Fueyo, et al., 2022). In a simplified description, CTGS starts with binding of sequence specific KRAB-ZFPs (Krüppel-associated box domain zinc-finger protein) to the RE promoter, followed by recruitment of a co-repressor named TRIM28 or KAP1 followed by SETDB1, deposition of H3K9me3 and later 5mC (Greenberg and Bourc'his, 2019; Padeken, et al., 2022; Schultz, et al., 2002; Yang, et al., 2022). SETDB1 is a H3K9me2/3 methyltransferase that participates in multiple TE silencing mechanisms (Bilodeau, et al., 2009; Karimi, et al., 2011; Kato, et al., 2018; Matsui, et al., 2010; Seczynska, et al., 2022). DNA methylation is the more robust, long- term silencing mechanism, that once established makes the ZFP path redundant, while H3K9me3 provides an added repressive mechanism (Kato, et al., 2018). This illustrates how the interplay of silencing effectors and the various epigenomic signals is leveraged for establishment and maintenance of persistent RE silencing. The model described here is conceptionally simple and favoured by evolution to regulate a wide range of targets, but actual implementation is non-trivial. To make matters even more complicated, ERVs evolve their sequences, especially in the promoter region, to escape suppression by their host. Over millions of years, the host genome adapts the KRAB-ZFP TFs to avoid escape, strongly indicating co- Table 1. Overview of the RE grouping hierarchy terms as used by the RE database Dfam and here (Storer, et al., 2021). Term example 1 example 2 Class LTR LINE Superfamily ERV L1 Family ERVK L1PA Subfamily or RE LTR22 L1PA1 RE instance LTR22 or L1PA1 at specific genomic coordinates 1. Introduction  22 evolution of the KRAB-ZFP TFs along with mammalian REs, in a constant arms race to supress the latter (Boissinot and Sookdeo, 2016; Jacobs, et al., 2014; Kato, et al., 2018; Pontis, et al., 2019). Nevertheless, the most recently diverged TEs have yet to elicit the evolution of an appropriate repressor TF, making transcriptional regulation of the youngest TEs a very challenging issue, which indeed is a matter of life or death for the cell. In the end, additional tools and alternative approaches are necessary for the survival of the cell, the organism and the species. This is most evident by the co-option and domestication of REs into regulatory roles that allow the further evolution of mammalian genomes and epigenomes (Gasparotto, et al., 2023; Gebrie, 2023; Senft and Macfarlan, 2021). A special note must be made regarding L1s, that comprise ~21% of the human genome, and can essentially be separated into non-transcribed, transcription competent but not intact, and transcription and transposition competent (intact) copies (Fueyo, et al., 2022; Seczynska, et al., 2022). Young L1s are strongly expressed and also transposition competent in ESC, producing intronless mRNAs (Percharde, et al., 2018). They are named L1P in Primates, with PA1/HS being the youngest and exclusive to Hominidae, and active PA2 elements are shared with Pan troglodytes, while young L1Md (Md_T, G, and A) are active in Mus musculus domesticus (Lee, et al., 2007; Sookdeo, et al., 2013). In the past, L1s were considered part of the “junk” DNA, not being expressed and having no function. However, their transcription is necessary in the early embryonic stages and in neuronal cells (Mangiavacchi, et al., 2021; Percharde, et al., 2018). The same is true for ERVs in committed stem cells (Enriquez-Gasca, et al., 2023; Fu, et al., 2021; Mohner, et al., 2023). To explain these phenomena, already in 1969, Britten and Davidson proposed the “gene-battery” model, seeing REs as reservoirs of regulatory elements and drivers of gene regulatory evolution (Fueyo, et al., 2022). Pioneering studies showed that the L1 anti-sense promoter can enhance transcription of opposite strand genes or be transposed as a sense promoter and express silent genes with chimeric mRNA (Nigumann, et al., 2002; Speek, 2001). REs may contain promoter or enhancer sequences, and specific TFBS, e.g. for KLF TFs (Enriquez-Gasca, et al., 2023; Pontis, et al., 2019; Pontis, et al., 2022; Sanchez-Luque, et al., 2019; Xiang, et al., 2022). In recent studies, they were also shown to influence 3D chromatin architecture overall and in cell-type specific manners, forming insulators and rearranging TADs (Diehl, et al., 2020; Lu, et al., 2021). TEs can have additional effects and influence cells in health and in disease (Fueyo, et al., 2022; Gasparotto, et al., 2023; Grundy, et al., 2022; Kong, et al., 2019; Mohner, et al., 2023; Shah, et al., 2023; Zadran, et al., 2023). Interestingly, TE-based enhancers in mESC are repressed by H3K9me3 deposited by SETDB1 and related interactors (Barral, et al., 2022; Rowe, et al., 2013). 1. Introduction  23  1.5. Epigenome signals, and the histone code 1.5.1 Histone lysine modifications   Figure 5. The different lysine (K) methylation states show a change of hydrogen bonding capacity and hydrophobicity, but not of charge. Lysine acetylation (ac) neutralizes the residue. As mentioned previously, epigenome control mechanisms are based on the modulation of the physicochemical properties of chromatin molecules, e.g. in lysine methylation (Figure 5). For histone PTMs, the exact position, type and degree of modification are important, as they allow for specific and selective interactions (Figure 1C). In the enhancer example, H3K4me1 is a mark specifically found on primed and active enhancers, whereas H3K4me3 is found flanking the TSS of actively transcribed genes, and H3K27ac is found on both enhancers and promoters of active genes. Starting from the original idea that each mark has an individual message to convey (Turner, 1993), eventually simple but specific roles were assigned to the major histone PMTs, aka instructive histone modification patterns (Allis and Jenuwein, 2016; Millan-Zambrano, et al., 2022) (Table 2). Importantly, the initial concept attributed absolute, unambiguous roles to single histone PTMs on the basis of correlative studies. Table 2. Exemplary instructive histone H3 lysine modification patterns, according to current understanding. Eu. – euchromatin, Hetero. – heterochromatin. H3 Modification Location Association to gene transcription Association to chromatin state K4me1 Enhancers Poised and active Eu. K4me3 Promoters Active Eu. K9me2 Intergenic Silent? LADs / Eu./ Hetero. K9me3 Intergenic, Promoters Silent Constitutive hetero. K27me3 Intergenic, Promoters Silent Facultative hetero. K27ac Enhancers, Promoters Active Eu. K36me2 Intergenic Active Eu./ Hetero. K36me3 Gene bodies Active Eu. Early mass-spectrometry studies found that each H3 tail typically carries 2-5 modifications on different residues (cis), and state-of-the-art studies documented ~ 600 combinations of double marks on single H3 tails (Lu, et al., 2021). Since each nucleosome contains two copies 1. Introduction  24 of each histone, different PTMs may also be on separate tails (trans). A known example of PTMs in cis is H3K9me3-S10ph (phosphorylation), a cell-cycle dependent methyl/phospho- switch which inhibits read-out by most H3K9me3 readers (Bock, et al., 2011). A landmark finding was the co-occurrence of “competing” PTMs on neighbouring histones in trans, that brought to focus the intermediate chromatin states (Bernstein, et al., 2006; Rada-Iglesias, et al., 2011). Other studies correlated histone marks to DNA (hydroxy-)methylation levels (Allis and Jenuwein, 2016; Millan-Zambrano, et al., 2022). In light of those findings, the concept of instructive PTMs was further expanded to propose the extended “histone code” theory, i.e. combinations of histone PTMs and DNA methylation states that work together to encode complex, specific information for the cell to interpret (Allis and Jenuwein, 2016; Jenuwein and Allis, 2001; Strahl and Allis, 2000), as illustrated with the example of primed and active enhancer states. Generally, double marks were shown to either signal for a synergistic effect by amplifying the individual ones, or indicate a new biological message. 1.5.2 Writers and functional roles of H3K4me1 and H3K9me2/3 In this section, I will provide additional information on three histone marks of particular relevance to this work, viz. H3K4me1, H3K9me2 and H3K9me3. The Shilatifard group showed in KO (gene knock-out) studies that H3K4me1 is deposited on enhancers by MLL3 and MLL4, and interpreted this as a functional role for this mark (Herz, et al., 2012; Hu, et al., 2013). However, mass spectroscopy determined its relative abundance to be ~30%, indicating that it is the most frequent modification on H3K4 (Janssen, et al., 2019; Lu, et al., 2021). Hence, H3K4me1 certainly cannot be restricted to enhancer regions, which represent only a tiny fraction of the genome. Later studies by the same and other groups used catalytic domain truncations or inactive mutants of these enzymes and concluded that the absence of the enzyme had a much greater effect than loss of the mark (Boileau, et al., 2023; Cao, et al., 2018; Dorighi, et al., 2017; Rickels, et al., 2017; Sze, et al., 2017). This might be related to the fact that the MLL enzymes are part of huge complexes with additional roles in the recruitment of downstream effectors. Commendably, the Shilatifard lab has widely publicised these findings (Morgan and Shilatifard, 2020; Morgan and Shilatifard, 2023; Rickels and Shilatifard, 2018). They have also zealously pursued their new hypothesis of catalytically independent and mark independent effects of histone methyltransferases (Cao, et al., 2018; Douillet, et al., 2020). More recent examinations of MLL3/MLL4 dependent H3K4me1 found that the modest effects were more pronounced on enhancers with dynamic H3K4me1 levels during epiblast formation (mouse embryonic day 4.5 to 5.5) (Boileau, et al., 2023), as well as in the process of enhancer 1. Introduction  25  (re)activation during germline development (Bleckwehl, et al., 2021). A more extensive study showed that MLL3/MLL4 catalytic activity, while “largely dispensable for enhancer activation” from embryoblast to the three germ layers (mouse embryonic day 3.5 to 7.5), did result in notably aberrant lineage selection during further differentiation (Xie, et al., 2023). The underlying mechanism of this effect is still unclear. H3K9me3 is conserved from unicellular to multicellular organisms, and a multitude of studies have strongly associated it with constitutive heterochromatin. It is deposited by independent or combined efforts of six protein lysine methyltransferases (PKMTs) (Padeken, et al., 2022). The most important PKMTs for the present work are SETDB1 and G9a (Fukuda, et al., 2021). SETDB1 catalyses H3K9me2/3 in euchromatin and heterochromatin, and participates in silencing of lineage-inappropriate genes, and REs via multiple mechanisms (Becker, et al., 2016; Bilodeau, et al., 2009; Karimi, et al., 2011; Kato, et al., 2018; Li, et al., 2006; Matsui, et al., 2010; Schultz, et al., 2002; Seczynska, et al., 2022). Two very interesting reviews provide an overview of SETDB1 functions (Fukuda and Shinkai, 2020; Zhu, et al., 2020). In contrast to SETDB1, G9a can only methylate up to H3K9me2. In lineage-committed cells, H3K9me2 appears in LOCKs (Large Organised Chromatin K9 domains), patterns spanning many genes to make up megabase-long genome domains, and mass spectrometric analyses showed that it is the most abundant PTM on H3 exceeding 60% in relative abundance (Janssen, et al., 2019; Lu, et al., 2021). In mESC, H3K9me2 amounts are drastically lower and these cells lack LOCKs, indicating that LOCK formation is connected to lineage commitment and differentiation (Wen, et al., 2009). Data from multiple publications support that G9a and SETDB1 are indispensable for early lineage commitment and maintaining cellular identity (Becker, et al., 2016), with large-scale reorganisation of the H3K9me2/3 patterns required as differentiation progresses (Becker, et al., 2016; Padeken, et al., 2022). Mouse embryos lacking functional SETDB1 perish shortly after implantation (day 4.5-5.5), while G9a defects are lethal shortly after that (day 9.5). Other H3K9 PKMTs are non-essential when knocked out individually (Cho, et al., 2011). The major roles attributed to both H3K9me2 and H3K9me3 have been the formation of constitutive heterochromatin, as well as the silencing of genes and REs, which are linked here (Padeken, et al., 2022). To this day, accurately dissecting the functions of the two marks has been challenging. 1. Introduction  26 1.6. Chromatin readers of particular interest for this work In this section, I will present the reader proteins and domains that will be discussed in this work. Next, I will introduce the principles of an accurate lysine methylation level read-out and the HiMID folds of the readers. Finally, I will discuss the principles that enable the efficient concurrent recognition of multiple chromatin marks. 1.6.1 DNMT3A contains an H3K36me2/3 reader DNA-(cytosine-5)-methyltransferase 3A (DNMT3A) methylates cytosine at the 5’ position in double stranded DNA, without needing pre-existing methylation of the opposite strand (de novo methylation). It contains the reader domain PWWP that binds H3K36me2/3 (Bock, et al., 2011; Dhayalan, et al., 2010; Dukatz, et al., 2019; Qiu, et al., 2002; Weinberg, et al., 2019), an ADD Zn-finger domain that binds H3K9me3 in the absence of H3K4me3/2 (Dhayalan, et al., 2011; Otani, et al., 2009), and an S-adenosyl-L-methionine-dependent methyltransferase domain that contains the active site (Vire, et al., 2006) (Figure 6). Figure 6. The known domains of the 908 aa long murine DNA (cytosine-5)-methyltransferase 3A (DNMT3A) (UniProtKB code: O88508) protein from amino- to carboxy-terminal end. ADD, ATRX–DNMT3–DNMT3L domain; SAMmt, S-adenosyl-L-methionine-dependent methyltransferase domain. Figure created with ‘MyDomains’ image creator (Hulo, et al., 2008) using information from (Bock, et al., 2011; Dhayalan, et al., 2010; Dhayalan, et al., 2011; Vire, et al., 2006; Xu, et al., 2015). The function of DNMT3A is the genome-wide de novo methylation, required for proper development of multicellular eukaryotes. The multitudinous paths and intertwined networks that result to this are yet to be fully uncovered (Gowher and Jeltsch, 2018; Jeltsch and Jurkowska, 2016; Jurkowska and Jeltsch, 2015). 1.6.2 DDX19A contains an H3K27me3 reader The DEAD-box Helicase 19A (DDX19A) is an ATP-dependent RNA helicase that unwinds RNA:DNA hybrid structures (known as R-loops), that are formed during transcription and upon DNA damage (Hodroj, et al., 2017; Pinter, et al., 2021). The study that generated the crystal structure of the 96% identical DDX19B proposed a numbered classification scheme for the two large protein domains (Collins, et al., 2009) (Figure 7). Domain 1/DEAD and domain 1. Introduction  27  2/CTD form one lobe each, and the N-terminal helix is inserted into the cleft between them. It can be displaced by binding of an ATP analogue and has an autoinhibitory function. The function of the intrinsically disordered region (IDR) is unclear, but all other domains are required for efficient ATP-dependent RNA helicase function. The additional annotation of DDX19A domains is based on homology and was retrieved from UniProtKB. Figure 7. The known domains of the 478 aa long human ATP-dependent RNA helicase DDX19A (UniProtKB code: Q9NUU7) protein. Figure created with ‘MyDomains’ image creator (Hulo, et al., 2008) using information from (Collins, et al., 2009). IDR, intrinsically disordered region 1.6.3 UHRF1 contains an H3K9me2/3 reader The protein Ubiquitin-like with PHD and RING finger domains 1 (UHRF1) contains an N- terminal ubiquitin-like domain that binds to DNMT1 (Li, et al., 2018), tandem Tudor domains that bind H3K9me2/3 in absence of H3K4me3/2 (Nady, et al., 2011), a Zn-finger PHD that binds to the unmodified H3R2me0-NTD tail (Rajakumara, et al., 2011; Xie, et al., 2012), a Set- and RING-Associated (SRA) domain that binds hemi-methylated DNA (Avvakumov, et al., 2008; Bashtrykov, et al., 2014), and a RING Zn-finger domain that provides a ubiquitin ligase function (Figure 8). The tandem Tudor domain has been reported in two different constructs to have H3R2me0/K9me3 as optimal substrate with reported KD values ranging from 2 to 22 μM (Nady, et al., 2011; Rothbart, et al., 2012). The first Tudor subdomain forms an aromatic cage to recognize H3K9me3, with the H3K4 placed in a groove between the tandem subdomains after conformation adjustment (Nady, et al., 2011). Moreover, UHRF1 occludes interaction surfaces and pockets by flexible linkers that can adopt variable conformations, including two autoinhibitory linkers that can occupy the cleft between the two Tudor domains. In the presence of specific ligands, spatial rearrangements of UHRF1 domains can occur and the autoinhibitory peptides are displaced to allow binding of modified H3 or modified LIG1 peptide (Houliston, et al., 2017). Figure 8. The known domains of the 793 aa long human Ubiquitin-like, with PHD and RING finger domains 1 (UHRF1) (UniProtKB code: Q96T88) protein. Ubl, ubiquitin-like domain; PHD, plant homeodomain; SRA, Set- and RING-Associated domain; RING, Really Interesting New Gene ubiquitin ligase domain. Figure created with 1. Introduction  28 ‘MyDomains’ image creator (Hulo, et al., 2008) using information from (Arita, et al., 2012; Jurkowska and Jeltsch, 2015; Rothbart, et al., 2013; Xu, et al., 2015). UHRF1 is an E3 ubiquitin-protein ligase with an expression peak in mid S-phase. Functioning as a hub for epigenetic processes, the protein is primarily known for its central role in maintaining DNA methylation and responding to DNA damage, where it recruits various chromatin-interacting proteins and coordinates their functions (Mancini, et al., 2021). Via its SRA domain it binds hemi-methylated DNA and recruits DNMT1 to replication foci (Alhosin, et al., 2016; Bashtrykov, et al., 2014; Hopfner, et al., 2000), thus acting as an important transcription and cell cycle regulator. Recent UHRF1 studies have progressed past its established roles in 5mC deposition and DNA damage response to focus on the regulation of cell fate during differentiation (Kim, et al., 2018; Obata, et al., 2014; Ramesh, et al., 2016; Sakai, et al., 2022; Yamashita, et al., 2017). UHRF1 expression is deregulated in most cancers, and these aberrations can be considered as a universal biomarker for cancer (Ashraf, et al., 2017; Mancini, et al., 2021; Wang, et al., 2019). In the embryonic preimplantation nucleus, only small amounts of maternal UHRF1 are available (Maenohara, et al., 2017), and UHRF1-/- mice undergo developmental arrest after gastrulation (after embryonic day 7.5) (Sharif, et al., 2007). During more than two decades of research, UHRF1 studies have been hampered by the protein’s essentiality for cell cycle progression and embryonic survival, as well as its particularly wide- ranging participation in multiple cellular processes (Fujimori, et al., 1998; Mancini, et al., 2021; Muto, et al., 2002). 1.6.4 Principles of HiMID – histone substrate interactions The cell machinery interacts non-covalently with the histone PTMs via reader domains, aka HiMIDs. These are classified by their protein domain fold, as well as the specific amino acid residue and PTM that is bound (Taverna, et al., 2007). The lysine-methylation reader folds encompass Royal family modules, Zn–finger Plant homeodomain (PHD), and ankyrins (Zhou, 2015). Typically, the reader has interactions with an extended epitope to recognise the lysine methylation context, e.g. the sequences flanking H3K4me vs H3K9me (Figure 1C), while the methylation state is identified by the preferential interactions that leverage the physicochemical properties (H-bond capacity, hydrophobicity) specific to that methylation state (Figure 5). HiMIDs typically follow the induced fit paradigm, being pre-folded, ready to receive the substrate in a suitable groove with minimal reorganisation and associated entropic costs. Using a well-understood mechanism, the di- or tri-methylammonium of lysine is recognised within a cage of aromatic residues via cation–π interactions and hydrophobic desolvation (Xu, et al., 1. Introduction  29  2015). The presence of a carboxylate group in the cage favours Kme2 over Kme3, due to hydrogen bonding and direct ion-pair interaction (Figure 5). Similarly, Kme1 is typically recognised by incomplete aromatic cages with two H-bonds formed between the Nε amino group and residues from the reader domain (Li, et al., 2007; Liu and Huang, 2018). This work will focus on a protein with a previously unknown histone reader function (DDX19A) and two modules from the Royal Family (DNMT3A-PWWP and UHRF1-Tandem Tudor). The HiMID within the former is yet to be identified. Royal Family folds, named after the unfortunate offspring of Henry VIII Tudor, have twisted β-barrels of five antiparallel β- sheets. PWWP domains are found in 20 human proteins, and are loosely conserved (Xu, et al., 2015). The DNMT3A-PWWP was the first PWWP shown to selectively interact with H3K36me3 (Dhayalan, et al., 2010), and in addition to the five-stranded β-barrel, a bundle of α-helices is found at the C-terminus (Figure 9A). The Tandem Tudor Domain (TTD) comprises two typical Tudor β-barrels, each packed tightly with a linker joining them (Xu, et al., 2015) (Figure 9B). UHRF1-TTD was shown to bind H3K9me2 and H3K9me3 (Nady, et al., 2011; Rothbart, et al., 2012). It will be shown that the three constructs discussed in this work are multivalent, preferentially interacting with multiple substrates (molecules or histone PTMs). 1. Introduction  30     Figure 9. Structural basis of histone H3 PTM binding by reader domains of particular interest. A DNMT3A- PWWP structure in grey ribbon representation, with annotation for the residues within the aromatic cage and the H3 peptide highlighted in green (PDB 3LLR) (Rondelet, et al., 2016). B UHRF1-TTD structure in tan and the H3 peptide in light red (PDB 2L3R) (Nady, et al., 2011). Panels A and B generated with Chimera (Pettersen, et al., 2004). 1. Introduction  31  1.6.5 Thermodynamics of multivalent histone PTM read-out The mechanisms typically employed by multivalent reader proteins range from formation of multimeric complexes of proteins, to having cassettes of multiple reader domains in one protein, or more rarely with a single, multivalent HiMID. The same strategies have been reported for binding to multiple histone PTMs (Figure 10A) (Li, et al., 2015; Ruthenburg, et al., 2007). In a model of two linked but discrete interaction modules, avidity is the result of thermodynamic benefits (Ruthenburg, et al., 2007). Explained simply, the read-out of two PTMs or two subunits of a complex (e.g. histone PTM and DNA as part of a nucleosome) takes place because the first binding event increases the probability of the second happening. This reciprocal benefit in association is pivotal, and stems from the link between binding sites on the readers, the link between the substrates, and the minimal requirements for reader reorganisation. In other words, each single binding event contributes to enthalpy ΔH while preorganisation reduces the entropic cost ΔS. This leads to stronger binding for the concurrent double interaction, per the free energy equation ΔG=ΔH-TΔS (Figure 10B). It also explains how reduction of the net entropy drives multivalent binding, even in weakly interacting systems. From an evolutionary perspective, assembly of HiMIDs into complexes or cassettes to identify complex substrates represents a more-or-less straightforward modular toolbox approach, combining established mechanisms and leveraging clear entropic benefits.   Figure 10. Theoretical basis of multivalent histone PTM engagement. A Multivalent engagement of epigenome marks can take place by multiple reading modules, assembled in protein complexes or found in single proteins, as well as by single reader domains. Panel A adapted from (Musselman, et al., 2012). B Thermodynamic model to explain the increased affinity during multivalent binding, in comparison to single binding events. Panel B adapted from (Ruthenburg, et al., 2007). 1. Introduction  32 The previous model assumes negligible costs for binding strain and/or conformational adjustment of reader and substrate. Experimental data from single HiMIDs engaging multivalent histone PTM substrates suggest that this model is unsuitable, as the simplified parameters are non-negligible (Du, et al., 2021; Jurkowska, et al., 2017; Ramón-Maiques, et al., 2007; Su, et al., 2014). In all these cases, the entropic costs are not insignificant as reader and substrate are rearranged to optimise binding for the combinatorial substrate. This suggests that the interaction is driven by strong enthalpic gains, e.g. increasing desolvation and contact surfaces, maximising electrostatic interactions etc. This can give rise to difficult to predict specificities (e.g. SPIN1 preference for H3K4me3-R8me2a or H3K4me3-K9me3) and unexpected novel interaction mechanisms. There is another very important distinction between a very strong monovalent and an equally strong multivalent interaction. In the multivalent case, each individual interaction is weaker, and hence they can be broken and blocked afterwards. Therefore, the multivalent interaction is easier to regulate, as it is more flexible and suitable for reversible signalling, while still providing a strong binding. 1. Introduction  33  1.7. Experimental methods to investigate multivalent interactions In light of the ubiquitous presence of the histone code for metazoan genome organisation and gene regulation, deciphering it is a major aim of the research community. However, the experimental techniques suited to this purpose are limited, and histone PTM-specific antibodies typically find extensive use in such investigations despite the challenges they pose (Kungulovski and Jeltsch, 2015; Kungulovski, et al., 2015; Ruthenburg, et al., 2007). Among protein-protein interaction (PPI) assays, peptide arrays permit high-throughput screening of multiple substrates, while equilibrium peptide binding assays can quantify binding constants allowing comparisons. CIDOP (Chromatin Interacting DOmain Precipitation) is a protocol similar to chromatin immunoprecipitation (ChIP), using native chromatin to selectively retain nucleosomes carrying the target PTMs. Afterwards, locus specific (qPCR) or whole genome (HTS) analyses of DNA are possible (Figure 11). For histone proteins, Western Blot with suitably validated antibodies permits probing a CIDOP sample to detect a specific PTM or the combination of multiple ones. Figure 11. Scheme of the study design utilised in the present work to characterise HiMIDs (Histone Modification Interacting Domains) and validate binding to optimal substrates. CIDOP - Chromatin Interacting DOmain Precipitation; HTS - High-Throughput Sequencing; WB - Western Blot 1. Introduction  34 1.7.1. Biochemical characterisation of HiMIDs One approach to investigate the biochemical properties of HiMIDs is to analyse their binding to modified peptides. To characterize the specificity in PTM recognition, binding studies to as many different peptides with different PTM patters as possible are needed. CelluSpots™ peptide arrays offer a particularly powerful approach for an initial screening of HiMIDs binding to modified histone tails. These arrays contain 384 synthesised peptide spots in duplicate, each of them a 20 aa long peptide of the core histone protein N-terminal tails which have up to four combinatorial PTMs, with particular emphasis on the H3 N-terminus (1-19 aa). Their development allowed the investigation of the effect of double marks, such as the H3K9me3-S10ph, on HiMID binding (Bock, et al., 2011). In the context of this study, the arrays allowed for the identification of PTM patterns among the optimal substrates for specific HiMIDs. Then, binding affinities to peptides carrying PTMs can be quantified in equilibrium binding assays, validating avidity by multiple PTMs in cis vs peptides with a single or no modification. In this study, fluorescence anisotropy was extensively used to determine equilibrium binding constants (KD) of (modified) peptides to HiMIDs. Fluorescence anisotropy/depolarisation (FA) and its theoretical framework were developed almost a century ago (Perrin, 1926). Incident polarized light is absorbed by fluorophores in the sample producing a population of excited fluorophores, with their transition dipole moments oriented along the vector of the polarized exciting light (Lakowicz, 2007). When the fluorophores return to their ground state, polarised photons are emitted. However, between the two processes there is a pause, the lifetime of the excited state, typically around 10 ns. As the fluorophores are covalently bound to small peptides, they are highly rotationally mobile and affected by Brownian motion with rotational diffusion rates (tumbling) in the range of 0.1 ns, whereas a 25 kD protein is expected to rotate with tumbling rates of ~10 ns (Lakowicz, 2007; Sosa, et al., 2010). Thus, the tumbling rate of proteins is comparable to the decay time of many fluorophores. For a population of the mobile fluorophores the ratio of emission parallel to excitation vs. perpendicular to it is not equal to 1, demonstrating depolarisation/anisotropy of the polarised fluorescent light which directly depends on the mobility (Figure 12A). The fluorimeter requires one polariser for excitation and one for emission. Near-simultaneous acquisitions record the parallel (I||) (with respect to polarized excitation) and the perpendicular (I⊥) emission, and the equilibrium FA (r) is calculated as r = (I|| − I⊥)/(I|| + 2I⊥) (Lakowicz, 2007). This definition of anisotropy was originally introduced by A.  Jabłoński (Jablonski, 1960). An older term named polarisation is defined as P = (I|| − I⊥)/(I|| + I⊥) (Jameson and Ross, 1. Introduction  35  2010; Lakowicz, 2007). The two describe the same phenomenon and essentially contain the same information. However, in r the denominator corresponds to the total emission intensity, which can be useful for equation simplification.   Figure 12. A Physical principle of fluorescence anisotropy (FA). Tumbling of the fluorophore increases depolarisation of the emission. B Experimental principle of FA titrations. Binding of the protein leads to loss of anisotropy. In practical applications, the average change of polarisation between absorption and emission is the primary experimental information of interest, and that is dependent on the mobility of the fluorophore, which is dependent on the size of the peptide bound fluorophore. If the peptide is then bound by a HiMID, the fluorophore becomes attached to a large, slow moving complex, strongly reducing depolarisation (Figure 12B). Essentially, by this we detect the change in effective molecular volume. Equilibrium peptide binding is investigated by FA in titrations with step-wise addition of the larger partner (HiMID) to the fluorescently labelled peptide (Rossi and Taylor, 2011). 1. Introduction  36 1.7.2. CIDOP and downstream analyses Once the optimal substrate has been identified using synthetic peptides, placing the discovery in biological context is critical. CIDOP and ChIP experiments are powerful tools to this end (Kungulovski, et al., 2014). Using native chromatin as substrate and enrichment reagents (HiMIDs, antibodies) we can study the histone PTMs and the genomic sequence of the enriched nucleosomes. After validation of the existence of the multivalent substrate in native chromatin, the hypothesised interaction with the HiMID must be reproduced. Briefly, the method consists of native chromatin fragmented to mononucleosome size by micrococcal nuclease (MNase), interaction with affinity reagents to bind histone PTMs etc, precipitation with magnetic beads, and isolation of DNA or histones (Figure 13). DNA downstream analyses may be quantitative polymerase chain reaction (qPCR) or High-Throughput Sequencing (HTS). Bioinformatic analysis of the binding profiles determined by HTS can be extensive and particularly informative. Moreover, the histone PTMs of the CIDOP samples can be detected by Western blot, making up the profile of histone PTM enrichment of this combination of HiMID and assay conditions. During this work, previous protocols were optimised and a modular approach applied to enable reagent-specific protocol optimisation. Samples with adequate enrichment in selected qPCR reporter regions can be investigated genome-wide using HTS. Illumina “sequencing-by-synthesis” produces paired-end reads of up to 150 bp length by isothermal PCR amplification, identifying each individually added nucleotide via fluorescent dyes and terminators that can be removed in the next step (Heather and Chain, 2016). Typically, the reads (sequences) are then quality controlled, mapped against a reference genome assembly, filtered against known problem regions of the assembly, and then quantified (Yohe and Thyagarajan, 2017). The major limitation of the method is found in analysing constitutive heterochromatin, GC-rich regions and other REs. The first is due to low experimental solubility and therefore under-representation, the second is due to the PCR-like Figure 13. Principle of CIDOP (Chromatin Interacting DOmain Precipitation) and downstream analyses. Figure modified from Choudalakis et al., 2023 (Choudalakis, et al., 2023). 1. Introduction  37  conditions of sequencing, and the latter compounds the previous two with additional bioinformatical challenges in assigning matching regions during mapping. Downstream analyses can be very extensive and non-trivial. The ChIP-Enrich webserver (chip- enrich.med.umich.edu) offers the most comprehensive method to analyse enrichments on enhancers (Qin, et al., 2022), and webservers that use a list of gene names (gene set), such as Enrichr (maayanlab.cloud/Enrichr)(Xie, et al., 2021), can perform Gene Set Enrichment Analysis (GSEA) to mine multiple databases for correlation. Since analysis of REs in ChIP-seq data is particularly challenging, numerous specialised tools employ a variety of approaches in attempts to address this (Goerner-Potvin and Bourque, 2018). 1.8. Biological context of multivalent epigenomic marks 1.8.1. Overview of concepts in gene regulation: TFs, accessibility, and histone code theory Eukaryotes are characterised by the mainly nuclear localisation of their genome, its large size, and the consequent necessities of chromatin compaction and regulation of gene expression. Research into the latter truly started in 1969 with the milestone discoveries of RNA polymerases, followed by the first gene-specific DNA-binding TF (TFIIA), necessary for 5S rDNA transcription, and the description of the eukaryotic pre-initiation complex (PIC) (Roeder, 2019). Additional breakthroughs came in 1994, when investigation of chromatinised DNA substrates showed that in eukaryotes chromatin remodelling is required for TF binding and transcriptional activation. Further experiments demonstrated the looping of enhancers to promoters via large protein complexes, with cooperating TFs and chromatin modellers (Figure 2A). Research bringing together chromatin architecture and gene regulation was met with resistance by the TF field that advocated working with naked DNA in purified, reconstituted systems, to insure biochemically well-defined assays (Kadonaga, 2019). Today, the effect of chromatin accessibility to transcriptional regulation by TFs is widely accepted, although many questions are yet to be answered (Mach and Giorgetti, 2023). Research into epigenetic mechanisms such as histone PTMs and DNA methylation started in the 1970s and were already linked to gene regulation by 1980. The mechanisms were shown to be ubiquitous among eukaryotes with principles that are largely conserved from yeast to humans (Allis and Jenuwein, 2016; Millan-Zambrano, et al., 2022). Correlative studies sparked the idea that each histone PTM conveys a simple, unambiguous individual message understood 1. Introduction  38 by reader proteins (Turner, 1993). Landmark articles in 2000-2001 drew insight from available data, and put forth the theory that colocalising histone PTMs form a “histone code” (Jenuwein and Allis, 2001; Strahl and Allis, 2000), where multiple marks “act in a combinatorial or sequential fashion on one or multiple histone tails, to specify unique downstream functions”(Strahl and Allis, 2000). Also, the interplay of 5mC and H4ac on gene regulation was beautifully demonstrated (Grandjean, et al., 2001). The existence of histone phospho-switches highlighted the similarities to signal transduction networks, “suggesting that multiple modifications combine to confer bistability, robustness, and adaptability” (Schreiber and Bernstein, 2002). Later, cross-talk of marks with opposing “canonical instructions” revealed the existence of steady intermediates, aka poised states (Bernstein, et al., 2006; Rada-Iglesias, et al., 2011). The interplay of signals was nicely summarised in a contemporary review (Li, et al., 2015). Additional discoveries led to extension and refinements in the theory, showcasing the interplay between histone PTMs and 5(h)mC to encode the complex information necessary to sustain eukaryotic life (Allis and Jenuwein, 2016; Jenuwein and Allis, 2001; Li, et al., 2021; Ruthenburg, et al., 2007; Zhang, et al., 2015). Some TF researchers expressed strong opposition to the histone code concept (Ptashne, 2013). At that time, the debate was principally fuelled by the moderate strength and small number of supporting evidence. Nowadays, epigenome signals and the extended histone code is understood as part of the basis for cellular phenotypical heterogeneity in many high-profile publications (Carter and Zhao, 2021; Cavalli and Heard, 2019; Greenberg and Bourc'his, 2019; Lappalainen and Greally, 2017; Millan-Zambrano, et al., 2022), even extending into specific instances of transgenerational epigenetic inheritance (Fitz- James and Cavalli, 2022; Skvortsova, et al., 2018). 1.8.2. The histone code theory in practice Histone PTMs can act as effectors and maintainers of epigenomic regulation. They can be persistent across cell divisions, they are necessary and adequate for downstream effects with well documented causality, and they are able to exert their influence in a sequence independent manner (Millan-Zambrano, et al., 2022). Of course, the specifics vary for each of the large number of known histone PTMs and so far only few have been studied as extensively as the major ones (Table 2). Studies have leveraged advanced experimental designs to document the fate of parental histone modifications to the daughter cells. After passage of the replication fork, the two daughter DNA strands are wrapped around nucleosomes. These contain recycled old histones and newly synthesized histones. Then, starting from the S-phase major histone PTMs can be restored within one cell cycle (Alabert, et al., 2015), and accurately reproduced to 1. Introduction  39  maintain the epigenomic landscape (Reverón-Gómez, et al., 2018). The fastest mark to be restored is H3K4me3 with the process completed by G2 or within 6 h after replication. The repressive marks H3K27me3 and H3K9me3 are primarily restored in the next G1 phase. H3K9me3 is particularly interesting as it was clearly shown to be inherited independently of DNA sequence, DNA methylation, or RNA interference (Audergon, et al., 2015; Ragunathan, et al., 2015). Additional details on the process and a beautiful epigenetic circuit for sequence independent establishment were published recently (Li, et al., 2023; Yuan and Moazed, 2024). To address the causality of histone PTMs, some of the best evidence comes from studies of H3K9me3, constitutive heterochromatin, their connection, and their functions. Specifically, H3K9me3 is necessary for phase separation of the reader CBX5 that forms heterochromatin in complex with TRIM28 and SUV39 (Wang, et al., 2019). Loss of H3K9me3 leads to loss of heterochromatin (Fukuda, et al., 2021; Montavon, et al., 2021), upregulation of RE transcription (McCarthy, et al., 2021; Padeken, et al., 2022; Zhao, et al., 2023), loss of bookmarking TFs (Djeghloul, et al., 2023) or gain of TF binding to regulatory elements (Padeken, et al., 2022), and cellular senescence (Zhang, et al., 2021). A particularly elegant study demonstrated the physiological role of H3K9me3 and sonication-resistant heterochromatin by alternatively safeguarding pluripotency in uncommitted embryonic cells or lineage fidelity during differentiation (Nicetto, et al., 2019). Other marks are deposited as consequence of an event. The very informative review “Histone post-translational modifications — cause and consequence of genome function” describes more examples (Millan-Zambrano, et al., 2022). Perturbation of the histone marks by mutation of a writer, a reader or a histone residue have been identified as the cause of multiple diseases (Li, et al., 2021; Lutsik, et al., 2020). The clear regulatory effects of histone PTMs on gene expression are successfully leveraged in many studies of epigenomic editing (Policarpi, et al., 2021). A note should also be made regarding the different meanings attached to identical terms between various authors. Some give causal marks a broad meaning of necessity for specific interactions to take place (Allis and Jenuwein, 2016; Morgan and Shilatifard, 2020). This is the standard interpretation within the histone code theory, and the one used here. Others consider a causal histone PTM to have a direct effect, such as histone core acetylations that cause conformational changes to the nucleosome and actively decompact chromatin (Millan- Zambrano, et al., 2022). This is also termed an “instructive” or causative role, and distinguished from histone tail PTMs that typically must be read and therefore have “indirect” effects. 1. Introduction  40 Depending on the mark’s nature and function, histone PTMs might be deposited in broad domains or sharp peaks, correlating strongly with specific 3D-chromatin structures and accessibility (Table 2). The read-out mechanism involves mono- or multivalent readers, that interact not only with histone marks, but also with additional “epitopes”, such as the DNA/RNA backbone, folding, sequence, or other biomolecules. The interactions of multiple readers in the nucleosomal context were recently thoroughly discussed (McGinty and Tan, 2021; Peng, et al., 2021). The model of thermodynamic benefits discussed in chapter 1.5 explains how multiple recognition sites can result in specific and strong engagement and answers for the generally low strength of interactions observed in vitro with short monovalent peptides, the typical quantitation method. In this way, the combination of “low-affinity” anchor points can assist the reader to identify the intended target and bind more strongly, restrict orientations, or cause a conformational change (Li, et al., 2015; McGinty and Tan, 2021; Peng, et al., 2021). For example, this can be useful in the nucleosomal context to avoid promiscuous targeting of histone-like motifs on non-histones. However, chromatin states are unlikely to be discriminated solely in this manner. Therefore, on the next level of complexity, the theory proposes the coexistence of multiple epige