Deep Enzymology studies on the mammalian DNA methyltransferases and methylcytosine dioxygenases Von der Fakultät 3: Chemie der Universität Stuttgart zur Erlangung der Würde eines Doktors der Naturwissenschaften (Dr. rer. nat.) genehmigte Abhandlung Vorgelegt von Sabrina Adam Geboren am 08.07.1994 in Backnang Hauptberichter: Prof. Dr. Albert Jeltsch Mitberichter: Prof. Dr. Jens Brockmeyer Prüfungsvorsitzender: Prof. Dr. Andreas Köhn Tag der mündlichen Prüfung: 15.09.2022 Universität Stuttgart Institut für Biochemie und Technische Biochemie Abteilung Biochemie 2022 III Erklärung über die Eigenständigkeit der Dissertation Ich versichere, dass ich die vorliegende Arbeit mit dem Titel „Deep Enzymology studies on the mammalian DNA methyltransferases and methylcytosine dioxygenases“ selbständig verfasst und keine anderen als die angegebenen Quellen und Hilfsmittel benutzt habe; aus fremden Quellen entnommene Passagen und Gedanken sind als solche kenntlich gemacht. Declaration of Authorship I hereby certify that the dissertation entitled “Deep Enzymology studies on the mammalian DNA methyltransferases and methylcytosine dioxygenases” is entirely my own work except where otherwise indicated. Passages and ideas from other sources have been clearly indicated. Name/Name: Sabrina Adam Unterschrift/Signed: ________________________________ Datum/Date: 20.06.2022 IV V Acknowledgements First, I want to thank Prof. Dr. Albert Jeltsch for the opportunity to be a part of his research group. It was exactly what I needed to further grow my love of biochemistry and to be able to pursue this love in my future career. I also thank you for your guidance and repeated encouragement, your essential assistance in all writing processes and that you always have an open ear for any problems. Furthermore, I would like to thank Prof. Dr. Brockmeyer for being the co-referee of my PhD thesis and Prof. Dr. Köhn for taking up the role as the head of the committee. Next, a special thanks goes to Dr. Dr. Pavel Bashtrykov, who never stopped to inspire me during all the years I've known him. It was always a great pleasure to not only discuss scientific topics but also hiking routes and vacation spots with you and I will definitely remember you as a mentor that helped me grow into the scientific person I am today (starting from my bachelor days). In addition, I would like to thank all of my colleagues in the lab, who always helped me if I had any issues and made my stay so enjoyable. I will never forget how pleasant the work atmosphere was and how many fun hours we spent together. I also thank my great students Lukas and Greta, who showed me how nice it is if you can start a scientific spark in someone and who were a great help during my research. Finally, I would like to thank my family and friends, who never understood why I liked biochemistry so much, but never stopped trying to understand what it's all about (except for Mira, who was the only one who got it, luckily). Last but not least, I want to thank my boyfriend Jonas, who always supported me throughout these years and encouraged me to pursue what I love. VI List of publications Adam, Sabrina*; Anteneh, Hiwot*; Hornisch, Maximilian; Wagner, Vincent; Lu, Jiuwei; Radde, Nicole E.; Bashtrykov, Pavel; Song, Jikui; Jeltsch, Albert (2020): DNA sequence-dependent activity and base flipping mechanisms of DNMT1 regulate genome-wide DNA methylation. In Nature communications 11 (1), p. 3723. DOI: 10.1038/s41467-020-17531-8. * These authors contributed equally to the work. Adam, Sabrina; Bräcker, Julia; Klingel, Viviane; Osteresch, Bernd; Radde, Nicole E.; Brockmeyer, Jens; Bashtrykov, Pavel; Jeltsch, Albert (2022): Flanking sequences influence the activity of TET1 and TET2 methylcytosine dioxygenases and affect genomic 5hmC patterns. In Communications biology 5 (1), p. 92. DOI: 10.1038/s42003-022-03033-4. Bröhm, Alexander; Schoch, Tabea; Dukatz, Michael; Graf, Nora; Dorscht, Franziska; Mantai, Evelin; Adam, Sabrina; Bashtrykov, Pavel; Jeltsch, Albert (2022): Methylation of recombinant mononucleosomes by DNMT3A demonstrates efficient linker DNA methylation and a role of H3K36me3. In Communications biology 5 (1), p. 192. DOI: 10.1038/s42003-022-03119-z. Data not included in this thesis. Dukatz, Michael; Adam, Sabrina; Biswal, Mahamaya; Song, Jikui; Bashtrykov, Pavel; Jeltsch, Albert (2020): Complex DNA sequence readout mechanisms of the DNMT3B DNA methyltransferase. In Nucleic acids research 48 (20), pp. 11495–11509. DOI: 10.1093/nar/gkaa938. Data not included in this thesis. Emperle, Max*; Adam, Sabrina*; Kunert, Stefan; Dukatz, Michael; Baude, Annika; Plass, Christoph; Rathert, Phillip; Bashtrykov, Pavel; Jeltsch, Albert (2019): Mutations of R882 change flanking sequence preferences of the DNA methyltransferase DNMT3A and cellular methylation patterns. In Nucleic acids research 47 (21), pp. 11355–11367. DOI: 10.1093/nar/gkz911. * These authors contributed equally to the work. VII Emperle, Max*; Bangalore, Disha M.*; Adam, Sabrina; Kunert, Stefan; Heil, Hannah S.; Heinze, Katrin G.; Bashtrykov, Pavel; Tessmer, Ingrid; Jeltsch, Albert (2021): Structural and biochemical insight into the mechanism of dual CpG site binding and methylation by the DNMT3A DNA methyltransferase. In Nucleic acids research 49 (14), pp. 8294–8308. DOI: 10.1093/nar/gkab600. * These authors contributed equally to the work. Gao, Linfeng*; Emperle, Max*; Guo, Yiran; Grimm, Sara A.; Ren, Wendan; Adam, Sabrina; Uryu, Hidetaka; Zhang, Zhi-Min; Chen, Dongliang; Yin, Jiekai; Dukatz, Michael; Anteneh, Hiwot; Jurkowska, Renata Z.; Lu, Jiuwei; Wang, Yinsheng; Bashtrykov, Pavel; Wade, Paul A.; Wang, Gang Greg; Jeltsch, Albert; Song, Jikui (2020b): Comprehensive structure-function characterization of DNMT3B and DNMT3A reveals distinctive de novo DNA methylation mechanisms. In Nature communications 11 (1), p. 3355. DOI: 10.1038/s41467-020-17109-4. * These authors contributed equally to the work. Hofacker, Daniel*; Broche, Julian*; Laistner, Laura; Adam, Sabrina; Bashtrykov, Pavel; Jeltsch, Albert (2020): Engineering of Effector Domains for Targeted DNA Methylation with Reduced Off-Target Effects. In International journal of molecular sciences 21 (2). DOI: 10.3390/ijms21020502. * These authors contributed equally to the work. Data not included in this thesis. Jeltsch, Albert; Adam, Sabrina; Dukatz, Michael; Emperle, Max; Bashtrykov, Pavel (2021): Deep Enzymology Studies on DNA Methyltransferases Reveal Novel Connections between Flanking Sequences and Enzyme Activity. In Journal of molecular biology 433 (19), p. 167186. DOI: 10.1016/j.jmb.2021.167186. Review of all Deep Enzymology publications. Mack, Alexandra; Emperle, Max; Schnee, Philipp; Adam, Sabrina; Pleiss, Jürgen; Bashtrykov, Pavel; Jeltsch, Albert (2022): Preferential Self-interaction of DNA Methyltransferase DNMT3A Subunits Containing the R882H Cancer Mutation Leads to Dominant Changes of Flanking Sequence Preferences. In Journal of molecular biology 434 (7), p. 167482. DOI: 10.1016/j.jmb.2022.167482 VIII Table of contents Erklärung über die Eigenständigkeit der Dissertation ................................................ III Declaration of Authorship .......................................................................................... III Acknowledgements .................................................................................................... V List of publications ..................................................................................................... VI List of figures ............................................................................................................. XI List of tables ............................................................................................................ XIII List of abbreviations ................................................................................................. XIV Zusammenfassung .................................................................................................. XVI Abstract ................................................................................................................... XIX 1 Introduction .............................................................................................................. 1 1.1 Key players of epigenetics ................................................................................ 1 1.2 DNA methylation ............................................................................................... 3 1.2.1 DNA methylation as an epigenetic modification ......................................... 3 1.2.2 Classical and extended models of cytosine DNA methylation .................... 4 1.2.3 The family of DNA methyltransferases ....................................................... 8 1.2.3.1 De novo methyltransferases DNMT3A and DNMT3B ......................... 11 1.2.3.2 The maintenance methyltransferase DNMT1 ..................................... 15 1.2.4 Acute myeloid leukaemia and the role of the DNMT3A R882H mutation . 19 1.3 DNA Demethylation......................................................................................... 22 1.3.1 Principles of DNA demethylation and roles of its products ....................... 22 1.3.2 Ten-eleven translocation enzymes ........................................................... 24 1.3.2.1 TET1 ................................................................................................... 27 1.3.2.2 TET2 ................................................................................................... 28 1.3.3 Passive DNA demethylation pathways ..................................................... 31 1.4 Flanking sequence preferences of DNMTs and TET enzymes ....................... 32 2 Aims of this study .................................................................................................. 35 IX 3 Materials and methods…………………………………………………………………...38 3.1 Cloning, overexpression and protein purification ............................................ 38 3.1.1 DNMT1 ..................................................................................................... 38 3.1.2 TET1 and TET2 isoforms ......................................................................... 38 3.2 Biotin-avidin microplate assay ......................................................................... 39 3.3 HPLC-MS/MS analysis .................................................................................... 40 3.4 Deep Enzymology reactions with randomized substrates ............................... 41 4 Results................................................................................................................... 45 4.1 Investigation of the flanking sequence preferences of DNMT1 ....................... 45 4.2 Investigation of the DNA sequence readout mechanisms of DNMT3A and DNMT3B ............................................................................................................... 49 4.2.1 General insights into the de novo methylation mechanisms ..................... 50 4.2.2 Dual CpG site methylation mechanism of DNMT3A ................................. 54 4.3 Investigation of the DNMT3A R882H mutation consequences ....................... 57 4.3.1 Flanking sequence preferences of DNMT3A R882H ................................ 58 4.3.2 Assembly of DNMT3A WT and R882H heterotetramers .......................... 62 4.4 Investigation of the flanking sequence preferences of TET1 and TET2 .......... 66 5 Discussion ............................................................................................................. 72 5.1 Profound flanking sequence preferences of DNMT1 ...................................... 73 5.2 Different DNA sequence readout mechanisms of DNMT3A and DNMT3B ..... 78 5.2.1 Distinct de novo methylation mechanisms and their biological connections .......................................................................................................................... 79 5.2.2 Mechanisms of co-methylation of two CpG sites by DNMT3A ................. 82 5.3 Pathogenic mechanism of the DNMT3A R882H mutation .............................. 86 5.3.1 Altered flanking sequence preferences of DNMT3A R882H ..................... 87 5.3.2 Preferred self-assembly of DNMT3A R882H homodimeric interfaces ...... 90 5.4 Distinct flanking sequence preferences of TET1 and TET2 ............................ 92 X 5.5 Conclusion and perspectives for the Deep Enzymology approach ................. 99 6 References .......................................................................................................... 102 7 Author contributions ............................................................................................. 126 8 Appendix (not included in the published thesis) ................................................... 128 XI List of figures Figure 1: Schematic summary of key epigenetic players. ........................................... 2 Figure 2: Classical model of DNA methylation setting and maintenance. ................... 5 Figure 3: Schematic drawing of the domain arrangement in the DNMT family. .......... 9 Figure 4: Catalytic mechanism of cytosine C5 DNA methyltransferases. ..................10 Figure 5: Schematic illustrations of DNMT3A and DNMT3A/3L complex structure and multimerization. .........................................................................................................12 Figure 6: Structure of mouse DNMT1 co-crystallized with hemimethylated DNA. .....18 Figure 7: Multistage process of AML development. ...................................................20 Figure 8: Mutations of DNMT3A occurring in AML. ...................................................21 Figure 9: Pathways of passive and active DNA demethylation. .................................24 Figure 10: Schematic drawing of the domain arrangement in the human TET family. ..................................................................................................................................25 Figure 11: Catalytic cycle of TET enzymes. ...............................................................26 Figure 12: Structure of human TET2 co-crystallized with hemimethylated DNA........30 Figure 13: Schematic illustration of the biotin-avidin microplate assay. .....................40 Figure 14: Schematic illustration of the Deep Enzymology approach used to study the flanking sequence preferences of different DNA methyltransferases and methylcytosine dioxygenases. ...................................................................................44 Figure 15: Flanking sequence preferences of DNMT1. .............................................47 Figure 16: Flanking sequence preferences of DNMT3A and DNMT3B. ....................52 Figure 17: Co-methylation in different distances catalysed by DNMT3A or DNMT3A/3L. ..................................................................................................................................56 Figure 18: Flanking sequence preference analysis for the DNMT3A R882H mutation. ..................................................................................................................................60 Figure 19: Compilation of the results from the flanking sequence preference analysis of homo- and heterotetrameric DNMT3A WT/R882H complexes. .............................64 Figure 20: Flanking sequence preference analysis of TET1 and TET2. ....................68 Figure 21: Schematic overview of the structural changes of DNMT1 resulting from the binding to different flanking sequence contexts. ........................................................76 Figure 22: Structural basis for the different flanking sequence preferences of the DNMT3s. ...................................................................................................................80 XII Figure 23: Schematic illustrations of the DNMT3A tetramer structure and potential models to explain different types of co-methylation by the enzyme complex. ...........84 Figure 24: Orientation of the 882 amino acid in DNMT3A WT and R882H crystal structures regarding the 3’ flank of the target cytosine. .............................................88 Figure 25: Schematic illustration of all possibly formed homo- and heterotetrameric DNMT3A complexes, including active and inactive WT and R882H subunits……….91 Figure 26: Comparison of three TET crystal structures determined with different flanking contexts of the target site. ............................................................................96 XIII List of tables Table 1: Summary of the main randomized substrates used during the projects of this thesis with annotated number of target sites and their respective context and modification. ............................................................................................................. 42 XIV List of abbreviations 2OG 2-oxoglutarate 30mer DNA substrate with 30 nucleotides ca5C 5-carboxylcytosine f5C 5-formylcytosine hm5C 5-hydroxymethylcytosine m5C 5-methylcytosine A Adenine ADD ATRX-DNMT3A-DNMT3L domain AdoHcy (SAH) S-adenosyl-L-homocysteine AdoMet (SAM) S-adenosyl-L-methionine AML Acute myeloid leukaemia BAH (1/2) Bromo-adjacent homology domain (1 or 2) BC Barcode BER Base excision repair bps Base pairs C Cytosine CpA Shorthand for 5'-cytosine-phosphate-adenine-3' CpC Shorthand for 5'-cytosine-phosphate-cytosine-3' CpG Shorthand for 5'-cytosine-phosphate-guanine-3' CpH (non-CpG) Shorthand for 5'-cytosine-phosphate- adenine/cytosine/guanine -3' CpN Shorthand for 5'-cytosine-phosphate-any base-3' CpT Shorthand for 5'-cytosine-phosphate-thymine-3' Cryo-EM Cryogenic electron microscopy CXXC Cysteine-X-X-cysteine DBSH Double-stranded beta-helix DMAB-seq DNMT1 methylation activity-assisted bisulfite sequencing DNA Deoxyribonucleic acid DNMT(s) (1/3/3A/3B/3C/3L) DNA methyltransferase(s) (1 or 3 or 3A or 3B or 3C or 3L) DOT1L DOT1 like histone lysine methyltransferase dsDNA Double-stranded DNA E. coli Escherichia coli ESCs Embryonic stem cells Fe(II)/Fe(III)/Fe(IV) Iron in oxidation state +2 or +3 or +4 FF interface Four stacked phenylalanine residues forming the interface G Guanine GFP Green fluorescent protein GK Glycine-lysine H3K9 Histon 3 lysine 9 H3R2 Histone 3 arginine 2 HCT116 cells Human colon cancer cell line HEK293 cells Human embryonic kidney 293 cells His Histidine (as tag referring to a string of six histidines) hmCpA Hydroxymethyladenine HPLC-MS/MS High-performance liquid chromatography coupled with mass spectrometry XV HSPC Human pluripotent stem cells ICF Immunodeficiency, centromere region instability, facial anomalies iPSCs Induced pluripotent stem cells JBP (1/2) J binding protein (1 or 2) KO (1KO/DKO/TKO) Knockout (single or double or triple knockout) LB Luria-Bertani MBD Methyl-CpG-binding domain MD simulation Molecular dynamics simulation MeCP2 Methyl-CpG-binding protein 2 MTase Methyltransferase NGS Next-Generation Sequencing NgTET1 TET1 from Naegleria gruberi Nickel-NTA Ni(II) ions coupled to nitrilotriacetic acid NLS Nuclear localization signal Oct4 Octamer-binding transcription factor 4 p53 Tumour suppressor protein 53 PCNA Proliferating cell nuclear antigen PCR (1/2) Polymerase chain reaction (step 1 or 2 for library preparation) PDBI Protein database index PGCs Primordial germ cells PIP box PCNA-interacting protein element Pre-LSC Cells in a pre-leukemic state PWWP Proline-tryptophan-tryptophan-proline domain RD interface Arginine and aspartate residues forming the interface RFTS Replication foci targeting sequence domain RING Really interesting new gene domain RMSD Root mean squared deviation RNA Ribonucleic acid SatII Satellite 2 SDS-PAGE Sodium dodecyl sulfate polyacrylamide gel electrophoresis SEM The standard deviation of the mean Sf21 cells Ovarian cells isolated from Spodoptera frugiperda SFM Scanning force microscopy SRA SET and RING-associated domain ssDNA Single-stranded DNA T Thymine T2/T4 Enterobacteria phage 2 or 4 TDG Thymine DNA glycosylase TET (TET1/2/3) Ten-eleven translocation (enzyme 1 or 2 or 3) TpG Shorthand for 5'-thymine-phosphate-guanine-3' TRD Target recognition domain tRNA Transfer ribonucleic acid TSS Transcription start site U2O2 cells Human bone osteosarcoma epithelial cells UHRF (1/2) Ubiquitin-like, containing PHD and RING finger domains (1 or 2) UV Ultraviolet WT Wild-type XVI Zusammenfassung Das sich rapide entwickelnde Feld der Epigenetik untersucht vererbliche Veränderungen von zellulären Phänotypen, die nicht mit einer Veränderung der DNA Sequenz einhergehen. In Säugetieren werden diese Veränderungen von verschiedenen epigenetischen Signalen kontrolliert, wie beispielsweise Modifikationen von Histonen, Inkorporation von Histonvarianten, nicht-kodierender RNA und Modifikationen der DNA, wobei letzteres den Fokus dieser Arbeit darstellt. Zusammen regulieren diese Modifikationen die Gentranskription über Veränderungen des Chromatinzustandes, sowie der Zugänglichkeit der DNA für Transkriptionsfaktoren. In Säugetieren tritt die Methylierung von Cytosin-Basen meistens an der C5 Position (m5C) auf. Die daraus folgenden zellspezifischen DNA Methylierungsmuster umfassen hauptsächlich Methylierungen im CpG Kontext, welcher insgesamt zu 70 % methyliert vorliegt, jedoch sind auch geringere Mengen an CpH Methylierungen vorhanden. Dieser Prozess der DNA Methylierung, welcher mithilfe der de novo DNMT3 DNA Methyltransferasen erfolgt, ist ein essentieller Schritt in der Entwicklung von Säugetieren. Die entstandenen DNA Methylierungsmuster werden anschließend mithilfe der Erhaltungsmethyltransferase DNMT1 in einem replikationsabhängigen Prozess aufrechterhalten. Über die Methylierung von DNA hinaus wird die aktive Form der DNA Demethylierung von den TET Methylcytosin-Dioxygenasen initiiert, welche die sequentielle Oxidation von 5-Methylcytosin (m5C) über 5-Hydroxymethylcytosin (hm5C) und 5-Formylcytosin (f5C) zu 5-Carboxylcytosin (ca5C) katalysieren. Durch die Entdeckung dieses Vorgangs wurden neue Fragen aufgeworfen hinsichtlich des mechanistischen Zusammenspiels aller beteiligten Enzyme an den gemeinsamen Regulationen der dynamischen DNA Methylierungsmuster in Zellen. Im Vergleich zu früheren Studien, welche sich hauptsächlich auf die Rekrutierung und die biologische Funktion der Enzyme fokussierten, war es das Hauptziel dieser Arbeit die fundamentale molekulare Basis der DNA Erkennung der DNMTs und TETs aus Säugetieren systematisch zu untersuchen. Zu diesem Zweck wurde deren Aktivität an verschiedenen Methylierungsstellen in Kombination mit allen möglichen Flankierungssequenzen bestimmt. Um diese detaillierten mechanistischen Einblicke zu erhalten, wurde ein neuer experimenteller Ansatz mit dem Namen Deep Enzymology entwickelt und angewendet. Dieser Ansatz basiert auf der Einzelmolekül- XVII Analyse von Enzymaktivitäten, welche sich durch die Kombination von Reaktionen auf randomisierten Substraten und der anschließenden Analyse der Modifikationen einzelner DNA Moleküle durch Bisulfit-Konversion und Next-Generation Sequencing auszeichnet. Mithilfe dieser Methode war es möglich, ausgeprägte und bisher unbekannte Flankierungssequenz-Präferenzen von DNMT1, DNMT3A und DNMT3B zusammen mit einigen ihrer Mutanten, sowie von TET1 und TET2 zu bestimmen. Darüber hinaus konnten in vielen Fällen die strukturelle Erklärungen und biologische oder pathologische Konsequenzen dieser Effekte identifiziert werden. Im ersten Teil dieser Arbeit habe ich festgestellt, dass DNMT1 bisher unbekannte aber über 100-fache Unterschiede in der Methylierung von hemimethylierten CpG Stellen mit verschiedenen Flankierungssequenzen aufweist. Mithilfe von publizierten und neuen DNMT1 Kristallstrukturen im Komplex mit verschiedenen DNA Substraten konnten diese Erkenntnisse zusätzlich verschiedenen Biegungszuständen sowie Konformationsänderungen der DNA und des Enzyms während der Komplexbildung zugeordnet werden. Es wurde auch gezeigt, dass die Präferenzen von DNMT1 für bestimmte Flankierungssequenzen stark mit zellulären m5C Profilen korrelieren, was deutlich darauf hinweißt, dass das zelluläre DNA Methylom von diesen Präferenzen mitbestimmt wird. Im zweiten Projekt habe ich bei der Validierung und Präzisierung der Präferenzen der de novo Enzyme DNMT3A und DNMT3B im CpG und CpN Kontext mitgewirkt, welche zuvor bereits in kleineren bis mittelgroßen Studien beobachtet wurden. Mittels des Deep Enzymology Ansatzes wurden für beide Enzyme sehr stark ausgeprägte Präferenzen für bestimmte Flankierungssequenzen beobachtet und es konnten mechanistisch wichtige Schlüsselregionen oder Aminosäuren (z.B. K777 und T775) in einer vorhandenen Kristallstruktur von DNMT3A bzw. neuen Strukturen von DNMT3B identifiziert werden. Auf dieser Basis konnten anschließend die beobachteten Unterschiede der DNMT3 Präferenzen im CpG Kontext sowie die stärkere Methylierung durch DNMT3B im CpH Kontext erklärt werden. Die ausgeprägte Korrelation der experimentellen Flankierungspräferenzen mit zellulären DNA Methylierungsdaten betonte zusätzlich die biologische Relevanz der enzymatischen Präferenzen. Des Weiteren war ich an einem Unterprojekt beteiligt, in welche XVIII verschiedene DNA Interaktionsmodi von DNMT3A/3L Heterotetrameren während der Methylierung von zwei CpG Stellen in verschiedenen Abständen aufgedeckt wurden. Im Zusammenhang mit DNMT3A zielte das dritte Projekt dieser Arbeit darauf ab, den pathogenen Mechanismus der heterozygot exprimierten Krebsmutation R882H aufzudecken, welche besonders häufig in Patienten mit akuter myeloischer Leukämie gefunden wurde. Hierbei wurden die biochemischen Untersuchungen auf früheren Beobachtungen im Labor aufgebaut, welche in kleineren Studien eine Änderung der Präferenz-Profile von DNMT3A nach der Mutation von R882 zu Histidin gezeigt hatten. Systematische Untersuchungen dieses Effekts zeigten eine mehr als 70-fache Veränderung der Flankierungssequenz-Präferenzen von DNMT3A ausgelöst durch die Mutation dieser einen Aminosäure. Auf der Basis dieses Effekts konnte zudem gezeigt werden, dass R882H Untereinheiten präferentiell mit sich selbst interagieren, was eine mechanistische Interpretation für den beobachteten dominanten Effekt der Mutante gegenüber dem Wildtyp DNMT3A bei deren gleichzeitiger Expression in Zellen lieferte. Zusammengenommen boten diese zwei Teilprojekte wichtige Einblicke in den karzinogenen Effekt von R882H. Abschließend wurde im vierten Projekt dieser Arbeit der Einfluss von Flankierungssequenzen auf die Aktivität von TET1 und TET2 untersucht, welche einen ähnlichen Reaktionsmechanismus zum Ausklappen des Cytosins verwenden wie die Methyltransferasen, bei denen jedoch bisher aufgrund vorhandener Kristallstrukturen ein Fehlen solcher Präferenzen postuliert wurde. Tatsächlich konnte ich ausgeprägte Präferenzen für beide Enzyme bestimmen, sowohl auf Substraten mit einer CpG als auch CpH Stelle, welche m5C oder hm5C Modifikation enthielten. Zudem reflektierten die bestimmten Effekte lokale sowie genomweite hm5C Muster und sie konnten im Fall von TET2 mit indirekten DNA Interaktionen eines spezifischen Arginin-Restes mit der +1 Flankierungsposition oder Stabilisierungseffekte des Cytosins über das Basenpaar an der -1 Flankierungsposition erklärt werden. XIX Abstract The emerging field of epigenetics investigates heritable changes in cellular phenotypes that do not include changes in the DNA sequence. In mammals, these changes are controlled by various epigenetic signals such as histone modifications or incorporation of histone variants, non-coding RNAs and DNA modifications, with the latter being the focus of this work. Together, these modifications control the transcription states of cells through the modulation of chromatin states and the accessibility of DNA to transcription factors. In mammals, methylation of cytosine bases at the C5 position (m5C) predominantly occurs in the context of CpG sites, which are methylated to approximately 70% in a cell type-specific pattern, but lower levels of cytosine methylation are also present in CpH context. The introduction of m5C is mediated by the de novo DNMT3 DNA methyltransferases in a process that was proven to be essential for mammalian development. Later, the maintenance methyltransferase DNMT1 preserves the DNA methylation patterns in a replication-coupled manner. Beyond the methylation of DNA, active DNA demethylation is initiated by the TET methylcytosine dioxygenases through the stepwise oxidation of 5-methylcytosine (m5C) over 5-hydroxymethylcytosine (hm5C) and 5-formylcytosine (f5C) to 5- carboxylcytosine (ca5C) giving rise to new questions about the mechanistic interplay of all involved enzymes that altogether regulate the dynamic DNA methylation landscape in cells. In contrast to previous studies, which focused more on the targeting of the enzymes and their biological functions, it was the ultimate purpose of this work to systematically determine the fundamental molecular basis for DNA recognition of mammalian DNMTs and TETs. To this end, their activity was studied on different target sites embedded into all possible flanking sequence contexts. To gain such detailed mechanistic insights, a new experimental approach termed Deep Enzymology was developed and applied. It is based on the single molecule analysis of enzyme activity, which is achieved by coupling enzymatic reactions on randomized substrates with hairpin- bisulfite conversion followed by Next-Generation Sequencing for the readout of the modification state of individual DNA molecules. Using this method, it was possible to discover distinct and previously unknown flanking sequence preferences for DNMT1, DNMT3A and DNMT3B as well as some of their mutants, together with TET1 and XX TET2. Moreover, in many cases, the structural explanation and biological or pathogenic consequences of these effects could be uncovered. In the first part of my work, I discovered that DNMT1 shows profound differences of up to 100-fold in the methylation of hemimethylated CpG sites with different flanking sequences, which were previously unknown. The findings could further be connected to different degrees of DNA bending and conformational rearrangements of the DNA and enzyme during complex formation using published but also new DNMT1 crystal structures in complex with DNA molecules of different sequences. The flanking sequence preferences of DNMT1 were shown to be highly correlated with cellular m5C profiles, clearly indicating that DNMT1 preferences shape the cellular methylome. In the second project, I contributed to the validation and refinement of the CpG and CpN flanking sequence preferences of the de novo enzymes DNMT3A and DNMT3B previously observed in small- to mid-scale studies. Using the Deep Enzymology approach, very strong and distinct flanking sequence preferences were observed for both enzymes and mechanistic key regions or residues (e.g. K777 and T775) could be identified using available crystal structures of DNMT3A and newly provided structures of DNMT3B, which can explain the difference in CpG methylation preferences as well as the stronger CpH methylation of DNMT3B. Correlated methylation effects were also observed in cellular data, which emphasizes the biological relevance of the determined flanking preferences. Furthermore, I was included in a sub-project that unravelled the different DNA interaction modes applied by DNMT3A/3L heterotetramers to perform co-methylation of CpG sites in different distances. In the context of DNMT3A, the third project of this thesis strived to elucidate the pathogenic mechanism of the heterozygous cancer mutation R882H, which was shown to be enriched in patients with acute myeloid leukaemia. Here, the biochemical investigations were built on previous observations of the lab that demonstrated a change in the flanking sequence preference profiles upon the mutation of R882 to histidine in small-scale studies. This effect was systematically investigated showing more than a 70-fold change in the flanking sequence preferences of DNMT3A due to the single amino acid mutation. Exploiting these effects, we could also demonstrate that R882H subunits preferentially interact with each other, providing a mechanistic interpretation for the observed dominant effect of this mutant over wild-type DNMT3A XXI when both are expressed in one cell. Together, these two sub-projects offered important insights into the carcinogenic effect of R882H. Lastly, the fourth project examined potential flanking sequence effects on the activity of TET1 and TET2, which use a similar base-flipping mechanism as the DNMTs but were so far assumed to lack flanking effects based on the published crystal structures. Indeed, I was able to determine distinct preferences for both enzymes using substrates with CpG as well as non-CpG target sites containing m5C or hm5C modifications. Moreover, these findings recapitulated local and genome-wide hm5C patterns and they could also be connected to indirect interactions of the DNA with a specific arginine residue in TET2 on the +1 flanking position and stabilization effects of the target cytosine by the -1 base pair. XXII 1 1 Introduction 1.1 Key players of epigenetics Every human being is made up of trillions of cells, making us a very complex organism to be studied. During development, precursor cells originating from one single zygote differentiate into the roughly 200 different cell types the human body consists of (Moris et al., 2016). In this process, a gradual transition of embryonic stem cells (ESCs) into groups of more differentiated cells with more specialized morphology and functionality takes place. Despite the tremendous variety of cellular phenotypes, almost all of these differentiated cells contain the same genetic information encoded in their DNA sequence. Hence, cellular specialization must be by the regulation of gene expression realised by an additional layer of epigenetic information. The term “epigenetics” was first introduced by Conrad Waddington in 1942 and described as the investigation of these additional modifications as “the study of the interactions between genes and their products which result in a particular phenotype” (Waddington, 1942). Over the last decades, this particular field of research has developed rapidly so that the definition had to be redefined to “the study of changes in gene function that are mitotically and/or meiotically heritable and do not entail a change in DNA sequence” (Wu and Morris, 2001). Until today, it is known that epigenetic signals include non-coding RNAs, histone post- translational modifications such as acetylation, methylation or phosphorylation as well as the incorporation of histone variants and the introduction of chemical modifications to the DNA, e.g. DNA methylation. Together, these modifications lead to spatial and temporal modulations in the chromatin structure, which subsequently regulate the gene expression state of differentiated cells (Allis and Jenuwein, 2016). One gene regulatory principle is that condensation of chromatin into heterochromatin results in transcriptional repression, whereas remodelling into a more open chromatin structure (euchromatin) enables gene expression and DNA repair (Bonasio et al., 2010). As depicted in Figure 1, stable inheritance of gene expression states and therefore the maintenance of cellular identities requires three main factors (Berger et al., 2009; Wallace et al., 2018). Firstly, an extracellular signal termed “Epigenator” (e.g. oxidative stress, lack of oxygen or other cellular conditions) activates the epigenetic signalling 2 cascade in the cell. Then, “Initiators” such as DNA-binding proteins or non-coding RNAs define the regions on the chromosome to be modulated based on their locus and sequence specificity. Lastly, different “Maintainers” such as DNA methyltransferases or histone-modifying enzymes recruited by the “Initiator” are responsible to generate and maintain the local changes in chromatin organization. DNA methylation as an epigenetic “Maintainer” as well as the different pathways of its removal is the focus of this thesis and will be further discussed in the following sections. Figure 1: Schematic summary of key epigenetic players. The interplay of three different factors is needed to stably maintain cellular identity. An extracellular “Epigenator” initiates the signalling cascade in the cell that starts with an “Initiator” that locus- and sequence-specifically recruits different “Maintainers” to establish and maintain epigenetic modifications such as DNA methylation (Me). In turn, these modifications lead to changes in the chromatin organization and influence the gene expression state at this locus (adapted from Wallace et al., 2018). 3 1.2 DNA methylation 1.2.1 DNA methylation as an epigenetic modification Methylation as a chemical modification of DNA bases was first discovered by Hotchkiss in 1948 and its product was defined as “epicytosine” (Mattei et al., 2022). As the name suggests, this specific epigenetic modification occurs mostly at the C5 position of cytosines in mammals in the context of palindromic CpG dinucleotides (Ambrosi et al., 2017; Schübeler, 2015), but during the last years asymmetric DNA methylation at non- CpG sites has also been observed (He and Ecker, 2015). In the human genome, which was completely sequenced during the Human Genome Project in 2000 (Piovesan et al., 2019), roughly 60-80% of all 56 million CpG sites are methylated corresponding to 4-6% of all cytosines (Laurent et al., 2010; Lister and Ecker, 2009). Based on this high number it was hardly surprising that DNA methylation was found to play a role in various fundamental developmental processes in mammals such as inactivation of the female X-chromosome during embryonic development, silencing of repetitive elements and retrotransposons to maintain cellular integrity as well as genetic imprinting to ensure paternal-origin-specific expression of genes (Breiling and Lyko, 2015). In addition, aberrant DNA methylation has been connected to the emergence and progression of different cancer types (Baylin and Jones, 2011; Bergman and Cedar, 2013) and it plays a role in several other diseases such as psychiatric disorders or immune dysfunctions (Jin and Liu, 2018). Despite the importance of DNA methylation, its preferred sequence context of CpG dinucleotides was found to be evolutionarily depleted by a factor of 5-10 from the human genome except for specific regions called CpG islands (Gardiner-Garden and Frommer, 1987). The reason for this phenomenon is the mutagenic property of methylated CpG sites. Hydrolytic deamination of methylated cytosines results in TpG mismatches accounting for approximately 35% of all point mutations (Cooper and Youssoufian, 1988). Such mispairs are more difficult to correct than UpG mismatches, resulting from the 4-fold slower deamination of unmethylated cytosine nucleobases, due to the standard occurrence of thymine in DNA in comparison to the unnatural base uracil (Shen et al., 1994). 4 Throughout the human genome, approximately 29,000 so-called CpG islands were found which were defined as regions of 500-2,000 bps with a GC content of at least 50% and an observed versus expected ratio of CpG dinucleotides above 0.6 (Gardiner- Garden and Frommer, 1987; Takai and Jones, 2002). These regions of clustered CpG sites occur mostly at gene promotors (about 70% of all genes) including housekeeping and tissue-specific genes (Saxonov et al., 2006). Overall, the methylation state of the CpG sites present at a specific locus determines the accessibility of the regional chromatin landscape and therefore the transcription of the locus-specific genes (Deaton and Bird, 2011). CpG-rich regions such as CpG islands are mostly hypomethylated and enable the recruitment of transcription factors and other chromatin remodelling factors. In contrast, regions lacking these clusters are generally hypermethylated (with 60-80% of singly occurring CpG sites being methylated, depending on the cell type) to silence gene expression of repetitive and transposable elements (Pehrsson et al., 2019) or to prevent initiation of intragenic transcription (Neri et al., 2017). This is not only accomplished by the blockage of transcription activating factors but also through the recruitment of proteins that specifically bind to methylated cytosine residues, leading to the formation of repressive complexes and the formation of heterochromatin (Allis and Jenuwein, 2016). 1.2.2 Classical and extended models of cytosine DNA methylation DNA methylation as an epigenetic modification of cytosine nucleobases has been the subject of many studies throughout the last decades (Ambrosi et al., 2017; Schübeler, 2015), but the principle process of how the cell type-specific DNA methylation patterns are set and maintained was already proposed independently by two research groups in 1975 (Holliday and Pugh, 1975; Riggs, 1975). This classical model, which is shown in an updated form in Figure 2 (Jeltsch et al., 2018), comprised two groups of DNA methyltransferases (DNMTs), one setting the methylation marks and one restoring the marks after DNA replication. Later on, it was discovered that the family of DNA methyltransferase 3 (DNMT3) enzymes introduces DNA methylation on CpG dinucleotides. Like this, the de novo enzymes create methylation patterns of fully methylated and fully unmethylated CpG sites (Okano et al., 1999). During DNA replication, the newly synthesized daughter strands do not contain this epigenetic 5 information any longer, which would lead to dilution of the signal over time. Therefore, it is essential to restore the cytosine methylation, a process which is performed by the DNA methyltransferase 1 (DNMT1). This maintenance methyltransferase specifically recognizes the hemimethylated CpG sites (one strand methylated, one strand not methylated) occurring after DNA replication and methylates the daughter strand (Jeltsch, 2006). For this process to occur accurately, the enzyme harbours intrinsic properties that are discussed in more detail in section 1.2.3.2. Figure 2: Classical model of DNA methylation setting and maintenance. Cell type- specific methylation patterns are established through de novo methylation of DNA by enzymes of the DNA methyltransferase 3 (DNMT3) family. This creates patterns of completely methylated (blue) or unmethylated (white) sites. During DNA replication the information of the parental strand is transferred to the new strand by DNA methyltransferase 1 (DNMT1) which specifically recognizes the hemimethylated (one strand blue, one strand white) sites occurring in DNA replication and maintains the methylation pattern. Without this process, DNA methylation could be passively lost. Furthermore, active DNA demethylation pathways involving the Ten-eleven translocation enzymes (TETs) and thymine DNA glycosylase (TDG) are known (taken from Jeltsch et al., 2018). In addition to a passive loss of DNA methylation prevented by DNMT1 (see section 1.3.3), there is also an active pathway of demethylation known since 2009 that involves the family of Ten-eleven translocation enzymes (TETs) (Tahiliani et al., 2009). Following a pathway of step-wise oxidation, which is the subject of section 1.3.1, TET enzymes initiate the removal of the methylation mark with the help of thymine DNA glycosylase (TDG) and the base excision repair machinery (He et al., 2011; Ito et al., 2011). 6 The classical model of DNA methylation based on the early studies from 1975 has been challenged over the past years by experimental data suggesting a more cooperative function of DNMT3 and DNMT1 (Jeltsch and Jurkowska, 2014). Evidence for this new model came from studies in mammalian cell lines and mice where deletion of DNMT3A and DNMT3B resulted in a reduction of DNA methylation present at repetitive elements, even though DNMT1 was fully functional (Chen et al., 2003; Dodge et al., 2005). Based on this observation, both DNMT3 enzymes share the maintenance function with DNMT1 at least for some specific targets. Moreover, DNMT1 was shown to be essential for the process of de novo methylation, since DNMT3 enzymes were observed to preferentially generate hemimethylated CpG sites, which are then further methylated by DNMT1 (Fatemi et al., 2002). The main cause of this is the strong flanking sequence dependency of the DNMT3 enzymes (Handa and Jeltsch, 2005; Lin et al., 2002), which often leads to preferred methylation in one strand but disfavoured methylation in the other strand. In addition, it was proven that DNMT1 displays direct de novo methyltransferase activity on specific transposable elements (Haggerty et al., 2021) which would enhance the long-term repression of these regions. Another argument against the old DNA methylation model arose from studies in human stem cells and differentiated neuronal progenitor cells, in which DNA methylation was also observed in asymmetric non-CpG (CpA, CpC, CpT) context, although to a reduced extent (Jang et al., 2017). The existence of non-CpG methylation was long suspected to be a methodical artefact (He and Ecker, 2015). However, the presence of this epigenetic mark was proven later on first in mice (Ramsahoye et al., 2000) and then in the human genome (Lister et al., 2009). Until today, the detection of methylation in CpH context is still difficult due to the high amount of genomic CpG methylation and requires high-resolution sequencing such as Illumina Next-Generation Sequencing (NGS) (Metzker, 2010). In comparison to somatic cells that contain only about 0.02% of their total cytosine methylation in CpH-context, this number increases up to 25% in human ESCs (Jang et al., 2017). In human induced pluripotent stem cells (iPSCs) that carry roughly 68% of methylation at CpG sites, methylation of more than 8% was observed in CpA context followed by 2% in CpT and 1% in CpC context (Ziller et al., 2011). Furthermore, non-dividing cells such as neurons showed gene bodies and transposons enriched with non-CpG methylation at >2% of all cytosines (Lister et al., 2013). Overall, non-CpG methylation was shown to correlate with repressed 7 transcription (Guo et al., 2014b) and states of pluripotency (Butcher et al., 2016). These asymmetric sites are no targets for DNMT1 methylation due to the missing methylation information on the template strand, leaving only DNMT3A and DNMT3B as possible methyltransferases for their maintenance. Indeed, biochemical experiments showed that DNMT3A and DNMT3B can introduce this epigenetic mark due to their less stringent specificity and the expression levels of both enzymes were shown to correlate with DNA methylation levels in non-CpG the context in ESCs and mammalian oocytes (Arand et al., 2012; Shirane et al., 2013). Furthermore, active demethylation through the TET-TDG pathway can take place at both upper and lower strand, leading to fully unmethylated CpG sites that are no longer targets for DNMT1. At such sites, the de novo methyltransferases would need to reintroduce the DNA methylation marks again to maintain the methylation pattern (Jeltsch and Jurkowska, 2014). The specific interplay of DNMT-mediated methylation and TET oxidation has been recently shown to be critical for mammalian stem cells during their exit from pluripotency (Parry et al., 2021). In contrast to the global DNA methylation and demethylation waves that occur during other stages of embryonic development (Messerschmidt et al., 2014), which are generally attributed to either upregulation of TET or DNMT3A/DNMT3B expression, both enzyme families are co- expressed in this special transition stage towards differentiation (Parry et al., 2021). Strikingly, ESCs that were lacking DNMTs or TET enzymes were still pluripotent, but could not differentiate anymore (Dawlaty et al., 2014; Tsumura et al., 2006). The basis for this effect was shown to be the combination of the two highly active machineries together with passive DNA demethylation, which leads to fast and continuous turnover of the individual DNA methylation states, especially at distal regulatory elements of poor to mediate CpG content (Parry et al., 2021). Together, these experimental findings forced the epigenetic field to change the old static concept into a more dynamic one. DNA methylation is now seen as a continuous process determined at each CpG site and genomic region by the local concentrations and catalytic activities of DNA methyltransferases with partly-overlapping functions as well as TET enzymes and DNA replication rates (Jeltsch and Jurkowska, 2014). This revised model is now able to explain that cells originating from the same tissue can have different methylation patterns as shown by bisulfite sequencing (Zhang et al., 8 2009) due to stochastic changes in the enzymatic activities that lead only to the preservation of average methylation density profiles. Following this model, methylation mistakes can be corrected through the help of various feedback loops based on the crosstalk between DNA methylation and a complex network of chromatin marks. This either includes further recruitment of DNMTs at regions with repressive marks or reduction of methylation at regions of activating chromatin marks (Jeltsch and Jurkowska, 2014). 1.2.3 The family of DNA methyltransferases In mammals, the family of DNA methyltransferases comprises four active enzymes and one catalytically inactive but regulatory protein (Figure 3). The de novo methyltransferases DNMT3A and DNMT3B set the cell-specific methylation patterns during early embryogenesis as well as gametogenesis which are then maintained by the maintenance methyltransferase DNMT1 (Jurkowska et al., 2011d). Furthermore, a methyltransferase termed DNMT3C that evolved from DNMT3B was recently discovered in rodents and shown to exclusively methylate retrotransposons in male germ cells (Barau et al., 2016). In addition to the active enzymes, the catalytically inactive DNMT3L was shown to act as a regulatory factor during de novo methylation (Jurkowska et al., 2011d). As shown in Figure 3, all DNMTs share the same structural composition of two main parts which are connected by a linker sequence like a glycine-lysine repeat region in the case of DNMT1. The smaller C-terminal parts of all enzymes contain the highly conserved active centre as well as the cofactor binding site. The larger N-terminal part varies between the enzymes and contains different regulatory domains responsible for the nuclear localization of the enzymes as well as their interaction with DNA, specific histone post-translational modifications or other proteins (Jeltsch and Jurkowska, 2014). Unlike DNMT1 which was shown to lose its function after removal of the N- terminal domains (Fatemi et al., 2001; Zimmermann et al., 1997), DNMT3A and DNMT3B were observed to also be active in their isolated C-terminal form (Gowher and Jeltsch, 2002). 9 Figure 3: Schematic drawing of the domain arrangement in the DNMT family. All enzymes consist of a C-terminal part containing the ten conserved amino acid motifs responsible for the methyltransferase activity (except for the inactive DNMT3L) and a regulatory N-terminal part with several domains for interaction with DNA, chromatin or other proteins. PCNA, binding domain for proliferating cell nuclear antigen; NLS, nuclear localization signal; CXXC, cysteine-X-X-cysteine domain; BAH, Bromo-adjacent homology domains; GK, glycine-lysine repeats; PWWP, proline-tryptophan-tryptophan- proline domain; ADD, ATRX-DNMT3-DNMT3L domain (adapted from Ravichandran et al., 2018). All prokaryotic and eukaryotic cytosine C5 DNA methyltransferases contain ten evolutionary conserved amino acid motifs that are essential for the methyltransferase activity (Gowher and Jeltsch, 2018). From these motifs, motif IV (also known as PCQ motif; PCN in case of DNMT3s), VI (also known as ENV motif) as well as IX are essential for catalysis. The binding of the cofactor is mediated by motifs I and X and DNA binding as well as the specificity of the DNMTs is dependent on the non- conserved region between the motifs VIII and IX. Overall, the domain has a typical MTase fold made up of a mixed seven-stranded sheet of six parallel β-strands and one antiparallel strand inserted between strands 5 and 6. Structurally, all the DNMTs use a common principle base-flipping mechanism as shown for the bacterial methyltransferases HhaI (Klimasauskas et al., 1994) and HaeIII (Reinisch et al., 1995) in which the target cytosine is flipped out of the DNA double helix and inserted into the hydrophobic pocket of the active site. The transfer of the methyl group from the cofactor S-adenosyl-L-methionine (AdoMet) to position 5 of cytosine nucleobases then follows the catalytic mechanism depicted in Figure 4 (Lyko, 2018). At first, a nucleophilic attack of the catalytic cysteine residue from motif IV at 10 the C6 position takes place that covalently links the enzyme to the flipped-out cytosine. This leads to transient protonation of the N3 through the catalytic glutamate residue from motif VI. The next step includes the deprotonation of N3 by the glutamate residue that activates the C5 position and enables the nucleophilic attack on the methyl group of the cofactor AdoMet. In the end, β-elimination of the H5 proton by a base that has not been identified yet but could be a water molecule leads to the release of the methylated DNA and S-adenosyl-L-homocysteine (AdoHcy) from the enzyme. The fact that the initial nucleophilic attack of the cysteine and the attack on the methyl group must occur at opposite sides of the ring system is one of the reasons for the usage of the already described base-flipping mechanism. Figure 4: Catalytic mechanism of cytosine C5 DNA methyltransferases. Nucleophilic attack of the catalytic cysteine from motif IV (PCQ motif) coloured in green together with the catalytic glutamate from motif VI (ENV motif) leads to transfer of the red coloured methyl group from the cofactor AdoMet coloured in violet to the C5 position of cytosine followed by deprotonation with an unknown base (taken from Lyko, 2018). Methylation-sensitive DNA-interacting proteins (Du et al., 2015) are then able to detect the incorporated methyl group in the major groove. Although the introduction of this modification does not alter the Watson-Crick pairing, it changes the contact profile of the major groove. Moreover, the hydrophobic character of the methyl group leads to a bending of the DNA together with changes in flexibility, thermostability and solvation state (Rausch et al., 2021). 11 1.2.3.1 De novo methyltransferases DNMT3A and DNMT3B The establishment of CpG methylation patterns, as well as the preservation of non- CpG methylation, is mainly the role of the DNMT3 enzymes DNMT3A and DNMT3B assisted by DNMT3L. Although both active methyltransferases show high overlap in the sequence of their MTase domain (Okano et al., 1998), the enzymes were shown to have distinct temporal expression patterns and functions. While expression of DNMT3A mainly takes place in oocytes and early stages of embryonic preimplantation where the enzyme is responsible for allele-specific imprinting control during gametogenesis, DNMT3B is predominantly expressed in the blastocyst stage during post-implantation development (Kaneda et al., 2004; Kato et al., 2007; Watanabe et al., 2002). Furthermore, both enzymes are involved in the preservation of methylation states of various repetitive elements, examples being the human Satellite II (SatII) repeats in case of DNMT3B or major satellite repeats in case of DNMT3A (Chen et al., 2003). Indeed, several knockout studies emphasized the critical roles of all three DNMT3 enzymes, since mice were only partly viable after DNMT3A knockout, the absence of DNMT3B was lethal (Okano et al., 1999) and loss of DNMT3L also led to embryonic lethality or infertility (Bourc´his et al., 2001). The latter is an indirect effect on de novo methylation, since DNMT3L lacks catalytic activity itself but enhances the activity of DNMT3A and DNMT3B and thereby regulates their structural organization and nuclear localization (Jurkowska et al., 2011b). Mechanistic insights into the interaction of DNMT3A and DNMT3L were obtained later on through several crystal structures that were published without (Jia et al., 2007) or with co-crystallized DNA (Zhang et al., 2018). These structures revealed that DNMT3A can dimerize and form distinct linear heterotetramers with DNMT3L with the specific arrangement of 3L-3A-3A-3L, a fact that was also biochemically shown through ultracentrifugation and size exclusion chromatography (Jurkowska et al., 2008; Jurkowska et al., 2011b; Nguyen et al., 2019). Accordingly, the two active subunits located in the centre of the tetramer mediate the 3A-3A interaction at the so-called RD interface, while the two inactive DNMT3L subunits located at the outer positions are involved in the formation of two peripheral 3A-3L or FF interfaces (Figure 5A) (Jeltsch and Jurkowska, 2016). The naming is hereby based on the amino acids that predominantly make up these interfaces, being arginine and aspartate residues building a contact network in case of the polar RD interface, or four stacked 12 phenylalanine residues forming the hydrophobic FF interface. Like this, the central subunits build up the DNA binding site of the complex, while the outer subunits are not in contact with the DNA but rather interact with residues that are involved in catalysis or binding of the cofactor. The contacts through the FF interface result in the increased methyltransferase activity of DNMT3A in the presence of DNMT3L, which is not able to form RD interfaces by itself due to the lack of essential motifs in its MTase domain. Figure 5: Schematic illustrations of DNMT3A and DNMT3A/3L complex structure and multimerization. Heterotetramers of two DNMT3A and two DNMT3L subunits (shown in A in blue and orange, respectively), as well as homotetramers, can form and bind to one DNA strand or various DNA strands (pink) in parallel orientation with different potential for multimerization (shown in B and D, respectively). Like this, co-methylation can occur on both strands of CpG sites (shown in C) and a combination of both multimerization processes can lead to the formation of 2D methylation networks (shown in E) (taken from Jeltsch and Jurkowska, 2016). In contrast to DNMT3L, DNMT3A can also form homotetramers followed by potential multimerization in not only a horizontal (Figure 5B) but also a vertical direction (Figure 5D) with several DNA binding sites located on one or more DNA molecules in parallel arrangement (Figure 5C). This can lead to the formation of DNMT3A filaments or even DNMT3A networks (Figure 5E), which was also experimentally verified by scanning force microscopy and Förster resonance energy transfer assays (Jurkowska et al., 2008; Jurkowska et al., 2011b; Rajavelu et al., 2012). 13 Overall, DNA binding of multiple homo- or heterotetramer complexes occurs non- specifically (Rajavelu et al., 2012) but in a cooperative manner, meaning that binding of new complexes to a DNA molecule preferentially takes place next to already bound ones. This property of DNMT3A has been studied intensively (Emperle et al., 2014; Jia et al., 2007; Jurkowska et al., 2008; Rajavelu et al., 2012), nevertheless, contradicting results can also be found in the literature (Holz-Schietinger and Reich, 2010). In addition to the subunit organization and formation of the essential interfaces, the crystal structure published in 2007 already gave some hints about how CpG sites are methylated even though there was no co-crystallized DNA present in this complex (Jia et al., 2007). Interestingly, structural constraints on the two central active subunits hinder the simultaneous methylation of both strands of a CpG target site despite its palindromic nature. Instead, parallel co-methylation of two CpG sites in opposite strands was suggested, whereby a CpG distance of roughly 10 bps would fit the ~40 Å separation of the inner subunits. Consistent with this hypothesis, biochemical as well as cellular data of different mammalian studies showed the same 10 bps periodicity for CpG (Jia et al., 2007; Jurkowska et al., 2008) and even non-CpG methylation (Lee et al., 2017), creating characteristic DNA methylation profiles. Nevertheless, adjacent DNMT3 complexes in an oligomeric structure would still be able to methylate the upper or lower strand of one CpG target site, respectively, giving rise to some flexibility during the DNA methylation process. Recently, new insights could be obtained into the complex mechanism of DNMT3A/DNMT3L when a new crystal structure was solved in 2018 with co- crystallized DNA containing two CpG target sites or two short DNA substrates containing one site each (Zhang et al., 2018). To obtain stable structures, target cytosines were replaced with the nucleotide analogue zebularine. Usage of this cytosine analogue has the advantage that a stable covalent complex is formed between the cysteine in the active centres of the enzyme and the target cytosines due to the prevention of the β-elimination step in the catalytic mechanism (Osterman et al., 1988). This new crystal structure was in good agreement with the DNA-free structure published almost 10 years earlier (Jia et al., 2007) except for a part of the target recognition domain (TRD) which appeared to be disordered in the absence of DNA but ordered upon its binding. In addition to the TRD loop, the catalytic loop, as well as the 14 RD interface, were determined to be the essential DNA interacting regions. Furthermore, a distance of 12 bps instead of 10 bps between the two target sites on one DNA strand was found to be preferred for co-methylation, a discrepancy which still had to be resolved. Surprisingly, the two different structures determined in this new study also showed variance regarding the bending of the DNA upon binding to the active centre. One structure showed the DNMT3A/3L heterotetramer in contact with two short individual DNA molecules, which were both unbent and bound to one inner subunit of the complex, respectively. In contrast, the longer DNA with two target CpG sites showed compression of the major groove due to approximately 40° bending in the middle of the substrate. Lastly, several DNMT3A residues were observed to be involved in specific contacts with the bound DNA, leading to distinct flanking sequence preferences, as further discussed in section 1.4. Despite the tremendous knowledge obtained for DNMT3A or DNM3A/3L methylation, little was known about the mechanism or organization of DNMT3B complexes due to the absence of a crystal structure until the start of this thesis. Several biochemical studies have shown that DNMT3B methylates DNA in a processive manner (Norvil et al., 2018) instead of the cooperative mechanism of DNMT3A (Jia et al., 2007; Jurkowska et al., 2008; Rajavelu et al., 2012) and that DNMT3B also harbours intrinsic preferences for specific flanking sequences, although they differ from DNMT3B (see section 1.4). Moreover, the sequence similarity of both DNMT3 enzymes especially in the conserved interfaces (Okano et al., 1998) would suggest a potential self- oligomerization of DNMT3B, but this could not be proven so far. Therefore, further insights into the mechanism of the de novo methylation by DNMT3B needed to be obtained through future experiments and for DNMT3A, the described discrepancies had to be clarified. 15 1.2.3.2 The maintenance methyltransferase DNMT1 DNMT1 was the first mammalian cytosine C5 DNA methyltransferase to be cloned and sequenced (Bestor et al., 1988). The protein is a large polypeptide of around 180 kDa comprising 1616 or 1620 amino acids in the human and mouse enzyme, respectively. Its amino acid sequence was later shown to be highly conserved between different species (Jurkowski and Jeltsch, 2011). Since DNMT1 is the major enzyme responsible for the maintenance of DNA methylation, it was not surprising that knockout of the enzyme or loss of its catalytic function were observed to be fatal in various studies. Disruption of the DNMT1 gene in mice led to delays in embryonic development and death of the embryos shortly after gastrulation (Li et al., 1992). A similar phenotype was observed in mice containing a mutation of DNMT1 in both alleles, rendering the enzyme catalytically inactive (Takebayashi et al., 2007). In addition, deletion of DNMT1 was shown to be lethal in all proliferating somatic cells (Fan et al., 2001; Sen et al., 2010; Trowbridge et al., 2009) and a strong global reduction in the DNA methylation level was observed after disruption of the gene in ESCs (Liao et al., 2015). Overall, these results support the major role of DNMT1 during all embryonic developmental stages. Expression levels of DNMT1 depend strongly on the cell type and the cell cycle stage. Non-dividing cells were found to express only low levels of the enzyme in contrast to the high expression levels in all mitotic cells (Kishikawa et al., 2003; Lee et al., 1996; Robertson et al., 2000). Furthermore, the levels of DNMT1 expression, as well as its localization in the nucleus, are strongly regulated during cell proliferation. The protein shows a dynamic subnuclear localization pattern, including various spotty patterns in the early, middle and late S-phase corresponding to sites of active DNA replication in specific chromatin states. The structures develop from small spots of DNMT1 associated with foci with ongoing replication in the euchromatin to larger toroidal structures of DNMT1 at less abundant replication sites on centromeric heterochromatin. This specific localization pattern is lost during other stages of the cell cycle, in which the enzyme shows a diffuse distribution throughout the nucleus (Schneider et al., 2013). 16 In all S-phase stages, the punctuate pattern of DNMT1 in the cells is a result of its association with replication forks and the hemimethylated CpG sites that are occurring after DNA replication and for which the methylation state has to be restored. For this specific maintenance function, DNMT1 displays a very high preference for hemimethylated over unmethylated CpG sites, ranging from 15-fold in in vitro experiments using a long 634 bps substrate with multiple hemimethylated target sites to 30-40-fold on short 30mer oligonucleotides containing one hemimethylated CpG site (Fatemi et al., 2001; Goyal et al., 2006; Pradhan et al., 1999) depending on the experimental conditions. In contrast to that, DNMT3A was shown to not discriminate between unmethylated and hemimethylated target sites (Gowher and Jeltsch, 2001). As a maintenance methyltransferase, DNMT1 was also proven to be highly processive referring to its ability to consecutively introduce methylation at up to 30 CpG sites without dissociation from the DNA (Goyal et al., 2006; Hermann et al., 2004; Vilkaitis et al., 2005) with a very low skipping rate of under 0.3% (Goyal et al., 2006). Since it was shown in these studies that processive methylation only occurs on one DNA strand without changing the target strand, DNMT1 was proposed to slide along the daughter strand while DNA replication takes place. Despite all the enzymatic properties unravelled so far, stable maintenance methylation of all 56 million CpG sites in the human genome by DNMT1 seems to be impossible during S-phase. Therefore, additional mechanisms involving other proteins and cofactors present in the cell are required to further enhance the activity and specificity of the methyltransferase. During the last 20 years, this complex network around DNMT1 has been studied intensively and several key players have been identified (Petryk et al., 2021). Ubiquitin-like, containing PHD and RING finger domains 1 (UHRF1), an E3 ubiquitin ligase, was shown to preferentially recognize hemimethylated CpG sites through its SET-and RING-associated (SRA) domain, which leads to allosteric activation of the protein that allows ubiquitination of histone H3 and/or auto-ubiquitination (Fang et al., 2016). Both modifications are recognized by the replication-focus-targeting-sequence (RFTS) domain of DNMT1, which guides the methyltransferase to the hemimethylated sites (Li et al., 2018) and leads to allosteric activation of the enzyme through the release of the auto-inhibitory RFTS domain from the catalytic centre (Jeltsch and 17 Jurkowska, 2016). UHRF1 also recognizes repressive chromatin marks such as H3K9 trimethylation (Nady et al., 2011) together with unmodified H3R2 (Hu et al., 2011; Wang et al., 2011) using a combinatorial readout involving its tandem tudor and plant homeodomain, which was demonstrated to be essential for H3 ubiquitination and subsequent DNA methylation (Qin et al., 2015). In addition, proliferating cell nuclear antigen (PCNA), a ring-shaped DNA clamp interacting with DNA polymerase δ, was shown to colocalize with DNMT1 at replication foci (Chuang et al., 1997; Easwaran et al., 2004) and the presence of PCNA enhances the binding of the methyltransferase to DNA (Iida et al., 2002). DNMT1 was observed to interact with PCNA through its PCNA-binding domain in the N-terminal part of the protein, but the detailed mechanism of this interaction was unknown until the crystal structure of PCNA with the interacting domain of DNMT1 (PIP box motif) was solved (Jimenji et al., 2019). Indeed, mutation of this binding domain was shown to reduce the maintenance methylation by a factor of 2 in comparison to the wild-type DNMT1 (Egger et al., 2006; Spada et al., 2007). Overall, disruption of the complex network around DNMT1, UHRF1 and PCNA led to global DNA hypomethylation in brain, lung, breast and mesothelial cells which is an oncogenic event in human tumorigenesis (Pacaud et al., 2014). Regarding the specificity of DNMT1, several crystal structures published in 2011 and especially in 2012 provided mechanistic insights into substrate recognition of the methyltransferase (Song et al., 2011; Song et al., 2012). The 2012 work provided the structure of a truncated mouse DNMT1 variant (amino acids 731-1602, PDBI: 4DA4) co-crystallized with a 12 bps oligonucleotide with a single 5-fluorocytosine containing hemimethylated CpG site. As depicted in Figure 6, the DNA was bound in the catalytic cleft with the methylated target 5-fluorocytosine flipped out of the DNA double helix and placed in the active site. In addition, the recognition of the hemimethylated CpG site was shown to be mediated via a hydrophobic surface in the TRD subdomain around the methyl group in the major groove. DNA contacts were formed by two loops of this subdomain in the major and one catalytic loop penetrating the minor groove of the DNA. Surprisingly, in this structure, the space of the flipped-out cytosine was occupied by two amino acids from DNMT1 and several structural reorganizations of the DNA around the target site took place. 18 Figure 6: Structure of mouse DNMT1 co-crystallized with hemimethylated DNA. Bromo-adjacent homology domains 1 and 2 (BAH1 and BAH2) are coloured light pink and orange, the catalytic and target recognition domain (TRD) loops 1 and 2 are coloured green and the methyltransferase domain is shown in cyan; black dashed lines are used to show the disordered [(GK)n] linker; the DNA is coloured barley with the flipped-out 5-fluorocytosine and the parental cytosine shown in purple and blue, respectively; Coordinated zinc ions are shown in purple and the bound cofactor S- adenosyl-L-homocysteine (AdoHcy) is shown in a space-filling representation (adapted from Song et al., 2012; PDBI: 4DA4). 19 1.2.4 Acute myeloid leukaemia and the role of the DNMT3A R882H mutation The relevance of faithful establishment and maintenance of epigenetic marks became increasingly clear during the last decades, in which the development and progression of many diseases and disorders could be linked to aberrant epigenetic processes (Robertson, 2005). In the case of DNA methylation, reduction of global methylation levels in human cancer cells compared to normal cells had already been observed in 1983 (Feinberg and Vogelstein, 1983), a hallmark which was later connected to the hypomethylation of high numbers of repetitive elements (Weisenberger et al., 2005) leading to genomic instability due to inefficient silencing of transposable elements. Generally, two types of carcinogenic alterations have to be distinguished: “driver” alterations that lead to enhanced proliferation of cancer cells and therefore promote the development of the disease, and “passenger” alterations that do not have this influence but accumulate by chance in the clonal selection of cell lines as cancer progresses (Pon and Marra, 2015; Roy et al., 2014). Although the exact classification of the different epigenetic changes is difficult, aberrant global DNA methylation levels were shown to occur frequently and in an early stage of cancer development (Sonnet et al., 2014). The cause for these changes often lies in different somatic mutations of DNMT3A, a condition that was shown to be prevalent in different types of malignancies such as lung (Gao et al., 2011) and haematological cancers (Yang et al., 2015) or developmental disorders (Tatton-Brown et al., 2014). Strikingly, mutations of DNMT3A were observed to be especially enriched in 20-40% of patients with acute myeloid leukaemia (AML) (Ley et al., 2010; Yamashita et al., 2010; Yan et al., 2011), a type of haematological cancer that originates from granulocytes or monocytes in the bone marrow and causes unregulated proliferation and subsequent accumulation of malignant, immature white blood cells (Ferrara and Schiffer, 2013). The pathogenesis of AML is a multistage process as depicted in Figure 7, where mutations in epigenetic regulators such as DNMT3A tend to occur in the early stages, leading to a pre-leukemic state (Li et al., 2016a; Papaemmanuil et al., 2016; Sato et al., 2016) which develops into heterogenic cancer cells if even more mutations accumulate. Like this, DNMT3A mutations drive the carcinogenic progression, which negatively influences the prognosis of treatment efficiency, remission time and survival rate (Brunetti et al., 2017). 20 Figure 7: Multistage process of AML development. A pre-leukemic state (Pre-LSC) is induced first through primary mutations of epigenetic regulators such as DNMT3A and TET2 occurring in hematopoietic stem cells or committed progenitor cells (HSPC). In this state, cells still contribute to self-renew and normal haematopoiesis. However, further mutations drive the pathogenesis, leading to different subpopulations of AML cells (adapted from Sato et al., 2016). A more detailed look at the DNMT3A mutations frequently observed in AML patients revealed that most of them are clustered at the MTase as well as the proline- tryptophan-tryptophan-proline (PWWP) and ATRX-DNMT3A-DNMT3L (ADD) domains (Figure 8). Generally, a strong enrichment of 73% missense mutations could be determined, with most of them occurring in a heterozygous manner combined with an intact wild-type allele that still expresses the unmutated DNMT3A (Brunetti et al., 2017). Hereby, the most common mutated residue is R882, which was shown to account for roughly 60% of the missense mutations and is predominantly converted to histidine (two-thirds) or cysteine (one-third) and only rarely to serine and proline. The location of R882 in the RD interface of DNMT3A complexes and the involvement in DNA binding through contacts to the DNA backbone (Zhang et al., 2018) hint towards a specific molecular effect of this mutation. For this reason, and due to the high prevalence of the mutation, several groups have investigated the mechanistic and pathogenic mechanisms of the R882H mutation, but with partly contradicting results (Brunetti et al., 2017; Marcucci et al., 2012). First, the catalytic activity of the mutant 21 was shown to be reduced by 30-50% in in vitro experiments compared to the wild-type enzyme (Emperle et al., 2018a, Emperle et al., 2018b; Holz-Schietinger et al., 2012; Yan et al., 2011). Observed effects were even more striking in vivo, although conflicting models of interaction with the co-expressed wild-type enzyme were proposed, including a dominant-negative effect (Kim et al., 2013; Russler-Germain et al., 2014). Similarly, the influence of the mutation on the potential oligomerization of DNMT3A is still under debate, with one group presenting data that supports an impaired dimerization of R882H subunits leading to a subsequent decrease in the processivity of the enzyme (Holz-Schietinger et al., 2012). In contrast, other groups showed increased multimerization of the mutated DNMT3A (Nguyen et al., 2019), changes in the enzymatic activity under different pH conditions (Holz-Schietinger and Reich, 2015) or the cooperative binding to DNA (Norvil et al., 2018). Furthermore, R882H was not only linked to global hypomethylation of CpG island and shore regions in AML (Qu et al., 2014) but it was also associated with hypermethylated promotors (Yan et al., 2011). Finally, the location of R882H in the RD interface was shown to influence recognition of the CpG target sites, with the observation of strong flanking sequence preferences distinct from wild-type DNMT3A (Emperle et al., 2018b) as further discussed in section 1.4. In summary, several consequences of the DNMT3A R882H mutation are still not fully unravelled, so further insights into this highly relevant AML mutation need to be obtained through future experiments. Figure 8: Mutations of DNMT3A occurring in AML. Nonsynonymous mutations are depicted on the schematic protein domain arrangement and presented as lollipops with colours referring to the type of mutation and size correlating with the mutation count (based on Brunetti et al., 2017). 22 1.3 DNA Demethylation 1.3.1 Principles of DNA demethylation and roles of its products DNA methylation as an epigenetic modification was long thought to be relatively stable due to the chemically inert character of the C-C-bond, therefore removal of this mark was suspected to take place through replication-dependent loss of the methylation mark in the absence of proper DNMT1 function (see section 1.3.3). However, this mechanism appears to be insufficient to account for the two main demethylation events happening during embryonic development, demethylation in the paternal genome after fertilization but before the first DNA replication occurs and genome-wide demethylation during germ cell specification (Messerschmidt et al., 2014). The existence of an active demethylation pathway was proven in 2000 when two independent groups observed a replication-independent global loss of DNA methylation in mouse zygotes (Mayer et al., 2000; Oswald et al., 2000), but the mechanism remained unclear until the Ten-eleven translocation (TET) enzyme TET1 was discovered in 2009. This enzyme was shown to oxidize 5-methylcytosine (m5C) to 5-hydroxymethylcytosine (hm5C) in vitro and in vivo (Tahiliani et al., 2009). The enzyme was identified in a computational screen as the mammalian analogue of the trypanosome dioxygenases JBP1 and JBP2, which oxidize thymine to 5- hydroxymethyluracil using the cofactors Fe(II) and 2-oxoglutarate (2OG) (Cliffe et al., 2009; Yu et al., 2007). Later on, the same oxidation activity was confirmed for the other two members of the TET family, TET2 and TET3 (Ito et al., 2010). 5-hydroxymethylcytosine as a modified base had already been discovered in 1952 in the genomes of T-even bacteriophages T2 and T4 as part of their restriction- modification system (Wyatt and Cohen, 1952). Due to contradicting experiments focused on the abundance of hm5C in mammals (Kothari and Shankar, 1976; Penn et al., 1972), the formation of this modified base was suspected to result from oxidative damage until the breakthrough in 2009. Using mouse ESCs, the authors could show that the oxidation product hm5C accounts for 0.03% of all nucleotide bases genome- wide with m5C being 14-fold more abundant (Tahiliani et al., 2009). The hm5C content was even higher in mouse granule cells and purkinje neurons, in which it constitutes 0.2% and 0.6% of all nucleotides, respectively (Kriaucionis and Heintz, 2009). Lastly, the abundance of hm5C was found to be highest in brain cells and ESCs (0.7% and 23 0.4% of dG, respectively), but also other mouse tissues such as lung, kidney, heart and muscle were shown to contain this modification (Globisch et al., 2010). Overall, hm5C levels varied strongly between the different tissue types, whereas m5C levels were relatively constant. Throughout the genome, hm5C was shown to occur non- overlapping with m5C and the modification was found especially enriched in euchromatic regions. In brain tissue and ESCs, hm5C was mostly observed at transcription start sites (TSS), promotors with moderate to low CpG content or in gene bodies (Shi et al., 2017), where a positive correlation with gene expression was observed (Ficz et al., 2011). Given the fact that special reader proteins such as UHRF2 (Spruijt, et al., 2013; Zhou et al., 2014) have been identified for hm5C, the oxidized cytosine species is nowadays also considered to play a role as a stable epigenetic mark whose specific biological role is still unclear. Unsurprisingly, the loss of this epigenetic modification is discussed as another hallmark of cancer (Ficz and Gribben, 2014). The full pathway of active demethylation (schematically shown in Figure 9) was unravelled two years later when two groups (He et al., 2011; Ito et al., 2011) independently showed that the TETs can further oxidize hm5C to 5-formylcytosine (f5C) and 5-carboxylcytosine (ca5C), similar to the stepwise oxidation of thymine by thymine-7-hydroxylase (Liu et al., 1973; Neidigh et al., 2009). Thymine DNA glycosylase (TDG) then recognizes these higher oxidized bases and hydrolyses the bond between the base and the deoxyribose, thereby creating an abasic site in the DNA which is replaced with an unmodified cytosine by the base excision repair machinery (BER) (He et al., 2011). The role of TDG was confirmed since knockdown of the enzyme led to a 10-fold increase in the levels of f5C and ca5C in mouse ESCs (Cortellino et al., 2011; Raiber et al., 2012). However, since its expression levels are low in the zygote and loss of TDG did not change the zygotic demethylation (Guo et al., 2014a), the enzyme cannot be the only DNA glycosylase involved. Levels of f5C and ca5C were found to be 100-1,000-fold lower than hm5C (Carell et al., 2018), but evidence for their biological relevance has been found (Lu et al., 2015; Song et al., 2013) so these less abundant bases could still have roles as epigenetic marks. 24 Figure 9: Pathways of passive and active DNA demethylation. DNA methylation is set by DNA methyltransferases (DNMT) at the C5 position of cytosine creating 5-methylcytosine (m5C). Using Fe(II), 2-oxoglutarate (2OG) and molecular oxygen as cofactors, the Ten-eleven translocation (TET) enzymes sequentially oxidize the methyl group yielding 5-hydroxymethylcytosine (hm5C) followed by 5- formylcytosine (f5C) and 5-carboxylmethylcytosine (ca5C). The last two modified bases are targets for the thymine DNA glycosylase (TDG), which excises the modified base which is then replaced with unmodified cytosine by the base excision repair (BER) machinery. This active demethylation pathway occurs in parallel to passive demethylation through inhibition of DNMT1 by hemi-modified CpG sites containing oxidized m5C species (based on An et al., 2017). 1.3.2 Ten-eleven translocation enzymes In mammals, there are three different members of the TET family, namely TET1, TET2 and TET3, which have both overlapping and distinct functions depending on the cell type. While TET1 and TET2 were shown to be highly expressed in ESCs and the inner cell mass, the expression of TET3 was the highest in the oocyte and zygote, making this enzyme the main candidate responsible for the major demethylation during embryonic development. After differentiation, the levels of all TET enzymes generally decrease (Rasmussen and Helin, 2016). All members of the TET family consist of a highly conserved C-terminal part and a less- conserved N-terminal part as shown in Figure 10 for human TETs (Rasmussen and Helin, 2016). The core catalytic domain contains a double-stranded β-helix domain 25 (DSBH) with binding sites for Fe(II) and 2OG, which adopts a characteristic fold for dioxygenases dependent on these cofactors (Iyer et al., 2009). The domain also harbours a low-complexity region increasing in size from TET1 to TET3, whose function remains unknown but was shown to not affect the enzymatic activity upon deletion (Hu et al., 2013). As shown for the DNMT3s, the isolated C-terminal domain on the TET enzymes alone can localize to the nucleus where it converts m5C to hm5C (Ito et al., 2010; Tahiliani et al., 2009). Overall, a large number of isoforms were found over the last years generated by alternative splicing or differential usage of the promotors (Melamed et al., 2018). Figure 10: Schematic drawing of the domain arrangement in the human TET family. The C-terminal core catalytic domain consists of a cysteine-rich (Cys, shown in red) region and the double-stranded β-helix domain (DSBH, shown in violet), which contains the two binding sites for the cofactors Fe(II) and 2-oxoglutarate (2OG) (shown in black and green, respectively) and a low-complexity region (shown in grey). The N-terminal part of TET1 and TET3 contains a zinc finger cysteine-X-X- cysteine (CXXC, shown in blue) domain that was evolutionarily lost for TET2 (taken from Rasmussen and Helin, 2016). Similar to the DNMTs, the TET enzymes use a base-flipping mechanism, in which the target cytosine is flipped out of the DNA double helix and inserted into the active site, resulting in a DNA bending of about 40° (Hu et al., 2013). Generally, the catalytic mechanism of the TET enzymes consists of two main parts, the activation of dioxygen followed by the oxidation of the respective substrate as depicted in Figure 11 (Parker et al., 2019). Firstly, Fe(II) and water are coordinated in the active centre through a HXD triad with X being any amino acid residue. The catalytic cycle then starts with the 26 binding of 2OG resulting in an active form of the enzyme, which in turn leads to the binding of the DNA substrate. After the replacement of one coordinated water molecule, one oxygen atom of molecular dioxygen binds to the Fe(II). Activation of the dioxygen occurs through the insertion of the unbound oxygen atom after decarboxylation of 2OG to succinate, followed by the cleavage of the bond between the two oxygen atoms. Like this, a highly reactive Fe(IV)-oxo intermediate is formed (Krebs et al., 2007; Valegard et al., 2004), which abstracts a hydrogen from the (modified) methyl group of the DNA substrate. The substrate radical then attacks the Fe(III)-hydroxide complex, resulting in the release of the oxidized substrate and the reduction to Fe(II) (Hoffart et al., 2006; Price et al., 2003). After the dissociation of succinate, the oxidation cycle can start again after the binding of 2OG and oxygen following the same mechanism. Since the +2 oxidation state of Fe(II) is essential for the activity of the TET enzymes, reducing agents such as ascorbic acid can enhance the oxidation reaction (Blaschke et al., 2013; Minor et al., 2013; Yin et al., 2013). Figure 11: Catalytic cycle of TET enzymes. The oxidation follows a radical mechanism that is initiated through the binding of the cofactors Fe(II) anoxoglutaraterate to the active site of the TET enzyme (taken from Parker et al., 2019). 27 1.3.2.1 TET1 The first member of the TET enzyme family, TET1, is the largest enzyme in the group with a length of 2136 amino acids. It contains the largest N-terminal part but the shortest C-terminal part due to the smallest size of the unstructured region in the DSBH domain (see Figure 10 for domain arrangements of human TET1). Therefore, the catalytic domain of TET1 was the easiest to purify and the first TET enzyme for which oxidation of m5C to hm5C was demonstrated (Tahiliani et al., 2009). Analyses of the hm5C distribution across various tissue and cell types revealed that TET1 shows high expression levels in mouse ESCs and primordial germ cells (PGCs) correlating with hm5C enrichment (Yamaguchi et al., 2012), but its levels are decreasing during differentiation. In ESCs, it was shown to be involved in the regulation of stem cell maintenance by suppressing the expression of factors related to differentiation (Dawlaty et al., 2011; Williams et al., 2011; Wu et al., 2011). In PGCs, TET1 is highly expressed and it contributes to the massive demethylation that occurs genome-wide to regulate the methylation of imprinting genes (Hackett et al., 2013; Yamaguchi et al., 2013). Compared to that, either no or only low expression levels were observed in the oocyte and zygote (Wossidlo et al., 2011). TET1 was observed to be present in high levels in iPSCs from mouse embryonic fibroblasts, where deletion of all TET enzymes was shown to prevent iPSC formation (Gao et al., 2013). Finally, overexpression of the enzyme was shown to increase hm5C levels in the central nervous system (Guo et al., 2011). Immunostaining of HEK293 cells showed that TET1 localizes in the nucleus (Tahiliani et al., 2009). Looking at the specific placement of TET1 binding sites in the genome, several groups demonstrated the colocalization with hm5C patterns in euchromatin with special enrichment at hypomethylated promotors of high CG-content (Williams et al., 2011; Wu et al., 2011). Although hm5C levels were demonstrated to be reduced after knockout of TET1 in ESCs and m5C levels were increased especially at CpG-rich promotors after knockdown or knockout of the enzyme, TET1 deficient mice were shown to be viable and fertile, with normal brain development except that they are smaller in size (Dawlaty et al., 2011) and have poor learning and memory function due to impaired neurogenesis of the hippocampus (Gao et al., 2013). 28 Despite its early characterisation compared to the other TET enzymes, no crystal structure of mouse or human TET1 is available until today. Nevertheless, several remarks about the reactivity of the enzyme can be made based on available TET2 and Naegleria gruberi NgTET1 structures due to the partly conserved catalytic domain (Hashimoto et al., 2014). The NgTET1 enzyme was shown to oxidize m5C following the same mechanism, with similar overall structure and DNA recognition as observed for TET2 (Hashimoto et al., 2014). DNA is bound on the basic surface of NgTET1, contacted by hairpin loop L1 (equivalent to loop L2 in human TET2) and the methylated cytosine is flipped out of the DNA helix and inserted into the active site cavity, inducing DNA bending of 65°. The size of this hydrophobic pocket was demonstrated to be responsible to control the activity of the enzyme towards higher oxidation since the oxidation capacity was shown to be reduced if a pocket size reducing point mutation was introduced (Hashimoto et al., 2015). Overall, 5-10-fold lower activities on hm5C and f5C containing substrates were observed in comparison to m5C substrates (Hashimoto et al., 2015). In terms of substrate specificity, NgTET1 was shown to prefer CpG sites over non-CpG sites (Hashimoto et al., 2014). Nevertheless, the enzyme is also suspected to oxidize target sites in CpH context to a lower extent, which would fit the abundance of hm5C in non-CpG (especially CpA) context observed in neurons (Mellén et al., 2017) and the role of TET1 in demethylation of neurons (Guo et al., 2011). 1.3.2.2 TET2 Compared to TET1, the N-terminal part of TET2 is smaller and lacks the cysteine-X-X- cysteine (CXXC) domain, which was lost due to gene duplication and inversion during evolution. It is now known that TET1 binds the inhibition of the dvl and axin complex (IDAX) protein, which contains a CXXC domain that presumably replaces the endogenous one (Iyer et al., 2009; Pastor et al., 2013). The C-terminal part of TET2 contains a larger unstructured region in its DSBH domain (see Figure 10 for domain arrangements of human TET2), which was shown not to influence activity but to hinder expression and purification of the protein, hence it was deleted in some structural and biochemical studies (Hu et al., 2013; Hu et al., 2015). 29 In terms of protein expression in different cell types, TET2 was shown to have similar expression levels as TET1 (Ficz et al., 2011; Ito et al., 2010; Wassidlo et al., 2011) with additional abundance in hematopoietic cells (Ko et al., 2010). The enzyme was also found to localize to the nucleus, which was shown, among others, for U2OS and HEK293 cells (Ito et al., 2010). Generally, TET2 is the family member most frequently mutated and misregulated in AML and other types of myeloid cancers, with an occurrence of 10-30% (Jiang, 2020; Langemeijer et al., 2009; Tefferi et al., 2009) and observed loss-of-function alterations resulting in decreased levels of hm5C. In accordance, mice harbouring TET2 mutations developed myeloid malignancies and TET2 knockout mice showed myeloid and also lymphoid disorders (Ito et al., 2019). Furthermore, DKO of TET1 and TET2 led to a 40% reduced birth rate (Dawlaty et al., 2011; Dawlaty et al., 2013) and TET1-3 TKO ESCs showed altered capacity in differentiation and embryonic development (Dawlaty et al., 2014). In the case of TET2, several crystal structures of the human enzyme are available with different modified versions of DNA (Hu et al., 2013; Hu et al., 2015), allowing for specific characterisation of protein-DNA interactions. As depicted in Figure 12, a crystal structure including methylated dsDNA shows that the substrate is bound above the core of the DSBH domain (enriched in basic and hydrophobic amino acids) with stabilization by the loops L1 and L2 from the cysteine-rich region. The target cytosine is flipped out of the DNA helix and the emptied space is occupied by a hydrophobic loop with a highly conserved tyrosine residue for stabilization. This is followed by the orientation of the methyl group towards the Fe(II) and the 2OG cofactors to enable catalytic turnover. As for NgTET1 (Hashimoto et al., 2015), the catalytic cavity of TET2 was shown to be large enough to accommodate both hm5C and f5C, with recognition of these modified target sites being the same as for m5C and almost identical conformation of the flipped-out base in the active centre (Hu et al., 2013; Hu et al., 2015). The reason for the 5-10-fold lower conversion rates, which were experimentally determined for hm5C and f5C containing DNA substrates (Hu et al., 2013; Hu et al., 2015; Ito et al., 2011), lies in the structure of the catalytic pocket. Additional hydrogen bonds are formed in presence of these oxidized species, hindering free rotation of the C-C-bond between the C5 cytosine and the modified methyl group. This leads to restrained conformations of the oxidized groups that prevent a fast abstraction of hydrogen during the catalytic mechanism, therefore slowing down the whole turnover 30 process. Whether the enzyme uses a distributive or processive mechanism for the stepwise oxidation of one site or between different sites on the same DNA substrate is still under debate (Crawford et al., 2016; Tamanaha et al., 2016). Figure 12: Structure of human TET2 co-crystallized with hemimethylated DNA. TET2 is shown in cyan, with co-crystallized DNA shown in grey and the flipped-out 5-methylcytosine coloured in orange; coordinated Fe(II) and 2-oxoglutarate are coloured in dark blue and light green, respectively (based on Hu et al., 2013; PDBI: 4NM6). Overall, it was shown that TET2 prefers CpG sites over target sites in a non-CpG context, as demonstrated for NgTET1 (Hashimoto et al., 2014; Hu et al., 2013). Since most cytosine methylation occurs in the CpG context, this preference of the TET enzymes fits nicely to the preference of the DNMTs. Nevertheless, there was also methylation found in non-CpG context with the highest levels in embryonic stem and brain cells (Ku et al., 2011), accompanied by hm5C in CpH context (Ficz et al., 2011; Lister et al., 2013). Following this observation, TET2 was proven to have moderate activity on CpH substrates, with reported preferences for either CpA (DeNizio et al., 2021) or CpC (Hu et al., 2013).