Molecular Dynamics Simulations of the Substrate- and Product Specificity and Mechanism of DNA- and Protein Lysine Methyltransferases A thesis accepted at the Faculty 4: Energy-, Process- and Bio-Engineering of the University of Stuttgart and the Stuttgart Center for Simulation Science in partial fulfillment of the requirements for the degree of Doktor der Naturwissenschaften/PhD in Natural Science (Dr. rer. nat.) by Philipp Schnee born 27.10.1995 in Stuttgart, Germany Main Referee: Prof. Dr. Albert Jeltsch (University of Stuttgart) Co-Referee: Prof. Dr. Martin Zacharias (Technical University Munich) Committee Chair: Prof. Dr. Markus Morrison (formerly Rehm) (University of Stuttgart) Date of Defense: 17.04.2024 University of Stuttgart Institute of Biochemistry and Technical Biochemistry 2024 II Eidesstattliche Erklärung Hiermit versichere ich, dass ich die vorliegende Arbeit mit dem Titel „Molecular Dynamics Simulations of the Substrate- and Product Specificity and Mechanism of DNA- and Protein Lysine Methyltransferases” selbstständig verfasst habe und dabei keine anderen als die angegebenen Quellen und Hilfsmittel verwendet habe. Declaration of Authorship I hereby certify that the dissertation entitled „Molecular Dynamics Simulations of the Substrate- and Product Specificity and Mechanism of DNA- and Protein Lysine Methyltransferases” is entirely my own work except where otherwise indicated. Passages and ideas from other sources have been clearly indicated. Name: Philipp Schnee Unterschrift/Signature: ………………………………………………….. Datum/Date: 08.01.2024 III Table of contents Abstract .................................................................................................................................................. VI Zusammenfassung .................................................................................................................................. IX Acknowledgements ............................................................................................................................... XII List of publications................................................................................................................................ XIV Author contributions ............................................................................................................................. XV List of figures ........................................................................................................................................ XVI List of abbreviations ........................................................................................................................... XVIII 1. Introduction ..................................................................................................................................... 1 1.1. Epigenetics ............................................................................................................................. 1 1.2. Chromatin structure and its regulation ................................................................................. 1 1.3. DNA methylation.................................................................................................................... 2 1.3.1. DNA methyltransferases ....................................................................................................... 3 1.4. Lysine methylation ................................................................................................................. 4 1.4.1. Histone lysine methylation .................................................................................................... 5 1.4.2. Non-histone protein lysine methylation ............................................................................... 5 1.5. Protein lysine methyltransferases ......................................................................................... 6 1.5.1. Structure of SET domain PKMTs ............................................................................................ 7 1.5.2. Different structural arrangements of SET domain PKMTs .................................................... 9 1.5.3. Autoinhibition of SET domain PKMTs .................................................................................. 11 1.5.4. Placeholder residues ........................................................................................................... 11 1.5.5. Target lysine deprotonation ................................................................................................ 12 1.5.6. Reaction mechanism of SET domain PKMTs ....................................................................... 14 1.5.7. Substrate specificity of SET domain PKMTs ........................................................................ 15 1.5.8. Discovery of PKMT super-substrates................................................................................... 16 1.5.9. Product Specificity of SET domain PKMTs ........................................................................... 18 1.6. Histone lysine 36 methylation ............................................................................................. 19 1.6.1. SETD2 ................................................................................................................................... 20 1.6.2. NSD2 .................................................................................................................................... 22 1.7. Lysine demethylation ........................................................................................................... 23 IV 1.8. Molecular Dynamics Simulation .......................................................................................... 25 1.8.1. Modelling ............................................................................................................................. 25 1.8.2. Structure of MD Simulations ............................................................................................... 27 1.8.3. Steered molecular dynamics simulations ............................................................................ 29 2. Aims of this work ........................................................................................................................... 31 2.1. MD Simulation of the somatic cancer mutation R882H of DNMT3A .................................. 31 2.2. MD Simulation of the somatic cancer mutation T1150A of NSD2 ...................................... 32 2.3. Mechanistic basis of super-substrate peptides ................................................................... 32 3. Material and Methods ................................................................................................................... 33 3.1. MD simulations and representation .................................................................................... 33 3.1.1. MD simulations of the DNMT3A/L hetero tetramer complexed with DNA ........................ 34 3.1.2. sMD simulations of the peptide association process into the NSD2 active site ................. 34 3.1.3. MD simulations of NSD2 ...................................................................................................... 35 3.1.4. MD simulations of peptides in solution .............................................................................. 36 3.1.5. sMD simulations of the peptide association process into the SETD2 active site ................ 36 3.1.6. MD simulations of SETD2 .................................................................................................... 37 3.2. Trajectory analysis ............................................................................................................... 38 3.2.1. Contact maps analysis ......................................................................................................... 38 3.2.2. NSD2 active site volume calculation ................................................................................... 38 3.2.3. Clustering of peptide conformations in solution ................................................................ 38 3.3. SETD2 purification and peptide hairpin validation using FRET ............................................ 39 3.4. In vitro methylation assay to test peptide inhibitors ........................................................... 39 4. Results ........................................................................................................................................... 41 4.1. MD Simulation of the somatic cancer mutation R882H of DNMT3A .................................. 42 4.1.1. DNMT3A R882H establishes more inter subunit contacts than the DNMT3A WT ............. 42 4.2. MD simulation of the T1150A cancer mutant of NSD2 ....................................................... 45 4.2.1. NSD2 T1150A can accommodate a H3K36me2 peptide and SAM in sMD simulations ...... 48 4.2.2. NSD2 T1150A loses contacts responsible for restricting the active site volume ................ 50 4.3. Mechanistic basis of the enhanced methylation activity towards super-substrates .......... 52 4.3.1. The SETD2 super-substrate peptide prefers a hairpin conformation in solution ............... 53 4.3.2. Experimental investigation of conformational preferences ............................................... 55 V 4.3.3. A hairpin conformation has easier access into the SETD2 active site ................................. 57 4.3.4. Hairpin structures unfold into an extended conformation upon binding to SETD2 ........... 59 4.3.5. Hairpin conformation in peptides lead to a faster methylation by SETD2.......................... 62 4.3.6. ssK36 establishes more and different TS-like conformation than H3K36 ........................... 63 4.4. Peptides can function as competitive inhibitors for PKMTs ................................................ 68 4.5. MD Simulation of NSD2 in complex with a new NSD2-specific super-substrate ................. 70 5. Discussion ...................................................................................................................................... 75 5.1. MD Simulation of the somatic cancer mutation R882H of DNMT3A .................................. 75 5.2. MD Simulation of the T1150A cancer mutant of NSD2 ....................................................... 78 5.3. Mechanistic basis of the SETD2 super-substrate peptide ................................................... 81 5.3.1. Hairpins in histone tails might facilitate the binding towards PKMTs ................................ 82 5.3.2. Peptide hairpin conformations unfold upon binding to PKMTs and establish distinct contacts ....................................................................................................................................................... 85 5.4. Super-substrate peptides function as PKMT inhibitors ....................................................... 87 6. References ..................................................................................................................................... 90 7. Appendix ...................................................................................................................................... 109 7.1. Appendix I (not included in the published thesis) ............................................................. 109 7.2. Appendix II ......................................................................................................................... 110 7.3. Appendix III ........................................................................................................................ 180 VI Abstract While the genetic information within each cell is encoded as the base pair sequence in the DNA, cellular differentiation and adaption to environmental signals are dictated by variations in gene expression. Epigenetics describes these often stable, yet reversible, changes in gene expression patterns, which do not involve alterations in the DNA sequence. Major epigenetic signals are DNA and histone lysine methylation. These modifications are deposited by DNA methyltransferases (DNMTs) and protein lysine methyltransferases (PKMTs) by transferring a methyl group from S-adenosyl-L-methionine (SAM) to the respective target. Subsequently, set modifications are read by chromatin remodeling enzymes, altering the accessibility of genes depending on the actual modifications. Hence, DNMTs and PKMTs function as key players in the regulation of genome stability, gene expression, DNA repair and cellular differentiation. Their activity is controlled by factors like substrate- and product specificity, autoinhibition, and conformational changes upon interaction with substrates. Cancer mutations in DNMTs and PKMTs disturb these regulatory mechanisms, which makes their understanding a main target in modern epigenetic research. In this work, molecular dynamics (MD) simulation in combination with biochemical experiments were used to investigate the catalytic mechanism of these enzymes in detail. In pursuit of this goal, two approaches were applied. By simulating cancer mutants of DNMTs and PKMTs and comparing the obtained simulation results to simulation results of the wild type enzyme (WT), distinctions between mutant and WT can be found. This was achieved for the somatic cancer mutant DNMT3A R882H, which frequently occurs in blood cancers. In earlier methylation experiments a change in flanking sequence preference was determined for R882H. The mechanism behind this observation was revealed by MD simulations of the DNMT3A/L-heterotetramer (3L-3A-3A-3L) in this work. Conducted simulations showed that the mutated R882H residue had a decreased contact to a guanine, three base pairs away from the methylation site, and instead interacted with the adjacent 3A subunit. The lost contact is directly connected to a different affinity for certain DNA substrates, explaining the change in flanking sequence preference. Moreover, R882H was found to have a dominant effect even in a heterozygous state. Extended contacts analysis of the MD simulation data showed that R882H not only interacted with the adjacent subunit but led to a rearrangement of a small contact network, increasing the overall binding affinity of DNMT3A R882H dimers compared to WT. Since the flanking sequence preference of DNMT3A tetrameric complexes is determined in the 3A-3A interface, which is preferentially formed by R882H, the dominant effect of R882H in R882H/WT mixed complexes was rationalized by the MD simulation results. Biochemical characterization of the PKMT NSD2 and its somatic cancer mutation T1150A revealed an altered product specificity and a change from a dimethyltransferase to a trimethyltransferase. Changes VII in the methylation state of histone 3 lysine 36 (H3K36), a methylation target of NSD2, are known to be associated with diverse biological outcomes, as dimethylated H3K36 (H3K36me2) and trimethylated H3K36 (H3K36me3) exhibit distinct downstream effects on gene transcription and chromatin structure. Therefore, a combination of MD and steered MD (sMD) simulation techniques was used in this work to investigate the reason for the altered product specificity of NSD2 T1150A. The analysis showed that in contrast to NSD2 WT, NSD2 T1150A was able accommodate the H3K36me2 peptide and SAM simultaneously in a productive conformation in the active site. Volume calculations of the active site revealed that larger volumes occurred more often for T1150A compared to WT, enabling the productive accommodation of the higher methylation state. The reason for this was found in a subsequent contact analysis. In NSD2 WT, T1150 was engaged in contacts with Y1092 and L1120, which oriented these residues effectively reducing the volume of the active site. The T1150 side chain hydroxyl group interacted with the Y1092 backbone nitrogen. The side chain methyl group hydrophobically interacted with the L1120 side chain. These two contacts were lost in NSD2 T1150A. As a consequence, the orientation of Y1092 and L1120 was more flexible and the active site volume increased. The presented results precisely explain the molecular mechanism behind the altered product specificity observed in biochemical experiment with NSD2 T1150A. The substrate specificity of the PKMT SETD2 regarding its natural H3K36 target sequence has been previously mapped using Celluspots peptide array methylation. This revealed that the canonical H3 amino acids were not ideal at many positions. Based on this, an artificial peptide substrate was designed that contained the most favorable amino acid at each position. Methylation experiments showed that the 15 amino-acid long super-substrate peptide (ssK36), which differed at four positions from the original H3K36, was methylated more than 100-fold faster than the canonical H3K36 peptide. The crystal structure of SETD2 with bound ssK36 peptide was resolved, but did not entirely explain the highly increased methylation activity of SETD2. The second approach in this work focuses on the mechanistic reasons behind this massive increase in reaction rate, using a combination of in vitro methylation and FRET experiments, MD and sMD simulation techniques to cover multiple steps of the catalytic process. MD simulations of the free peptides in solution showed a preference for ssK36 to form hairpin conformation, whereas H3K36 preferred an extended conformation. This preference was based on the four introduced mutations. Moreover, it was demonstrated in sMD simulations that hairpin-shaped peptides had easier access into the active site of SETD2, compared to extended conformations. In fact, methylation experiments confirmed that chemically induced hairpins increased the methylation activity of peptides by SETD2. Additionally, in MD simulation of the ssK36-SETD2 complex it was observed that the four mutations established a unique contact profile with SETD2, leading to more and different transition state-like conformation compared to the H3K36-SETD2 complex. The transferability of this approach was demonstrated as a new super-substrate peptide was specifically designed for NSD2. The molecular mechanism behind the increased methylation rate was VIII again investigated by MD simulation presented in this work. Remarkably, SETD2 and NSD2 were shown to be specific for their respective super-substrate and did not show increased activity for the other. The optimized enzyme interactions of the super-substrate peptides were then used as a starting point to establish a PKMT-specific inhibition assay in which ssK36 was demonstrated to function as a substrate-competitive SETD2-specific inhibitor. In conclusion, the MD simulations conducted in this work revealed yet unknown reasons for the dominant effect of DNMT3A R882H in heterozygous states and explained the altered product specificity for NSD2 T1150A. Moreover, the features of the artificially designed super-substrate peptides, which caused a ~100-fold activity increase of the PKMTs SETD2 and NSD2, were precisely described in various MD simulation approaches and validated by wet-lab experiments. The found molecular mechanisms in this work explain biochemical results of DNMTs and PKMTs at an atomistic resolution and suggest novel strategies for the design of a new class of substrate-competitive PKMT inhibitors. IX Zusammenfassung Die genetische Information einer jeden Zelle ist in Form der Basenpaarsequenz der DNA codiert. Zelldifferenzierung und Anpassung an Umweltsignale werden jedoch durch Äderungen der Genexpression bewerkstelligt. Epigenetik beschreibt diese meist stabilen, aber reversiblen Veränderungen der Genexpression, denen keine Änderungen der DNA-Sequenz zugrunde liegt. Zu den wichtigsten epigenetischen Signalen gehören die DNA- und Histonlysinmethylierung. Diese Modifikationen werden durch DNA-Methyltransferasen (DNMTs) und Proteinlysinmethyltransferasen (PKMTs), durch Übertragung einer Methylgruppe von S-Adenosyl-L-Methionin (SAM) auf das jeweilige Zielmolekül, gesetzt. Die methylierte DNA oder Proteine werden anschließend von Enzymen erkannt, welche die Zugänglichkeit von Genen, je nach Modifikation, verändern. DNMTs und PKMTs haben deshalb eine Schlüsselrolle bei der Regulation von Genomstabilität, Genexpression, DNA-Reparatur und Zelldifferenzierung. Ihre Aktivität wird von Faktoren wie der Substrat- und Produktspezifität, Autoinhibition sowie Konformationsänderungen während der Interaktion mit Substraten gesteuert. Krebsmutationen in DNMTs und PKMTs stören diese Regulationsmechanismen, wodurch ihre Erforschung zu einem Hauptziel in der modernen epigenetischen Forschung geworden ist. In dieser Arbeit wurden Molekulardynamik (MD) Simulationen in Kombination mit biochemischen Experimenten durchgeführt, um die katalytischen Mechanismen dieser Enzyme im Detail zu untersuchen. Um dieses Ziel zu erreichen, wurden zwei Ansätze angewendet. Durch Simulationen von Krebsmutationen von DNMTs und PKMTs und Vergleich mit Simulationen des jeweiligen Wildtyp Enzyms (WT) können Unterschiede zwischen Mutanten und WT gefunden werden. Dieser Ansatz wurde auf die somatische Krebsmutation DNMT3A R882H angewandt, die häufig in verschiedenen Leukämieformen auftritt. In früheren Methylierungsexperimenten wurde eine veränderte Präferenz von R882H für die Flankierungssequenz von CpG Methylierungsstellen festgestellt. Der Mechanismus hinter dieser Beobachtung wurde durch MD Simulationen des DNMT3A/L-Heterotetramers (3L-3A-3A- 3L) in dieser Arbeit aufgeklärt. Die durchgeführten Simulationen zeigten, dass die mutierte R882H- Aminosäure einen verringerten Kontakt zu einem Guanin, drei Basenpaare vom zu methylierenden CpG Dinukleotid entfernt, aufwies. Stattdessen interagierte sie mit der benachbarten 3A-Untereinheit. Diese verringerte Kontaktintensität steht in direktem Zusammenhang mit einer veränderten Affinität für bestimmte DNA-Substrate, wodurch die Änderung der Flankierungssequenzpräferenz erklärt werden konnte. Darüber hinaus wurde für R882H beschrieben, dass selbst im heterozygoten Zustand ein dominanter Effekt zu beobachten ist. Eine erweiterte Kontaktanalyse der MD Simulationsdaten zeigte, dass R882H nicht nur mit der benachbarten Untereinheit interagierte, sondern dies auch zu einer Restrukturierung eines kleinen Kontaktnetzwerks führte, welches die Bindungsaffinität von DNMT3A R882H-Dimeren im Vergleich zu WT-Dimeren erhöhte. Da die Flankierungssequenzpräferenz von DNMT3A Tetrameren in der 3A-3A-Schnittstelle bestimmt wird, welche vorzugsweise von R882H X Untereinheiten gebildet wird, wurde der dominante Effekt von R882H in gemischten WT/R882H Komplexen durch die Ergebnisse der MD Simulationen erklärt. Die biochemische Charakterisierung der PKMT NSD2 und ihrer somatischen Krebsmutation T1150A wies eine veränderte Produktspezifität und einen Wechsel von einer Dimethyltransferase zu einer Trimethyltransferase auf. Veränderungen im Methylierungsgrad von Histon 3 Lysin 36 (H3K36), ein Methylierungsziel von NSD2, sind mit unterschiedlichen biologischen Effekten verbunden. Dimethyliertes H3K36 (H3K36me2) und trimethyliertes H3K36 (H3K36me3) weisen unterschiedliche Auswirkungen auf die Genexpression und Chromatinstruktur auf. Um die Gründe für die veränderte Produktspezifität von NSD2 T1150A zu untersuchen, wurde in dieser Arbeit eine Kombination von MD und steered MD (sMD) Simulationsmethoden angewendet. Die Analyse zeigte, dass NSD2 T1150A im Gegensatz zu NSD2 WT ein H3K36me2-Peptid und SAM gleichzeitig in einer produktiven Konformation im aktiven Zentrum binden konnte. Volumenberechnungen des aktiven Zentrums zeigten, dass größere Volumina bei T1150A im Vergleich zu WT häufiger vorkamen und die gemeinsame Bindung ermöglicht wurde. Der Grund für die größeren Volumina konnte in einer nachfolgenden Kontaktanalyse gefunden werden. In NSD2 WT interagierte T1150 mit Y1092 und L1120, wodurch diese Reste ausgerichtet, und das Volumen des aktiven Zentrums reduziert wurde. Die Hydroxylgruppe der Seitenkette von T1150 interagierte dabei mit dem Stickstoff der Backbone-Atome von Y1092. Zusätzlich war die Methylgruppe der Seitenkette in einem hydrophoben Kontakt mit der Seitenkette von L1120. Diese beiden Kontakte gingen in NSD2 T1150A aufgrund der Mutation verloren. Die Orientierung von Y1092 und L1120 war dadurch weniger strukturiert, und das Volumen des aktiven Zentrums vergrößerte sich. Die vorgestellten Ergebnisse erklären präzise den molekularen Mechanismus hinter der experimentell beobachteten veränderten Produktspezifität für NSD2 T1150A. Jüngst wurde die Substratspezifität von der PKMT SETD2 mit Hilfe von Celluspots-Peptid-Array- Methylierung hinsichtlich der natürlichen H3K36-Zielsequenz kartiert. Dabei wurde festgestellt, dass die H3-Aminosäuresequenz an einigen Positionen nicht ideal war. Basierend darauf, wurde ein artifizielles Peptidsubstrat entworfen, welches an jeder Stelle die optimale Aminosäure enthielt. Methylierungsexperimente zeigten, dass das 15 Aminosäuren lange Super-Substratpeptid (ssK36), welches an vier Positionen von der natürlichen H3-Sequenz abweicht, mehr als 100-mal schneller methyliert wurde als das H3K36-Peptid. Die Kristallstruktur von SETD2 mit gebundenem ssK36-Peptid erklärte die stark erhöhte Methylierungsaktivität jedoch nicht vollständig. Der zweite Ansatz dieser Arbeit fokussiert sich auf die mechanistischen Gründe hinter diesem massiven Anstieg der Reaktionsrate. Um dies zu erreichen, wurde eine Kombination von in vitro Methylierungs- und FRET- Experimenten sowie MD- und sMD Simulationsmethoden angewendet, um mehrere Schritte des katalytischen Prozesses abzubilden. MD Simulationen der freien Peptide in Lösung zeigten, dass ssK36 präferentiell eine Haarnadelkonformation ausbildete, während H3K36 eher gestreckt vorlag. Diese XI Präferenz basierte auf den vier eingeführten Mutationen. Darüber hinaus wurde in sMD Simulationen gezeigt, dass Peptide in Haarnadelkonformationen einen besseren Zugang in das aktive Zentrum von SETD2, im Vergleich zu gestreckten Strukturen hatten. Methylierungsexperimente belegten, dass durch chemisch induzierte Haarnadelstrukturen in Peptiden, die Methylierungsaktivität von SETD2 erhöht werden konnte. Zusätzlich wurde in MD Simulationen des ssK36-SETD2-Komplexes beobachtet, dass die vier Mutationen ein einzigartiges Kontaktprofil mit SETD2 ausbildeten. Dadurch wurden einerseits unterschiedliche aber auch eine höhere Anzahl an Übergangszustands-ähnlichen Konformation im Vergleich zum H3K36-SETD2-Komplexen ausgebildet. Die Übertragbarkeit dieses Ansatzes wurde durch ein neues Super-Substrat-Peptid, welches speziell für NSD2 entworfen wurde, demonstriert. Der molekulare Mechanismus hinter der erhöhten Mehtylierungsrate wurde erneut durch MD Simulation, untersucht. Bemerkenswerterweise zeigten SETD2 und NSD2 spezifisch nur für ihr eigenes Super-Substrat eine erhöhte Aktivität und nicht für das jeweils andere Super-Substrat. Die optimierten Interaktionen der Super-Substratpeptide wurden anschließend als Ausgangspunkt verwendet, um einen PKMT-spezifischen Inhibitionstest zu etablieren, bei welchem ssK36 als substratkompetitiver SETD2-spezifischer Inhibitor fungierte. Zusammenfassend zeigten die in dieser Arbeit durchgeführte MD Simulationen bisher unbekannte Gründe für den dominanten Effekt von DNMT3A R882H im heterozygoten Zustand und erklärten die veränderte Produktspezifität für NSD2 T1150A. Darüber hinaus wurden die Merkmale des artifiziellen entworfenen Super-Substratpeptides, durch verschiedene MD Simulationsansätze beschrieben und durch biochemische Experimente validiert. Die, in dieser Arbeit gefundenen, molekularen Mechanismen für DNMTs und PKMTs erklären biochemische Ergebnisse in atomarer Auflösung und zeigen neue Strategien für die Gestaltung einer neuen Klasse von substratkompetitiven PKMT- Inhibitoren auf. XII Acknowledgements My deepest appreciation goes to my supervisor Prof. Dr. Albert Jeltsch for more reasons that one could list here. I am immensely grateful for the guidance and mentorship throughout my academic journey. From the very first lecture in my bachelor's study - to the opportunity to conduct my PhD in his institute to collaborative efforts on publishing papers, patents, reviews, Professor Jeltsch's support, professionalism, and advice have been instrumental in shaping my understanding of science and research skills. His commitment to fostering academic growth and excellence has left an indelible mark on my academic pursuits, and I am fortunate to have had the privilege of working alongside such a dedicated mentor. I extend my genuine gratitude to Prof. Dr. Jürgen Pleiss for his persistent support, innovative ideas, and key role in building a yet unseen GPU infrastructure tailored for Molecular Dynamics simulation. His support greatly facilitated my academic journey. From the beginning of our collaboration, Prof. Pleiss has consistently demonstrated a commitment to raising an environment of academic quality. I am thankful for Prof. Dr. Martin Zacharias from the Technical University of Munich to be my second examiner. Special thanks go to Prof. Dr. Markus Morrison for his kind acceptance to participate in my examination committee. Special thanks to Dr. Sara Weirich for her guidance in the lab, which made me more confident and independent. Additionally, I want to highlight Dr. Mina Saad Khella, and Michael Choudalakis who are not only co-authors in publications, but showed incredible support in developing techniques and approaches presented in this work. Moreover, thanks to the rest of the PKMT-group for the teamwork atmosphere, valuable discussions and continuous help. Additionally, I am thankful for Dr. Philipp Rathert for valuable comments in seminars and meetings. I would like to thank the people behind the scenes for making the work smooth and perfectly organized: Priv. Doz. Dr. Hans. Rudolph, Elisabeth Tosta and Lea Irsigler. Very special thanks to Regina, Dragica and Branka for all their helping hand, nice talks and laughs. I would like to express my sincere gratitude to SimTech EXC 2075 390740016 for providing the financial support that made me pursue my PhD. Especially my project network PN2-5 for the insightful discussions. My appreciation goes to Dr. Sven Benson for his pivotal role in sharpening my understanding for challenges outside of the university. His mentorship not only strengthened my scientific courage but also exposed me to the significance of navigating challenges within “yesterday” timelines. I am grateful for Dr. Peter Stockinger, my steadfast companion, from the moment we first met until the present. He has been more than a study friend – he has been my partner in crime. His enduring XIII support and camaraderie have made the academic journey to something more, turning challenges into shared triumphs. I am deeply thankful for the committed support of my parents throughout my academic journey. Their encouragement and sacrifices have been the bedrock of my success, with my father's profound wisdom and guidance serving as a beacon, steering me through challenges and enriching my personal and scholarly growth. XIV List of publications Schnee P, Choudalakis M, Weirich S, Khella M. S, Carvalho H, Pleiss J, Jeltsch A*, (2022) Mechanistic basis of the increased methylation activity of the SETD2 protein lysine methyltransferase towards a designed super-substrate peptide. Commun. Chem. 5, 139, doi.org/10.1038/s42004-022-00753-w Schnee P, Weirich S, Jeltsch A*, (2023) Charakterisierung der Substratspezifität von Protein Methyltransferasen – Methoden und Anwendungen. BIOspektrum 29, 249-251, doi.org/10.1007/s12268-023-1930-y Schnee P, Pleiss J, Jeltsch A*, (2024) Approaching the catalytic mechanism of protein lysine methyltransferases by biochemical and simulations techniques. Critical Reviews In Biochemistry & Molecular Biology 7, 1-49, doi.org/10.1080/10409238.2024.2318547 Schnee P, Jeltsch A, Weirich S (2023) Artifizielles Peptid mit PKMT-inhibitorischer Wirkung - Universität Stuttgart PCT-Patentanmeldung Khella M. S., Schnee P, Weirich S, Bui T, Bröhm A, Bashtrykov P, Pleiss J, Jeltsch A*, (2023) The T1150A cancer mutant of the protein lysine dimethyltransferase NSD2 can introduce H3K36 trimethylation. J Biol Chem 104796, doi.org/10.1016/j.jbc.2023.104796 Weirich S, Kusevic D, Schnee P, Reiter J, Pleiss J, Jeltsch A*, (2024) Discovery of new NSD2 non-histone substrates and design of a super-substrate – Communications Biology, Manuscript submitted for publication. Mack A, Emperle M, Schnee P, Adam S, Pleiss J, Bashtrykov P, Jeltsch A*, (2022) Preferential Self- interaction of DNA Methyltransferase DNMT3A Subunits Containing the R882H Cancer Mutation Leads to Dominant Changes of Flanking Sequence Preferences. J Mol Biol 15;434(7):167482, doi.org/10.1016/j.jmb.2022.167482. XV Author contributions Schnee et al. 2022, Mechanistic basis of the increased methylation activity of the SETD2 protein lysine methyltransferase towards a designed super-substrate peptide: P.S. conducted the MD simulation and fluorescence spectroscopy experiments. P.S. and A.J. did the data analysis and interpretation of the data. P.S. and A.J. prepared the manuscript and figures. Schnee et al. 2023, Charakterisierung der Substratspezifität von Protein Methyltransferasen Methoden und Anwendungen: P.S. created the first draft of the manuscript. Schnee et al. 2024, Approaching the catalytic mechanism of protein lysine methyltransferases by biochemical and simulations techniques: P.S. and A.J. prepared the draft manuscript. P.S. prepared the figures. Khella et al. 2023, The T1150A cancer mutant of the protein lysine dimethyltransferase NSD2 can introduce H3K36 trimethylation: P. S. designed and conducted the MD simulation experiments and data analysis thereof. P.S. prepared the draft and figures for the MD simulation part of the manuscript. Weirich et al. 2024, Discovery of new NSD2 non-histone substrates and design of a super-substrate: S.W. and A.J. devised the study. D.K. and S.W. conducted the biochemical experiments. P.S. conducted the MD simulations and the data analysis thereof. P.S. prepared the draft and figures for the MD simulation part of the manuscript. Mack et al. 2022, Preferential Self-interaction of DNA Methyltransferase DNMT3A Subunits Containing the R882H Cancer Mutation Leads to Dominant Changes of Flanking Sequence Preferences: P.S. performed and analyzed the MD simulations. P.S. prepared the draft and figures for the MD simulation part of the manuscript. XVI List of figures Figure 1: DNA and protein methylation influence gene transcription by recruiting chromatin remodeling enzymes ............................................................................................................................... 2 Figure 2: DNA Methyltransferase (DNMT) 3A and its DNMT3L cofactor form a heterotetramer .......... 4 Figure 3: Protein Lysine Methyltransferases (PKMTs) transfer up to three methyl groups to specific lysine residues in proteins ....................................................................................................................... 7 Figure 4: Binding mode of cofactor SAM and protein substrate for SET and non-SET domain-containing PKMTs. ..................................................................................................................................................... 8 Figure 5: Cartoon representation of multiple SET domain PKMT architectures................................... 10 Figure 6: The autoinhibitory loop (AL) and placeholder residue need conformational changes to overcome autoinhibition ....................................................................................................................... 12 Figure 7: PKMTs deprotonate the target lysine prior to the methyl group transfer ............................ 14 Figure 8: Geometric criteria for a bimolecular nucleophilic substitution (SN2) mechanism ................. 15 Figure 9: Substrate specificity profile of SETD2 led to the super-substrate peptide (ssK36) ............... 17 Figure 10: Proposed control mechanism for the product specificity of PKMTs ................................... 19 Figure 11: Reaction mechanisms of lysine demethylation. .................................................................. 24 Figure 12: Binding mode of PKMTs to nucleosomal substrates. ........................................................... 26 Figure 13: Molecular dynamics simulation use bonded and non-bonded forces to model the interactions between atoms ................................................................................................................. 28 Figure 14: DNMT3A R882H establishes more and different contacts in the RD interface and different interactions with the DNA. .................................................................................................................... 44 Figure 15: Product specificity change of NSD2 T1150A and NSD1 T2029A compared to WT enzymes on H3.1 protein and nucleosomes in vitro. ................................................................................................ 47 Figure 16: sMD simulation of SAM association into the complex of either NSD2 WT or T1150A with a H3K36me1 or me2 peptide substrate ................................................................................................... 49 Figure 17: Measurement of the active site pocket volume in NSD2 WT and T1150A complexes ..... 51 Figure 18: The natural H3K36 peptide differs at four positions from the artificially designed super- substrate peptide (ssK36) ...................................................................................................................... 53 Figure 19: Clustering of the H3K36 and ssK36 peptide conformations in solution observed in MD simulations reveals a hairpin conformation preference for ssK36 ....................................................... 55 Figure 20: H3K36 and ssK36 peptides show different conformational preferences in solution .......... 57 Figure 21: Hairpin conformations facilitate the access of peptides into the binding cleft of SETD2 ... 59 Figure 22: H3K36 and ssK36 peptides unfold upon binding to SETD2 .................................................. 61 Figure 23: Hairpin formation and resolution upon binding increase SETD2 methylation activity. ...... 62 XVII Figure 24: The complex of SETD2-ssK36 established significantly more TS-like conformations than SETD2-H3K36 ......................................................................................................................................... 64 Figure 25: Contact profiles of the H3K36 and ssK36 peptides bound to SETD2 observed in the MD simulations ............................................................................................................................................ 66 Figure 26: The enhanced methylation activity of SETD2 towards ssK36 can be summarized as four steps. ............................................................................................................................................................... 67 Figure 27: The ssK36 peptide functions as a substrate-competitive inhibitor for SETD2 ..................... 69 Figure 28: The complex of NSD2 and ssK36(NSD2) establishes more TS-like conformations than the NSD2-H3K36 complex............................................................................................................................ 71 Figure 29: Contact profile analysis reveals different contact maps for H3K36 and ssK36 in complex with NSD2 ...................................................................................................................................................... 72 Figure 30: The H3K36 and ssK36(NSD2) peptide establish different contacts with NSD2 ................... 74 Figure 31: Possible mechanism for the binding of a PKMT towards a nucleosome and the recognition of the target lysine ................................................................................................................................ 85 XVIII List of abbreviations Ade Adenine AL Autoinhibitory Loop AML Acute Myeloid Leukemia AOL Amine Oxidase Like ASH1L Absent, Small, or Homeotic Discs 1-Like ASH2L Absent, Small, or Homeotic Discs 2-Like AWS Associated With SET CpG Cytosine-Guanine CTD C-terminal Domain Cyt Cytosine DNA Deoxyribose Nucleic Acid DNMT1 DNA Methyltransferase 1 DNMT3A DNA Methyltransferase 3A DNMT3B DNA Methyltransferase 3B DNMT3L DNMT3-like DNMTs DNA Methyltransferases DTT Dithiothreitol DOT1L Disruptor of Telomeric silencing 1-like EDTA Ethylenediaminetetraacetic Acid EM Electron Microscopy EZH2 Enhancer of Zeste Homolog 2 FAD Flavine-Adenine Dinucleotide FRET Förster Resonance Energy Transfer GLP G9a Like Protein GPU Graphical Processing Unit GST Glutathione-S-Transferase Gua Guanine G9a aka EHMT2 (Euchromatic histone-lysine N-methyltransferase 2) H1 Histone H1 H2A Histone H2A H2B Histone H2B H3 Histone H3 XIX H3K27me Histone H3 lysine 27 methylation H3K36me Histone 3 lysine 36 methylation H3K4me Histone 3 lysine 4 methylation H3K9me Histone H3 lysine 9 methylation H4 Histone H4 HDACs Histone Deacetylase Complexes HP1 Heterochromatin Protein-1 KDM Lysine Demethylase Kme1 Monomethyl lysine Kme2 Dimethyl lysine Kme3 Trimethyl lysine LSD Lysine Specific Demethylase MALDI Matrix-assisted Laser Desorption/Ionization MBD Methylated DNA Binding Domain MD Molecular Dynamics MLL Mixed Lineage Leukemia MYND Myeloid Translocation Protein 8, Nervy, and DEAF-1 NSD1/2/3 The Nuclear Receptor-Binding SET domain 1/2/3 PDB Protein Data Bank PMT Protein Methyltransferase PKMT Protein Lysine Methyltransferase PRDM9 PR Domain Containing 9 PTM Post Translational Modification PWWP Proline-Tryptophan-Tryptophan-Proline domain QM/MM Quantum Mechanics/Molecular Mechanics RNA Ribose Nucleic Acid RNAP II RNA Polymerase 2 SAH S-Adenosyl-L-Homocysteine SAM S-Adenosyl-L-Methionine SET Suppressor of Variegation 3-9, Enhancer of Zeste, Trithorax SETD2 SET Domain-containing protein 2 SET-I SET Insertion domain sMD Steered Molecular Dynamics SMYD SET and MYND Domain-containing protein XX SN2 Nucleophilic Substitution SRI Set2–Rpb1 Interaction Domain ssK36 Super-substrate around lysine 36 STAT1/3 Signal Transducer and Activator of Transcription 1/3 SUV39H1/2 Suppressor of Variegation 3-9 Homolog 1/2 TETs Ten Eleven Translocation enzyme Thy Thymine TS Transition State WHS Wolf-Hirschhorn Syndrome WRAD WDR5, RbBP5, Ash2L and Dpy30 complex 5caC 5-carboxy cytosine 5fC 5-formyl cytosine 5hmC 5-hydroxymethyl cytosine 5mC 5-methyl cytosine 1 1. Introduction 1.1. Epigenetics Every cell has its genetic information encoded as the base pair sequence in the DNA. However, cellular differentiation is driven by differences in the expression of genes. Epigenetics describes the mechanisms of these often stable, but still reversible, changes in gene expression patterns that do not involve alterations in the DNA sequence (Allis & Jenuwein, 2016). How the expression of genes is regulated under certain circumstances, is one of the key questions in epigenetics. The highly regulated and reversible changes add a dynamic layer of complexity beyond the static genetic code. A unilateral flow of DNA to RNA to protein is therefore no longer feasible, since proteins themselves regulate gene expression and react to environmental changes. 1.2. Chromatin structure and its regulation Chromatin is the complex of proteins and DNA within the nucleus of eukaryotic cells. The smallest structural units of chromatin are nucleosomes, which consist of a stretch of about 146 base pairs of DNA wrapped around an octamer of histone proteins. The octamer contains two copies of the core histones, namely H2A, H2B, H3, and H4. Nucleosomes are connected by linker DNA segments, varying in length, forming a “beads-on-a-string” structure. This linear arrangement of nucleosomes can be further compacted to form higher-order chromatin structures, such as 30 nm chromatin fibers and highly condensed chromosomes. This leads to a remarkable size reduction, as the genetic information for a human cell lined up on a string would stretch over 2 meters. The compaction enables the DNA to fit into the eukaryotic nucleus with a diameter of 5-16 µM. Besides space optimization, chromatin compaction plays a pivotal role in the regulation of gene expression. In condensed chromatin regions, nucleosomes are packed tightly, restricting the access of other proteins responsible for e.g. transcription, DNA replication and repair. This repressed state of chromatin is referred to as heterochromatin. Chromatin regions that are less condensed and accessible for gene transcription are referred to as euchromatin. The conversion of heterochromatin to euchromatin, or vice versa, is regulated by modifications on DNA- and protein level (Fig. 1). 2 Figure 1: DNA and protein methylation influence gene transcription by recruiting chromatin remodeling enzymes. Chromosomes consist of smaller subunits called nucleosomes, which themselves consist of a protein octamer with DNA wrapped around it. DNA methylation at cytosines, and lysine methylation of histone proteins are epigenetic modifications, which are read by reader enzymes, recruiting chromatin remodeling enzymes. Depending on the actual modifications, the chromatin structure is being tightened or loosened, directly influencing gene transcription. 1.3. DNA methylation DNA methylation occurs at the fifth carbon (C5) in the pyrimidine base cytosine. The methylation reaction is catalyzed by DNA methyltransferases (DNMTs) using the cofactor S-adenosyl-L-methionine (SAM) as a methyl group donor, which is converted to S-adenosyl-L-homocysteine (SAH). DNA methylation in gene promoters is correlated with silenced gene expression. In general, it contributes to formation of heterochromatin and is a more stable modification compared to protein methylation. Moreover, DNA methylation is a crucial signal in many biological processes including development and gametogenesis, parental imprinting, X chromosome inactivation, as well as maintenance of genome integrity (Jurkowska, Jurkowski, et al., 2011). Cytosine-guanine (CpG) islands are genomic regions with a high frequency of cytosine-guanine dinucleotides, often associated with gene promoters and commonly unmethylated. At promoters, DNA methylation serves as a repressive signal, hindering the interaction of transcriptional activators and facilitating the recruitment of transcriptional repressors that incorporate methylated DNA binding domains (MBDs) (Razin & Riggs, 1980; Tate & Bird, 1993; Yin et al., 2017). Repressed promotors with methylated CpG islands are found in regions, where silencing Chromosome DNA methylation Histone tail methylation H3 H4 H2A H2B NH2 NH N CH3 O 3 is desired, like centromeric heterochromatin, imprinted genes or transposons (Howard et al., 2008; Jurkowska, Jurkowski, et al., 2011). 1.3.1. DNA methyltransferases In humans, three DNMTs catalyze the methylation of cytosine (Gowher & Jeltsch, 2018). Whereas DNA methyltransferase 1 (DNMT1) maintains the DNA methylation after DNA replication and works preferentially on hemimethylated CpG dinucleotide sites (Bestor et al., 1988; Fatemi et al., 2001; Goyal et al., 2006), DNA methyltransferase 3A and 3B (DNMT3A and 3B) are responsible for de novo DNA methylation. De novo methylation is important during early development, germ cell differentiation as well as imprinting, a process in which specific genes are marked with methyl groups based on their parental origin (Gowher & Jeltsch, 2001; Okano et al., 1998). DNMT3A and 3B methylate CpG and non CpG sites and have no preference for hemimethylated sequences, distinguishing them from DNMT1 (Gowher & Jeltsch, 2001). The DNMT3-like (DNMT3L) protein lacks catalytic activity, but serves as a scaffold protein for DNMT3A and DNMT3B, enhancing their de novo DNA methylation activity (Bourc'his et al., 2001). The catalytically active C-terminal domain (CTD) of DNMT3A in complex with the CTD of DNMT3L forms a linear heterotetrameric complex with the two DNMT3A subunits in the center and the DNMT3L at the edges (3L-3A-3A-3L, Fig. 2A) (Jia et al., 2007). The different subunits are connected by two interfaces. The DNMT3A/3L interface, called FF interface, and the central interface between the DNMT3A subunits, denoted as RD interface. The binding of DNMT3L at the FF interface helps to organize the active site and SAM binding pocket of DNMT3A, which explains the stimulation of DNMT3A activity (Jia et al., 2007). Crystal structures of the DNMT3A/3L complex bound to DNA showed that the DNMT3L subunits are not in contact with DNA, whereas the two DNMT3A subunits of the tetramer interact with two CpG sites of the substrate DNA, which involves the flipping of the target bases (Fig. 2B) (Zhang et al., 2018). The FF interface also supports DNMT3A/3A interactions, allowing the replacement of DNMT3L subunits in the DNMT3A/3L heterotetramer by two DNMT3A subunits yielding a DNMT3A homotetramer (Jurkowska et al., 2008; Jurkowska, Rajavelu, et al., 2011). DNA methylation is a dynamic process and can be reversed either passively during DNA replication or actively by DNA demethylases enzymes called ten eleven translocation enzymes (TETs). TET enzymes oxidize 5-methyl cytosine (5mC) in progressive oxidation reactions resulting in 5-hydroxymethyl cytosine (5hmC), 5-formyl cytosine (5fC) and 5-carboxy cytosine (5caC) (Ito et al., 2011; Tahiliani et al., 2009). 4 Figure 2: DNA Methyltransferase (DNMT) 3A and its DNMT3L cofactor form a heterotetramer. A| Schematic representation of the tetrameric DNMT3A (blue) – DNMT3L (cyan) (DNMT3A/L) complex. B| Cartoon representation of the DNMT3A/3L complex with SAM (yellow), bound DNA and flipped out cytosine (orange) (Protein Data Bank (PDB) 6W8B). The positions of the two FF interfaces between DNMT3A and DNMT3L subunits and the RD interface between the central DNMT3A subunits are indicated as black, dashed lines. Figure taken and modified from (Mack et al., 2022). 1.4. Lysine methylation SAM-dependent methylation occurs not only in DNA but also in proteins or peptides and can be found at side chains of lysine (K), arginine (R), aspartate (D), glutamate (E), histidine (H), asparagine (N), glutamine (Q), and cysteine (C) (Clarke, 2013). Due to the lone-pair electrons present in the ε-amine of lysine, as well as its preference for localization on the protein surface, lysine residues are a favorable target for posttranslational modifications (PTM) (Luo, 2018). Protein lysine methylation stands apart from other types of modifications like acetylation, ubiquitination or SUMOylation for three reasons. Firstly, the addition of methyl groups to lysine does not affect the overall charge of the residue at physiological pH, unlike acylation modifications that convert the positively charged ϵ-amine into a neutral amide. Secondly, lysine methylation represents the smallest PTM, resulting in only minor changes in the size of the side chain compared to other types of lysine modifications (Luo, 2018). Thirdly, up to three methyl groups can be transferred to a target lysine creating monomethyl lysine (Kme1), dimethyl lysine (Kme2) and trimethyl lysine (Kme3). Methyl lysine recognition is challenging, since lysine methylation only subtly alters the physiological properties. Still, the methylated lysines’ ability to engage in cation−π interactions (increased dispersion 5 of the positive charge around neighboring hydrocarbons) and the ability to form hydrogen bonds as a donor and acceptor is reduced with progressive methylation providing some options for discrimination and readout (Luo, 2018). A regular characteristic for methyl lysine-specific reader proteins is to recognize Kme2 and Kme3 methyl lysine groups through a hydrophobic pocket containing aromatic residues (e.g., F, Y, and W). Given that K, Kme1, Kme2, and Kme3 all carry an overall +1 formal charge at physiological pH, the aromatic pocket serves as a binding site for cation−π interactions (Luo, 2018). 1.4.1. Histone lysine methylation Histone tails are the flexible ends of the histone proteins extending from the histone octamer and are a key target for lysine methylation (Fig. 1). The methylated lysine residues serve as a platform for the recruitment of proteins and protein complexes that interpret and regulate this modification. Such effector proteins contain specific domains which recognize the position of the lysine residue in the histone tail sequence and its methylation state (Cornett et al., 2019; Hyun et al., 2017; Yun et al., 2011). Eventually, a signal cascade of downstream effects is triggered, influencing the activity of chromatin remodelers, which alter the accessibility of DNA and thus gene transcription (Fig. 1). In the case of histone 3 lysine 9 (H3K9) methylation, found in constitutive heterochromatin, a cooperative mechanism involving other histone modifications and DNA methylation is found to trigger gene silencing. Protein lysine methyltransferases like SUV39H1 and H2 deposit H3K9 methylation, subsequently recognized by heterochromatin protein 1 (HP1) via an aromatic cage in its chromo domain (Kumar & Kono, 2020). HP1 binds H3K9me2/3 and acts as a transcriptional repressor by preventing the association of transcription factors and RNA polymerase (Schoelz & Riddle, 2022). Beyond the steric effect, HP1 further recruits DNMTs, which methylate CpG sites adjacent to the methylated lysine. The methylated DNA then acts as a foundation for MBD binding, which in turn recruit Histone Deacetylases (HDACs) (Jones et al., 1998). HDACs remove histone acetylation, thereby increasing the histone’s positive charge (Bannister & Kouzarides, 2011). This strengthens the interaction with the negatively charged DNA sugar-phosphate backbone, leading to chromatin compaction (Bannister & Kouzarides, 2011). 1.4.2. Non-histone protein lysine methylation In addition to lysine methylation at histone tails, this modification was found at non-histone proteins like p53 (West & Gozani, 2011), E2F1 (Couture et al., 2006), STAT3 (Jinbo Yang et al., 2010) and the androgen receptor (Gaughan et al., 2011; Ko et al., 2011). Lysine methylation of non-histone proteins influences their functionality in multiple ways (Hamamoto et al., 2015). The methylation can serve as a signal to deploy other PTMs including ubiquitination, thereby affecting e.g. protein stability. Proteins 6 binding to methylated residues or other deployed PTMs: (i) stimulate or inhibit the target protein, (ii) regulate protein-protein interactions, (iii) affect the subcellular localization. 1.5. Protein lysine methyltransferases The transfer of methyl groups from SAM to proteins is catalyzed by enzymes called protein methyltransferases (PMTs). If the methylated amino acid is a lysine residue, they are denoted as protein lysine methyltransferases (PKMTs) (Fig. 3). Over 60 characterized PKMTs are encoded in the human genome and can be categorized into two classes: SET domain-containing PKMTs (class V methyltransferase, called SET due to its discovery in the Drosophila enzymes named Suppressor of variegation 3-9, Enhancer of zeste, and Trithorax) and non-SET domain PKMTs (class I methyltransferases, also called 7-beta strand MTases) (Copeland et al., 2009; Falnes et al., 2016; Luo, 2012; Richon et al., 2011). Notably, more than 90% of PKMTs belong to the SET domain family (Luo, 2018). Still, non-SET domain-containing PKMTs and especially disruptor of telomeric silencing 1-like (DOT1L) as the most-studied representative member of this family, were shown to have major roles in cellular process and disease development (McLean et al., 2014; Nguyen & Zhang, 2011; Sarno et al., 2020). PKMTs generally display a high specificity, targeting only defined lysine residues in one or few substrate proteins. Remarkably, histone lysine methylation is redundant, meaning that one lysine could be methylated by more than one PKMT. This redundancy has advantages: (i) different enzymes can be differentially regulated leading to a dramatic increase in the complexity of the regulatory network; (ii) PKMTs with a redundant substrate specificity can be recruited to different genomic loci like enhancers, promoters or gene bodies; (iii) PKMTs with redundant substrate specificities can transfer varying numbers of methyl groups. For example, PKMTs NSD1 (aka KMT3B), NSD2 (aka MMSET, WHSC1), NSD3 (aka WHSC1L1) and ASH1L transfer up to two methyl groups to histone H3 lysine 36 (H3K36) while SETD2 (aka KMT3A, HYPB, SET2) catalyzes trimethylation at the same lysine residue (Edmunds et al., 2008; Gregory et al., 2007; Li et al., 2009). PKMTs transfer up to three methyl groups to a target lysine. They can do so in two different ways. In a distributive mechanism, each round of catalysis results in product dissociation and rebinding of a fresh substrate is needed for a second turnover. Hence, each methylation event is independent leading to the stochastic generation of Kme1, Kme2 and Kme3, depending on the product specificity of the PKMT. In contrast, in a processive reaction mechanism, multiple rounds of catalysis proceed on the same substrate before dissociation of the product (Gowher & Jeltsch, 2001; van Dongen et al., 2014). 7 Figure 3: Protein Lysine Methyltransferases (PKMTs) transfer up to three methyl groups to specific lysine residues in proteins. The cofactor S-adenosyl-L-methionine (SAM) provides the methyl group. It is released after the transfer as S-adenosyl-L-homocysteine (SAH). Figure taken from (Schnee et al., 2023). 1.5.1. Structure of SET domain PKMTs The SET domain of PKMTs is responsible for the methylation activity of this class of enzymes. It consists of approximately 130 amino acids, is often flanked by a pre-SET and post-SET domain (Qian & Zhou, 2006) and sometimes contains the domain insertion SET-I. SET domain-containing PKMTs bind the protein substrate and the methyl group providing cofactor SAM at opposing binding faces (Cheng et al., 2005). This is contrary to non-SET domain-containing PKMTs, like DOT1L, where the protein substrate and SAM are accommodated within a single, extended binding cleft (Min et al., 2003). In the SET domain of PKMTs, the target lysine is brought in close proximity to the SAM methyl group through a hydrophobic tunnel. Here, the lysine hydrocarbon side chain interacts with tyrosine, phenylalanine and tryptophan residues via hydrophobic interactions (Qian & Zhou, 2006; Trievel et al., 2003). The positively charged ε-amine group interacts with these residues through cation-π interactions (Luo, 2018). After insertion, the ε-amine group is oriented by multiple tyrosine residues and primed for the methyl group transfer. Meanwhile, SAM binds at the opposing site via contacts with its nucleobase and sugar moiety. The methyl group is then inserted into the active site and transferred to the deprotonated lysine ε-amine group (Fig. 4). The detailed mechanistic features of different SET domain architectures, autoinhibition, lysine deprotonation and methyl group transfer are described in the following chapters. 8 Figure 4: Binding mode of cofactor SAM and protein substrate for SET and non-SET domain-containing PKMTs. A| In SET domain PKMTs, the protein substrate (cyan) and SAM (orange, methyl group is colored black) bind at opposing sites. The target lysine (pink) is inserted into a narrow tunnel, where it undergoes deprotonation and is oriented for the methyl group transfer (image created using simulation results of PDB 6VDB). B| The non-SET domain-containing PKMT DOT1L binds the target lysine (pink) and cofactor SAM (orange, methyl group is colored black) in the same pocket (PDB 1NW3). The architecture consists of a DOT1L specific region (yellow), a 7-beta sheet Rossman fold (white) and a ubiquitin interaction region (forest green). 9 1.5.2. Different structural arrangements of SET domain PKMTs Phylogenetic analysis of SET domain sequences revealed that human SET domain PKMTs can be classified into subfamilies, each characterized by unique architectures (Wu et al., 2010). G9a (aka EHMT2, KMT1C), SUV39H1 (aka KMT1A) and SUV39H2 (aka KMT1B) belong to the classical PKMT subfamily, where their SET domains catalyze the methyl group transfer without prior conformational changes (Fig. 5A) (Schnee et al., 2023; Tachibana et al., 2001). In contrast, NSD1, NSD2, NSD3 and SETD2 are part of the PKMT subfamily with an autoinhibitory loop (AL) (Fig. 5B) (An et al., 2011; Bennett et al., 2017; Yang et al., 2016). In this subfamily, the apo form of the SET domain is expected to have a highly reduced activity and needs to undergo conformational changes for substrate binding and enzyme activity. Other PKMTs act in complexes with additional proteins or contain specific domains to: (i) bind to certain structures like nucleosomes; (ii) recognize specific modifications on the substrate; and/or (iii) regulate their own activity. An example for this are Mixed lineage leukemia (MLL) SET domains, which are inactive on their own but become catalytically active in the presence of binding partners such as WDR5, RbBP5, ASH2L, and DPY30, collectively referred to as WRAD (Fig. 5C) (Borkin et al., 2015; Cao et al., 2014; Grebien et al., 2015). The SET domain of PKMTs can feature insertions like the MYND domain (Myeloid translocation protein 8, Nervy and DEAF-1). PKMTs with a MYND domain insertion represent the “SET and MYND Domain- containing protein” (SMYD) subfamily. The MYND domain is responsible for protein-protein interactions possibly recruiting the enzymes to specific substrate proteins. Additionally, SMYD enzymes are characterized by a bilobal architecture with the protein substrate in the middle (Fig. 5D) (Ferguson et al., 2011; Mazur et al., 2014; Mzoughi et al., 2016; Saddic et al., 2010; Sirinupong et al., 2011; Sirinupong et al., 2010). 10 Figure 5: Cartoon representation of multiple SET domain PKMT architectures. A| SET domain-containing PKMT G9a complexed with the 9 amino acid long H3K36 peptide (cyan) with the target lysine (pink), and cofactor SAM (orange, PDB 5JIY). SET domain-containing PKMTs incorporate zinc ions for structural stability in their “associate with SET” (AWS) domain (magenta), post-SET (yellow) or MYND (rose) domain depending on the enzyme (Dillon et al., 2005; Wu et al., 2011). However, they are not involved in catalysis or conformational changes. For simplicity, zinc ions are therefore not shown in protein structures presented in this work. B|SETD2 complexed with the 14 amino acid long H3K36 peptide, and cofactor SAM (PDB 5JLB). The autoinhibitory loop (rose) is in an open position to accommodate the protein substrate. C| MLL1 SET domain (white) associated with WDR5 (green), RbBP5 (light blue), ASH2L (rose) and DPY30 (cyan) bound to a nucleosome core particle (PDB 6PWV). MLL1 SET domain complexed with the 8 amino acid long H3K4 peptide (PDB 6UH5) D| SMYD2 complexed with the 10 amino acid long peptide Er𝛼 (PDB 4O6F). Distinct features are the bilobal or clamshell-like structure and the MYND domain. Figure taken from (Schnee et al., 2023) 11 1.5.3. Autoinhibition of SET domain PKMTs NSD1 was one of the first PKMTs for which an AL was described. Crystal structure analysis and Molecular Dynamics (MD) simulations of NSD1 with bound cofactor SAM, but without bound substrate, showed that a loop of approx. 14 amino acids is placed on top of the substrate binding cleft, effectively blocking the entrance of a target peptide (Fig. 5B) (Trievel et al., 2003; Xiao et al., 2003; Zhang et al., 2003). The AL is positioned between the SET and Post-SET domain. Multiple PKMTs were shown to have an AL in their structure, but their sequences are not conserved (Couture et al., 2005; Xiao et al., 2005). Studies on ASH1L demonstrated that stabilizing the closed position of the AL, achieved by enforcing hydrophobic interactions between AL and enzyme through mutations, decreased the ASH1L methylation activity (Rogawski et al., 2015). Adding to this, the AL was speculated to regulate the product specificity of PKMTs. Mutational studies of the ASH1L AL turned the enzyme from a dimethyltransferase to a trimethyltransferase (An et al., 2011; Rogawski et al., 2015). This result may provide an explanation for different product specificities among PKMTs with high sequence similarity in the active site, but not in the AL (Schnee et al., 2023). This was postulated for PKMTs NSD1, DIM-5 and SETD2, which share a high active site sequence similarity but possess different AL residues and exhibit differing product specificities (SETD2 and DIM-5 are trimethyltransferases, NSD1 is a dimethyltransferase) (Qiao et al., 2011). Together, these findings suggested a regulatory role of the AL. However, the MD simulation experiments and crystal structure analysis, which led to this conclusion were conducted with peptides as substrates. The mechanistic principles for the interaction with larger substrates like nucleosomes remain to be described. 1.5.4. Placeholder residues The AL plays a pivotal role in regulating the substrates binding of SET domain PKMTs by sterically blocking the binding cleft. In addition to its steric hindrance, the AL was also found to position residues directly in the active site, at the position of the target lysine. Notably, specific residues function as “placeholder” residues in this context, stabilizing the AL in its closed conformation. For instance, NSD1 employs C2062 as a placeholder residue (Morishita & di Luccio, 2011), NSD2 uses C1183 (Jaffe et al., 2013), SETD2 relies on R1670 (Yang et al., 2016), and ASH1L uses S2259 (Yang et al., 2016). Mutational studies on ASH1L demonstrated the impact of such interactions, where the placeholder residue serine was exchanged for methionine, which might establish stronger interactions with the hydrophobic lysine binding tunnel. This strengthened the closed conformation of the AL and led to a heavily decreased methylation activity (Rogawski et al., 2015). Remarkably, methionine was not found to be a placeholder residue in any SET domain PKMT, indicating that its binding strength into the active site might be too strong (Schnee et al., 2023). In the case of SETD2, the placeholder residue R1670 can adopt multiple conformations (Yang et al., 2016). In the AL closed state, R1670 was observed to occupy 12 the lysine binding tunnel (Fig. 6A). Crystal structures depicting the AL in a half-open position showed R1670 slightly flipped outwards, away from the active center (Fig. 6B). Furthermore, structures with the AL in a fully open conformation, and substrate bound, showed R1670 completely flipped outwards and exposed to the solvent. Moreover, the usually unresolved Post-SET loop (Q1676-K1703 for SETD2) was captured in front of the bound peptide substrate, engaging in hydrophobic interactions with the core enzyme (Fig. 6C). This loop is not resolved in crystal structures without bound peptide, indicating its high flexibility in this state. Figure 6: The autoinhibitory loop (AL) and placeholder residue need conformational changes to overcome autoinhibition. A| In the binary PKMT-SAM state, the placeholder residue occupies the target lysine channel. In SETD2 the placeholder residue R1670 (rose) can adopt multiple conformations. If no peptide is bound, the AL is in a closed position and R1670 occupies the target lysine channel (PDB 4H12). B| In a half-opened position, the AL starts to lift, and R1670 turns outwards (PDB 5JLE). C| When a peptide substrate (cyan, target lysine in pink) is bound, the AL is in an open position and R1670 becomes solvent exposed. The Post-SET loop (yellow) is closed on top of the bound peptide (PDB 5JLB). Figure taken and modified from (Schnee et al., 2023). 1.5.5. Target lysine deprotonation The side chains of K, Kme1 and Kme2 possess lone-pair electrons on their ε-amine groups, making them targets for methylation. However, due to their high pKa values (10.2−10.7), K, Kme1, and Kme2 predominantly exist in a protonated state under physiological conditions (pH 7.4), where they are unreactive as a nucleophile. Therefore, one critical requirement for lysine methylation catalyzed by 13 PKMTs is the deprotonation of the target lysine (Fig. 7A) (Trievel et al., 2002). To explain the deprotonation mechanism, MD and Hybrid Quantum Mechanics/Molecular Mechanics Quantum (QM/MM) simulations have been employed, showcasing a conserved mechanism for the deprotonation of the target lysine in multiple SET domain-containing PKMTs (Zhang & Bruice, 2007b). In this mechanism, the deprotonation of the side chain nitrogen (Nε) occurs through the transient formation of dynamic water channels in the enzyme’s active site (Hu & Zhang, 2006; X. Zhang & T. Bruice, 2008a, 2008b). A water molecule, which was frequently observed in the crystal structures of SET domain-containing PKMTs, was suggested to transfer the proton through a chain of water molecules into the aqueous solvent and finally to a buffer molecule (Fig. 7B-D). Additionally, electrostatic interactions between the positive charges of the SAM sulfonium moiety and the protonated N atom decrease the pKa of the latter from 10.9 to 8.2 (Zhang & Bruice, 2007b). This could explain the necessity of a basic reaction buffer for PKMTs and the weak in vitro methylation activity in acidic and even neutral buffers, as the deprotonation of the target lysine is impeded (Wilson et al., 2002; Zhang et al., 2002). Based on this, Bruice and Zhang suggested a stepwise process, in which: (i) the water channel appears; (ii) the target lysine is deprotonated; (iii) the target lysine is methylated using the cofactor SAM; (iv) the proton is transferred into the solvent (Zhang & Bruice, 2007a, 2007b, 2007c; X. Zhang & T. Bruice, 2008a, 2008b, 2008c). This model is applicable to multiple SET domain- containing PKMTs but not to non-SET containing PKMTs like DOT1L (Fig. 4B). In the class of non-SET domain PKMTs, a water channel was not observed, and the amino acids located at the target lysine channel appear incapable of facilitating a direct deprotonation. It was speculated that their more hydrophobic active site could reduce the pKa of the target lysine and that the carboxylate of SAM could help in the subsequent deprotonation process (Cheng et al., 2005; Cortopassi et al., 2016; Min et al., 2003). 14 Figure 7: PKMTs deprotonate the target lysine prior to the methyl group transfer. A| Schematic depiction illustrating the obligatory target lysine deprotonation prior to the PKMT-catalyzed methyl group transfer. B| The protonated target lysine (pink) is oriented by e.g., PKMT SET7/9 Y335 (white, sticks), while the water channel (red spheres) is already present (prepared using PDB 1XQH). C| The lysine proton is transferred to a nearby water molecule. D| After lysine deprotonation, the SAM methyl group is rapidly transferred to the deprotonated target lysine thereby preventing reprotonation. The excess proton is transferred into the bulk solvent. B-D| Figure taken and modified from (Schnee et al., 2023). 1.5.6. Reaction mechanism of SET domain PKMTs After a successful deprotonation, target lysine and SAM must be oriented in a conformation that facilitates the subsequent bimolecular nucleophilic substitution reaction (SN2), leading to the methyl group transfer. QM/MM simulations of SET7/9 (aka SETD7, SET7, SET9, KMT7) were first to describe the details of the SN2 mechanism (Hu & Zhang, 2006). In this mechanism, the deprotonated lysine N acts as the nucleophile, whereas the SAM sulphonium cation (S+) as the leaving group. The free 15 electron pair of N is present in a sp3 orbital at an 109° angle. The SN2 reaction occurs at an aliphatic sp3 carbon center (the C-atom of the transferred methyl group), with the electronegative sulphonium leaving group attached to it. The nucleophile attacks the carbon at a minimal distance of approximately 4.4–4.6 Å (Chen et al., 2019). A combination of computational modeling, QM/MM and kinetic isotope effect studies have demonstrated that PKMTs can stabilize two distinct transitions states (TS) when methylating substrates (Chen et al., 2019; Linscott et al., 2016; Poulin et al., 2016). The SET8 enzyme (aka Pr-SET7, SETD8, KMT5A) exhibits an early SN2 TS (with a C−S distance of 2.0 Å and a C−Nε distance of 2.4 Å), while a late SN2 TS was observed for NSD2 (with a C−S distance of 2.5 Å and a C−Nε distance of 2.1 Å) (Fig. 8). Breaking of the C–S bond and the formation of the new bond between C and the nucleophile occurs instantaneously through a trigonal bipyramidal TS in which the carbon atom is sp2 hybridized. The nucleophile attacks the carbon at a 180° angle to the leaving group, optimizing the overlap between the nucleophile's lone pair and the C–S antibonding orbital. Subsequently, the leaving group is pushed off at the opposite side, the TS structure collapses, and the methyl group covalently binds to the nitrogen atom, while SAH is released as a product (Copeland et al., 2009). Important to note is that the methyl group is transferred rapidly to the target lysine once a geometry favorable for the reaction has been achieved, due to the high group transfer potential of SAM, where it then prevents reprotonation. Figure 8: Geometric criteria for a bimolecular nucleophilic substitution (SN2) mechanism. A transition state (TS)- like conformation can be approximated by using the depicted metrices. 1.5.7. Substrate specificity of SET domain PKMTs PKMTs are highly regulated enzymes and aberrant methylation of proteins could result in misregulation of chromatin states or protein activity. A specific recognition of the protein substrate by PKMTs is therefore indispensable. A suitable technique to decipher the substrate specificity of PKMTs are Celluspot peptide arrays (Bock et al., 2011; Sara Weirich & Albert Jeltsch, 2022). In this method, peptides are synthesized on a cellulose membrane using solid-phase peptide synthesis. A large variety of different peptides can be synthesized on a single membrane, increasing the screening capacity. Each 16 spot on the membrane represents an individual peptide sequence (Fig. 9A). The membrane is then incubated with the PKMT of interest and radioactively labeled SAM in buffer, allowing the detection of methylation through autoradiography. The signal intensity of the different peptide spots directly indicates which peptide sequences are preferred by the PKMT. By creating peptide arrays containing all possible single amino acid substitutions of an original substrate sequence, a PKMT-specific substrate specificity profile can be generated (Fig. 9B) (Dhayalan et al., 2011; Kudithipudi, Kusevic, et al., 2014; Kudithipudi, Lungu, et al., 2014; Kusevic et al., 2017; Rathert, Dhayalan, Ma, et al., 2008; Schuhmacher et al., 2015; Weirich et al., 2016). With the obtained specificity profile as a consensus sequence, novel protein substrate candidates have been identified (Dhayalan et al., 2011; Rathert, Dhayalan, Murakami, et al., 2008; Schuhmacher et al., 2020; Weirich et al., 2020). Certain PKMTs displayed a strict substrate specificity, limiting the range of substrates available for methylation (Kudithipudi et al., 2012; Schuhmacher et al., 2015). In contrast, other PKMTs showed a relaxed substrate specificity, allowing them to methylate a broad spectrum of substrates (Rathert, Dhayalan, Murakami, et al., 2008). A strict substrate specificity of PKMTs could be explained by precise interaction between enzyme and substrate, ensuring an accurate readout of the substrate sequence. On the other hand, explaining the promiscuity of certain PKMTs is more challenging. Recognizing multiple lysine residues necessitates a complex network of specific interactions, while still avoiding off-target effects to prevent aberrant methylation profiles. A hollow active site with loose contacts therefore appears as a too simplistic explanation. Various models have been suggested to clarify the promiscuity of PKMTs. One hypothesis suggests that the structural flexibility of both, the enzyme's active site and the substrate allows for the adoption of numerous dynamic conformations, enabling PKMTs to recognize multiple substrates (Luo, 2018). An alternative model proposes that certain PKMTs identify their substrates based on their backbone atoms rather than their side chains (Al Temimi et al., 2019; Luo, 2018). The SET-I domain splits the SET domain and is speculated to be one of the key factors in determining substrate specificity, since it is the least conserved region among SET domain-containing PKMTs and heavily interacts with the substrate (Fig. 5) (Ronen Marmorstein, 2003). However, the SET-I hypothesis is challenged by the observation that PKMTs with similar substrate specificity, such as SETDB1 and SUV39H1, both methylating H3K9, exhibit large differences in their SET-I sequence (Ronen Marmorstein, 2003; Qian & Zhou, 2006). 1.5.8. Discovery of PKMT super-substrates One striking example of the complex mechanism behind the substrate specificity of PKMT has recently been discovered for SETD2. Hereby, the canonical substrate, H3 residues A29-P43, was deemed suboptimal for SETD2 (Schuhmacher et al., 2020). Surprisingly, multiple single amino acid exchanges 17 caused a higher methylation of the corresponding peptide substrates. By combination of the preferred amino acids, a novel, non-natural peptide sequence, referred to as “super-substrate K36” (ssK36), was created. The ssK36 peptide, differed at four positions from the canonical H3 sequence and was methylated about 100-fold more efficiently (Fig. 9C). Methylation differences between H3K36 and ssK36 were even larger in a protein context (Schuhmacher et al., 2020). The crystal structure of the ssK36 peptide complexed to the SET domain of SETD2 was resolved, and subtle differences were observed compared to the H3K36-SETD2 structure (Schuhmacher et al., 2020). Three of the four amino acids altered in ssK36 established distinct contacts: ssK36-R31 forms an H-bond/salt bridge with SETD2-E1674, ssK36-F32 is bound into a pocket formed by SETD2-E1674 and SETD2-Q1676, and ssK36- R37 interacts with the backbone of SETD2-A1700. Despite these alterations, the overall structures of the ssK36-SETD2 and H3K36-SETD2 complexes remained very similar. Consequently, the crystal structures could not fully explain the substantial enhancement in the methylation rate of ssK36. Figure 9: Substrate specificity profile of SETD2 led to the super-substrate peptide (ssK36). A| Celluspot peptide array with the 15-residue long H3K36 peptide sequence as the starting sequence, incubated with SETD2 and radioactively labeled SAM. Positions were individually mutated to any other amino acid except tryptophan and cysteine. At several positions amino acids are preferred which differ from the original H3 sequence. B| Quantification of the peptide array methylation data generates a PKMT specific specificity profile, highlighting the preference for each position. C| Combination of preferred residues led to the super-substrate peptide (ssK36) (black) sequence, differing at 4 positions (orange) from the canonical H3K36 peptide (cyan) sequence. SETD2 was demonstrated to have a strongly enhanced methylation efficiency towards ssK36. Figure taken from (Philipp Schnee et al., 2022; Schuhmacher et al., 2020). 18 1.5.9. Product Specificity of SET domain PKMTs Product specificity in PKMTs refers to their capability to transfer a precise number of methyl groups to their lysine residue target: one, two or three methyl groups, creating Kme1, Kme2 or Km3, respectively. Despite structural similarities in the SET domain, PKMTs exhibit distinct substrate but also product specificities, requiring unique mechanisms to control the number of methylation steps. One potential mechanism to control the number of transferred methyl groups is speculated to be the target lysine deprotonation. As described earlier, the deprotonation of the lysine N is facilitated via a chain of water molecules (Zhang & Bruice, 2007b). In the context of product specificity, the presence or absence of the water channel could define the outcome. MD simulations of the monomethyltransferase SET7/9 revealed a water channel’s presence only in the SAM-bound state complexed with an unmethylated K4 peptide, suggesting a role in lysine deprotonation and regulating mono-methylation. In the presence of SAH, or K4me1, the water channel was absent, preventing further methylation after monomethylation. (Fig. 10A). Mechanistically, the methyl group of the monomethylated peptide takes the position of the proton that would be removed through the water channel. Deprotonation and further methylation of Kme1 is therefore impossible (X. Zhang & T. Bruice, 2008b). Another possible regulation mechanism refers to the SN2 reaction mechanism used by PKMTs. If methyl group and lysine Nε are too distant, the transfer is unlikely. In MD and QM/MM simulations of SET7/9 complexed with the K4 or K4me1 peptide, the distance between the SAM sulfur group and lysine Nε was greater for K4me1 (6.1 Å) than for K4 (5.7 Å) (Zhang & Bruice, 2007b). This difference in distance may be attributed to the active site potentially being too narrow. After the first methylation, a reorientation of Kme1 is not possible due to steric constraints. Consequently, a productive state, in which monomethylated lysine Nε and SAM methyl group come in close proximity, cannot form (Fig. 10B). Adding to the described mechanisms, in multiple sequence alignments it had been identified, that PKMTs possessing a tyrosine at the so-called “F/Y-switch” position are limited to catalyzing mono- or dimethylation. In contrast, enzymes with a phenylalanine or another hydrophobic residue at this position display di- or trimethyltransferase activity (Collins et al., 2005). This phenomenon was observed for several SET domain-containing PKMTs and it could even be used to manipulate the product specificity. For instance, the trimethyltransferase DIM-5 could be converted into a mono/dimethylase by the F281Y mutation (Zhang et al., 2003) and the monomethyltransferase SET7/9 could be changed to a dimethylase through the Y305F mutation (Del Rizzo et al., 2010; Zhang et al., 2003). The mechanistic basis of the F/Y-switch solely relies on the presence of a single hydroxyl group. The missing hydroxyl group in the Y to F mutants creates additional space in the active site, facilitating the accommodation of water molecules and proper reorientation of already transferred methyl groups. In contrast, F-to-Y mutations, which turn trimethyltransferases into mono- or 19 dimethyltransferases, could be based on steric effects caused by the additional hydroxyl group making the active site too narrow to accommodate multiple methyl groups at the lysine N (Fig. 10C) (Chu et al., 2012; Hu & Zhang, 2006). The concept of the active site volume as a regulator for product specificity may also provide insights into somatic cancer mutations altering product specificity as later shown in this work. Figure 10: Proposed control mechanism for the product specificity of PKMTs. PKMTs catalyze the transfer of a distinct number of methyl groups to their lysine target (pink). Multiple mechanisms have been proposed, regarding the control of this process. A| Restricted second methylation caused by a disrupted water channel (red, spheres) and blocked lysine deprotonation of monomethylated target lysine (green, PDB 1XQH). B| The SN2 geometry cannot be adopted in the presence of a monomethyl substrate. C| The F/Y-switch position controls the product specificity of certain PKMTs. Phenylalanine (white) at this position creates additional space in the active site, allowing accommodation of a dimethyl product. In contrast, a tyrosine with its additional hydroxyl group causes clashes, preventing the formation of the dimethylated product. Figure taken and modified from (Schnee et al., 2023). 1.6. Histone lysine 36 methylation Methylation of lysine 36 of histone H3 (H3K36) and especially the di- and trimethylation (H3K36me2 and me3) are important histone modifications affecting many cellular processes (Eric J. Wagner & Phillip B. Carpenter, 2012). NSD1, NSD2, NSD3, ASH1L and SETD2, SMYD5 and PRDM9 are the PKMTs responsible for H3K36 methylation in human cells. While NSD1, NSD2, NSD3 and ASH1L can only introduce mono- and dimethylation of H3K36 in vitro and in vivo (Eric J. Wagner & Phillip B. Carpenter, 2012). SETD2 and SETD5 are responsible to introduce up to trimethylation at H3K36 in gene bodies, the SET and MYND domain-containing 5 (SMYD5) and PR/SET 9 (PRDM9) do so at promoter regions (Edmunds et al., 2008; Gregory et al., 2007; Li et al., 2009; Powers et al., 2016; Sessa et al., 2019; Zhang 20 et al., 2022). H3K36me2 is enriched at intergenic regions and promotors while H3K36me3 is enriched at gene bodies of active genes (Lam et al., 2022). H3K36me3 levels can be controlled in several ways, such as demethylation by the eraser protein KDM4A (Klose et al., 2006) or through the stability of SETD2. Homeostatic SETD2 protein levels in mammalian cells are low, as it is readily degraded by the ubiquitin–proteasome system (Zhu et al., 2017). SETD2 is also negatively regulated at the transcriptional level by the microRNA miR-106b-5p. Overexpression of miR-106b-5p was found to reduce SETD2 expression (Xiang et al., 2015). The biological functions of H3K36 methylation encompass the regulation of gene expression, DNA repair, recombination and gene splicing (Eric J. Wagner & Phillip B. Carpenter, 2012). The diverse effects arise through the physical interaction of H3K36 PKMTs, predominantly SETD2, with RNA polymerase II (RNAP II), RNA-binding proteins, and transcriptional elongation factors (Li et al., 2019). H3K36 methylation is associated with both active gene transcription marks and gene repression. Its impact on gene transcription is controlled by adjacent histone modifications and their respective reader proteins (Eric J. Wagner & Phillip B. Carpenter, 2012). As a repressive modification, H3K36 methylation functions to suppress the aberrant initiation of transcription within coding regions of gene bodies in particular during active gene expression. This repression is facilitated by recruiting deacetylase complexes and the DNA methylation machinery (Lam et al., 2022; Li et al., 2019; Eric J. Wagner & Phillip B. Carpenter, 2012). The connection between H3K36 methylation and DNA methylation is established through the PWWP domains of DNMT3A and DNMT3B, which preferentially bind to H3K36me2 and H3K36me3, respectively (Dukatz et al., 2019). 1.6.1. SETD2 The SET domain-containing protein 2 (SETD2), has a size of 230 kDa, which corresponds to 2564 amino acids. This enzyme is a major writer of H3K36me3 in mammals, depositing the modification primarily at gene bodies of actively transcribed genes. SETD2 was thought to be the sole protein responsible for H3K36me3, but a recent study has indicated that SETD5 can also deposit H3K36me3 at active gene bodies in vivo (Sessa et al., 2019). Furthermore, PKMTs SMYD5 and PRDM9 have been shown to deposit H3K36me3 at promoter regions and during meiosis, respectively (Powers et al., 2016; Zhang et al., 2022). Human SETD2 contains several functional domains. These include a SET domain flanked by an “associated with SET” (AWS) domain and a post-SET domain, which together are responsible for the methyltransferase activity (Sun et al., 2005). SETD2 also contains a Set2–Rpb1 interaction (SRI) domain, which interacts with RNA polymerase II (RNAPII) (Li et al., 2005). The largest subunit of RNAPII contains a CTD that is hyperphosphorylated during active transcription. SETD2 interacts specifically with this 21 phosphorylated form of the RNAPII CTD through its SRI domain (Li et al., 2005; Sun et al., 2005). This allows SETD2 to selectively associate with actively transcribed regions of the genome. As a result, H3K36me3 is generally deposited at the 3′ end of actively transcribed gene bodies and it is associated with euchromatin (Bannister et al., 2005; Mitchell et al., 2023). The SETD2 deposited H3K36me3 modification interacts with a large group of H3K36me3-binding proteins to participate in numerous cellular processes. One such process is de novo DNA methylation at genomic sites enriched with H3K36me3. This is performed by the reader protein DNMT3B (Baubec et al., 2015). DNA methylation aids in repressing cryptic transcription, which is the false initiation of transcription from intragenic sites of protein-coding genes (Neri et al., 2017). SETD2 is additionally involved in the regulation of cell size. MicroRNA-mediated SETD2 knockdown in human cells caused an increase in cell size and total protein content accompanied by an increased protein synthesis rate in vitro (Molenaar et al., 2022). It is speculated that SETD2 might indirectly regulate cell size by influencing cell cycle dynamics or by directly controlling protein synthesis rates (Molenaar et al., 2022). Overexpression of the oncohistone H3.3K36M, which diminishes H3K36me3, also causes an increase in cell volume (Molenaar et al., 2022). Crystal structures revealed that the H3K36M mutation inhibits the catalytic activity of SETD2, as the introduced methionine mutation binds into the lysine binding channel of the active site and functions as a competitive enzyme inhibitor (Yang et al., 2016). Despite H3K36me3 loss through H3K36M overexpression, the cell size increase was smaller than for SETD2 knockdown (Molenaar & van Leeuwen, 2022). This indicates that SETD2- mediated regulation of cell size is not entirely dependent on H3K36me3 and highlights that the biological role of SETD2 is not limited to H3K36me3 deposition. Besides histone protein targets, recent studies found that SETD2 methylation occurs at non-histone substrates. Among these, SETD2 has been observed to monomethylate K525 of the signal transducer and activator of transcription 1-alpha/beta (STAT1) (Chen et al., 2017). This methylation promotes STAT1 phosphorylation and activation, connecting SETD2 with the amplification of IFNα-dependent antiviral immunity signaling pathways (Chen et al., 2017). Other SETD2 targets are K735 of EZH2, K40 of α-tubulin and K68 of actin (Park et al., 2016; Seervai et al.; Yuan et al., 2021). Numerous somatic mutations of the SETD2 gene were found in cancer tissues, especially in pediatric high-grade gliomas (Fontebasso et al., 2013). Additionally, frameshift, non-sense and missense mutations in SETD2 are driver mutations in cell renal cell carcinoma (cRCC), indicating a loss-of- function mechanism. This is supported by the reduced amount of H3K36me3 in cRCC and an unaffected amount of H3K36me2 (Kudithipudi, 2014). 22 1.6.2. NSD2 The nuclear receptor SET domain-containing 2 (NSD2) catalyzes up to dimethylation of H3K36 and non- histone proteins. Down-regulation of NSD2 significantly decreases the methylation of H4K20, leading to the increased accumulation of 53BP1 (Pei et al., 2011). NSD2 interacts with phosphatase and tensin homolog deleted on chromosome 10 (PTEN) via its CTD and stimulates the dimethylation of PTEN in cells (Zhang et al., 2019). The latter is recognized by the specific domain of 53BP1 to recruit PTEN into sites of DNA damage. This is suspected to represent one pathway to regulate the sensitivity of cells to DNA damage (Chen et al., 2020). NSD2 dysfunction is linked to many diseases ranging from developmental disorders to cancers (Lam et al., 2022; Li et al., 2019; Eric J. Wagner & Phillip B. Carpenter, 2012). Heterozygous loss of NSD2 is responsible for the developmental disease called Wolf-Hirschhorn syndrome (WHS) (Bergemann et al., 2005). Moreover, missense mutations in NSD2 were observed in various types of cancers like lung cancers (Sengupta et al., 2021a; Yuan et al., 2021), hematological cancers (Jaffe et al., 2013) and head and neck squamous cell carcinomas (Cancer Genome Atlas, 2015). Epithelial–mesenchymal transformation (EMT) is a crucial process in cancer development, in which epithelial cells acquire characteristics of mesenchymal cells during tumorigenesis, development and progression (Brabletz et al., 2005; Yang et al., 2004). Recently, it was found that the overexpression of NSD2 occurs in 15% of patients with t (4;14)-positive multiple myeloma and that Twist-1 participates in driving the expression of EMT-related genes and contributes to tumor migration (Cheong et al., 2020). NSD2 interacts with Twist-1, which leads to an increase in H3K36me2 and promotion of EMT (Ezponda et al., 2013). Contrary to the straightforward impact of gene deletions causing loss-of-function changes, understanding the biological effects of single point mutations is a more complex task. Many somatic missense mutations in PKMTs have been detected in diverse cancer types, and have been shown to alter the enzyme’s activity, substrate specificity, product specificity, or other enzymatic properties (Brohm et al., 2019; Oyer et al., 2014; Weirich et al., 2017; Weirich et al., 2015). A frequent NSD2 missense single point mutation is E1099K, which was detected in leukemic patients. This mutant was comprehensively characterized and shown to be hyperactive (Jaffe et al., 2013; Oyer et al., 2014; Pierro et al., 2020; Swaroop et al., 2019). E1099K was demonstrated to firstly enhance the binding towards nucleosomal substrates by interacting with the negatively charged DNA, and secondly destabilize the AL of NSD2 through breaking the salt bridge mediated by E1099. Eventually, this led to an enhanced nucleosome associations and higher activity (Li et al., 2019; Sato et al., 2021). However, the effects of other frequent missense cancer mutants like T1150A in NSD2 are still unknown. 23 1.7. Lysine demethylation Histone lysine methylation is a dynamic modification and can be removed by a group of enzymes called lysine demethylase (KDMs) (Hyun et al., 2017). The chemically inert nature of methyl lysine restricts the potential mechanisms of enzymes to remove a methyl group from the ε-amine of a protein lysine residue. Two mechanisms of enzymatic lysine demethylation were characterized and involve amino oxidation and hydroxylation (Luo, 2018). Lysine-specific demethylases (LSDs) are flavine-adenine- dinucleotide (FAD) dependent and use the amine oxidase like (AOL) domain for the amino oxidation to remove lysine methylation (Fig. 11A-B). The hydroxylation reaction to demethylate lysine residues is carried out by KDMs bearing characteristic JmjC domains (Fig. 11C-D). JmjC-domain-containing KDMs can remove methyl groups from Kme1/2/3 whereas LSD enzymes can only act on Kme1/2 as substrates (Cole, 2008; Nowak et al., 2016). Similar to PKMTs, KDMs have been shown to demethylate methyl lysine in non-histone proteins such as ERα (Zhang et al., 2013), E2F1 (Kontaki & Talianidis, 2010), DNMT1 (Nicholson & Chen, 2009) and STAT3 (J. Yang et al., 2010). Many non-histone targets are substrates of LSD1, even though a well- defined sequence motive is missing (Luo, 2018). This raises questions as LSD1 was shown to bind its H3K4me2 substrate in a highly sequence-specific manner (Luo, 2018). The promiscuous sequences and the specific recognition of LSD1 substrates at the same time are contrary and further work will be needed to clarify this issue. Two factors could potentially be responsible for this interesting effect: (i) the substrate-binding pocket of LSD1 is flexible and adopts multiple conformations to accommodate different substrates; (ii) the substrate specificity of LSD1 is altered by recruitment of regulatory partners like the androgen receptor, which interacts with the LSD1 SWIRM domain, changing the preference from H3K4me2 to H3K9me2 (Luo, 2018; Wu et al., 2012). 24 Figure 11: Reaction mechanisms of lysine demethylation. A| Chemical mechanism of the demethylation reaction catalyzed by LSDs. B| Hydrophobic interactions of monomethylated lysine (pink) with its surrounding residues (green) in the catalytic chamber with FAD (orange, PDB 6VYP). C| Chemical mechanism of demethylation reaction catalyzed by Jmjc-domain-containing KDMs (Luo, 2018). D| Representative structure and catalytic site of JmjC-domain-containing KDMs. KDM4A is shown as an example (PDB 2OQ6). Residues H188, E180, H276 (yellow), a water molecule (red) and an α-ketoglutarate analogue (rose) coordinate the iron (orange). Figure taken and modified from (Kong et al., 2011; Luo, 2018). 25 1.8. Molecular Dynamics Simulation Molecular Dynamics Simulations (MD) are computational methods used to study the dynamic behavior of molecules and atoms in space over time. Empirically derived physic principles are applied to model the interactions and movements of atoms in the simulated system. By numerically solving the equation of motion for each atom, MD simulatio