Molecular Dynamics Simulations  

of the Substrate- and Product Specificity and 

Mechanism of DNA- and Protein Lysine 

Methyltransferases 

 
A thesis accepted at the Faculty 4: Energy-, Process- and Bio-Engineering of the University of 

Stuttgart and the Stuttgart Center for Simulation Science in partial fulfillment of the requirements for 

the degree of  

Doktor der Naturwissenschaften/PhD in Natural Science (Dr. rer. nat.) 

 
by 

Philipp Schnee 

born 27.10.1995 in Stuttgart, Germany 

 
Main Referee:   Prof. Dr. Albert Jeltsch (University of Stuttgart) 

Co-Referee:   Prof. Dr. Martin Zacharias (Technical University Munich) 

Committee Chair:  Prof. Dr. Markus Morrison (formerly Rehm) (University of Stuttgart) 

 
Date of Defense:  17.04.2024 

 
University of Stuttgart 

Institute of Biochemistry and Technical Biochemistry 

2024 


II 

Eidesstattliche Erklärung 

Hiermit versichere ich, dass ich die vorliegende Arbeit mit dem Titel 

„Molecular Dynamics Simulations of the Substrate- and Product Specificity and Mechanism of DNA- 

and Protein Lysine Methyltransferases” 

selbstständig verfasst habe und dabei keine anderen als die angegebenen Quellen und Hilfsmittel 

verwendet habe. 

 
Declaration of Authorship 

I hereby certify that the dissertation entitled 

„Molecular Dynamics Simulations of the Substrate- and Product Specificity and Mechanism of DNA- 

and Protein Lysine Methyltransferases” 

is entirely my own work except where otherwise indicated. Passages and ideas from other sources 

have been clearly indicated. 

 
Name:    Philipp Schnee 

Unterschrift/Signature:  ………………………………………………….. 

Datum/Date:   08.01.2024 

  
III 

Table of contents 

Abstract .................................................................................................................................................. VI 

Zusammenfassung .................................................................................................................................. IX 

Acknowledgements ............................................................................................................................... XII 

List of publications................................................................................................................................ XIV 

Author contributions ............................................................................................................................. XV 

List of figures ........................................................................................................................................ XVI 

List of abbreviations ........................................................................................................................... XVIII 

1. Introduction ..................................................................................................................................... 1 

1.1. Epigenetics ............................................................................................................................. 1 

1.2. Chromatin structure and its regulation ................................................................................. 1 

1.3. DNA methylation.................................................................................................................... 2 

1.3.1. DNA methyltransferases ....................................................................................................... 3 

1.4. Lysine methylation ................................................................................................................. 4 

1.4.1. Histone lysine methylation .................................................................................................... 5 

1.4.2. Non-histone protein lysine methylation ............................................................................... 5 

1.5. Protein lysine methyltransferases ......................................................................................... 6 

1.5.1. Structure of SET domain PKMTs ............................................................................................ 7 

1.5.2. Different structural arrangements of SET domain PKMTs .................................................... 9 

1.5.3. Autoinhibition of SET domain PKMTs .................................................................................. 11 

1.5.4. Placeholder residues ........................................................................................................... 11 

1.5.5. Target lysine deprotonation ................................................................................................ 12 

1.5.6. Reaction mechanism of SET domain PKMTs ....................................................................... 14 

1.5.7. Substrate specificity of SET domain PKMTs ........................................................................ 15 

1.5.8. Discovery of PKMT super-substrates................................................................................... 16 

1.5.9. Product Specificity of SET domain PKMTs ........................................................................... 18 

1.6. Histone lysine 36 methylation ............................................................................................. 19 

1.6.1. SETD2 ................................................................................................................................... 20 

1.6.2. NSD2 .................................................................................................................................... 22 

1.7. Lysine demethylation ........................................................................................................... 23 


IV 

1.8. Molecular Dynamics Simulation .......................................................................................... 25 

1.8.1. Modelling ............................................................................................................................. 25 

1.8.2. Structure of MD Simulations ............................................................................................... 27 

1.8.3. Steered molecular dynamics simulations ............................................................................ 29 

2. Aims of this work ........................................................................................................................... 31 

2.1. MD Simulation of the somatic cancer mutation R882H of DNMT3A .................................. 31 

2.2. MD Simulation of the somatic cancer mutation T1150A of NSD2 ...................................... 32 

2.3. Mechanistic basis of super-substrate peptides ................................................................... 32 

3. Material and Methods ................................................................................................................... 33 

3.1. MD simulations and representation .................................................................................... 33 

3.1.1. MD simulations of the DNMT3A/L hetero tetramer complexed with DNA ........................ 34 

3.1.2. sMD simulations of the peptide association process into the NSD2 active site ................. 34 

3.1.3. MD simulations of NSD2 ...................................................................................................... 35 

3.1.4. MD simulations of peptides in solution .............................................................................. 36 

3.1.5. sMD simulations of the peptide association process into the SETD2 active site ................ 36 

3.1.6. MD simulations of SETD2 .................................................................................................... 37 

3.2. Trajectory analysis ............................................................................................................... 38 

3.2.1. Contact maps analysis ......................................................................................................... 38 

3.2.2. NSD2 active site volume calculation ................................................................................... 38 

3.2.3. Clustering of peptide conformations in solution ................................................................ 38 

3.3. SETD2 purification and peptide hairpin validation using FRET ............................................ 39 

3.4. In vitro methylation assay to test peptide inhibitors ........................................................... 39 

4. Results ........................................................................................................................................... 41 

4.1. MD Simulation of the somatic cancer mutation R882H of DNMT3A .................................. 42 

4.1.1. DNMT3A R882H establishes more inter subunit contacts than the DNMT3A WT ............. 42 

4.2. MD simulation of the T1150A cancer mutant of NSD2 ....................................................... 45 

4.2.1. NSD2 T1150A can accommodate a H3K36me2 peptide and SAM in sMD simulations ...... 48 

4.2.2. NSD2 T1150A loses contacts responsible for restricting the active site volume ................ 50 

4.3. Mechanistic basis of the enhanced methylation activity towards super-substrates .......... 52 

4.3.1. The SETD2 super-substrate peptide prefers a hairpin conformation in solution ............... 53 

4.3.2. Experimental investigation of conformational preferences ............................................... 55 


V 

4.3.3. A hairpin conformation has easier access into the SETD2 active site ................................. 57 

4.3.4. Hairpin structures unfold into an extended conformation upon binding to SETD2 ........... 59 

4.3.5. Hairpin conformation in peptides lead to a faster methylation by SETD2.......................... 62 

4.3.6. ssK36 establishes more and different TS-like conformation than H3K36 ........................... 63 

4.4. Peptides can function as competitive inhibitors for PKMTs ................................................ 68 

4.5. MD Simulation of NSD2 in complex with a new NSD2-specific super-substrate ................. 70 

5. Discussion ...................................................................................................................................... 75 

5.1. MD Simulation of the somatic cancer mutation R882H of DNMT3A .................................. 75 

5.2. MD Simulation of the T1150A cancer mutant of NSD2 ....................................................... 78 

5.3. Mechanistic basis of the SETD2 super-substrate peptide ................................................... 81 

5.3.1. Hairpins in histone tails might facilitate the binding towards PKMTs ................................ 82 

5.3.2. Peptide hairpin conformations unfold upon binding to PKMTs and establish distinct contacts

 ....................................................................................................................................................... 85 

5.4. Super-substrate peptides function as PKMT inhibitors ....................................................... 87 

6. References ..................................................................................................................................... 90 

7. Appendix ...................................................................................................................................... 109 

7.1. Appendix I (not included in the published thesis) ............................................................. 109 

7.2. Appendix II ......................................................................................................................... 110 

7.3. Appendix III ........................................................................................................................ 180 

 
VI 

Abstract 

While the genetic information within each cell is encoded as the base pair sequence in the DNA, cellular 

differentiation and adaption to environmental signals are dictated by variations in gene expression. 

Epigenetics describes these often stable, yet reversible, changes in gene expression patterns, which do 

not involve alterations in the DNA sequence. Major epigenetic signals are DNA and histone lysine 

methylation. These modifications are deposited by DNA methyltransferases (DNMTs) and protein 

lysine methyltransferases (PKMTs) by transferring a methyl group from S-adenosyl-L-methionine 

(SAM) to the respective target. Subsequently, set modifications are read by chromatin remodeling 

enzymes, altering the accessibility of genes depending on the actual modifications. Hence, DNMTs and 

PKMTs function as key players in the regulation of genome stability, gene expression, DNA repair and 

cellular differentiation. Their activity is controlled by factors like substrate- and product specificity, 

autoinhibition, and conformational changes upon interaction with substrates. Cancer mutations in 

DNMTs and PKMTs disturb these regulatory mechanisms, which makes their understanding a main 

target in modern epigenetic research.  

In this work, molecular dynamics (MD) simulation in combination with biochemical experiments were 

used to investigate the catalytic mechanism of these enzymes in detail. In pursuit of this goal, two 

approaches were applied. By simulating cancer mutants of DNMTs and PKMTs and comparing the 

obtained simulation results to simulation results of the wild type enzyme (WT), distinctions between 

mutant and WT can be found. This was achieved for the somatic cancer mutant DNMT3A R882H, which 

frequently occurs in blood cancers. In earlier methylation experiments a change in flanking sequence 

preference was determined for R882H. The mechanism behind this observation was revealed by MD 

simulations of the DNMT3A/L-heterotetramer (3L-3A-3A-3L) in this work. Conducted simulations 

showed that the mutated R882H residue had a decreased contact to a guanine, three base pairs away 

from the methylation site, and instead interacted with the adjacent 3A subunit. The lost contact is 

directly connected to a different affinity for certain DNA substrates, explaining the change in flanking 

sequence preference. Moreover, R882H was found to have a dominant effect even in a heterozygous 

state. Extended contacts analysis of the MD simulation data showed that R882H not only interacted 

with the adjacent subunit but led to a rearrangement of a small contact network, increasing the overall 

binding affinity of DNMT3A R882H dimers compared to WT. Since the flanking sequence preference of 

DNMT3A tetrameric complexes is determined in the 3A-3A interface, which is preferentially formed by 

R882H, the dominant effect of R882H in R882H/WT mixed complexes was rationalized by the MD 

simulation results. 

Biochemical characterization of the PKMT NSD2 and its somatic cancer mutation T1150A revealed an 

altered product specificity and a change from a dimethyltransferase to a trimethyltransferase. Changes 


VII 

in the methylation state of histone 3 lysine 36 (H3K36), a methylation target of NSD2, are known to be 

associated with diverse biological outcomes, as dimethylated H3K36 (H3K36me2) and trimethylated 

H3K36 (H3K36me3) exhibit distinct downstream effects on gene transcription and chromatin 

structure. Therefore, a combination of MD and steered MD (sMD) simulation techniques was used in 

this work to investigate the reason for the altered product specificity of NSD2 T1150A. The analysis 

showed that in contrast to NSD2 WT, NSD2 T1150A was able accommodate the H3K36me2 peptide 

and SAM simultaneously in a productive conformation in the active site. Volume calculations of the 

active site revealed that larger volumes occurred more often for T1150A compared to WT, enabling 

the productive accommodation of the higher methylation state. The reason for this was found in a 

subsequent contact analysis. In NSD2 WT, T1150 was engaged in contacts with Y1092 and L1120, which 

oriented these residues effectively reducing the volume of the active site. The T1150 side chain 

hydroxyl group interacted with the Y1092 backbone nitrogen. The side chain methyl group 

hydrophobically interacted with the L1120 side chain. These two contacts were lost in NSD2 T1150A. 

As a consequence, the orientation of Y1092 and L1120 was more flexible and the active site volume 

increased. The presented results precisely explain the molecular mechanism behind the altered 

product specificity observed in biochemical experiment with NSD2 T1150A. 

The substrate specificity of the PKMT SETD2 regarding its natural H3K36 target sequence has been 

previously mapped using Celluspots peptide array methylation. This revealed that the canonical H3 

amino acids were not ideal at many positions. Based on this, an artificial peptide substrate was 

designed that contained the most favorable amino acid at each position. Methylation experiments 

showed that the 15 amino-acid long super-substrate peptide (ssK36), which differed at four positions 

from the original H3K36, was methylated more than 100-fold faster than the canonical H3K36 peptide. 

The crystal structure of SETD2 with bound ssK36 peptide was resolved, but did not entirely explain the 

highly increased methylation activity of SETD2. The second approach in this work focuses on the 

mechanistic reasons behind this massive increase in reaction rate, using a combination of in vitro 

methylation and FRET experiments, MD and sMD simulation techniques to cover multiple steps of the 

catalytic process. MD simulations of the free peptides in solution showed a preference for ssK36 to 

form hairpin conformation, whereas H3K36 preferred an extended conformation. This preference was 

based on the four introduced mutations. Moreover, it was demonstrated in sMD simulations that 

hairpin-shaped peptides had easier access into the active site of SETD2, compared to extended 

conformations. In fact, methylation experiments confirmed that chemically induced hairpins increased 

the methylation activity of peptides by SETD2. Additionally, in MD simulation of the ssK36-SETD2 

complex it was observed that the four mutations established a unique contact profile with SETD2, 

leading to more and different transition state-like conformation compared to the H3K36-SETD2 

complex. The transferability of this approach was demonstrated as a new super-substrate peptide was 

specifically designed for NSD2. The molecular mechanism behind the increased methylation rate was 


VIII 

again investigated by MD simulation presented in this work. Remarkably, SETD2 and NSD2 were shown 

to be specific for their respective super-substrate and did not show increased activity for the other. 

The optimized enzyme interactions of the super-substrate peptides were then used as a starting point 

to establish a PKMT-specific inhibition assay in which ssK36 was demonstrated to function as a 

substrate-competitive SETD2-specific inhibitor. 

In conclusion, the MD simulations conducted in this work revealed yet unknown reasons for the 

dominant effect of DNMT3A R882H in heterozygous states and explained the altered product 

specificity for NSD2 T1150A. Moreover, the features of the artificially designed super-substrate 

peptides, which caused a ~100-fold activity increase of the PKMTs SETD2 and NSD2, were precisely 

described in various MD simulation approaches and validated by wet-lab experiments. The found 

molecular mechanisms in this work explain biochemical results of DNMTs and PKMTs at an atomistic 

resolution and suggest novel strategies for the design of a new class of substrate-competitive PKMT 

inhibitors. 

 
IX 

Zusammenfassung 

Die genetische Information einer jeden Zelle ist in Form der Basenpaarsequenz der DNA codiert. 

Zelldifferenzierung und Anpassung an Umweltsignale werden jedoch durch Äderungen der 

Genexpression bewerkstelligt. Epigenetik beschreibt diese meist stabilen, aber reversiblen 

Veränderungen der Genexpression, denen keine Änderungen der DNA-Sequenz zugrunde liegt. Zu den 

wichtigsten epigenetischen Signalen gehören die DNA- und Histonlysinmethylierung. Diese 

Modifikationen werden durch DNA-Methyltransferasen (DNMTs) und Proteinlysinmethyltransferasen 

(PKMTs), durch Übertragung einer Methylgruppe von S-Adenosyl-L-Methionin (SAM) auf das jeweilige 

Zielmolekül, gesetzt. Die methylierte DNA oder Proteine werden anschließend von Enzymen erkannt, 

welche die Zugänglichkeit von Genen, je nach Modifikation, verändern. DNMTs und PKMTs haben 

deshalb eine Schlüsselrolle bei der Regulation von Genomstabilität, Genexpression, DNA-Reparatur 

und Zelldifferenzierung. Ihre Aktivität wird von Faktoren wie der Substrat- und Produktspezifität, 

Autoinhibition sowie Konformationsänderungen während der Interaktion mit Substraten gesteuert. 

Krebsmutationen in DNMTs und PKMTs stören diese Regulationsmechanismen, wodurch ihre 

Erforschung zu einem Hauptziel in der modernen epigenetischen Forschung geworden ist. 

In dieser Arbeit wurden Molekulardynamik (MD) Simulationen in Kombination mit biochemischen 

Experimenten durchgeführt, um die katalytischen Mechanismen dieser Enzyme im Detail zu 

untersuchen. Um dieses Ziel zu erreichen, wurden zwei Ansätze angewendet. Durch Simulationen von 

Krebsmutationen von DNMTs und PKMTs und Vergleich mit Simulationen des jeweiligen Wildtyp 

Enzyms (WT) können Unterschiede zwischen Mutanten und WT gefunden werden. Dieser Ansatz 

wurde auf die somatische Krebsmutation DNMT3A R882H angewandt, die häufig in verschiedenen 

Leukämieformen auftritt. In früheren Methylierungsexperimenten wurde eine veränderte Präferenz 

von R882H für die Flankierungssequenz von CpG Methylierungsstellen festgestellt. Der Mechanismus 

hinter dieser Beobachtung wurde durch MD Simulationen des DNMT3A/L-Heterotetramers (3L-3A-3A-

3L) in dieser Arbeit aufgeklärt. Die durchgeführten Simulationen zeigten, dass die mutierte R882H-

Aminosäure einen verringerten Kontakt zu einem Guanin, drei Basenpaare vom zu methylierenden 

CpG Dinukleotid entfernt, aufwies. Stattdessen interagierte sie mit der benachbarten 3A-Untereinheit. 

Diese verringerte Kontaktintensität steht in direktem Zusammenhang mit einer veränderten Affinität 

für bestimmte DNA-Substrate, wodurch die Änderung der Flankierungssequenzpräferenz erklärt 

werden konnte. Darüber hinaus wurde für R882H beschrieben, dass selbst im heterozygoten Zustand 

ein dominanter Effekt zu beobachten ist. Eine erweiterte Kontaktanalyse der MD Simulationsdaten 

zeigte, dass R882H nicht nur mit der benachbarten Untereinheit interagierte, sondern dies auch zu 

einer Restrukturierung eines kleinen Kontaktnetzwerks führte, welches die Bindungsaffinität von 

DNMT3A R882H-Dimeren im Vergleich zu WT-Dimeren erhöhte. Da die Flankierungssequenzpräferenz 

von DNMT3A Tetrameren in der 3A-3A-Schnittstelle bestimmt wird, welche vorzugsweise von R882H 


X 

Untereinheiten gebildet wird, wurde der dominante Effekt von R882H in gemischten WT/R882H 

Komplexen durch die Ergebnisse der MD Simulationen erklärt. 

Die biochemische Charakterisierung der PKMT NSD2 und ihrer somatischen Krebsmutation T1150A 

wies eine veränderte Produktspezifität und einen Wechsel von einer Dimethyltransferase zu einer 

Trimethyltransferase auf. Veränderungen im Methylierungsgrad von Histon 3 Lysin 36 (H3K36), ein 

Methylierungsziel von NSD2, sind mit unterschiedlichen biologischen Effekten verbunden. 

Dimethyliertes H3K36 (H3K36me2) und trimethyliertes H3K36 (H3K36me3) weisen unterschiedliche 

Auswirkungen auf die Genexpression und Chromatinstruktur auf. Um die Gründe für die veränderte 

Produktspezifität von NSD2 T1150A zu untersuchen, wurde in dieser Arbeit eine Kombination von MD 

und steered MD (sMD) Simulationsmethoden angewendet. Die Analyse zeigte, dass NSD2 T1150A im 

Gegensatz zu NSD2 WT ein H3K36me2-Peptid und SAM gleichzeitig in einer produktiven Konformation 

im aktiven Zentrum binden konnte. Volumenberechnungen des aktiven Zentrums zeigten, dass 

größere Volumina bei T1150A im Vergleich zu WT häufiger vorkamen und die gemeinsame Bindung 

ermöglicht wurde. Der Grund für die größeren Volumina konnte in einer nachfolgenden 

Kontaktanalyse gefunden werden. In NSD2 WT interagierte T1150 mit Y1092 und L1120, wodurch 

diese Reste ausgerichtet, und das Volumen des aktiven Zentrums reduziert wurde. Die Hydroxylgruppe 

der Seitenkette von T1150 interagierte dabei mit dem Stickstoff der Backbone-Atome von Y1092. 

Zusätzlich war die Methylgruppe der Seitenkette in einem hydrophoben Kontakt mit der Seitenkette 

von L1120. Diese beiden Kontakte gingen in NSD2 T1150A aufgrund der Mutation verloren. Die 

Orientierung von Y1092 und L1120 war dadurch weniger strukturiert, und das Volumen des aktiven 

Zentrums vergrößerte sich. Die vorgestellten Ergebnisse erklären präzise den molekularen 

Mechanismus hinter der experimentell beobachteten veränderten Produktspezifität für NSD2 T1150A. 

Jüngst wurde die Substratspezifität von der PKMT SETD2 mit Hilfe von Celluspots-Peptid-Array-

Methylierung hinsichtlich der natürlichen H3K36-Zielsequenz kartiert. Dabei wurde festgestellt, dass 

die H3-Aminosäuresequenz an einigen Positionen nicht ideal war. Basierend darauf, wurde ein 

artifizielles Peptidsubstrat entworfen, welches an jeder Stelle die optimale Aminosäure enthielt. 

Methylierungsexperimente zeigten, dass das 15 Aminosäuren lange Super-Substratpeptid (ssK36), 

welches an vier Positionen von der natürlichen H3-Sequenz abweicht, mehr als 100-mal schneller 

methyliert wurde als das H3K36-Peptid. Die Kristallstruktur von SETD2 mit gebundenem ssK36-Peptid 

erklärte die stark erhöhte Methylierungsaktivität jedoch nicht vollständig. Der zweite Ansatz dieser 

Arbeit fokussiert sich auf die mechanistischen Gründe hinter diesem massiven Anstieg der 

Reaktionsrate. Um dies zu erreichen, wurde eine Kombination von in vitro Methylierungs- und FRET-

Experimenten sowie MD- und sMD Simulationsmethoden angewendet, um mehrere Schritte des 

katalytischen Prozesses abzubilden. MD Simulationen der freien Peptide in Lösung zeigten, dass ssK36 

präferentiell eine Haarnadelkonformation ausbildete, während H3K36 eher gestreckt vorlag. Diese 


XI 

Präferenz basierte auf den vier eingeführten Mutationen. Darüber hinaus wurde in sMD Simulationen 

gezeigt, dass Peptide in Haarnadelkonformationen einen besseren Zugang in das aktive Zentrum von 

SETD2, im Vergleich zu gestreckten Strukturen hatten. Methylierungsexperimente belegten, dass 

durch chemisch induzierte Haarnadelstrukturen in Peptiden, die Methylierungsaktivität von SETD2 

erhöht werden konnte. Zusätzlich wurde in MD Simulationen des ssK36-SETD2-Komplexes beobachtet, 

dass die vier Mutationen ein einzigartiges Kontaktprofil mit SETD2 ausbildeten. Dadurch wurden 

einerseits unterschiedliche aber auch eine höhere Anzahl an Übergangszustands-ähnlichen 

Konformation im Vergleich zum H3K36-SETD2-Komplexen ausgebildet.  

Die Übertragbarkeit dieses Ansatzes wurde durch ein neues Super-Substrat-Peptid, welches speziell 

für NSD2 entworfen wurde, demonstriert. Der molekulare Mechanismus hinter der erhöhten 

Mehtylierungsrate wurde erneut durch MD Simulation, untersucht. Bemerkenswerterweise zeigten 

SETD2 und NSD2 spezifisch nur für ihr eigenes Super-Substrat eine erhöhte Aktivität und nicht für das 

jeweils andere Super-Substrat. 

Die optimierten Interaktionen der Super-Substratpeptide wurden anschließend als Ausgangspunkt 

verwendet, um einen PKMT-spezifischen Inhibitionstest zu etablieren, bei welchem ssK36 als 

substratkompetitiver SETD2-spezifischer Inhibitor fungierte. 

Zusammenfassend zeigten die in dieser Arbeit durchgeführte MD Simulationen bisher unbekannte 

Gründe für den dominanten Effekt von DNMT3A R882H im heterozygoten Zustand und erklärten die 

veränderte Produktspezifität für NSD2 T1150A. Darüber hinaus wurden die Merkmale des artifiziellen 

entworfenen Super-Substratpeptides, durch verschiedene MD Simulationsansätze beschrieben und 

durch biochemische Experimente validiert. Die, in dieser Arbeit gefundenen, molekularen 

Mechanismen für DNMTs und PKMTs erklären biochemische Ergebnisse in atomarer Auflösung und 

zeigen neue Strategien für die Gestaltung einer neuen Klasse von substratkompetitiven PKMT-

Inhibitoren auf. 

  
XII 

Acknowledgements 

My deepest appreciation goes to my supervisor Prof. Dr. Albert Jeltsch for more reasons that one could 

list here. I am immensely grateful for the guidance and mentorship throughout my academic journey. 

From the very first lecture in my bachelor's study - to the opportunity to conduct my PhD in his institute 

to collaborative efforts on publishing papers, patents, reviews, Professor Jeltsch's support, 

professionalism, and advice have been instrumental in shaping my understanding of science and 

research skills. His commitment to fostering academic growth and excellence has left an indelible mark 

on my academic pursuits, and I am fortunate to have had the privilege of working alongside such a 

dedicated mentor. 

I extend my genuine gratitude to Prof. Dr. Jürgen Pleiss for his persistent support, innovative ideas, 

and key role in building a yet unseen GPU infrastructure tailored for Molecular Dynamics simulation. 

His support greatly facilitated my academic journey. From the beginning of our collaboration, Prof. 

Pleiss has consistently demonstrated a commitment to raising an environment of academic quality.  

I am thankful for Prof. Dr. Martin Zacharias from the Technical University of Munich to be my second 

examiner. Special thanks go to Prof. Dr. Markus Morrison for his kind acceptance to participate in my 

examination committee. 

Special thanks to Dr. Sara Weirich for her guidance in the lab, which made me more confident and 

independent. Additionally, I want to highlight Dr. Mina Saad Khella, and Michael Choudalakis who are 

not only co-authors in publications, but showed incredible support in developing techniques and 

approaches presented in this work. Moreover, thanks to the rest of the PKMT-group for the teamwork 

atmosphere, valuable discussions and continuous help. Additionally, I am thankful for Dr. Philipp 

Rathert for valuable comments in seminars and meetings. I would like to thank the people behind the 

scenes for making the work smooth and perfectly organized: Priv. Doz. Dr. Hans. Rudolph, Elisabeth 

Tosta and Lea Irsigler. Very special thanks to Regina, Dragica and Branka for all their helping hand, nice 

talks and laughs. 

I would like to express my sincere gratitude to SimTech EXC 2075 390740016 for providing the financial 

support that made me pursue my PhD. Especially my project network PN2-5 for the insightful 

discussions. 

My appreciation goes to Dr. Sven Benson for his pivotal role in sharpening my understanding for 

challenges outside of the university. His mentorship not only strengthened my scientific courage but 

also exposed me to the significance of navigating challenges within “yesterday” timelines. 

I am grateful for Dr. Peter Stockinger, my steadfast companion, from the moment we first met until 

the present. He has been more than a study friend – he has been my partner in crime. His enduring 


XIII 

support and camaraderie have made the academic journey to something more, turning challenges into 

shared triumphs. 

I am deeply thankful for the committed support of my parents throughout my academic journey. Their 

encouragement and sacrifices have been the bedrock of my success, with my father's profound 

wisdom and guidance serving as a beacon, steering me through challenges and enriching my personal 

and scholarly growth. 

 
XIV 

List of publications 

Schnee P, Choudalakis M, Weirich S, Khella M. S, Carvalho H, Pleiss J, Jeltsch A*, (2022) Mechanistic  

  basis of the increased methylation activity of the SETD2 protein lysine methyltransferase  

  towards a designed super-substrate peptide. Commun. Chem. 5, 139,  

  doi.org/10.1038/s42004-022-00753-w 

 
Schnee P, Weirich S, Jeltsch A*, (2023) Charakterisierung der Substratspezifität von Protein  

  Methyltransferasen – Methoden und Anwendungen. BIOspektrum 29, 249-251,  

  doi.org/10.1007/s12268-023-1930-y 

 
Schnee P, Pleiss J, Jeltsch A*, (2024) Approaching the catalytic mechanism of protein lysine  

  methyltransferases by biochemical and simulations techniques. Critical Reviews In  

  Biochemistry & Molecular Biology 7, 1-49, doi.org/10.1080/10409238.2024.2318547 

 
Schnee P, Jeltsch A, Weirich S (2023) Artifizielles Peptid mit PKMT-inhibitorischer Wirkung - Universität  

  Stuttgart PCT-Patentanmeldung 

 
Khella M. S., Schnee P, Weirich S, Bui T, Bröhm A, Bashtrykov P, Pleiss J, Jeltsch A*, (2023) The T1150A  

  cancer mutant of the protein lysine dimethyltransferase NSD2 can introduce H3K36  

  trimethylation. J Biol Chem 104796, doi.org/10.1016/j.jbc.2023.104796 

 
Weirich S, Kusevic D, Schnee P, Reiter J, Pleiss J, Jeltsch A*, (2024) Discovery of new NSD2 non-histone  

  substrates and design of a super-substrate – Communications Biology, Manuscript submitted  

  for publication. 

 
Mack A, Emperle M, Schnee P, Adam S, Pleiss J, Bashtrykov P, Jeltsch A*, (2022) Preferential Self- 

  interaction of DNA Methyltransferase DNMT3A Subunits Containing the R882H Cancer  

  Mutation Leads to Dominant Changes of Flanking Sequence Preferences. J Mol Biol  

  15;434(7):167482, doi.org/10.1016/j.jmb.2022.167482. 

  
XV 

Author contributions 

Schnee et al. 2022, Mechanistic basis of the increased methylation activity of the SETD2 protein lysine 

methyltransferase towards a designed super-substrate peptide: P.S. conducted the MD simulation and 

fluorescence spectroscopy experiments. P.S. and A.J. did the data analysis and interpretation of the 

data. P.S. and A.J. prepared the manuscript and figures. 

 
Schnee et al. 2023, Charakterisierung der Substratspezifität von Protein Methyltransferasen Methoden 

und Anwendungen: P.S. created the first draft of the manuscript. 

 
Schnee et al. 2024, Approaching the catalytic mechanism of protein lysine methyltransferases by 

biochemical and simulations techniques: P.S. and A.J. prepared the draft manuscript. P.S. prepared 

the figures. 

 
Khella et al. 2023, The T1150A cancer mutant of the protein lysine dimethyltransferase NSD2 can 

introduce H3K36 trimethylation: P. S. designed and conducted the MD simulation experiments and 

data analysis thereof. P.S. prepared the draft and figures for the MD simulation part of the manuscript. 

 
Weirich et al. 2024, Discovery of new NSD2 non-histone substrates and design of a super-substrate: 

S.W. and A.J. devised the study. D.K. and S.W. conducted the biochemical experiments. P.S. conducted 

the MD simulations and the data analysis thereof. P.S. prepared the draft and figures for the MD 

simulation part of the manuscript. 

 
Mack et al. 2022, Preferential Self-interaction of DNA Methyltransferase DNMT3A Subunits Containing 

the R882H Cancer Mutation Leads to Dominant Changes of Flanking Sequence Preferences: P.S. 

performed and analyzed the MD simulations. P.S. prepared the draft and figures for the MD simulation 

part of the manuscript. 

 
XVI 

List of figures 

Figure 1: DNA and protein methylation influence gene transcription by recruiting chromatin 

remodeling enzymes ............................................................................................................................... 2 

Figure 2: DNA Methyltransferase (DNMT) 3A and its DNMT3L cofactor form a heterotetramer .......... 4 

Figure 3: Protein Lysine Methyltransferases (PKMTs) transfer up to three methyl groups to specific 

lysine residues in proteins ....................................................................................................................... 7 

Figure 4: Binding mode of cofactor SAM and protein substrate for SET and non-SET domain-containing 

PKMTs. ..................................................................................................................................................... 8 

Figure 5: Cartoon representation of multiple SET domain PKMT architectures................................... 10 

Figure 6: The autoinhibitory loop (AL) and placeholder residue need conformational changes to 

overcome autoinhibition ....................................................................................................................... 12 

Figure 7: PKMTs deprotonate the target lysine prior to the methyl group transfer ............................ 14 

Figure 8: Geometric criteria for a bimolecular nucleophilic substitution (SN2) mechanism ................. 15 

Figure 9: Substrate specificity profile of SETD2 led to the super-substrate peptide (ssK36) ............... 17 

Figure 10: Proposed control mechanism for the product specificity of PKMTs ................................... 19 

Figure 11: Reaction mechanisms of lysine demethylation. .................................................................. 24 

Figure 12: Binding mode of PKMTs to nucleosomal substrates. ........................................................... 26 

Figure 13: Molecular dynamics simulation use bonded and non-bonded forces to model the 

interactions between atoms ................................................................................................................. 28 

Figure 14: DNMT3A R882H establishes more and different contacts in the RD interface and different 

interactions with the DNA. .................................................................................................................... 44 

Figure 15: Product specificity change of NSD2 T1150A and NSD1 T2029A compared to WT enzymes on 

H3.1 protein and nucleosomes in vitro. ................................................................................................ 47 

Figure 16: sMD simulation of SAM association into the complex of either NSD2 WT or T1150A with a 

H3K36me1 or me2 peptide substrate ................................................................................................... 49 

Figure 17: Measurement of the active site pocket volume in NSD2 WT and T1150A complexes ..... 51 

Figure 18: The natural H3K36 peptide differs at four positions from the artificially designed super-

substrate peptide (ssK36) ...................................................................................................................... 53 

Figure 19: Clustering of the H3K36 and ssK36 peptide conformations in solution observed in MD 

simulations reveals a hairpin conformation preference for ssK36 ....................................................... 55 

Figure 20: H3K36 and ssK36 peptides show different conformational preferences in solution .......... 57 

Figure 21: Hairpin conformations facilitate the access of peptides into the binding cleft of SETD2 ... 59 

Figure 22: H3K36 and ssK36 peptides unfold upon binding to SETD2 .................................................. 61 

Figure 23: Hairpin formation and resolution upon binding increase SETD2 methylation activity. ...... 62 


XVII 

Figure 24: The complex of SETD2-ssK36 established significantly more TS-like conformations than 

SETD2-H3K36 ......................................................................................................................................... 64 

Figure 25: Contact profiles of the H3K36 and ssK36 peptides bound to SETD2 observed in the MD 

simulations ............................................................................................................................................ 66 

Figure 26: The enhanced methylation activity of SETD2 towards ssK36 can be summarized as four steps.

 ............................................................................................................................................................... 67 

Figure 27: The ssK36 peptide functions as a substrate-competitive inhibitor for SETD2 ..................... 69 

Figure 28: The complex of NSD2 and ssK36(NSD2) establishes more TS-like conformations than the 

NSD2-H3K36 complex............................................................................................................................ 71 

Figure 29: Contact profile analysis reveals different contact maps for H3K36 and ssK36 in complex with 

NSD2 ...................................................................................................................................................... 72 

Figure 30: The H3K36 and ssK36(NSD2) peptide establish different contacts with NSD2 ................... 74 

Figure 31: Possible mechanism for the binding of a PKMT towards a nucleosome and the recognition 

of the target lysine ................................................................................................................................ 85 

 
XVIII 

List of abbreviations 

Ade   Adenine 

AL    Autoinhibitory Loop 

AML   Acute Myeloid Leukemia 

AOL   Amine Oxidase Like 

ASH1L   Absent, Small, or Homeotic Discs 1-Like 

ASH2L   Absent, Small, or Homeotic Discs 2-Like 

AWS    Associated With SET 

CpG   Cytosine-Guanine 

CTD   C-terminal Domain 

Cyt   Cytosine 

DNA   Deoxyribose Nucleic Acid 

DNMT1   DNA Methyltransferase 1 

DNMT3A   DNA Methyltransferase 3A 

DNMT3B   DNA Methyltransferase 3B 

DNMT3L  DNMT3-like 

DNMTs    DNA Methyltransferases 

DTT   Dithiothreitol 

DOT1L   Disruptor of Telomeric silencing 1-like 

EDTA   Ethylenediaminetetraacetic Acid 

EM   Electron Microscopy 

EZH2   Enhancer of Zeste Homolog 2 

FAD   Flavine-Adenine Dinucleotide 

FRET    Förster Resonance Energy Transfer 

GLP   G9a Like Protein 

GPU   Graphical Processing Unit 

GST   Glutathione-S-Transferase 

Gua   Guanine 

G9a   aka EHMT2 (Euchromatic histone-lysine N-methyltransferase 2) 

H1   Histone H1 

H2A   Histone H2A 

H2B   Histone H2B 

H3   Histone H3 


XIX 

H3K27me  Histone H3 lysine 27 methylation 

H3K36me   Histone 3 lysine 36 methylation 

H3K4me  Histone 3 lysine 4 methylation 

H3K9me   Histone H3 lysine 9 methylation 

H4    Histone H4 

HDACs    Histone Deacetylase Complexes 

HP1   Heterochromatin Protein-1 

KDM   Lysine Demethylase 

Kme1   Monomethyl lysine 

Kme2   Dimethyl lysine 

Kme3   Trimethyl lysine 

LSD   Lysine Specific Demethylase 

MALDI    Matrix-assisted Laser Desorption/Ionization 

MBD   Methylated DNA Binding Domain 

MD   Molecular Dynamics 

MLL   Mixed Lineage Leukemia 

MYND   Myeloid Translocation Protein 8, Nervy, and DEAF-1 

NSD1/2/3  The Nuclear Receptor-Binding SET domain 1/2/3 

PDB   Protein Data Bank 

PMT   Protein Methyltransferase 

PKMT   Protein Lysine Methyltransferase 

PRDM9   PR Domain Containing 9 

PTM   Post Translational Modification 

PWWP    Proline-Tryptophan-Tryptophan-Proline domain 

QM/MM  Quantum Mechanics/Molecular Mechanics 

RNA   Ribose Nucleic Acid 

RNAP II   RNA Polymerase 2 

SAH    S-Adenosyl-L-Homocysteine 

SAM    S-Adenosyl-L-Methionine 

SET   Suppressor of Variegation 3-9, Enhancer of Zeste, Trithorax 

SETD2   SET Domain-containing protein 2 

SET-I   SET Insertion domain 

sMD   Steered Molecular Dynamics 

SMYD   SET and MYND Domain-containing protein 


XX 

SN2   Nucleophilic Substitution 

SRI   Set2–Rpb1 Interaction Domain 

ssK36   Super-substrate around lysine 36 

STAT1/3  Signal Transducer and Activator of Transcription 1/3 

SUV39H1/2  Suppressor of Variegation 3-9 Homolog 1/2 

TETs   Ten Eleven Translocation enzyme 

Thy   Thymine 

TS   Transition State 

WHS   Wolf-Hirschhorn Syndrome 

WRAD    WDR5, RbBP5, Ash2L and Dpy30 complex 

5caC   5-carboxy cytosine 

5fC   5-formyl cytosine 

5hmC   5-hydroxymethyl cytosine 

5mC    5-methyl cytosine 

 
1 

1. Introduction  

1.1. Epigenetics 

Every cell has its genetic information encoded as the base pair sequence in the DNA. However, cellular 

differentiation is driven by differences in the expression of genes. Epigenetics describes the 

mechanisms of these often stable, but still reversible, changes in gene expression patterns that do not 

involve alterations in the DNA sequence (Allis & Jenuwein, 2016). How the expression of genes is 

regulated under certain circumstances, is one of the key questions in epigenetics. The highly regulated 

and reversible changes add a dynamic layer of complexity beyond the static genetic code. A unilateral 

flow of DNA to RNA to protein is therefore no longer feasible, since proteins themselves regulate gene 

expression and react to environmental changes. 

 
1.2. Chromatin structure and its regulation 

Chromatin is the complex of proteins and DNA within the nucleus of eukaryotic cells. The smallest 

structural units of chromatin are nucleosomes, which consist of a stretch of about 146 base pairs of 

DNA wrapped around an octamer of histone proteins. The octamer contains two copies of the core 

histones, namely H2A, H2B, H3, and H4. Nucleosomes are connected by linker DNA segments, varying 

in length, forming a “beads-on-a-string” structure. This linear arrangement of nucleosomes can be 

further compacted to form higher-order chromatin structures, such as 30 nm chromatin fibers and 

highly condensed chromosomes. This leads to a remarkable size reduction, as the genetic information 

for a human cell lined up on a string would stretch over 2 meters. The compaction enables the DNA to 

fit into the eukaryotic nucleus with a diameter of 5-16 µM. Besides space optimization, chromatin 

compaction plays a pivotal role in the regulation of gene expression. In condensed chromatin regions, 

nucleosomes are packed tightly, restricting the access of other proteins responsible for e.g. 

transcription, DNA replication and repair. This repressed state of chromatin is referred to as 

heterochromatin. Chromatin regions that are less condensed and accessible for gene transcription are 

referred to as euchromatin. The conversion of heterochromatin to euchromatin, or vice versa, is 

regulated by modifications on DNA- and protein level (Fig. 1). 


2 

 
Figure 1: DNA and protein methylation influence gene transcription by recruiting chromatin remodeling 

enzymes. Chromosomes consist of smaller subunits called nucleosomes, which themselves consist of a protein 

octamer with DNA wrapped around it. DNA methylation at cytosines, and lysine methylation of histone proteins 

are epigenetic modifications, which are read by reader enzymes, recruiting chromatin remodeling enzymes. 

Depending on the actual modifications, the chromatin structure is being tightened or loosened, directly 

influencing gene transcription.  

 
1.3. DNA methylation 

DNA methylation occurs at the fifth carbon (C5) in the pyrimidine base cytosine. The methylation 

reaction is catalyzed by DNA methyltransferases (DNMTs) using the cofactor S-adenosyl-L-methionine 

(SAM) as a methyl group donor, which is converted to S-adenosyl-L-homocysteine (SAH). DNA 

methylation in gene promoters is correlated with silenced gene expression. In general, it contributes 

to formation of heterochromatin and is a more stable modification compared to protein methylation. 

Moreover, DNA methylation is a crucial signal in many biological processes including development and 

gametogenesis, parental imprinting, X chromosome inactivation, as well as maintenance of genome 

integrity (Jurkowska, Jurkowski, et al., 2011). Cytosine-guanine (CpG) islands are genomic regions with 

a high frequency of cytosine-guanine dinucleotides, often associated with gene promoters and 

commonly unmethylated. At promoters, DNA methylation serves as a repressive signal, hindering the 

interaction of transcriptional activators and facilitating the recruitment of transcriptional repressors 

that incorporate methylated DNA binding domains (MBDs) (Razin & Riggs, 1980; Tate & Bird, 1993; Yin 

et al., 2017). Repressed promotors with methylated CpG islands are found in regions, where silencing 

Chromosome

DNA methylation

Histone tail methylation

H3
H4

H2A
H2B

NH2

NH

N

CH3

O


3 

is desired, like centromeric heterochromatin, imprinted genes or transposons (Howard et al., 2008; 

Jurkowska, Jurkowski, et al., 2011). 

 
1.3.1. DNA methyltransferases 

In humans, three DNMTs catalyze the methylation of cytosine (Gowher & Jeltsch, 2018). Whereas DNA 

methyltransferase 1 (DNMT1) maintains the DNA methylation after DNA replication and works 

preferentially on hemimethylated CpG dinucleotide sites (Bestor et al., 1988; Fatemi et al., 2001; Goyal 

et al., 2006), DNA methyltransferase 3A and 3B (DNMT3A and 3B) are responsible for de novo DNA 

methylation. De novo methylation is important during early development, germ cell differentiation as 

well as imprinting, a process in which specific genes are marked with methyl groups based on their 

parental origin (Gowher & Jeltsch, 2001; Okano et al., 1998). DNMT3A and 3B methylate CpG and non 

CpG sites and have no preference for hemimethylated sequences, distinguishing them from DNMT1 

(Gowher & Jeltsch, 2001). The DNMT3-like (DNMT3L) protein lacks catalytic activity, but serves as a 

scaffold protein for DNMT3A and DNMT3B, enhancing their de novo DNA methylation activity 

(Bourc'his et al., 2001). The catalytically active C-terminal domain (CTD) of DNMT3A in complex with 

the CTD of DNMT3L forms a linear heterotetrameric complex with the two DNMT3A subunits in the 

center and the DNMT3L at the edges (3L-3A-3A-3L, Fig. 2A) (Jia et al., 2007). The different subunits are 

connected by two interfaces. The DNMT3A/3L interface, called FF interface, and the central interface 

between the DNMT3A subunits, denoted as RD interface. The binding of DNMT3L at the FF interface 

helps to organize the active site and SAM binding pocket of DNMT3A, which explains the stimulation 

of DNMT3A activity (Jia et al., 2007). Crystal structures of the DNMT3A/3L complex bound to DNA 

showed that the DNMT3L subunits are not in contact with DNA, whereas the two DNMT3A subunits of 

the tetramer interact with two CpG sites of the substrate DNA, which involves the flipping of the target 

bases (Fig. 2B) (Zhang et al., 2018). The FF interface also supports DNMT3A/3A interactions, allowing 

the replacement of DNMT3L subunits in the DNMT3A/3L heterotetramer by two DNMT3A subunits 

yielding a DNMT3A homotetramer (Jurkowska et al., 2008; Jurkowska, Rajavelu, et al., 2011). 

DNA methylation is a dynamic process and can be reversed either passively during DNA replication or 

actively by DNA demethylases enzymes called ten eleven translocation enzymes (TETs). TET enzymes 

oxidize 5-methyl cytosine (5mC) in progressive oxidation reactions resulting in 5-hydroxymethyl 

cytosine (5hmC), 5-formyl cytosine (5fC) and 5-carboxy cytosine (5caC) (Ito et al., 2011; Tahiliani et al., 

2009). 


4 

 
Figure 2: DNA Methyltransferase (DNMT) 3A and its DNMT3L cofactor form a heterotetramer. A| Schematic 

representation of the tetrameric DNMT3A (blue) – DNMT3L (cyan) (DNMT3A/L) complex. B| Cartoon 

representation of the DNMT3A/3L complex with SAM (yellow), bound DNA and flipped out cytosine (orange) 

(Protein Data Bank (PDB) 6W8B). The positions of the two FF interfaces between DNMT3A and DNMT3L subunits 

and the RD interface between the central DNMT3A subunits are indicated as black, dashed lines. Figure taken 

and modified from (Mack et al., 2022). 

 
1.4. Lysine methylation 

SAM-dependent methylation occurs not only in DNA but also in proteins or peptides and can be found 

at side chains of lysine (K), arginine (R), aspartate (D), glutamate (E), histidine (H), asparagine (N), 

glutamine (Q), and cysteine (C) (Clarke, 2013). Due to the lone-pair electrons present in the ε-amine of 

lysine, as well as its preference for localization on the protein surface, lysine residues are a favorable 

target for posttranslational modifications (PTM) (Luo, 2018). Protein lysine methylation stands apart 

from other types of modifications like acetylation, ubiquitination or SUMOylation for three reasons. 

Firstly, the addition of methyl groups to lysine does not affect the overall charge of the residue at 

physiological pH, unlike acylation modifications that convert the positively charged ϵ-amine into a 

neutral amide. Secondly, lysine methylation represents the smallest PTM, resulting in only minor 

changes in the size of the side chain compared to other types of lysine modifications (Luo, 2018). 

Thirdly, up to three methyl groups can be transferred to a target lysine creating monomethyl lysine 

(Kme1), dimethyl lysine (Kme2) and trimethyl lysine (Kme3). 

Methyl lysine recognition is challenging, since lysine methylation only subtly alters the physiological 

properties. Still, the methylated lysines’ ability to engage in cation−π interactions (increased dispersion 


5 

of the positive charge around neighboring hydrocarbons) and the ability to form hydrogen bonds as a 

donor and acceptor is reduced with progressive methylation providing some options for discrimination 

and readout (Luo, 2018). A regular characteristic for methyl lysine-specific reader proteins is to 

recognize Kme2 and Kme3 methyl lysine groups through a hydrophobic pocket containing aromatic 

residues (e.g., F, Y, and W). Given that K, Kme1, Kme2, and Kme3 all carry an overall +1 formal charge 

at physiological pH, the aromatic pocket serves as a binding site for cation−π interactions (Luo, 2018). 

 
1.4.1. Histone lysine methylation 

Histone tails are the flexible ends of the histone proteins extending from the histone octamer and are 

a key target for lysine methylation (Fig. 1). The methylated lysine residues serve as a platform for the 

recruitment of proteins and protein complexes that interpret and regulate this modification. Such 

effector proteins contain specific domains which recognize the position of the lysine residue in the 

histone tail sequence and its methylation state (Cornett et al., 2019; Hyun et al., 2017; Yun et al., 2011). 

Eventually, a signal cascade of downstream effects is triggered, influencing the activity of chromatin 

remodelers, which alter the accessibility of DNA and thus gene transcription (Fig. 1). In the case of 

histone 3 lysine 9 (H3K9) methylation, found in constitutive heterochromatin, a cooperative 

mechanism involving other histone modifications and DNA methylation is found to trigger gene 

silencing. Protein lysine methyltransferases like SUV39H1 and H2 deposit H3K9 methylation, 

subsequently recognized by heterochromatin protein 1 (HP1) via an aromatic cage in its chromo 

domain (Kumar & Kono, 2020). HP1 binds H3K9me2/3 and acts as a transcriptional repressor by 

preventing the association of transcription factors and RNA polymerase (Schoelz & Riddle, 2022). 

Beyond the steric effect, HP1 further recruits DNMTs, which methylate CpG sites adjacent to the 

methylated lysine. The methylated DNA then acts as a foundation for MBD binding, which in turn 

recruit Histone Deacetylases (HDACs) (Jones et al., 1998). HDACs remove histone acetylation, thereby 

increasing the histone’s positive charge (Bannister & Kouzarides, 2011). This strengthens the 

interaction with the negatively charged DNA sugar-phosphate backbone, leading to chromatin 

compaction (Bannister & Kouzarides, 2011). 

 
1.4.2. Non-histone protein lysine methylation 

In addition to lysine methylation at histone tails, this modification was found at non-histone proteins 

like p53 (West & Gozani, 2011), E2F1 (Couture et al., 2006), STAT3 (Jinbo Yang et al., 2010) and the 

androgen receptor (Gaughan et al., 2011; Ko et al., 2011). Lysine methylation of non-histone proteins 

influences their functionality in multiple ways (Hamamoto et al., 2015). The methylation can serve as 

a signal to deploy other PTMs including ubiquitination, thereby affecting e.g. protein stability. Proteins 


6 

binding to methylated residues or other deployed PTMs: (i) stimulate or inhibit the target protein, (ii) 

regulate protein-protein interactions, (iii) affect the subcellular localization.  

 
1.5. Protein lysine methyltransferases 

The transfer of methyl groups from SAM to proteins is catalyzed by enzymes called protein 

methyltransferases (PMTs). If the methylated amino acid is a lysine residue, they are denoted as 

protein lysine methyltransferases (PKMTs) (Fig. 3). Over 60 characterized PKMTs are encoded in the 

human genome and can be categorized into two classes: SET domain-containing PKMTs (class V 

methyltransferase, called SET due to its discovery in the Drosophila enzymes named Suppressor of 

variegation 3-9, Enhancer of zeste, and Trithorax) and non-SET domain PKMTs (class I 

methyltransferases, also called 7-beta strand MTases) (Copeland et al., 2009; Falnes et al., 2016; Luo, 

2012; Richon et al., 2011). Notably, more than 90% of PKMTs belong to the SET domain family (Luo, 

2018). Still, non-SET domain-containing PKMTs and especially disruptor of telomeric silencing 1-like 

(DOT1L) as the most-studied representative member of this family, were shown to have major roles in 

cellular process and disease development (McLean et al., 2014; Nguyen & Zhang, 2011; Sarno et al., 

2020). 

PKMTs generally display a high specificity, targeting only defined lysine residues in one or few substrate 

proteins. Remarkably, histone lysine methylation is redundant, meaning that one lysine could be 

methylated by more than one PKMT. This redundancy has advantages: (i) different enzymes can be 

differentially regulated leading to a dramatic increase in the complexity of the regulatory network; (ii) 

PKMTs with a redundant substrate specificity can be recruited to different genomic loci like enhancers, 

promoters or gene bodies; (iii) PKMTs with redundant substrate specificities can transfer varying 

numbers of methyl groups. For example, PKMTs NSD1 (aka KMT3B), NSD2 (aka MMSET, WHSC1), NSD3 

(aka WHSC1L1) and ASH1L transfer up to two methyl groups to histone H3 lysine 36 (H3K36) while 

SETD2 (aka KMT3A, HYPB, SET2) catalyzes trimethylation at the same lysine residue (Edmunds et al., 

2008; Gregory et al., 2007; Li et al., 2009).  

PKMTs transfer up to three methyl groups to a target lysine. They can do so in two different ways. In a 

distributive mechanism, each round of catalysis results in product dissociation and rebinding of a fresh 

substrate is needed for a second turnover. Hence, each methylation event is independent leading to 

the stochastic generation of Kme1, Kme2 and Kme3, depending on the product specificity of the PKMT. 

In contrast, in a processive reaction mechanism, multiple rounds of catalysis proceed on the same 

substrate before dissociation of the product (Gowher & Jeltsch, 2001; van Dongen et al., 2014).  

 
7 

 
Figure 3: Protein Lysine Methyltransferases (PKMTs) transfer up to three methyl groups to specific lysine 

residues in proteins. The cofactor S-adenosyl-L-methionine (SAM) provides the methyl group. It is released after 

the transfer as S-adenosyl-L-homocysteine (SAH). Figure taken from (Schnee et al., 2023). 

 
1.5.1. Structure of SET domain PKMTs 

The SET domain of PKMTs is responsible for the methylation activity of this class of enzymes. It consists 

of approximately 130 amino acids, is often flanked by a pre-SET and post-SET domain (Qian & Zhou, 

2006) and sometimes contains the domain insertion SET-I. SET domain-containing PKMTs bind the 

protein substrate and the methyl group providing cofactor SAM at opposing binding faces (Cheng et 

al., 2005). This is contrary to non-SET domain-containing PKMTs, like DOT1L, where the protein 

substrate and SAM are accommodated within a single, extended binding cleft (Min et al., 2003).  

In the SET domain of PKMTs, the target lysine is brought in close proximity to the SAM methyl group 

through a hydrophobic tunnel. Here, the lysine hydrocarbon side chain interacts with tyrosine, 

phenylalanine and tryptophan residues via hydrophobic interactions (Qian & Zhou, 2006; Trievel et al., 

2003). The positively charged ε-amine group interacts with these residues through cation-π 

interactions (Luo, 2018). After insertion, the ε-amine group is oriented by multiple tyrosine residues 

and primed for the methyl group transfer. Meanwhile, SAM binds at the opposing site via contacts 

with its nucleobase and sugar moiety. The methyl group is then inserted into the active site and 

transferred to the deprotonated lysine ε-amine group (Fig. 4). The detailed mechanistic features of 

different SET domain architectures, autoinhibition, lysine deprotonation and methyl group transfer are 

described in the following chapters. 


8 

 
Figure 4: Binding mode of cofactor SAM and protein substrate for SET and non-SET domain-containing PKMTs. 

A| In SET domain PKMTs, the protein substrate (cyan) and SAM (orange, methyl group is colored black) bind at 

opposing sites. The target lysine (pink) is inserted into a narrow tunnel, where it undergoes deprotonation and 

is oriented for the methyl group transfer (image created using simulation results of PDB 6VDB). B| The non-SET 

domain-containing PKMT DOT1L binds the target lysine (pink) and cofactor SAM (orange, methyl group is colored 

black) in the same pocket (PDB 1NW3). The architecture consists of a DOT1L specific region (yellow), a 7-beta 

sheet Rossman fold (white) and a ubiquitin interaction region (forest green). 

 
9 

1.5.2. Different structural arrangements of SET domain PKMTs 

Phylogenetic analysis of SET domain sequences revealed that human SET domain PKMTs can be 

classified into subfamilies, each characterized by unique architectures (Wu et al., 2010). G9a (aka 

EHMT2, KMT1C), SUV39H1 (aka KMT1A) and SUV39H2 (aka KMT1B) belong to the classical PKMT 

subfamily, where their SET domains catalyze the methyl group transfer without prior conformational 

changes (Fig. 5A) (Schnee et al., 2023; Tachibana et al., 2001). In contrast, NSD1, NSD2, NSD3 and 

SETD2 are part of the PKMT subfamily with an autoinhibitory loop (AL) (Fig. 5B) (An et al., 2011; 

Bennett et al., 2017; Yang et al., 2016). In this subfamily, the apo form of the SET domain is expected 

to have a highly reduced activity and needs to undergo conformational changes for substrate binding 

and enzyme activity. Other PKMTs act in complexes with additional proteins or contain specific 

domains to: (i) bind to certain structures like nucleosomes; (ii) recognize specific modifications on the 

substrate; and/or (iii) regulate their own activity. An example for this are Mixed lineage leukemia (MLL) 

SET domains, which are inactive on their own but become catalytically active in the presence of binding 

partners such as WDR5, RbBP5, ASH2L, and DPY30, collectively referred to as WRAD (Fig. 5C) (Borkin 

et al., 2015; Cao et al., 2014; Grebien et al., 2015).  

The SET domain of PKMTs can feature insertions like the MYND domain (Myeloid translocation protein 

8, Nervy and DEAF-1). PKMTs with a MYND domain insertion represent the “SET and MYND Domain-

containing protein” (SMYD) subfamily. The MYND domain is responsible for protein-protein 

interactions possibly recruiting the enzymes to specific substrate proteins. Additionally, SMYD 

enzymes are characterized by a bilobal architecture with the protein substrate in the middle (Fig. 5D) 

(Ferguson et al., 2011; Mazur et al., 2014; Mzoughi et al., 2016; Saddic et al., 2010; Sirinupong et al., 

2011; Sirinupong et al., 2010).  

 
10 

 
Figure 5: Cartoon representation of multiple SET domain PKMT architectures. A| SET domain-containing PKMT 

G9a complexed with the 9 amino acid long H3K36 peptide (cyan) with the target lysine (pink), and cofactor SAM 

(orange, PDB 5JIY). SET domain-containing PKMTs incorporate zinc ions for structural stability in their “associate 

with SET” (AWS) domain (magenta), post-SET (yellow) or MYND (rose) domain depending on the enzyme (Dillon 

et al., 2005; Wu et al., 2011). However, they are not involved in catalysis or conformational changes. For 

simplicity, zinc ions are therefore not shown in protein structures presented in this work. B|SETD2 complexed 

with the 14 amino acid long H3K36 peptide, and cofactor SAM (PDB 5JLB). The autoinhibitory loop (rose) is in an 

open position to accommodate the protein substrate. C| MLL1 SET domain (white) associated with WDR5 

(green), RbBP5 (light blue), ASH2L (rose) and DPY30 (cyan) bound to a nucleosome core particle (PDB 6PWV). 

MLL1 SET domain complexed with the 8 amino acid long H3K4 peptide (PDB 6UH5) D| SMYD2 complexed with 

the 10 amino acid long peptide Er𝛼 (PDB 4O6F). Distinct features are the bilobal or clamshell-like structure and 

the MYND domain. Figure taken from (Schnee et al., 2023) 


11 

1.5.3. Autoinhibition of SET domain PKMTs 

NSD1 was one of the first PKMTs for which an AL was described. Crystal structure analysis and 

Molecular Dynamics (MD) simulations of NSD1 with bound cofactor SAM, but without bound 

substrate, showed that a loop of approx. 14 amino acids is placed on top of the substrate binding cleft, 

effectively blocking the entrance of a target peptide (Fig. 5B) (Trievel et al., 2003; Xiao et al., 2003; 

Zhang et al., 2003). The AL is positioned between the SET and Post-SET domain. Multiple PKMTs were 

shown to have an AL in their structure, but their sequences are not conserved (Couture et al., 2005; 

Xiao et al., 2005). Studies on ASH1L demonstrated that stabilizing the closed position of the AL, 

achieved by enforcing hydrophobic interactions between AL and enzyme through mutations, 

decreased the ASH1L methylation activity (Rogawski et al., 2015). Adding to this, the AL was speculated 

to regulate the product specificity of PKMTs. Mutational studies of the ASH1L AL turned the enzyme 

from a dimethyltransferase to a trimethyltransferase (An et al., 2011; Rogawski et al., 2015). This result 

may provide an explanation for different product specificities among PKMTs with high sequence 

similarity in the active site, but not in the AL (Schnee et al., 2023). This was postulated for PKMTs NSD1, 

DIM-5 and SETD2, which share a high active site sequence similarity but possess different AL residues 

and exhibit differing product specificities (SETD2 and DIM-5 are trimethyltransferases, NSD1 is a 

dimethyltransferase) (Qiao et al., 2011). Together, these findings suggested a regulatory role of the AL. 

However, the MD simulation experiments and crystal structure analysis, which led to this conclusion 

were conducted with peptides as substrates. The mechanistic principles for the interaction with larger 

substrates like nucleosomes remain to be described. 

 
1.5.4. Placeholder residues 

The AL plays a pivotal role in regulating the substrates binding of SET domain PKMTs by sterically 

blocking the binding cleft. In addition to its steric hindrance, the AL was also found to position residues 

directly in the active site, at the position of the target lysine. Notably, specific residues function as 

“placeholder” residues in this context, stabilizing the AL in its closed conformation. For instance, NSD1 

employs C2062 as a placeholder residue (Morishita & di Luccio, 2011), NSD2 uses C1183 (Jaffe et al., 

2013), SETD2 relies on R1670 (Yang et al., 2016), and ASH1L uses S2259 (Yang et al., 2016). Mutational 

studies on ASH1L demonstrated the impact of such interactions, where the placeholder residue serine 

was exchanged for methionine, which might establish stronger interactions with the hydrophobic 

lysine binding tunnel. This strengthened the closed conformation of the AL and led to a heavily 

decreased methylation activity (Rogawski et al., 2015). Remarkably, methionine was not found to be a 

placeholder residue in any SET domain PKMT, indicating that its binding strength into the active site 

might be too strong (Schnee et al., 2023). In the case of SETD2, the placeholder residue R1670 can 

adopt multiple conformations (Yang et al., 2016). In the AL closed state, R1670 was observed to occupy 


12 

the lysine binding tunnel (Fig. 6A). Crystal structures depicting the AL in a half-open position showed 

R1670 slightly flipped outwards, away from the active center (Fig. 6B). Furthermore, structures with 

the AL in a fully open conformation, and substrate bound, showed R1670 completely flipped outwards 

and exposed to the solvent. Moreover, the usually unresolved Post-SET loop (Q1676-K1703 for SETD2) 

was captured in front of the bound peptide substrate, engaging in hydrophobic interactions with the 

core enzyme (Fig. 6C). This loop is not resolved in crystal structures without bound peptide, indicating 

its high flexibility in this state. 

 
Figure 6: The autoinhibitory loop (AL) and placeholder residue need conformational changes to overcome 

autoinhibition. A| In the binary PKMT-SAM state, the placeholder residue occupies the target lysine channel. In 

SETD2 the placeholder residue R1670 (rose) can adopt multiple conformations. If no peptide is bound, the AL is 

in a closed position and R1670 occupies the target lysine channel (PDB 4H12). B| In a half-opened position, the 

AL starts to lift, and R1670 turns outwards (PDB 5JLE). C| When a peptide substrate (cyan, target lysine in pink) 

is bound, the AL is in an open position and R1670 becomes solvent exposed. The Post-SET loop (yellow) is closed 

on top of the bound peptide (PDB 5JLB). Figure taken and modified from (Schnee et al., 2023). 

 
1.5.5. Target lysine deprotonation 

The side chains of K, Kme1 and Kme2 possess lone-pair electrons on their ε-amine groups, making 

them targets for methylation. However, due to their high pKa values (10.2−10.7), K, Kme1, and Kme2 

predominantly exist in a protonated state under physiological conditions (pH 7.4), where they are 

unreactive as a nucleophile. Therefore, one critical requirement for lysine methylation catalyzed by 


13 

PKMTs is the deprotonation of the target lysine (Fig. 7A) (Trievel et al., 2002). To explain the 

deprotonation mechanism, MD and Hybrid Quantum Mechanics/Molecular Mechanics Quantum 

(QM/MM) simulations have been employed, showcasing a conserved mechanism for the 

deprotonation of the target lysine in multiple SET domain-containing PKMTs (Zhang & Bruice, 2007b). 

In this mechanism, the deprotonation of the side chain nitrogen (Nε) occurs through the transient 

formation of dynamic water channels in the enzyme’s active site (Hu & Zhang, 2006; X. Zhang & T. 

Bruice, 2008a, 2008b). A water molecule, which was frequently observed in the crystal structures of 

SET domain-containing PKMTs, was suggested to transfer the proton through a chain of water 

molecules into the aqueous solvent and finally to a buffer molecule (Fig. 7B-D). Additionally, 

electrostatic interactions between the positive charges of the SAM sulfonium moiety and the 

protonated N atom decrease the pKa of the latter from 10.9 to 8.2 (Zhang & Bruice, 2007b). This could 

explain the necessity of a basic reaction buffer for PKMTs and the weak in vitro methylation activity in 

acidic and even neutral buffers, as the deprotonation of the target lysine is impeded (Wilson et al., 

2002; Zhang et al., 2002). Based on this, Bruice and Zhang suggested a stepwise process, in which: (i) 

the water channel appears; (ii) the target lysine is deprotonated; (iii) the target lysine is methylated 

using the cofactor SAM; (iv) the proton is transferred into the solvent (Zhang & Bruice, 2007a, 2007b, 

2007c; X. Zhang & T. Bruice, 2008a, 2008b, 2008c). This model is applicable to multiple SET domain-

containing PKMTs but not to non-SET containing PKMTs like DOT1L (Fig. 4B). In the class of non-SET 

domain PKMTs, a water channel was not observed, and the amino acids located at the target lysine 

channel appear incapable of facilitating a direct deprotonation. It was speculated that their more 

hydrophobic active site could reduce the pKa of the target lysine and that the carboxylate of SAM could 

help in the subsequent deprotonation process (Cheng et al., 2005; Cortopassi et al., 2016; Min et al., 

2003). 


14 

 
Figure 7: PKMTs deprotonate the target lysine prior to the methyl group transfer. A| Schematic depiction 

illustrating the obligatory target lysine deprotonation prior to the PKMT-catalyzed methyl group transfer. B| The 

protonated target lysine (pink) is oriented by e.g., PKMT SET7/9 Y335 (white, sticks), while the water channel 

(red spheres) is already present (prepared using PDB 1XQH). C| The lysine proton is transferred to a nearby water 

molecule. D| After lysine deprotonation, the SAM methyl group is rapidly transferred to the deprotonated target 

lysine thereby preventing reprotonation. The excess proton is transferred into the bulk solvent. B-D| Figure taken 

and modified from (Schnee et al., 2023). 

 
1.5.6.  Reaction mechanism of SET domain PKMTs 

After a successful deprotonation, target lysine and SAM must be oriented in a conformation that 

facilitates the subsequent bimolecular nucleophilic substitution reaction (SN2), leading to the methyl 

group transfer. QM/MM simulations of SET7/9 (aka SETD7, SET7, SET9, KMT7) were first to describe 

the details of the SN2 mechanism (Hu & Zhang, 2006). In this mechanism, the deprotonated lysine N 

acts as the nucleophile, whereas the SAM sulphonium cation (S+) as the leaving group. The free 


15 

electron pair of N is present in a sp3 orbital at an 109° angle. The SN2 reaction occurs at an aliphatic 

sp3 carbon center (the C-atom of the transferred methyl group), with the electronegative sulphonium 

leaving group attached to it. The nucleophile attacks the carbon at a minimal distance of approximately 

4.4–4.6 Å (Chen et al., 2019). A combination of computational modeling, QM/MM and kinetic isotope 

effect studies have demonstrated that PKMTs can stabilize two distinct transitions states (TS) when 

methylating substrates (Chen et al., 2019; Linscott et al., 2016; Poulin et al., 2016). The SET8 enzyme 

(aka Pr-SET7, SETD8, KMT5A) exhibits an early SN2 TS (with a C−S distance of 2.0 Å and a C−Nε distance 

of 2.4 Å), while a late SN2 TS was observed for NSD2 (with a C−S distance of 2.5 Å and a C−Nε distance 

of 2.1 Å) (Fig. 8). Breaking of the C–S bond and the formation of the new bond between C and the 

nucleophile occurs instantaneously through a trigonal bipyramidal TS in which the carbon atom is sp2 

hybridized. The nucleophile attacks the carbon at a 180° angle to the leaving group, optimizing the 

overlap between the nucleophile's lone pair and the C–S antibonding orbital. Subsequently, the leaving 

group is pushed off at the opposite side, the TS structure collapses, and the methyl group covalently 

binds to the nitrogen atom, while SAH is released as a product (Copeland et al., 2009). Important to 

note is that the methyl group is transferred rapidly to the target lysine once a geometry favorable for 

the reaction has been achieved, due to the high group transfer potential of SAM, where it then 

prevents reprotonation. 

 
Figure 8: Geometric criteria for a bimolecular nucleophilic substitution (SN2) mechanism. A transition state (TS)-

like conformation can be approximated by using the depicted metrices.  

 
1.5.7.  Substrate specificity of SET domain PKMTs 

PKMTs are highly regulated enzymes and aberrant methylation of proteins could result in 

misregulation of chromatin states or protein activity. A specific recognition of the protein substrate by 

PKMTs is therefore indispensable. A suitable technique to decipher the substrate specificity of PKMTs 

are Celluspot peptide arrays (Bock et al., 2011; Sara Weirich & Albert Jeltsch, 2022). In this method, 

peptides are synthesized on a cellulose membrane using solid-phase peptide synthesis. A large variety 

of different peptides can be synthesized on a single membrane, increasing the screening capacity. Each 


16 

spot on the membrane represents an individual peptide sequence (Fig. 9A). The membrane is then 

incubated with the PKMT of interest and radioactively labeled SAM in buffer, allowing the detection 

of methylation through autoradiography. The signal intensity of the different peptide spots directly 

indicates which peptide sequences are preferred by the PKMT. By creating peptide arrays containing 

all possible single amino acid substitutions of an original substrate sequence, a PKMT-specific substrate 

specificity profile can be generated (Fig. 9B) (Dhayalan et al., 2011; Kudithipudi, Kusevic, et al., 2014; 

Kudithipudi, Lungu, et al., 2014; Kusevic et al., 2017; Rathert, Dhayalan, Ma, et al., 2008; Schuhmacher 

et al., 2015; Weirich et al., 2016). With the obtained specificity profile as a consensus sequence, novel 

protein substrate candidates have been identified (Dhayalan et al., 2011; Rathert, Dhayalan, 

Murakami, et al., 2008; Schuhmacher et al., 2020; Weirich et al., 2020). Certain PKMTs displayed a 

strict substrate specificity, limiting the range of substrates available for methylation (Kudithipudi et al., 

2012; Schuhmacher et al., 2015). In contrast, other PKMTs showed a relaxed substrate specificity, 

allowing them to methylate a broad spectrum of substrates (Rathert, Dhayalan, Murakami, et al., 

2008).  

A strict substrate specificity of PKMTs could be explained by precise interaction between enzyme and 

substrate, ensuring an accurate readout of the substrate sequence. On the other hand, explaining the 

promiscuity of certain PKMTs is more challenging. Recognizing multiple lysine residues necessitates a 

complex network of specific interactions, while still avoiding off-target effects to prevent aberrant 

methylation profiles. A hollow active site with loose contacts therefore appears as a too simplistic 

explanation. Various models have been suggested to clarify the promiscuity of PKMTs. One hypothesis 

suggests that the structural flexibility of both, the enzyme's active site and the substrate allows for the 

adoption of numerous dynamic conformations, enabling PKMTs to recognize multiple substrates (Luo, 

2018). An alternative model proposes that certain PKMTs identify their substrates based on their 

backbone atoms rather than their side chains (Al Temimi et al., 2019; Luo, 2018). The SET-I domain 

splits the SET domain and is speculated to be one of the key factors in determining substrate specificity, 

since it is the least conserved region among SET domain-containing PKMTs and heavily interacts with 

the substrate (Fig. 5) (Ronen Marmorstein, 2003). However, the SET-I hypothesis is challenged by the 

observation that PKMTs with similar substrate specificity, such as SETDB1 and SUV39H1, both 

methylating H3K9, exhibit large differences in their SET-I sequence (Ronen Marmorstein, 2003; Qian & 

Zhou, 2006). 

 
1.5.8.  Discovery of PKMT super-substrates 

One striking example of the complex mechanism behind the substrate specificity of PKMT has recently 

been discovered for SETD2. Hereby, the canonical substrate, H3 residues A29-P43, was deemed 

suboptimal for SETD2 (Schuhmacher et al., 2020). Surprisingly, multiple single amino acid exchanges 


17 

caused a higher methylation of the corresponding peptide substrates. By combination of the preferred 

amino acids, a novel, non-natural peptide sequence, referred to as “super-substrate K36” (ssK36), was 

created. The ssK36 peptide, differed at four positions from the canonical H3 sequence and was 

methylated about 100-fold more efficiently (Fig. 9C). Methylation differences between H3K36 and 

ssK36 were even larger in a protein context (Schuhmacher et al., 2020). The crystal structure of the 

ssK36 peptide complexed to the SET domain of SETD2 was resolved, and subtle differences were 

observed compared to the H3K36-SETD2 structure (Schuhmacher et al., 2020). Three of the four amino 

acids altered in ssK36 established distinct contacts: ssK36-R31 forms an H-bond/salt bridge with 

SETD2-E1674, ssK36-F32 is bound into a pocket formed by SETD2-E1674 and SETD2-Q1676, and ssK36-

R37 interacts with the backbone of SETD2-A1700. Despite these alterations, the overall structures of 

the ssK36-SETD2 and H3K36-SETD2 complexes remained very similar. Consequently, the crystal 

structures could not fully explain the substantial enhancement in the methylation rate of ssK36. 

 
Figure 9: Substrate specificity profile of SETD2 led to the super-substrate peptide (ssK36). A| Celluspot peptide 

array with the 15-residue long H3K36 peptide sequence as the starting sequence, incubated with SETD2 and 

radioactively labeled SAM. Positions were individually mutated to any other amino acid except tryptophan and 

cysteine. At several positions amino acids are preferred which differ from the original H3 sequence. B| 

Quantification of the peptide array methylation data generates a PKMT specific specificity profile, highlighting 

the preference for each position. C| Combination of preferred residues led to the super-substrate peptide (ssK36) 

(black) sequence, differing at 4 positions (orange) from the canonical H3K36 peptide (cyan) sequence. SETD2 was 

demonstrated to have a strongly enhanced methylation efficiency towards ssK36. Figure taken from (Philipp 

Schnee et al., 2022; Schuhmacher et al., 2020). 

 
18 

1.5.9. Product Specificity of SET domain PKMTs 

Product specificity in PKMTs refers to their capability to transfer a precise number of methyl groups to 

their lysine residue target: one, two or three methyl groups, creating Kme1, Kme2 or Km3, respectively. 

Despite structural similarities in the SET domain, PKMTs exhibit distinct substrate but also product 

specificities, requiring unique mechanisms to control the number of methylation steps.  

One potential mechanism to control the number of transferred methyl groups is speculated to be the 

target lysine deprotonation. As described earlier, the deprotonation of the lysine N is facilitated via a 

chain of water molecules (Zhang & Bruice, 2007b). In the context of product specificity, the presence 

or absence of the water channel could define the outcome. MD simulations of the 

monomethyltransferase SET7/9 revealed a water channel’s presence only in the SAM-bound state 

complexed with an unmethylated K4 peptide, suggesting a role in lysine deprotonation and regulating 

mono-methylation. In the presence of SAH, or K4me1, the water channel was absent, preventing 

further methylation after monomethylation. (Fig. 10A). Mechanistically, the methyl group of the 

monomethylated peptide takes the position of the proton that would be removed through the water 

channel. Deprotonation and further methylation of Kme1 is therefore impossible (X. Zhang & T. Bruice, 

2008b).  

Another possible regulation mechanism refers to the SN2 reaction mechanism used by PKMTs. If methyl 

group and lysine Nε are too distant, the transfer is unlikely. In MD and QM/MM simulations of SET7/9 

complexed with the K4 or K4me1 peptide, the distance between the SAM sulfur group and lysine Nε 

was greater for K4me1 (6.1 Å) than for K4 (5.7 Å) (Zhang & Bruice, 2007b). This difference in distance 

may be attributed to the active site potentially being too narrow. After the first methylation, a 

reorientation of Kme1 is not possible due to steric constraints. Consequently, a productive state, in 

which monomethylated lysine Nε and SAM methyl group come in close proximity, cannot form (Fig. 

10B). Adding to the described mechanisms, in multiple sequence alignments it had been identified, 

that PKMTs possessing a tyrosine at the so-called “F/Y-switch” position are limited to catalyzing mono- 

or dimethylation. In contrast, enzymes with a phenylalanine or another hydrophobic residue at this 

position display di- or trimethyltransferase activity (Collins et al., 2005). This phenomenon was 

observed for several SET domain-containing PKMTs and it could even be used to manipulate the 

product specificity. For instance, the trimethyltransferase DIM-5 could be converted into a 

mono/dimethylase by the F281Y mutation (Zhang et al., 2003) and the monomethyltransferase SET7/9 

could be changed to a dimethylase through the Y305F mutation (Del Rizzo et al., 2010; Zhang et al., 

2003). The mechanistic basis of the F/Y-switch solely relies on the presence of a single hydroxyl group. 

The missing hydroxyl group in the Y to F mutants creates additional space in the active site, facilitating 

the accommodation of water molecules and proper reorientation of already transferred methyl 

groups. In contrast, F-to-Y mutations, which turn trimethyltransferases into mono- or 


19 

dimethyltransferases, could be based on steric effects caused by the additional hydroxyl group making 

the active site too narrow to accommodate multiple methyl groups at the lysine N (Fig. 10C) (Chu et 

al., 2012; Hu & Zhang, 2006). The concept of the active site volume as a regulator for product specificity 

may also provide insights into somatic cancer mutations altering product specificity as later shown in 

this work.  

 
Figure 10: Proposed control mechanism for the product specificity of PKMTs. PKMTs catalyze the transfer of a 

distinct number of methyl groups to their lysine target (pink). Multiple mechanisms have been proposed, 

regarding the control of this process. A| Restricted second methylation caused by a disrupted water channel 

(red, spheres) and blocked lysine deprotonation of monomethylated target lysine (green, PDB 1XQH). B| The SN2 

geometry cannot be adopted in the presence of a monomethyl substrate. C| The F/Y-switch position controls 

the product specificity of certain PKMTs. Phenylalanine (white) at this position creates additional space in the 

active site, allowing accommodation of a dimethyl product. In contrast, a tyrosine with its additional hydroxyl 

group causes clashes, preventing the formation of the dimethylated product. Figure taken and modified from 

(Schnee et al., 2023). 

 
1.6. Histone lysine 36 methylation 

Methylation of lysine 36 of histone H3 (H3K36) and especially the di- and trimethylation (H3K36me2 

and me3) are important histone modifications affecting many cellular processes (Eric J. Wagner & 

Phillip B. Carpenter, 2012). NSD1, NSD2, NSD3, ASH1L and SETD2, SMYD5 and PRDM9 are the PKMTs 

responsible for H3K36 methylation in human cells. While NSD1, NSD2, NSD3 and ASH1L can only 

introduce mono- and dimethylation of H3K36 in vitro and in vivo (Eric J. Wagner & Phillip B. Carpenter, 

2012). SETD2 and SETD5 are responsible to introduce up to trimethylation at H3K36 in gene bodies, 

the SET and MYND domain-containing 5 (SMYD5) and PR/SET 9 (PRDM9) do so at promoter regions 

(Edmunds et al., 2008; Gregory et al., 2007; Li et al., 2009; Powers et al., 2016; Sessa et al., 2019; Zhang 


20 

et al., 2022). H3K36me2 is enriched at intergenic regions and promotors while H3K36me3 is enriched 

at gene bodies of active genes (Lam et al., 2022). H3K36me3 levels can be controlled in several ways, 

such as demethylation by the eraser protein KDM4A (Klose et al., 2006) or through the stability of 

SETD2. Homeostatic SETD2 protein levels in mammalian cells are low, as it is readily degraded by the 

ubiquitin–proteasome system (Zhu et al., 2017). SETD2 is also negatively regulated at the 

transcriptional level by the microRNA miR-106b-5p. Overexpression of miR-106b-5p was found to 

reduce SETD2 expression (Xiang et al., 2015). 

The biological functions of H3K36 methylation encompass the regulation of gene expression, DNA 

repair, recombination and gene splicing (Eric J. Wagner & Phillip B. Carpenter, 2012). The diverse 

effects arise through the physical interaction of H3K36 PKMTs, predominantly SETD2, with RNA 

polymerase II (RNAP II), RNA-binding proteins, and transcriptional elongation factors (Li et al., 2019). 

H3K36 methylation is associated with both active gene transcription marks and gene repression. Its 

impact on gene transcription is controlled by adjacent histone modifications and their respective 

reader proteins (Eric J. Wagner & Phillip B. Carpenter, 2012). As a repressive modification, H3K36 

methylation functions to suppress the aberrant initiation of transcription within coding regions of gene 

bodies in particular during active gene expression. This repression is facilitated by recruiting 

deacetylase complexes and the DNA methylation machinery (Lam et al., 2022; Li et al., 2019; Eric J. 

Wagner & Phillip B. Carpenter, 2012). The connection between H3K36 methylation and DNA 

methylation is established through the PWWP domains of DNMT3A and DNMT3B, which preferentially 

bind to H3K36me2 and H3K36me3, respectively (Dukatz et al., 2019). 

 
1.6.1. SETD2 

The SET domain-containing protein 2 (SETD2), has a size of 230 kDa, which corresponds to 2564 amino 

acids. This enzyme is a major writer of H3K36me3 in mammals, depositing the modification primarily 

at gene bodies of actively transcribed genes. SETD2 was thought to be the sole protein responsible for 

H3K36me3, but a recent study has indicated that SETD5 can also deposit H3K36me3 at active gene 

bodies in vivo (Sessa et al., 2019). Furthermore, PKMTs SMYD5 and PRDM9 have been shown to 

deposit H3K36me3 at promoter regions and during meiosis, respectively (Powers et al., 2016; Zhang 

et al., 2022). 

Human SETD2 contains several functional domains. These include a SET domain flanked by an 

“associated with SET” (AWS) domain and a post-SET domain, which together are responsible for the 

methyltransferase activity (Sun et al., 2005). SETD2 also contains a Set2–Rpb1 interaction (SRI) domain, 

which interacts with RNA polymerase II (RNAPII) (Li et al., 2005). The largest subunit of RNAPII contains 

a CTD that is hyperphosphorylated during active transcription. SETD2 interacts specifically with this 


21 

phosphorylated form of the RNAPII CTD through its SRI domain (Li et al., 2005; Sun et al., 2005). This 

allows SETD2 to selectively associate with actively transcribed regions of the genome. As a result, 

H3K36me3 is generally deposited at the 3′ end of actively transcribed gene bodies and it is associated 

with euchromatin (Bannister et al., 2005; Mitchell et al., 2023). The SETD2 deposited H3K36me3 

modification interacts with a large group of H3K36me3-binding proteins to participate in numerous 

cellular processes. One such process is de novo DNA methylation at genomic sites enriched with 

H3K36me3. This is performed by the reader protein DNMT3B (Baubec et al., 2015). DNA methylation 

aids in repressing cryptic transcription, which is the false initiation of transcription from intragenic sites 

of protein-coding genes (Neri et al., 2017).  

SETD2 is additionally involved in the regulation of cell size. MicroRNA-mediated SETD2 knockdown in 

human cells caused an increase in cell size and total protein content accompanied by an increased 

protein synthesis rate in vitro (Molenaar et al., 2022). It is speculated that SETD2 might indirectly 

regulate cell size by influencing cell cycle dynamics or by directly controlling protein synthesis rates 

(Molenaar et al., 2022). Overexpression of the oncohistone H3.3K36M, which diminishes H3K36me3, 

also causes an increase in cell volume (Molenaar et al., 2022). Crystal structures revealed that the 

H3K36M mutation inhibits the catalytic activity of SETD2, as the introduced methionine mutation binds 

into the lysine binding channel of the active site and functions as a competitive enzyme inhibitor (Yang 

et al., 2016). Despite H3K36me3 loss through H3K36M overexpression, the cell size increase was 

smaller than for SETD2 knockdown (Molenaar & van Leeuwen, 2022). This indicates that SETD2-

mediated regulation of cell size is not entirely dependent on H3K36me3 and highlights that the 

biological role of SETD2 is not limited to H3K36me3 deposition. 

Besides histone protein targets, recent studies found that SETD2 methylation occurs at non-histone 

substrates. Among these, SETD2 has been observed to monomethylate K525 of the signal transducer 

and activator of transcription 1-alpha/beta (STAT1) (Chen et al., 2017). This methylation promotes 

STAT1 phosphorylation and activation, connecting SETD2 with the amplification of IFNα-dependent 

antiviral immunity signaling pathways (Chen et al., 2017). Other SETD2 targets are K735 of EZH2, K40 

of α-tubulin and K68 of actin (Park et al., 2016; Seervai et al.; Yuan et al., 2021).  

Numerous somatic mutations of the SETD2 gene were found in cancer tissues, especially in pediatric 

high-grade gliomas (Fontebasso et al., 2013). Additionally, frameshift, non-sense and missense 

mutations in SETD2 are driver mutations in cell renal cell carcinoma (cRCC), indicating a loss-of-

function mechanism. This is supported by the reduced amount of H3K36me3 in cRCC and an unaffected 

amount of H3K36me2 (Kudithipudi, 2014).  

 
22 

1.6.2. NSD2 

The nuclear receptor SET domain-containing 2 (NSD2) catalyzes up to dimethylation of H3K36 and non-

histone proteins. Down-regulation of NSD2 significantly decreases the methylation of H4K20, leading 

to the increased accumulation of 53BP1 (Pei et al., 2011). NSD2 interacts with phosphatase and tensin 

homolog deleted on chromosome 10 (PTEN) via its CTD and stimulates the dimethylation of PTEN in 

cells (Zhang et al., 2019). The latter is recognized by the specific domain of 53BP1 to recruit PTEN into 

sites of DNA damage. This is suspected to represent one pathway to regulate the sensitivity of cells to 

DNA damage (Chen et al., 2020).  

NSD2 dysfunction is linked to many diseases ranging from developmental disorders to cancers (Lam et 

al., 2022; Li et al., 2019; Eric J. Wagner & Phillip B. Carpenter, 2012). Heterozygous loss of NSD2 is 

responsible for the developmental disease called Wolf-Hirschhorn syndrome (WHS) (Bergemann et al., 

2005). Moreover, missense mutations in NSD2 were observed in various types of cancers like lung 

cancers (Sengupta et al., 2021a; Yuan et al., 2021), hematological cancers (Jaffe et al., 2013) and head 

and neck squamous cell carcinomas (Cancer Genome Atlas, 2015). Epithelial–mesenchymal 

transformation (EMT) is a crucial process in cancer development, in which epithelial cells acquire 

characteristics of mesenchymal cells during tumorigenesis, development and progression (Brabletz et 

al., 2005; Yang et al., 2004). Recently, it was found that the overexpression of NSD2 occurs in 15% of 

patients with t (4;14)-positive multiple myeloma and that Twist-1 participates in driving the expression 

of EMT-related genes and contributes to tumor migration (Cheong et al., 2020). NSD2 interacts with 

Twist-1, which leads to an increase in H3K36me2 and promotion of EMT (Ezponda et al., 2013). 

Contrary to the straightforward impact of gene deletions causing loss-of-function changes, 

understanding the biological effects of single point mutations is a more complex task. Many somatic 

missense mutations in PKMTs have been detected in diverse cancer types, and have been shown to 

alter the enzyme’s activity, substrate specificity, product specificity, or other enzymatic properties 

(Brohm et al., 2019; Oyer et al., 2014; Weirich et al., 2017; Weirich et al., 2015). A frequent NSD2 

missense single point mutation is E1099K, which was detected in leukemic patients. This mutant was 

comprehensively characterized and shown to be hyperactive (Jaffe et al., 2013; Oyer et al., 2014; Pierro 

et al., 2020; Swaroop et al., 2019). E1099K was demonstrated to firstly enhance the binding towards 

nucleosomal substrates by interacting with the negatively charged DNA, and secondly destabilize the 

AL of NSD2 through breaking the salt bridge mediated by E1099. Eventually, this led to an enhanced 

nucleosome associations and higher activity (Li et al., 2019; Sato et al., 2021). However, the effects of 

other frequent missense cancer mutants like T1150A in NSD2 are still unknown. 

 
23 

1.7. Lysine demethylation 

Histone lysine methylation is a dynamic modification and can be removed by a group of enzymes called 

lysine demethylase (KDMs) (Hyun et al., 2017). The chemically inert nature of methyl lysine restricts 

the potential mechanisms of enzymes to remove a methyl group from the ε-amine of a protein lysine 

residue. Two mechanisms of enzymatic lysine demethylation were characterized and involve amino 

oxidation and hydroxylation (Luo, 2018). Lysine-specific demethylases (LSDs) are flavine-adenine-

dinucleotide (FAD) dependent and use the amine oxidase like (AOL) domain for the amino oxidation 

to remove lysine methylation (Fig. 11A-B). The hydroxylation reaction to demethylate lysine residues 

is carried out by KDMs bearing characteristic JmjC domains (Fig. 11C-D). JmjC-domain-containing KDMs 

can remove methyl groups from Kme1/2/3 whereas LSD enzymes can only act on Kme1/2 as substrates 

(Cole, 2008; Nowak et al., 2016). 

Similar to PKMTs, KDMs have been shown to demethylate methyl lysine in non-histone proteins such 

as ERα (Zhang et al., 2013), E2F1 (Kontaki & Talianidis, 2010), DNMT1 (Nicholson & Chen, 2009) and 

STAT3 (J. Yang et al., 2010). Many non-histone targets are substrates of LSD1, even though a well-

defined sequence motive is missing (Luo, 2018). This raises questions as LSD1 was shown to bind its 

H3K4me2 substrate in a highly sequence-specific manner (Luo, 2018). The promiscuous sequences and 

the specific recognition of LSD1 substrates at the same time are contrary and further work will be 

needed to clarify this issue. Two factors could potentially be responsible for this interesting effect: (i) 

the substrate-binding pocket of LSD1 is flexible and adopts multiple conformations to accommodate 

different substrates; (ii) the substrate specificity of LSD1 is altered by recruitment of regulatory 

partners like the androgen receptor, which interacts with the LSD1 SWIRM domain, changing the 

preference from H3K4me2 to H3K9me2 (Luo, 2018; Wu et al., 2012). 


24 

 
Figure 11: Reaction mechanisms of lysine demethylation. A| Chemical mechanism of the demethylation 

reaction catalyzed by LSDs. B| Hydrophobic interactions of monomethylated lysine (pink) with its surrounding 

residues (green) in the catalytic chamber with FAD (orange, PDB 6VYP). C| Chemical mechanism of demethylation 

reaction catalyzed by Jmjc-domain-containing KDMs (Luo, 2018). D| Representative structure and catalytic site 

of JmjC-domain-containing KDMs. KDM4A is shown as an example (PDB 2OQ6). Residues H188, E180, H276 

(yellow), a water molecule (red) and an α-ketoglutarate analogue (rose) coordinate the iron (orange). Figure 

taken and modified from (Kong et al., 2011; Luo, 2018). 

 
25 

1.8. Molecular Dynamics Simulation 

Molecular Dynamics Simulations (MD) are computational methods used to study the dynamic behavior 

of molecules and atoms in space over time. Empirically derived physic principles are applied to model 

the interactions and movements of atoms in the simulated system. By numerically solving the equation 

of motion for each atom, MD simulatio