Local Correlation Methods in Classical and
Quantum Mechanics Hybrid Schemes
Von der Fakultät Chemie der Universität Stuttgart
zur Erlangung der Würde eines Doktors der
Naturwissenschaften (Dr. rer. nat.) genehmigte Abhandlung
Vorgelegt von
Ricardo André Fernandes da Mata
aus Lissabon
Hauptberichter: Prof. Dr. H.-J. Werner, Universität Stuttgart
Mitberichter: Prof. Dr. G. Rauhut, Universität Stuttgart
Tag der mündlichen Prüfung: 8 November 2007
Institut für Theoretische Chemie der Universität Stuttgart
2007

Contents
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Citations to Published Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1 Introduction 13
2 Theoretical Background 19
2.1 Quantum Mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1.1 Hartree Fock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.1.2 Møller-Plesset Perturbation Theory . . . . . . . . . . . . . . . . . 25
2.1.3 Coupled Cluster Theory . . . . . . . . . . . . . . . . . . . . . . . 28
2.1.4 Local Correlation Methods . . . . . . . . . . . . . . . . . . . . . . 31
2.1.5 Density Functional Theory . . . . . . . . . . . . . . . . . . . . . . 36
2.1.6 Semiempirical Methods . . . . . . . . . . . . . . . . . . . . . . . 38
2.2 Molecular Mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.3 Quantum Mechanics/Molecular Mechanics . . . . . . . . . . . . . . . . . 42
2.4 Reaction Rate Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.4.1 Transition State Theory . . . . . . . . . . . . . . . . . . . . . . . . 45
2.4.2 Michaelis-Menten kinetics . . . . . . . . . . . . . . . . . . . . . . 47
3 Computing Potential Energy Surfaces using Local Correlation Methods 49
3.1 The Domain Discontinuity Problem . . . . . . . . . . . . . . . . . . . . . 51
3.2 Domain Merging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.2.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.3 Test Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.3.1 Ketene and propadienone bond dissociation . . . . . . . . . . . . . 54
3.3.2 SN2 reaction of hydrochlorocarbons with chlorine . . . . . . . . . 58
3.3.3 Hydrogen fluoride addition to double bonds . . . . . . . . . . . . . 64
3
4 Contents
4 Natural Localized Molecular Orbitals for Local Correlation Schemes 69
4.1 Critical Assessment of the Boughton-Pulay Criteria . . . . . . . . . . . . . 71
4.2 Natural Localized Molecular Orbitals . . . . . . . . . . . . . . . . . . . . 72
4.3 Natural Population Domain Criterion . . . . . . . . . . . . . . . . . . . . . 76
4.3.1 Orbitals Population . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.3.2 NPA-based Domain Criterion . . . . . . . . . . . . . . . . . . . . 78
4.4 Comparison to Boughton-Pulay . . . . . . . . . . . . . . . . . . . . . . . 79
4.4.1 Domain Convergence with respect to Basis Set . . . . . . . . . . . 79
4.4.2 Correlation Energies . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.4.3 Local Gradients . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5 Local Quantum Mechanical Hybrid Scheme 87
5.1 Localized Orbitals as Molecular Subspaces . . . . . . . . . . . . . . . . . 89
5.2 Local Regions Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.2.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.2.2 Preliminary Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.2.3 Scaling of the Method . . . . . . . . . . . . . . . . . . . . . . . . 97
5.3 Test Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.3.1 Proton Transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.3.2 Hydroxylation Reaction . . . . . . . . . . . . . . . . . . . . . . . 103
5.4 Comparison to other partitioning methods . . . . . . . . . . . . . . . . . . 105
5.4.1 Chlorohydrocarbon SN2 reactions . . . . . . . . . . . . . . . . . . 107
5.4.2 Aminoacid-water complexes . . . . . . . . . . . . . . . . . . . . . 107
6 Computation of Activation Barriers in Enzymes 111
6.1 Local Correlation Methods - Tools for Computational Biochemistry . . . . 113
6.2 The p-Hydroxybenzoate Hydroxylase enzyme . . . . . . . . . . . . . . . . 115
6.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.2.2 Model Setup and Simulation . . . . . . . . . . . . . . . . . . . . . 118
6.2.3 The Hydroxylation Activation Barrier . . . . . . . . . . . . . . . . 119
6.3 The Chorismate Mutase enzyme . . . . . . . . . . . . . . . . . . . . . . . 128
6.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
6.3.2 Model Setup and Simulation . . . . . . . . . . . . . . . . . . . . . 130
6.3.3 The Claisen Rearrangement Barrier . . . . . . . . . . . . . . . . . 131
7 Summary 139
Contents 5
8 Zusammenfassung 145
A Natural Localized Molecular Orbitals 153
A.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
A.2 General Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
A.3 NAO Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
A.4 NBO Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
A.4.1 Core and Valence lone pair NBOs . . . . . . . . . . . . . . . . . . 157
A.4.2 Two-center Bond NBOs . . . . . . . . . . . . . . . . . . . . . . . 158
A.4.3 Rydberg NBOs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
A.4.4 Orthogonalization of the NHOs . . . . . . . . . . . . . . . . . . . 159
A.4.5 Antibonding NBOs . . . . . . . . . . . . . . . . . . . . . . . . . . 159
A.5 NLMO Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
A.5.1 Exclusion of core orbitals . . . . . . . . . . . . . . . . . . . . . . 160
B Domain Merging - Quick Guide 162
B.1 General Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
B.2 A step-by-step example: ketene . . . . . . . . . . . . . . . . . . . . . . . . 163
C LMOMO - Quick Guide 167
C.1 General Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
C.2 A step-by-step example: SN2 reaction . . . . . . . . . . . . . . . . . . . . 169
D Electrostatic embedding - the polarized QM Hamiltonian 174
E Optimized stationary points structures 176
E.1 SN2 Reactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
E.2 Hydrogen fluoride addition to double bonds . . . . . . . . . . . . . . . . . 180
Bibliography 188
Acknowledgments
To my parents for their support ...
... to all (former and present) coworkers from the Stuttgart group, especially Andreas Nick-
lass, Robert Polly, Alexander Mitrushchenkov and Klaus Pflüger for their patience in an-
swering all my nagging questions, and also to those who helped me by proof-reading this
manuscript: Thomas Adler (who had it worst), Erich Goll and Christoph Köppl ...
... to Prof. Stoll and Prof. Rauhut, for sharing their research and knowledge ...
... to my friends in Stuttgart, for four years I will never forget ...
... to my brother Tiago and Christine, for always being there for me ...
... and to Prof. Werner, who so wholeheartedly accepted me in his group and guided me
through this work (and let us not forget, taught me how to put the back of an envelope to
good use) ...
... I owe my thanks, and am forever in your debt.
6
Abstract
The computation of reaction barriers in molecular systems has been since its birth one of the
major challenges in Theoretical Chemistry. It is of vital importance in understanding and
predicting catalysis phenomena, and in rationalizing our knowledge of chemical reactivity
in general using Transition State Theory (TST).
The Hartree-Fock (HF) approximation offers a "mean field" approach to the solution of
the Schrödinger Equation, and is the starting point for most of quantum chemical methods.
It does however not account for electron correlation effects, i.e., the instantaneous Coulomb
repulsion between electrons. This effect is of prime importance in describing chemical re-
activity, due to the changes in electron interaction during bond breaking/formation. The
HF method normally has errors in the range of 100-500% for reaction barriers. The ap-
proximate treatment of electron correlation through Density Functional Theory (DFT) is
an inexpensive way to include some of these effects in the energy estimate. However, its
results depend strongly on the parametrization made and a functional which consistently
delivers good results for all chemical systems has still not been found. The post-HF ab
initio family of methods offers a systematic way to approach a converged result. How-
ever, the high scaling of computational cost with molecular size only allows quantitative
calculations for small-sized systems (up to 15 atoms).
Local correlation methods avoid the steep scaling of conventional canonical methods
by using local spaces to describe occupied and virtual orbitals. The excitations are limited
by distance criteria, and the correlation of electron pairs is approximated in an hierarchical
manner, with higher levels used for neighboring orbitals, and neglecting very distant pair
contributions.
In this PhD work several advances have been made in the application of local corre-
lation methods to the computation of reaction paths and barriers. A new procedure was
implemented in the Molpro program package to compensate for the geometry dependence
of excitation domains. This dependence can lead to noncontinuous potential energy sur-
faces and used to be a drawback in the use of local methods for tracing reaction paths. The
main focus however, was in the implementation and use of QuantumMechanical/Molecular
Mechanics (QM/MM), as well as a combined Quantum Mechanics/Quantum Mechanics
(QM/QM) approaches for the computation of reaction barriers. Although the local meth-
ods approach asymptotically the linear scaling regime, they can only be routinely applied
to systems of up to 100-150 atoms. Enzymatic systems include well above 1000 atoms
and further approximations are needed. In the QM/MM case, the environment is treated by
MM force fields, and the active site by regular quantum mechanical methods. The use of
7
this approach together with local methods provided reaction enthalpies of high accuracy for
two enzymatic systems (Chorismate Mutase and p-Hydroxybenzoate Hydroxylase). The
second coupling (QM/QM) is made by classifying orbital pairs according to regions of
different chemical interest (normally separating the active site from the environment) and
applying different correlation schemes. This has shown promising results for medium to
large sized systems.
8
Citations to Published Work
Most of Chapter 3 has been published as
"Computation of smooth Potential Surfaces using Local Correlation Methods",
R. A. Mata, H.-J. Werner, J. Chem. Phys. 125, 184110 (2006)
Chapter 4 is also partly featured in
"Local Correlation Methods with a Natural Localized Molecular Orbital Basis",
R. A. Mata, H.-J. Werner, in press
Chapter 5 is featured in
"Correlation regions within a localized molecular orbital approach",
R. A. Mata, M. Schütz, and H.-J. Werner, to be submitted
Large portions of Chapter 6 have been or will be published in the following papers:
"High-Accuracy Computation of Reaction Barriers in Enzymes",
F. Clayessens, J. N. Harvey, F. R. Manby, R. A. Mata, A. J. Mulholland, K. E.
Ranaghan, M. Schütz, S. Thiel, W. Thiel, and H.-J. Werner, Angew. Chemie
118, 7010-7013 (2006)
"Towards accurate barriers for enzymatic reactions: QM/MM case study on p-
hydroxybenzoate hydroxylase",
R. A. Mata, S. Thiel, W. Thiel, and H.-J. Werner, to be submitted
9
Abbreviations
3,4-DOHB 3,4-dihydroxybenzoate
AM1 Austin Model 1
BP Boughton-Pulay
BSSE Basis Set Superposition Error
CBS Complete Basis Set
CC Coupled Cluster
CCSD Singles and Doubles Coupled Cluster
CCSD(T) Singles and Doubles Coupled Cluster with perturbative triples
CM Chorismate Mutase
CP Counterpoise
DF Density Fitting
DFT Density Functional Theory
FAD Flavin adenine
FCI Full Configuration Interaction
HF Hartree-Fock
HOMO Highest Occupied Molecular Orbital
IMOMO Integrated MO/MO
KS Kohn-Sham
LCCSD Local CCSD
LCCSD(T) Local CCSD(T)
LCCSD(T0) Local CCSD with non-iterative triples
LMO Localized Molecular Orbital
LMP2 Local MP2
LUMO Lowest Unoccupied Molecular Orbital
MD Molecular Dynamics
MM Molecular Mechanics
10
MNDO Modified Neglect of Diatomic Overlap
MO Molecular Orbital
MP Møller-Plesset
MP2 Second Order Møller-Plesset Perturbation Theory
NAC Near Attack Conformation
NAO Natural Atomic Orbital
NBO Natural Bond Orbital
NDDO Neglect of Diatomic Differential Overlap
NHO Natural Hybrid Orbital
NLMO Natural Localized Molecular Orbital
NMB Natural Minimal Basis
NPA Natural Population Analysis
NRB Natural Rydberg Basis
PAO Projected Atomic Orbital
PES Potential Energy Surface
PHBH para-Hydroxybenzoate Hydroxylase
PM Pipek-Mezey
PM3 Parametric Method Number 3
pOHB para-Hydroxybenzoate
QM Quantum Mechanics
QM/MM Quantum Mechanics/Molecular Mechanics
SCF Self-Consistent Field
SCS Spin Component Scaled
TS Transition State
TST Transition State Theory
WT Wild Type
ZDO Zero Differential Overlap
ZPVE Zero Point Vibrational Energy
11

Chapter 1
Introduction
13

15
We are perhaps not far removed from the time when we shall be able to submit
the bulk of chemical phenomena to calculation.
Joseph Louis Gay-Lussac,
Memoires de la Société D’Arcueil, 2, 207 (1808)
Theoretical Chemistry is a vibrant and expanding research area. It covers virtually
all aspects of Chemistry, from calorimetry to spectroscopy, biochemistry and solid state,
organic and inorganic materials. Fulfilling the wishes of Gay-Lussac, even if with some
delay, one is currently able to predict and understand phenomena in ever larger time or
size scales, almost exclusively on the basis of calculations. But even considering all the
advances made in the last decades, size still remains a central concern for the theoretical
chemist.
If we consider the different methods and applications of Theoretical Chemistry as a
function of the system size, we will find a broad spectrum, changing not only in subject,
but also in character. On one end, we would find the study of very small systems in the gas
phase, typically up to 10 atoms. The theoretical chemist on this side of the spectrum will
be concentrated on questions of "quantity". His calculations can reach an accuracy below 1
kcal mol−1 (or even sub kJ mol−1), and therefore rival with experiments. These studies give
a valuable support to the lab-chemist, often identifying errors in tabulated experimental data
or providing assistance in the interpretation of various spectra. At the other end, stands the
study of condensed/biological systems. These involve the study of thousands to hundreds
of thousands of atoms. In this case, one mostly hopes for a qualitative description, like
information on conformational stability or preferential reactivity.
The reasons behind these differences are manifold. The questions posed on an enzy-
matic system are of course not the same when dealing with a two-atom molecule. But there
are still many common interests. The ability to make quantitative predictions is always
desirable, whether one is dealing with a large or small amount of atoms. There is however
a barrier in the way. The computational cost of accurate quantum chemical methods scales
exponentially with system size. This means that beyond a given number of atoms, one will
face an exponential wall. The cost increases too steeply and it is impossible to add any
element to the calculation without exceeding the available resources. Since computer tech-
nology evolves at best linearly, little progress is to be expected just by waiting for the new
computers to come along. Therefore, improving the scaling of quantum chemical methods
should be one of our top priorities.
If the exponential wall is to be overcome, the scaling of the computational cost with
molecular size must be made at least linear. In this case, when one doubles the system
16 Chapter 1. Introduction
size, the CPU memory and/or disk space requirements will only double. If the task can be
parallelized the resources can be distributed over several machines, the quality of the cal-
culation can be preserved and the requirements per CPU kept constant. Several progresses
have been made in this direction over the last few years. Many of them even took place
here in Stuttgart. Based on the ideas first put forward by Peter Pulay, linear scaling corre-
lated methods have been successfully programmed and tested. They are referred to as local
correlation methods, due to the use of approximations based on the locality of electron
correlation. This family of methods critically decreases the cost of conventional methods,
and are today’s top reference in the field of linear scaling quantum chemical algorithms.
But we still may have a long wait ahead before we will be able to perform a full quantum
mechanical calculation on a protein with the same accuracy as in a diatomic. The prefac-
tors involved are still too large, and one would be extremely limited in performing these
applications at such a large scale (need for supercomputers, small number of calculations,
limiting the search of the conformational space). On the other hand, one could ask whether
this is actually necessary. Does one need a full calculation to extract information from the
system?
Fortunately, when discussing chemical phenomena the answer is in many cases "no".
Most of the effects of interest in reactivity are local in nature. In a bond breaking/formation,
only the near-lying groups will have a significant influence on the process. The surrounding
environment can be included approximately or even neglected. The ideas detailed above
denote the fundaments for my work. It should be possible to treat large molecular systems
with unprecedented accuracy by coupling local correlation methods with lower level ap-
proaches. Such approaches are already in use today. The innovation lies in the use of local
methods. They allow for larger active sites and higher accuracies than their conventional
counterparts. Also, since one avoids the exponential scaling, it is possible to increase the
active site in the calculation without great increase in the computational cost. This can be
used to test the approximations involved or simply to improve the overall accuracy of the
calculation. Hybrid schemes involving both coupling of quantum mechanical and molecu-
lar mechanical methods (QM/MM) as well as quantummechanics with quantummechanics
(QM/QM) have been implemented and/or used in this work, building up the centerpiece of
this Thesis.
The structure of this Thesis is as follows. In the first two sections of Chapter 2 a short re-
view of the various methods used in this work is given. The large amount of information led
me to cut some parts short, but I believe the connection between the different methodolo-
gies has been evidenced. The level of detail and the wide spectrum of techniques featured
make the text an advisable pre-graduate reading material. The informed reader may how-
17
ever skip most of it without great loss. The main emphasis lies in the description of local
correlation methods and the quantum mechanics/molecular mechanics coupling schemes.
This is required for the following Chapters 5 and 6. In the last section, the general theory
on rate constants is presented, in support of the discussion featured in Chapter 6.
In Chapter 3, the computation of potential energy surfaces using local correlation meth-
ods is discussed. These methods are known to generate non-continuous potential energy
surfaces in cases of bond breaking or significant geometric displacements. This is due to
the use of geometry-dependent excitation spaces. This problem has been investigated and
a simple procedure is presented in order to generate smooth surfaces in such cases. The
procedure is also found to improve the description of some reaction energetics.
Chapter 4 discusses the use of Natural Localized Molecular Orbitals as a new occupied
space for local methods in general. A new single-parameter domain criterion is also pre-
sented. The combination of this proposed occupied space with the new selection proves
to be remarkably more stable than previous implementations, and is a promising devel-
opment for establishing local correlation methods as well defined computational models.
First results for absolute energies, together with a thorough comparison between the new
procedure and the previous methods are shown.
Chapter 5 describes the hybrid QM/QM implementation at the heart of this work. It
presents a novel approach to the problem of separating the system into constituent parts
of different accuracy. It is also the second method to date which allows a combined use
of quantum mechanical methods. Test calculations are presented for biological systems,
and comparison is made with other proposed QM/QM coupling schemes. The method is
shown to deliver similar or better accuracy, but with significant advantages relative to other
models.
The use of QM/MM coupling schemes for local correlation methods did not involve
any new theoretical developments. Therefore, only the applications will be discussed. The
general theory is given in Chapter 2 and in Appendix D. The applications are discussed
in Chapter 6. These involved extensive collaboration with other groups in Bristol and at
the Max-Planck Institute in Mülheim. References to their work will accordingly be made
throughout the Chapter.

Chapter 2
Theoretical Background
19

2.1. Quantum Mechanics 21
2.1 Quantum Mechanics
In quantum mechanics, the state of a system is defined by its wave functionΨ. In this work
only time-independent functions will be considered, so that Ψ=Ψ(x), where x is a vector
representing the systems generalized coordinates (spatial and spin). Since a state is fully
described by its wave function, both terms will be used interchangeably.
The Schrödinger equation1
HˆΨ= EΨ, (2.1)
determines which states Ψ are allowed in a system described by the Hamiltonian opera-
tor Hˆ. The wave function must be an eigenvector of the operator, with the energy E as
the corresponding eigenvalue. For a system composed of N electron and M nuclei, the
Hamiltonian is written (in atomic units) as
Hˆ =−1
2
N
∑
i=1
∇2i −
1
2
M
∑
m=1
1
Mm
∇2m−
M
∑
m=1
N
∑
i=1
Zm
rim
+∑
i< j
1
ri j
+ ∑
m<n
ZmZn
rmn
, (2.2)
where the indices i and j refer to electron indices, m and n to nuclei indices. The masses
and charges of the nuclei are represented by Mm and Zm, respectively, and the distances
between two particles as rxy. The first two sums in Eq. (2.2) account for the electron
and nuclear kinetic energy, respectively, while the remaining terms describe Coulombic
interactions between the particles.
Due to the high-dimensionality of the problem, this equation is exactly soluble just
for very simple cases. Especially troublesome are the Coulombic terms which couple the
movement of all particles in the system. Therefore, instead of analytically solving the
equation, one is forced to use approximate representations of the wave function2 and/or the
Hamiltonian. A fundamental assumption made in all of the methods to be later discussed
is the Born-Oppenheimer approximation. Since the electrons travel much faster than the
nuclei, the movement of both can be decoupled. The problem is then divided into two parts.
The solution of an electronic Hamiltonian for fixed nuclear coordinates, and of a nuclear
Hamiltonian for an effective electron potential. The concept of a Potential Energy Surface
(PES), so often discussed in Chemistry, is thereof derived. The nuclei move subject to
forces derived from the solution of the electronic Hamiltonian and the Coulomb nuclear
repulsion. This approximation is generally valid, except when discussing phenomena such
as conic intersections, or if one is interested in high accuracy calculations. The energy of a
1The Schrödinger equation featured is also time-independent.
2As long as they comply to the Pauli principle for fermions, which states that the exchange of two electrons
leads to a change of the sign in the wave function - antisymmetry.
22 Chapter 2. Theoretical Background
given electronic state is obtained by solving Eq. (2.1), with Hˆ substituted by an electronic
Hamiltonian of the form
Hˆel =−12
N
∑
i=1
∇2i −
M
∑
m=1
N
∑
i=1
Zm
rim
+∑
i< j
1
ri j
=
N
∑
i=1
hˆ(i)+∑
i< j
1
ri j
(2.3)
The nuclear repulsion potential is added a posteriori. In the following sections, allusions
to the Hamiltonian operator will be implicit references to Eq. (2.3).
2.1.1 Hartree Fock
In the Hartree-Fock (HF) approximation, the wave function is represented by a Slater de-
terminant
ΨHF =
1√
N!
∣∣∣∣∣∣∣∣∣∣∣∣∣∣
ψ1(x1) ψ2(x1) . . . ψN(x1)
ψ1(x2)
. . . ...
... . . .
...
ψ1(xN) . . . . . . ψN(xN)
∣∣∣∣∣∣∣∣∣∣∣∣∣∣
= |ψ1(x1)ψ2(x2) . . .ψN(xN)> . (2.4)
The electron coordinates are given as a vector xi = {ri,si}, for spatial and spin coordinates
respectively. The total wave function is an anti-symmetrized product of molecular spin-
orbitals {ψi}. In cases where the number of α and β spin electrons are the same, one
may use the same spatial orbitals {φi} for both sets. This is referred to as a closed shell
representation. The closed-shell energy expectation value is of the form (according to the
Slater Condon rules[14])
<ΨHF|Hˆ|ΨHF >=
N/2
∑
i
2< i|hˆ|i>+
N/2
∑
i j
[2(ii| j j)− (i j| ji)] , (2.5)
where use has been made of the following notation for one and two-electron integrals
< i|hˆ| j >=
∫
φ∗i (r1)hˆ(r1)φ j(r1)dr1 (2.6)
(i j|kl) =
∫
φ∗i (r1)φ j(r1)r
−1
12 φ
∗
k (r2)φl(r2)dr1dr2. (2.7)
Integration over the spin coordinates has already been carried out, so that the theory is from
now on spin-free.
The Hartree-Fock method is by construction a variational method, which means that
2.1. Quantum Mechanics 23
the energy obtained with any test wave function will always be an upper bound to the exact
energy of the system3. This can be used as a criterion to optimize the form of our wave
function. The lower the energy expectation value, the closer one should be to the best
possible description of the system. The condition for an optimized HF wave function is
that the derivative of Eq. (2.5) with respect to orbital changes is equal to zero (minimum
point), under the orthonormality restriction. For this reason, one does not minimize the
energy expression, but instead the Lagrangian function
L =<ΨHF|Hˆ|ΨHF >−2
N/2
∑
i j
ε ji
[
< i| j >−δi j
]
, (2.8)
where ε ji are the lagrangian multipliers and < i| j > is an overlap integral. Setting the
derivative of the Lagrangian with respect to each orbital to zero, one obtains the HF equa-
tions
fˆ |i>=
N/2
∑
j
ε ji| j > (2.9)
with the Fock operator fˆ defined as
fˆ (i) = hˆ(i)+∑
j
[
2Jˆ j(i)− Kˆ j(i)
]
= hˆ(i)+ gˆ(i). (2.10)
The two operators Jˆ j(i) and Kˆ j(i), referred to as Coulomb and exchange operators, respec-
tively, are defined by their effect when operating on a spatial orbital
Jˆ j(1)φi(r1) =
∫
φ∗j (r2)
1
r12
φ j(r2)φi(r1)dr2, (2.11)
Kˆ j(1)φi(r1) =
∫
φ∗j (r2)
1
r12
φ j(r1)φi(r2)dr2. (2.12)
Comparing Eq. (2.3) and Eq. (2.10), it is easy to identify the approximation made in HF
theory. The electron-electron interaction operator r−1i j is replaced by a mean-field electron
repulsion in the form of gˆ(i). Each electron only "feels" an averaged interaction with the
remaining electrons.
Since the energy is invariant with respect to unitary transformations among the occupied
orbitals, a simpler form for Eq. (2.9) is possible by using a basis where the fock operator
3The proof for the variational theorem is relatively trivial, and can be found in almost any Quantum
Chemistry book. I would however advise looking into Ref. [15]
24 Chapter 2. Theoretical Background
is diagonal, the canonical HF equations
fˆ |i>= εi|i> . (2.13)
Such a basis is referred to as a canonical orbital basis. The diagonal elements of the Fock
matrix are the orbital energies εi. Notice that the HF total energy is not equal to the sum of
the occupied orbital energies. This would lead to double-counting of the electron-electron
interaction.
Up to this point, the theory has been derived without any consideration over the form
of the spatial orbitals {φi} which are used. If one would use an infinite basis - also referred
to as complete basis set (CBS) - to expand the orbital space, the variational method would
find the "right" solution under the HF mean-field approximation. This is of course not pos-
sible, since we always have to restrict ourselves to a finite-sized expansion. In most of the
molecular structure programs in use today, these orbitals are built as a linear combination
of atom-like gaussian functions4 - the Atomic Orbital (AO) set {χµ}. Much effort has
been put in to achieve values close to the CBS limit using the smallest number of functions
possible. The spatial orbitals are defined in the Linear Combination of Atomic Orbitals
(LCAO) approximation as
φi(r) =∑
µ
Cµiχµ(r). (2.14)
Introduction of Eq. (2.14) into (2.5) gives the Hartree-Fock energy in dependence of the
atomic orbital integrals
EHF =∑
µν
Dµν
{
hµν +
1
2∑ρσ
Dρσ
[
(µν |ρσ)− 1
2
(µσ |ρν)
]}
=
1
2∑µν
Dµν
(
hµν + fµν
)
. (2.15)
Both matrices hµν and fµν are integrals over the operators hˆ and fˆ in the AO basis. The
matrix D is the one-electron density matrix, with
Dµν = 2
N/2
∑
i
CµiC∗ν i (2.16)
The sum runs only over the occupied orbital indices. Therefore, the HF energy is only
dependent of the AO basis and the first N/2 molecular orbitals (MOs) coefficients. The re-
mainingNAO−N/2 orbitals (NAO is the number of AOs used in the expansion of Eq. (2.14))
4The basis functions, however, do not have to be necessarily gaussian functions.
2.1. Quantum Mechanics 25
are referred to as virtual orbitals. They have no special significance in HF theory, except
for the Koopman’s Theorem5, but are of vital importance for post-HF treatments. Although
the HF energy commonly gives about 99% of the total energy, the electron-electron inter-
action is still approximated through the use of a mean-field. The remaining 1% describing
the instantaneous correlation between the electrons as they move is often important for de-
scribing chemical phenomena. This is called the correlation energy, and methods which
include this contribution are referred to as correlated methods. To obtain the full energy
of the system for a given AO basis, the wavefunction would have to be built out of a lin-
ear combination of all possible Slater determinants, each with different occupations for the
NAO molecular orbitals. This is normally referred to as a Full Configuration Interaction
(FCI) method. Such an approach is however too costly, and only feasible for very small
systems and AO expansions. Other methods have been developed which scale significantly
better while providing good estimates for the correlation contribution. These methods are
the subject for the next few Sections.
2.1.2 Møller-Plesset Perturbation Theory
One of the simplest approaches to the correlation problem is to consider the HF solution a
sufficiently good approximation to the total energy of the system, and to obtain the missing
contributions through a perturbation expansion. The Hamiltonian is split into a reference
Hˆ(0) and a perturbation Hˆ(1)
(Hˆ(0)+λ Hˆ(1))|Ψ>= E|Ψ>, (2.17)
with
Hˆ(0) =
N
∑
i
fˆ (i) =
N
∑
i
[
hˆ(i)+ gˆ(i)
]
(2.18)
Hˆ(1) = Hˆ− Hˆ(0) (2.19)
5According to Koopman’s Theorem, the electron affinity of a molecular system will correspond to the
orbital energy of the lowest lying virtual orbital. This definition, however, rarely provides reliable results, as
the virtual orbital energies do not converge to defined values upon increasing the basis sets, and the physical
meaning of them is in fact dubious.
26 Chapter 2. Theoretical Background
This choice of Hamiltonian is referred to as Møller-Plesset (MP) Perturbation Theory. By
expanding the energy and the wave function in a Taylor series
E = ∑
k=0
λ kE(k) , |Ψ>= ∑
k=0
λ k|Ψ(k) > (2.20)
and inserting them into Eq. (2.17), one obtains
(Hˆ(0)−E(0))|Ψ(0) >+λ
[
(Hˆ(0)−E(0))|Ψ(1) >+(Hˆ(1)−E(1))|Ψ(0) >
]
+ · · ·= 0,
(2.21)
Since Eq. (2.21) must hold for any value of λ , there are in fact n+1 equations to be solved,
where n is the expansion limit. For each power of λ , the associated terms must equal zero.
Such an expansion contains energy corrections up to order n+1, although the wavefunction
only has to be known up to order n.
The reference wave functionΨ(0) will be the HF wave function, which is an eigenfunc-
tion of Hˆ(0). It is easy to show that the energies of order n= 0,1
E(0) =<ΨHF|Hˆ(0)|ΨHF >=<ΨHF|
N
∑
i
fˆi|ΨHF >=
N
∑
i
εi (2.22)
E(1) =<ΨHF|Hˆ(1)|ΨHF >=−1
2∑i j
[2(ii| j j)− (i j| ji)] (2.23)
summed together will give the HF energy (compare the above result to Eq.(2.5)). The first
correction is therefore contained in E(2). This energy term already involves the first order
wave function Ψ(1), which must be given. According to the Brillouins Theorem, singly
excited configurations do not interact with the reference and, therefore, do not contribute
in first order to the energy. The first order wave function is built as a combination of the
doubly excited configurations
|Ψ(1) >= 1
2∑i j ∑ab
T i jab|Φabi j > . (2.24)
The |Φabi j > functions are defined as
|Φabi j >= EˆaiEˆb j|ΨHF >, (2.25)
where Eˆai is a spin-adapted operator which excites an electron from an occupied orbital i
to a virtual orbital a. However, since theΦabi j configurations are not orthogonal nor normal-
2.1. Quantum Mechanics 27
ized, it is convenient to make use of contravariant configurations and amplitudes
Φ˜abi j =
1
6
(2Φabi j +Φ
ab
ji ) , T˜
i j
ab = 2T
i j
ab−T jiab, (2.26)
which have the following properties:
< Φ˜abi j |Φcdkl > = δacδbdδikδ jl +δadδbcδilδ jk, (2.27)
< Φ˜abi j |Ψ(1) > = T i jab, (2.28)
< Φ˜abi j |Hˆ|ΨHF > = Ki jab. (2.29)
The use of contravariant configurations and amplitudes greatly simplify the end formulae
of matrix elements involving excited configurations. The MP2 correlation energy gives the
first correction to the HF value, and is easily computed as
∆EMP2 = E(2) =<ΨHF|Hˆ|Ψ(1) >=∑
i j
∑
ab
<ΨHF|Hˆ|Φ˜abi j > T˜ i jab
=∑
i j
∑
ab
Ki jabT˜
i j
ab. (2.30)
The new term which has been introduced is an exchange integral Ki jab = (ia| jb).
The amplitudes are calculated by taking the second term of Eq. (2.21), and multiplying
from the left by a contravariant configuration. As stated before, the expression should equal
zero for the converged solution
Ri jab =< Φ˜
ab
i j |Hˆ(0)−E(0)|Ψ(1) >+< Φ˜abi j |Hˆ|Ψ(0) >= 0, (2.31)
where Ri jab is referred to as the doubles residual for the given electron pair excitation. This
expression can be evaluated with help of second quantization to
Ri jab = K
i j
ab+∑
c
( facT
i j
cb +T
i j
ac fcb)−∑
k
( fikT
k j
ab +T
ik
ab fk j), (2.32)
where frs are elements of the Fock matrix. In the case of canonical orbitals, the matrix is
diagonal and the expression reduces to
Ri jab = K
i j
ab+(εa+ εb− εi− ε j)T i jab. (2.33)
The amplitudes can be calculated directly as long as the Ki jab integrals have been computed.
28 Chapter 2. Theoretical Background
Substitution into Eq. (2.30) gives the canonical MP2 energy
∆EMP2 =∑
i j
∑
ab
Ki jab(2K
i j
ab−K jiab)
εi+ ε j− εa− εb . (2.34)
Higher order perturbations can also be used, leading to the MPn series. There is however no
given proof that the series necessarily converges, and it is often the case that the correlation
energy oscillates or even diverges when including higher excitations. It is also difficult to
judge the quality of the MP2 estimate. In molecular systems with a small HOMO-LUMO
gap, the denominator in Eq. (2.34) will be small and might lead to large errors in the energy.
Also, in cases where the HF reference gives a bad energy estimate, MP2 is well known to
overestimate the correlation contribution (some empirical corrections have however been
proposed with some success[16]).
Nonetheless, it is one of the most commonly used post-HFmethods. It is size-consistent
(although not variational) and has a very low cost compared to other correlation methods.
Formally, the computational cost of canonical MP2 scales with O(N 5), whereN stands
for the size of the system. This is due to the transformation of the two-electron AO integrals
to build the matrices Ki j. The scaling can be reduced by integral screening or by reducing
the number and/or size of these matrices. Some of these approximations will be later
discussed in the text.
2.1.3 Coupled Cluster Theory
Coupled Cluster (CC) Theory is (as in the MP2 case) a non-variational size consistent
method. It has gained great popularity in the last years, mostly due to the latter property,
and to the fact that it converges towards the FCI limit in going to higher order excitations.
The CC wave function has the form
|ΨCC >= eTˆ |ΨHF > . (2.35)
2.1. Quantum Mechanics 29
The cluster operator Tˆ includes excitation operators up to a given order as Tˆ = Tˆ1+ Tˆ2+
Tˆ3+ . . . , where the excitation operators are defined as
Tˆ1 =∑
i
∑
a
Eˆait ia (2.36)
Tˆ2 =
1
2∑i j ∑ab
EˆaiEˆb jT
i j
ab. (2.37)
...
The exponential function shown in Eq. (2.35) may be expanded into a Taylor series
eTˆ = 1+ Tˆ +
Tˆ 2
2!
+
Tˆ 3
3!
+ . . . (2.38)
Including excitation operators up to Nth order (with N the number of electrons in the sys-
tem) would give the FCI result. However, the Tˆ operators can be truncated to include only
the lowest excitations, since the higher orders should have a smaller contribution. This
leads to the hierarchy of coupled cluster models available today. One of the most com-
mon truncations is up to double excitations, and the method is termed as Coupled Cluster
Singles and Doubles (CCSD). The expansion in (2.38) can in this case be rewritten as
eTˆ1+Tˆ2 = 1+ Tˆ1+
(
Tˆ2+
Tˆ 21
2
)
+
(
Tˆ2Tˆ1+
Tˆ 31
6
)
+ . . . (2.39)
As seen above, this includes higher order excitations by products of single and double
operators. This is the key for the size-extensivity of the method.
The CCSD correlation energy is obtained by applying the Hamiltonian operator to the
CCSD wave function, and projecting from the left with the reference function
∆ECCSD =<ΨHF|Hˆ|ΨCCSD >
=<ΨHF|Hˆ(Tˆ1+ Tˆ2+ 12 Tˆ
2
1 )|ΨHF >
=<ΨHF|HˆTˆ1|ΨHF >+<ΨHF|Hˆ(Tˆ2+ 12 Tˆ
2
1 )|ΨHF >
=∑
ai
<ΨHF|Hˆ|Φai > t ia+∑
i j
∑
ab
<ΨHF|Hˆ|Φabi j >
(
T i jab+
1
2
t iat
j
b
)
(2.40)
Again, with help of second quantization, it is possible to transform the equation into a more
30 Chapter 2. Theoretical Background
practical form
∆ECCSD =∑
ai
2 fait ia+∑
abi j
[
2Ki j−K ji]ab(T i jab+ 12t iat jb
)
. (2.41)
The singles and doubles residuals are obtained in a similar way, by projecting from the left
with the corresponding contravariant configurations
ria =< Φ˜
a
i |Hˆ−ECCSD|ΨCCSD > (2.42)
Ri jab =< Φ˜
ab
i j |Hˆ−ECCSD|ΨCCSD > (2.43)
The form of the residuals is somewhat more involved than the MP2 case and, for sim-
plicity, I will only give the doubles residual equation leaving out the singles terms. This
corresponds to the CCD residual, and is given by
Ri jab = K
i j
ab+
[
K(Ti j)
]
ab+∑
kl
[
Kkli j + tr(T
i jKlk)
]
T klab +G
i j
ab+G
ji
ab. (2.44)
The matrices hold the same meaning as before, the only new elements introduced are the
matrices K(Ti j) and Gi j. The former is an external exchange operator
[
K(Ti j)
]
ab =∑
cd
T i jcd(ac|db), (2.45)
the transformed 4-external integrals are contracted with the amplitudes. The G matrix
accounts for the contribution of other two-electron integrals, and will only be discussed
later in the context of local correlation approximations. It is however given in Refs. [17,
18]. The full residuals for CCSD are featured also in Ref. [19].
Although the coupled cluster family of methods converges relatively quickly to the FCI
result with inclusion of higher-order excitations6, it is regularly the case that triple excita-
tions still give an important contribution to the correlation estimate. However, including
the full triples is computationally demanding, scaling with N 8. An alternative is the use
of perturbation theory to include the triples effect, the CCSD(T) method.[20, 21, 22] This
model has become quite popular over the last years, becoming a standard method for high
accuracy single reference calculations.
In the CCSD(T) method, the correlation energy is given by the CCSD contribution of
6CC methods have a faster convergence pattern in comparison to configuration interaction methods.
2.1. Quantum Mechanics 31
Eq. (2.41), and a perturbative correction of the form
∆E(T) =<ΨHF|(Tˆ1+ Tˆ2)†Vˆ Tˆ3|ΨHF > (2.46)
where Vˆ denotes the perturbation operator. In the canonical case, there is no coupling
between individual amplitudes and the correction can be obtained non-iteratively. The
computational cost still scales with N 7, but the CCSD iterations are done independently,
which leads to a much more cost effective approach than using the full triples. It is also
found that the CCSD(T) model is generally more accurate than CCSDT.
2.1.4 Local Correlation Methods
The problem of correlation has been connected for a long time to local representations of
orbitals. As Kutzelnigg once remarked, "the clearest pictorial description of correlation is
(...) the one based on localized orbitals".[23] It is known that dynamic electron correla-
tion is a short range effect, decaying with ≈ r−6 as the dispersion energy. Conventional
correlation methods, however, can make no use of the electron locality since they employ
delocalized canonical orbitals. This leads to a quadratic increase of the number of am-
plitudes needed to correlate each electron pair, and a quartic increase in the number of
parameters to be computed.
In order to avoid a steep scaling of computational cost with the molecular size, sev-
eral methods using a local description of electron correlation have been presented over the
years.[24, 25, 26, 27, 28, 29] One of the most successful to date has been the one first pro-
posed by Pulay.[1] His suggestion was to transform the occupied space into a local orbital
basis
|φ loci >=∑
µ
|χµ > Lµi
=∑
k
|φ cani >Uki (2.47)
by means of an unitary transformation U, with L=CU. The transformation can be chosen
from any of the several localization algorithms proposed.[11, 30, 31] The Pipek and Mezey
scheme[11] is often preferred, since it keeps the separation between σ and pi orbitals.
The virtual orbitals are obtained by projecting out the occupied space from the AO basis
|χ˜r >=
(
1−∑
i
|φ loci >< φ loci |
)
|χr >=∑
µ
|χµ > Pµr, (2.48)
32 Chapter 2. Theoretical Background
and are commonly called Projected Atomic Orbitals (PAOs). The projection matrix P is
computed as
P= 1−LL†S. (2.49)
This particular selection of occupied and virtual spaces has the following properties:
(1) the occupied orbitals are kept orthogonal among themselves and the virtual space.
(2) the virtual orbitals are however no longer orthogonal, with overlap
< χ˜r|χ˜s >= (P†SP)rs = S˜rs (2.50)
(3) there are linear dependencies in the virtual space. The number of PAOs is equal to
the size of the AO basis (NAO), although it spans an NAO− nocc dimensional space.
This is further discussed later in the text.
(4) both occupied and virtual space are inherently local.
The non-orthogonality of the PAOs leads to somewhat more complicated working equa-
tions, but the advantages of using a local orbital space greatly compensate this disadvan-
tage. The methods based on these approximations are referred to as local methods, and
named after the canonical counterpart with an "L" prefix added. For second order Møller-
Plesset theory this will be LMP2, for CCSD the corresponding local method is denoted as
LCCSD, and so on. The approximations involved are now detailed.
Orbital domains
The fundamental approximation in local correlation methods is to restrict excitations from
an occupied orbital |φ loci > to virtual orbitals in its vicinity. For this purpose, the PAOs are
grouped together according to the centers of the original AOs. A group of atoms can then
be selected for each LMO based on a locality criterion. The respective group of PAOs build
up the orbital domain [i]. Which criterion should be used is, however, not straightforward.
Small domains increase the errors in the correlation energy estimate, large domains will
slow down the calculation.
The selection is usually done as first suggested by Boughton and Pulay.[9] The current
Molpro version [8] contains only minor modifications to this procedure. One starts by
ordering the atoms according to decreasing Löwdin charges
liA = 2 ∑
µ∈A
[S1/2L]µi (2.51)
2.1. Quantum Mechanics 33
All atoms with charges above a given threshold are automatically added to the domain list.
Further centers may be added according to the overlap criterion of Boughton and Pulay.
An approximate LMO |φˆ loci > is built using the AO basis from the selected centers
|φˆ loci >= ∑
A∈[i]
∑
µ∈A
|χµ > Lˆµi. (2.52)
The coefficients Lˆµi are determined by maximizing the overlap with the original LMO. The
Boughton-Pulay criterion is of the form
Bi = 1−
∫
|(φi− φˆi)2|dτ > TBP. (2.53)
Atoms are added to the domain list until Bi is above the threshold TBP. The value is normally
varied as a function of the basis set, with TBP = 0.980 and TBP = 0.985 recommended for
double and triple-zeta basis sets, respectively.
Only excitations i→ r,r ∈ [i] will be allowed. For double excitations, pair domains are
built as the union of two single domains [i j] = [i]∪ [ j], with similar restrictions imposed
i j→ rs,rs ∈ [i j]. The error introduced by truncation of the virtual space has been proven
to be small, more than 98% of the correlation energy is usually recovered. This fraction
even increases with the basis size, with part of the loss being linked to a reduction in the
basis set superposition errors (BSSE).[32, 33, 34, 35, 36, 37] This is however not the only
reason for the difference. As detailed in previous studies,[34] ionic excitations are also left
out, and they should account for the error near the CBS limit.
The linear dependencies in the domains, which have already been mentioned, are in
the current Molpro implementation by default removed individually in each pair domain.
The PAO overlap matrix S˜[i j] is built for each domain [i j] separately and diagonalized.
The eigenvectors which correspond to the smallest eigenvalues are then deleted, or the
individual basis functions with the largest coefficients in these eigenvectors.
The weak pair approximation
The domain approximation alone does not lead to linear scaling with respect to the molecu-
lar size. The number of orbital pairs rises quadratically, and without any further approxima-
tion this would be the minimal scaling regime. However, since correlation is a short-range
effect, orbital pairs located far apart from each other should give small contributions.
It is therefore reasonable to neglect contributions from pairs with large separations. For
any molecular system, it is trivial to prove that within a given distance (smaller than the sys-
tem diameter) the number of neighbors is a constant number. Only outside this sphere does
34 Chapter 2. Theoretical Background
Table 2.1: Pair types in local coupled cluster calculations. The default values for Rc, Rw, Rd and Rvd are 1, 3,
8 and 15 Bohr respectively. The distance between the orbital centers is given by rp.
strong Rc > rp treated at the CCSD level and in the triples (see text).
close Rc ≤ rp < Rw treated at the MP2 level, and included in the triples (see
text).
weak Rw ≤ rp < Rd treated at the MP2 level.
distant Rd ≤ rp < Rvd treated at the MP2 level through an approximate multipole
method (not used in this work).
very distant rp ≥ Rvd neglected.
the number of neighbors scale quadratically. By defining a cutoff distance, one can keep
the number of pairs in a linear scaling regime. The neglected pairs are referred to as very
distant. Further approximations can be implemented by defining other distance parameters.
At medium distances, where the contributions are relatively small but not negligible, lower
level correlation methods may be used. In a LCCSD calculation, these pair energies can
be estimated by LMP2 theory. Table 2.1 gives a summary of the distance parameters and
different pair classifications used in the Molpro LCCSD(T) implementation. However, it
should be noted that the only condition for asymptotically linear scaling is the neglect of
very distant pairs. All other approximations only affect prefactors and its onset.
The effect of pair approximations has been thoroughly tested. It has been found that
correlating orbital pairs which share one center (strong pairs) at the LCCSD level, and the
remaining ones with LMP2 (weak pairs), reaction energies and general molecular proper-
ties are well reproduced. The triples are somewhat more sensible to this cutoff and only
including strong pairs the percentage of correlation energy recovered was found to be be-
low 70% . An extra class - close pairs - was introduced to correct for this problem. The
triples will be computed for orbitals (i jk) under the condition that one of the pairs (i j), (ik)
or ( jk) is strong; the two other pairs can either be strong or close. This has been found to
bring the percentage up to 90% .[6]
The default values should be seen as a compromise between computational cost and
accuracy, and the effect of these approximations should be carefully monitored7.
7Many benchmarking studies have been made in the last years. K. Pflüger has done extensive work in
reaction energies and polarizabilities,[38] G. Rauhut and T. Hrenar on frequencies,[39] so that for these type
of calculations the approximations have been tested. Exceptions can however still be found when calculating,
for example, activation barriers. This will be further discussed in Chapter 6.
2.1. Quantum Mechanics 35
The local equations and the linear scaling regime
To calculate the LMP2 energy correction, the residual in Eq. (2.32) has to be rewritten in
the local basis. This involves transforming from the canonical virtual basis to the (non-
orthogonal) PAO basis. The steps are detailed elsewhere,[18] involving relatively simple
transformations of each element, and resulting in the expression
Ri jrs =K
i j
rs + ∑
tu∈[i j]
frtT
i j
tu S˜us+ ∑
tu∈[i j]
S˜rtT
i j
tu fus
−∑
k
[
∑
tu∈[k j]
S˜rt fikT
k j
tu S˜us+ ∑
tu∈[ki]
S˜rt fk jT iktu S˜us
]
. (2.54)
Although a bit more intricate, it still bears a great resemblance to Eq. (2.32). The major
difference lies in the extra matrix multiplications with the PAO overlap matrix from Eq.
(2.50). Since this is done with matrix blocks instead of a full matrix, the extra effort is
relatively small. The size of these blocks are determined by the size of the orbital domains,
which should be more or less independent of the molecular size.
The LMP2 energy can be written as
E(2) =∑
i j
∑
rs∈[i j]
Ki jrs(2T
i j
rs −T i jsr ) (2.55)
with the amplitudes in the PAO basis being optimized by solving Eq. (2.54).[18] In local
MP2 calculations, due to the choice of the occupied and virtual spaces, the equations have
to be solved iteratively. This is actually a minor disadvantage in comparison to the canon-
ical counterpart. In about 7-8 iterations the amplitudes are converged. The advantages on
the other hand are manifold. As pointed out above, all summations are carried only over the
domains. The matrices are of reduced dimensions and can be kept in memory (avoiding the
slowdown caused by I/O operations on disk). Also the occupied indices are constrained.
Only energies (and residuals) for i j pairs in the pair list have to be computed. As such,
the number of terms will scale linearly with the molecular size if very distant pairs are
neglected. The sum over k is also restricted according to the pair list.
Local Coupled Cluster theory can be derived essentially in the same way as in the MP2
case. The residuals have to be transformed from the virtual MO to the PAO basis, and the
final equations will resemble the canonical result, except for the extra multiplications with
the PAO overlap matrix. In Chapter 5, some comments will be made about some of the
residual terms and the restrictions made to orbital pairs and domains. The linear scaling
properties are however guaranteed in a similar fashion as in the LMP2 case.
36 Chapter 2. Theoretical Background
2.1.5 Density Functional Theory
In all quantum mechanical methods discussed up to this point, the wave function has been
the main quantity of interest. Reaching a converged solution for the orbital indices or am-
plitudes is just another way of saying that one has been able to determine a wave function
representation of the system. For N-electrons, this corresponds to dealing with 3N coor-
dinates, a high-dimensional problem. It would be desirable to find an alternative function
from which the energy could be retrieved with a reduced number of variables.
In the seminal work by Hohenberg and Kohn,[40] proof was given of an univocal rela-
tion between the energy and the electron density. This made it possible to establish an en-
tirely new approach to the Schrödinger equation problem. Instead of looking for an eigen-
vector of the Hamiltonian, one could theoretically use a functional connecting a density to
an energy value. This leads to a drastic reduction in the dimensionality of the problem,
since the electron density only depends on three coordinates. The only problem remaining
is to find the right functional for the system we wish to describe.
The general form for a (spin independent) density functional can be given as
E[ρ(r)] =− 1
2∑i
< φi|∇2|φi >+
∫
ν(r)ρ(r)dr
+
1
2
∫ ∫ ρ(r)ρ(r′)
|r− r′| drdr
′+Exc[ρ(r)]. (2.56)
In a conventional molecular system, the external potential ν(r) is given by the potential
of the nuclei, so that the second term is the electron-nuclei Coulombic interaction energy.
The third term is the electron-electron classical interaction (Coulombic repulsion of two
electron clouds). The factor 1/2 avoids double-counting. The last term is an exchange-
correlation functional, which distinguishes the various density functionals among each
other.
A further comment should be made about the first term. It is clearly the kinetic energy
contribution, but instead of using the density as a function, molecular orbitals are explicitly
used to represent the density. In the beginning, density functionals were also used for the
kinetic energy. This would correspond to an orbital-free theory, since one could always
build the system density with any type of functions. However, this leads to large errors
in the kinetic energy and to the nonbinding problem8. Kohn and Sham were the first to
8The Nonbinding Theorem presented by Lieb(1973) and Simon(1977) proves that under the Thomas-
Fermi-Dirac model - which uses a density functional form for the kinetic energy - no molecular system
would be stable relative to dissociation into constituent fragments. In short: "Goodbye World!".
2.1. Quantum Mechanics 37
propose the use of orbitals, by solving self-consistently the equation[
−1
2
∇2+ν(r)+
∫ ρ(r′)
|r− r′| +
δExc[ρ(r)]
δρ(r)
]
|φi >= εi|φi > (2.57)
for each orbital, and defining the electron density as
ρ(r) = 2
N
∑
i=1
φ∗i φi. (2.58)
In short, although the main function of interest is the density, one needs to define orbitals
and to solve the problem self-consistently. The meaning of these Kohn-Sham (KS) orbitals
and respective energies is however not clear. There is no proven one-to-one correspondence
between a density and the orbitals, and the Koopman’s Theorem also does not apply9.
As stated above, the form of the exchange-correlation term defines the density func-
tional, and several groups have proposed different solutions to the energy correspondence
problem. There is an endless list of functionals in the literature, and sometimes little sup-
port for making a decision on the method that can better describe the system under study.
The most commonly used density functional today is the B3LYP functional,[41] which
has widespread applications in almost all fields of chemistry, from organic compounds to
metals. The exchange correlation term is given as
EB3LYPxc = 0.2E
HF
x +0.72E
B88
x +0.08E
S
x +0.81E
LYP
c +0.19E
VWN80
c . (2.59)
It includes the exact Hartree-Fock exchange EHFx , obtained as in Eq. (2.12), mixed
with exchange from both the gradient corrected B88 functional[42] and the Slater-Dirac
exchange[43]. The correlation is also built from both gradient corrected and local ap-
proximations, the LYP[44] and the VWN80[45] functionals, respectively. The numerical
parameters were fitted to reproduce atomization energies, proton affinities and ionization
potentials of the G1 molecule test set.
Several criticisms have been directed at Density Functional Theory over the years.
Although the theory promises an universally valid functional, this has still to be found,
and most density functionals will only work sensibly in systems for which they were
parametrized. Secondly, there is no systematic way to improve on a DFT estimate. Con-
trary to wave function-based theory, there is no hierarchy of methods which approaches a
converged result in the n-particle space. Another common problem with DFT functionals is
9Nevertheless, the KS orbital energies have been used over the years for comparison with photoelectron
spectra. These values are, however, often contested, even if in good agreement with experiment.
38 Chapter 2. Theoretical Background
the difficulty in describing van derWaals (vdW) interactions. These interactions result from
non-overlapping electron densities, and therefore cannot be captured by a simple density
based functional form.
2.1.6 Semiempirical Methods
The main bottleneck in wave function methods is the calculation and transformation of two-
electron integrals. Conventional implementations of these methods scale formally with the
fourth power of the number of basis functions. Semiempirical methods offer a way to
overcome this problem by reducing the number of integrals. The valence electrons are rep-
resented with a minimal basis set and the core electrons replaced by functions to represent
the combined nuclei and the inner shells. The one-electron operator for a valence electron
i can then be rewritten as
h(i) =−1
2
∇2i −
M
∑
m=1
Z′m
rim
(2.60)
where Z′m is the reduced nuclear charge due to the core electrons. This leads to a drastic
reduction in the number of integrals, since the number of explicit electrons is kept at a
minimum.
It is also common in the semiempirical family of methods to assume that the product of
two functions lying on different centers will be zero. This is referred to as Zero Differential
Overlap (ZDO) approximation, although sometimes also under the name of Neglect of
Diatomic Differential Overlap (NDDO). Since one uses a minimal basis set, the quantum
numbers will always be different for same center functions and the overlap matrix is unitary.
The most popular semiempirical methods to date are modified NDDO models,
parametrized with atomistic or molecular data (such as enthalpies of formation, atomic
spectra...). They are commonly referred to under the term Modified Neglect of Diatomic
Overlap (MNDO).[46] The minimal basis set is built up of Slater type orbitals up to p-
type functions. Extension to d-orbitals have also been made[47], allowing for the treatment
of heavier atoms and polarization effects in outer-valence electrons. However, the most
popular semiempirical methods include only lower angular momentum functions, and the
discussion will be restricted to this case.
In order to simplify the notation, a different orbital labeling scheme will be used in this
Section. Since there is only one type of orbitals in semiempirical methods (in the other
methods, one discusses AO and MO orbitals), and the basis set is minimal (a set of s and
p orbitals on each atom) an orbital will be generally be named µA, where µ represents the
angular momentum of the function (which can only vary between s and p-type), and A
2.1. Quantum Mechanics 39
stands for the atom at which the function is centered. This is of relative importance due to
the ZDO approximation.
Adjustable parameters are included in the one and two-electron integrals, and in the
core-core repulsion (the Coulombic interaction of the reduced nuclei charges). The one-
electron integrals are given in the form
hµν =< µA|hˆ|νA >= δµνUµ −
M
∑
m6=A
Z′A < µAsm|νAsm >, (2.61)
where Uµ corresponds to the energy of a single electron experiencing the full nuclear
charge. The values for this quantity are parametrized for each atom. The second term
is the potential due to all the other nuclei in the system and is parametrized in terms of
reduced nuclear charges and a two-electron integral. This involves the valence orbitals as
well as extra nuclei-centered s-type functions which model the removed core electrons.
The two-center one-electron integrals are approximated as
< µA|h|νB >= Sµν 12(βµ +βν). (2.62)
The β values are referred to as "resonance" parameters, and the µ and ν labels, as before,
are either s or p-type functions. Contrary to the NDDO approximation, these modified
models calculate Sµν explicitly (this is actually the reason why these methods are called
"modified" NDDO models).
The two-electron integrals are modelled as interactions between multipoles, being sep-
arated into Coulomb terms or exchange. There are relatively few of them, since they are
combinations of only s and p-type orbitals.
The core-core repulsion is the main difference between the various semiempirical mod-
els. In the MNDO methods, one uses the form
VMNDO(A,B) = Z′AZ
′
B < sAsB|sAsB >
(
1+ e−αArAB + e−αBrAB
)
(2.63)
for any general two atoms A and B, except for O-H and N-H bonds, where the expression
is slightly modified. The fitting parameters are the α exponents. The reason why the
repulsion is computed in this way and not by a Coulomb interaction expression, is that
a simple Coulomb force, due to the MNDO approximations, is not canceled by the long
distance electron interactions. These expressions guarantee the correct limiting behavior.
The Austin Model 1 (AM1) method [48] uses a diferent set of parameters, with the
two-electron integrals fitted to atomic spectra. The core-core repulsion is also somewhat
40 Chapter 2. Theoretical Background
different
VAM1(A,B) =VMNDO(A,B)+
Z′AZ
′
B
rAB
×
(
∑
k
akAe−bkA(rAB−ckA)
2
+∑
k
akBe−bkB(rAB−ckB)
2
)
, (2.64)
with constants a, b and c being introduced as new parameters. These have been fitted to
molecular data.
At last, the Parametric Method Number 3 (PM3) [49] is a reparametrization, not by
hand, as in the AM1 and MNDO models, but fitting all parameters simultaneously with the
help of an error function. The core-core repulsion term is the same as in AM1, with the
difference that only two Gaussians were assigned to each atom and allowed to vary during
the fitting.
Although at heart the semiempirical family of methods is an approximation to HF,
the two approaches are not to be mistaken. Due to the parametrization to experimental
values, correlation is partly included in the semiempirical Hamiltonian. It is therefore
relatively difficult to judge a priori how both approaches will compare to each other. In a
few cases semiempirical methods may give better agreement to experiment (or higher-level
estimates) than HF. The physics should however not be ignored, and caution should be
taken when making use of these methodologies. There are many documented cases where
these approaches bluntly fail.
2.2 Molecular Mechanics
The methods discussed up to this point offer approximations to the solution of the electronic
Schrödinger equation. However, for system sizes ranging above one thousand atoms, such
an approach is technically impossible by todays standards. Even if the methods computa-
tional cost does scale linearly with the system size, the prefactors or the onset of the scaling
regime do not allow for such calculations to be performed routinely on biological systems.
An alternative way to solve this problem is to define the energy as a parametric function of
the nuclear coordinates. The parameters can be chosen to fit experimental data or accurate
quantum chemistry results in small systems. The computational cost is strongly reduced,
since one needs only to compute some simple analytic functions. At the same time, the ac-
curacy can remain mostly unchanged, as long as the system is found in similar conditions
to the ones used for parametrization, and that the reference data is chosen accordingly.
There are several potential functions available, separated into two major types. The
2.2. Molecular Mechanics 41
all-atom force fields treat explicitly all atoms contained in the structure. In the united-atom
approach, some of the non-polar hydrogen atoms are joined together with their bonded
atom into a single pseudo-atom. The force field is parametrized in such a way that the
atom describes the moiety as a whole. The latter force fields are computationally somewhat
simpler, since the number of parameters is reduced, but are in most cases also less accurate.
Independent of its type, the force field takes the general form
V = Estr+Ebend+Etors+EvdW+Eel+Ecross. (2.65)
In other words, the total energy is a sum of various terms, each describing a different phys-
ical contribution to the potential. Bond stretches (Estr), bond angles (Ebend) and torsions
(Etors) are included, as well as van der Waals (EvdW) and electrostatic terms (Eel). The last
term Ecross is called a cross term and introduces a coupling between other components (e.g.,
between an angle and a distance).
Discussion will from now on be restrained to the all-atom case, although most of it is
extensive to the other approach. One of the most commonly used force fields of this type is
CHARMM.[50] It will be used as an example to ilustrate the form of the various potential
energy contributions. The total energy for this force field is written as
VCHARMM = ∑
bonds
kb(b−b0)2+ ∑
angles
kθ (θ −θ0)2+ ∑
dihedrals
kφ [1+ cos(nφ −δ )]
+ ∑
impropers
kω(ω−ω0)2+ ∑
Urey−Bradley
ku(u−u0)2
+ ∑
nonbonded
ε
[(
Rmini j
ri j
)12
−
(
Rmini j
ri j
)6]
+
qiq j
εri j
. (2.66)
Both bond stretches Estr and angles Ebend (first two sums) are parametrized through a har-
monic potential, and each needs two parameters to be fitted - the force constant (kb, kθ ) and
the equilibrium value (b0, θ0). Both parameters will vary depending on the bonded atom
types. These functions perform best for values of b or θ close to the minima. Bond break-
ing phenomena cannot be described by such an expression, since it fails to describe any
dissociative behavior10 The torsion potential Etors is described by two sums, one running
over regular torsion angles, the other over improper torsions, or out-of-plane bendings. The
dihedrals are parametrized through the use of a periodic function, with kφ being the dihe-
dral force constant, n the multiplicity of the function, φ the dihedral angle and δ the phase
shift. Since most dihedral potentials are in fact close to a sine form, the potential will work
10The harmonic potential would have to be replaced by a Morse potential or some other function which for
b→+∞ would go to zero.
42 Chapter 2. Theoretical Background
reasonably well for a wide-range of φ . The out-of-plane bendings are fitted to a harmonic
potential to describe the natural plane rigidity. The out-of-plane angle is represented by
ω . The cross-term Ecross corresponds to the sum of Urey-Bradley components, which are
non-bonding interactions between 1,3 neighbors. The variable u is defined as the distance
between the 1,3 atoms in the harmonic potential. This introduces in an approximate way
a coupling between the atom distances and the 1,3 angle11. The van der Waals term EvdW
corresponds to the sum over nonbonded atoms i and j, and is calculated with a standard
12-6 Lennard-Jones potential. The Rmini j term is not the minimum of the potential, but
rather where the Lennard-Jones potential is zero. The last term corresponds to Eel, a simple
Coulombic interaction with permittivity ε included.
The gradient and Hessian matrices can be computed with little effort (compared to the
previously discussed methods) since they also have a very simple analytic form. Molecular
Mechanics force fields can be routinely applied to systems spanning thousands of atoms.
They are quite reliable for conformational studies, describing well (due to their highly
parametrized form) otherwise challenging interactions, such as pi-stacking, general vdW
interactions or hydrogen bonding.
2.3 Quantum Mechanics/Molecular Mechanics
The study of reactivity in biological systems is a major challenge for computational chem-
istry methods. Whether discussing a system in solution or an enzymatically catalyzed
process, the effects of the environment have to be included in the calculation. Although the
reaction normally takes place in a relatively confined space, referred to as active site, the
surrounding system can influence the reaction rate by many orders of magnitude. However,
most systems of interest are too large to be treated by Quantum Mechanics methods, and
Molecular Mechanics, although inexpensive, cannot treat bond breaking/formation phe-
nomena. Hybrid methods have therefore been developed which combine both approaches.
The Quantum Mechanics / Molecular Mechanics (QM/MM) method [51] was the first
to combine two different levels of theory in a single calculation. It can in general be used
with any electronic structure method and force field. The system is partitioned into two
sections - the QM and MM parts. The former will normally correspond to the region
where the reaction takes place and will be treated at the higher quantum mechanical level.
This allows for the study of reactivity without the parametrization problem of Molecular
11In the case of water, if only bond and angle potentials would be used, the O-H distance would be inde-
pendent of the H-O-H angle. By introducing a harmonic potential for the two hydrogens (1,3), all the energy
terms will be effectively coupled.
2.3. Quantum Mechanics/Molecular Mechanics 43
Mechanics. The second part, modeled by a force field, ensures that the environmental
effects are incorporated into the reaction.
Let us consider a system separated into an active site and the environment. For the time
being, we will neglect the existence of bonds between the two sections. There areM atoms
treated at the QM level, X atoms in the MM region, and N electrons in the QM region. The
total Hamiltonian for this system will be
Hˆtot = HˆQM+ HˆMM+ HˆQM/MM. (2.67)
The first term corresponds to the Hamiltonian of the QM particles (nuclei and electrons) in
vacuo, as given by Eq. (2.3). The second term describes the interaction between the MM
atoms, and is simply given by the force field energy. The coupling between the two regions
Figure 2.1: Depiction of an enzyme-substrate complex. In a QM/MM calculation the environment is treated
with the help of force fields and the active site at higher levels of theory.
44 Chapter 2. Theoretical Background
is contained in HˆQM/MM, with
HˆQM/MM =−
X
∑
x=1
N
∑
i=1
qx
rix
+
M
∑
m=1
X
∑
x=1
qxZm
rmx
+EvdWQM/MM, (2.68)
where qx stands for the charge of the xth MM atom, rix the distance between an electron
and a MM atom and rmx between two atoms, one in the QM, the other in the MM region.
The first two terms represent the electrostatic interaction between the two sections, and are
computed by the QM program. The last term stands for the van der Waals interactions.
Instead of using an operator, this term is commonly an energy contribution computed by
the MM program. It is a simple Lennard-Jones potential as in Eq. (2.66), where standard
parameters for the QM atoms are used. The above Hamiltonian accounts for polarization
of the QM region due to the MM charges (through the first term of (2.68) and the SCF
procedure). This is referred to as electrostatic embedding. Backward polarization of the
MM system due to the QM part is in this case neglected. One could introduce such terms,
but only in conjuction with a polarizable force field, and a self consistent cycle for the
QM/MM coupling. Such force fields are, however, rarely used. The polarization effects are
already included in an averaged way in the parametrization of the force field so that these
effects are generally assumed to be small.
The coupling between the two regions described up till now is referred to as additive
scheme. The name is due to the addition of the coupling terms present in HˆQM/MM. Only
two calculations take place, one for the QM region, another for the total system using MM
methods. Other approaches have, however, been presented, which are commonly referred
to as subtractive schemes. Let us consider the system again divided into two regions, host
and cluster12, whereby the former denotes the whole system and the latter the cutout to be
computed at the QM level. The following formula is used
EQM/MM = EMM(host)+EQM(cluster)−EMM(cluster)︸ ︷︷ ︸
∆Esub
(2.69)
The supercripts indicate the level of theory and the names in parenthesis the geometries
used. The total energy for the system is computed at a lower level and a correction is
introduced by calculating the difference between the higher and lower levels at the cluster
region (∆Esub). If both regions would be coincident, this would of course correspond to the
exact high level estimate. The size of the cluster should be chosen so that the most relevant
energy contributions are contained therein.
12The nomenclature is the same as used in the QMPOT program.[52]
2.4. Reaction Rate Theory 45
Both approaches have advantages and disadvantages. The subtractive scheme is
methodologically more general. Observing Eq. (2.69) there is no reason why one should
restrict the levels of theory to QM and MM. Since the calculations are decoupled, any
level of theory is possible. One may use DFT as a lower level of theory, and combine it
with CCSD(T). Also, the number of regions is not restricted to two. One could imagine a
sequence of corrections in an onion-like approach, where ∆Esub would include the constri-
butions from the inner to the outer region. This is commonly referred to as the ONIOM
approach.[53] The same method has also been popularized by Sauer and coworkers [52] in
the study of zeolites.
There are however some disadvantages relative to the additive scheme. The correction
∆Esub is obtained in the vacuum (i.e., without any information about the environment).
The effect of the surrounding molecule(s) is only contained in the EMM(host) term and is
therefore exclusively treated at the lower level. There is no polarization. Recent work has
compensated for this fault,[54] but including polarization corrections is not trivial when
coupling quantum mechanical methods, since the polarization interaction is in this case not
reproducible by including point charges.
2.4 Reaction Rate Theory
2.4.1 Transition State Theory
Transition State Theory (TST)13 postulates that a reaction proceeds from one energy min-
imum to another via a maximum on the potential hypersurface. This maximum is referred
to as the transition state (TS). A reaction can therefore be understood in terms of a path
connecting reactant and product state, with a hill separating the two minima, as depicted in
Fig. 2.2. The TS will correspond to the structure where the potential is maximal.
In classical mechanics, the probability with which the system will progress from reac-
tants to the products should depend on the barrier height, or in other words, on the energy
difference between reactant and TS. If one considers a simple Boltzmann distribution14,
then this probability will be proportional to e−∆‡G/kBT . ∆‡G is the free energy difference
between the reactant and transition states (see Fig. 2.2)15. This however neglects three
effects
13Sometimes also referred to as Activated Complex Theory.
14In a Boltzmann distribution, the probability of finding the system on any given state along the path is
given by e−∆E/kBT .
15The free energy of activation includes not only the energy barrier necessary for a reactant state to be
activated (the so called critical energy), but also the partition functions of the species involved.
46 Chapter 2. Theoretical Background
Figure 2.2: Diagram ilustrating the concept of a transition state (TS) intermediate. The potential curve
connects the reactant (RS) to the product state (PS), with ∆‡G representing the energy barrier separating the
two states, and ∆GR the total reaction energy.
(1) the movement over the TS may be coupled to other movements from the activated
complex,
(2) once the TS is reached the system does not necessarily fall into the products and
(3) quantum mechanical tunneling. There should be a small quantum "leaking" to the
product side, without necessarily passing over the TS.
The first point would lead to a breakdown of the TST, removing the dependence of
the reaction rate on the TS barrier. The other two factors lead to some minor corrections
in the formulae. When the energy difference between reactants and products is small,
the probability that the system might fall back into the reactants increases, and the speed
of the reaction will decrease. On the other hand, quantum tunneling allows the system
to travel directly to the product side, increasing the reaction rate. Both factors are taken
into consideration by introducing a transmission coefficient κ . For a simple unimolecular
2.4. Reaction Rate Theory 47
reaction, leading from the reactant A to the product B, the rate of formation is given by
d[B]
dt
= k(T )[A], (2.70)
where [ ] stands for the concentration of each species, and k(T ) is called the macroscopic
rate constant, which is temperature dependent16. This explicit dependence will be from
now on dropped. The rate constant is given by
k = κ
kBT
h
e−∆
‡G/RT . (2.71)
Within the limits of the Boltzmann distribution this formulation will hold. It is of general
use for liquid or gas state reactions.
The picture of a continuous structural change connecting the reactant to the product
states is rather universally accepted. The various PES which have been computed in the
last dozens of years, and the high quantitative predictions which have been thereof extracted
are undeniable evidence for the existence of these intermediate structures. The most severe
criticisms are actually directed at the various assumptions detailed above, and the range
of quantitative application which can be given to TST. Particularly in Biochemistry, there
is an active debate on the possibility of explaining enzymatic catalysis through the use of
such a simplified model.
2.4.2 Michaelis-Menten kinetics
Since in this work enzymatically catalized reactions will also be discussed, it is worth at this
point to shortly adress Michaelis-Menten kinetics. For a substrate (S) which is transformed
into a product (P) through the mediation of an enzyme (E), the following scheme should
be valid
E+S
k1−−⇀↽−
k−1
ES
k2−→ E+P. (2.72)
The first step corresponds to the substrate binding to the enzyme, the enzyme-complex (ES)
then reacts, forming free enzyme and product. The second step includes product formation
and release. Which one of these processes controls the effective kinetic constant, depends
on the reaction studied.
The scheme only holds as long as the enzymatic reaction is irreversible, and the prod-
uct does not rebind to the enzyme. Using Eq. (2.70), and considering the quasi-steady
16For a more general reaction, with more than one reactant, one just has to multiply the concentrations in
the right side, to the power of each order.
48 Chapter 2. Theoretical Background
state approximation[55] that the concentration of enzyme-substrate complex is constant,
the relation between the system concentrations is given by
[ES] =
[E][S]
Km
(2.73)
where
Km =
k−1+ k2
k1
(2.74)
is the Michaelis-Menten constant. The rate of product formation can be written as
d[P]
dt
= k2([E]+ [ES])
[S]
Km+[S]
=Vmax
[S]
Km+[S]
. (2.75)
If the concentration of substrate is large compared to the value of Km, the system is said to
be saturated, and a maximum velocity of k2([E]+ [ES]) is reached.
Enzymes are often characterized by their Km value. In cases where product formation is
rate limiting (large k2), it represents the dissociation constant of the enzyme-substrate com-
plex. Low values indicate a large complex stability and ES will rarely dissociate without
the substrate first reacting to form the product.
Chapter 3
Computing Potential Energy Surfaces
using Local Correlation Methods
49

3.1. The Domain Discontinuity Problem 51
3.1 The Domain Discontinuity Problem
As detailed before, local correlation methods employ local orbital spaces to restrict the
number of excited configurations in the wave function. The use of a truncated virtual space
is an essential approximation for achieving linear scaling. By the use of domains, the
number of excitations per electron pair becomes independent of the molecular size. The
domain lists are also used for defining the orbitals distances, which in turn are necessary
for truncating the orbital pair list (neglect of very distant pairs).
These approximations work very well when near equilibrium properties like equilib-
rium geometries[56, 57, 58], harmonic vibrational frequencies[59, 60, 39], or other proper-
ties like dipole moments[38], dipole polarizabilities[38, 61], or NMR chemical shifts[62]
are computed. The smoothness of the potentials in geometry optimizations or frequency
calculations can be ensured by freezing the domains; in geometry optimizations this is done
once the geometry stepsize is smaller than a certain threshold. This is similar to density
functional theory, where the definition of the grid must also be fixed at a certain stage.
A more complicated situation arises if activation or reaction energies are considered,
and the electronic structure of the system strongly changes along the reaction coordinate.
In such cases the results of local correlation calculations can be affected in several ways:
when following a reaction path, steps may appear on the potential energy surface (PES)
due to discontinuous changes of the localized orbitals or domains. This has been pointed
out by Russ and Crawford[63], who used three model systems to investigate this prob-
lem. In the homolytic bond breaking of CH3-F, no discontinuity was observed, since in a
spin-restricted framework the localized bond orbital stays delocalized over both fragments.
However, heterolytic bond dissociation of ketene (CH2CO) and propadienone (CH2CCO)
with carbon monoxide as a product revealed in both cases small steps on the PES, caused
by changes of the domains. The potential energy surface for ketene, using MP2 and CCSD
is depicted in Fig. 3.1, for both local and canonical methods. The difference between both
sets ∆Eloc(r) = ELCCSD(r)−ECCSD(r) (or MP2) is given in Fig. 3.2 on a smaller scale.
The discontinuities are difficult to recognize in the full path. The steps are only a few mili-
hartree in magnitude, and only by plotting the difference between the two calculations is it
possible to clearly observe the effect of domain changes. The computational details will be
later discussed in Section 3.3.
Russ and Crawford have pointed out that the discontinuities are of the same order of
magnitude as the localization error (due to the truncation of the virtual space) and therefore
not negligible. It is desirable to establish a procedure which can produce a continuous PES
without affecting the linear scaling behavior of the local methods. Not only due to the
52 Chapter 3. Computing Potential Energy Surfaces using Local Correlation Methods
1.2 1.4 1.6 1.8 2.0 2.2 2.4
r(C-C) / A
-152.16
-152.12
-152.08
-152.04
E n
e r
g y
 /  
E h
local
canonical
MP2
CCSD
Figure 3.1: (L)MP2 and (L)CCSD potential energy surfaces for the ketene dissociation.
discontinuities, relative energies may also be affected. If the reactants and products have
different electronic structures, the domain sizes may change and this can lead to unbalanced
results. In particular, it may happen that at the transition state the electronic structure
is more delocalized than for reactants and products, and the resulting larger domains at
the transition state can then lead to a significant underestimation of computed activation
energies.
This subject is of particular concern in the context of this work. Many reactions of
biological interest involve complex aromatic structures, and the domain approximation can
lead to large errors in relative energies. Due to the system sizes involved, it is difficult
to access these effects by comparison to results from canonical methods. An automated
procedure is needed to reduce the geometry dependence of the local methods errors, while
preserving their low computational cost.
3.2 Domain Merging
3.2.1 Method
Let us consider the energy E(x) of a molecular system, where x is a vector representation
of the spatial coordinates. If the domain for a given LMO φi at position x is given by [i]x,
3.2. Domain Merging 53
1.2 1.4 1.6 1.8 2.0 2.2 2.4
r(C-C) / A
1
2
3
 
∆ E
l o
c 
/  m
E h
LMP2
LCCSD
Figure 3.2: Energy difference between local and canonical counterparts, which reveals the discontinuities in
the profile.
the domain list is defined as the group of domains
D(x) = { [i]x | i ∈ occupied }. (3.1)
In the case of a reaction, where the system progresses from an initial configuration, defined
by xi, to a final configuration xf, the energy should remain continuous for any value in the
interval, and as a consequence, the domain list constant.
An obvious approach to the problem would be to use a domain definition which would
include all of the domains along the reaction path. This would, however, be costly, since
it would involve determining domains at all the points along the path. Alternatively, one
could also pick the most relevant points on the surface, and only merge the domains for a
few selected geometries. This would be enough even in case one uses two points as long as
the other domain lists are contained in those of the two selected configurations. The most
straightforward solution is to use the initial and final domain lists
Dmerge = { [i]xi ∪ [i]xf | i ∈ occupied }. (3.2)
Building the union of two domain lists is however not straightforward. In order to make use
of Eq. (3.2) the orbitals should be in a 1-to-1 correspondence between the two sets D(xi)
and D(xf). This involves finding the orbitals with the largest resemblance at both geome-
54 Chapter 3. Computing Potential Energy Surfaces using Local Correlation Methods
tries. A possible criterion for this would be the overlap between the two sets of occupied
orbitals. However, for large displacements, the value will be close to 0. Computing the
overlap with use of the AO overlap matrix from one of the geometries solves the problem
for bond elongations, but might fail in the case of rotations. For example, if a methyl group
rotates by about 120◦, the overlap matrix will indicate a false ordering for the 3 C-H bond
orbitals.
Another possibility to pair the orbitals of one geometry to a second set is to compare
the orbital domains and find the pairing for which the orbital domains coincide best. This
removes the geometry dependence in the comparison, but can in some cases lead to an arbi-
trary pairing (in cases where more than one orbital have the same domains). Our procedure
uses both criteria, overlap and domain comparison. In this way, the stronger criteria has
priority (overlap) and the domain comparison can correct for problems due to rotations.
Once the orbitals in each set are paired, only a small subset of the domain lists should
not agree. For each of these orbitals the center lists are merged [i]xi ∪ [i]xf . If there are
multiple bonds, it can happen that several orbital domains have the same center lists. In
this case the merged domains are used for all of these orbitals.
3.3 Test Applications
3.3.1 Ketene and propadienone bond dissociation
The first tests for the procedure were made for the same systems as studied by Russ and
Crawford,[63] namely the ketene and propadienone bond dissociation. The reaction paths
were optimized as described in the aforementioned paper: first the equilibrium structures
were determined at the CCSD/cc-pVDZ level, and then the C-C bonds were increased while
relaxing all other geometry parameters. The resulting structures were used in single point
calculations using LMP2 and LCCSD and different basis sets.
For the dissociation reaction of ketene (CH2CO) into singlet methylene and carbon
monoxide, the C-C distance was varied in the range from 1.2 Å to 2.5 Å. This region
includes the minimum and the four discontinuities observed by Russ and Crawford.[63]
At the equilibrium structure ketene has C2v-symmetry (C-C distance 1.33 Å), but due to
out-of-plane bending the symmetry reduces to Cs at a C-C bond distance of about 1.47
Å.[64, 65] The MP2, LMP2, CCSD and LCCSD energy profiles along the reaction path are
shown in Fig. 3.3. The middle panel of Fig. 3.3 shows the difference ∆Eloc(r) between the
local and canonical calculations on a much smaller scale. This plot clearly reveals the four
discontinuities at r = 1.30,1.47,1.99,2.32 Å. Near the equilibrium structure, the oxygen
3.3. Test Applications 55
out-of-plane lone pair (b1 symmetry) mixes with the C-C pi bonding orbital in the same
symmetry, and this leads to a domain that extends over the 3 heavy atoms. However, the
contribution at the methylene C-atom is very small. At shorter (r < 1.3 Å) or longer (r >
1.47 Å) distances this atom happens to be not included in the domain when the Boughton-
Pulay method is used with a completeness criterion of 0.98. This leads to the lowering of
the LMP2 and LCCSD energies in the range 1.3 Å < r < 1.47 Å. At a distance of 1.99 Å
one of the C-C bond domains becomes a lone pair (sp3 hybrid) on the CH2 fragment, and
at 2.32 Å the second C-C bond domain becomes a lone pair on CO.
The discontinuities disappear when the domain merging procedure is used. In a first
test, the first and last points from the path were used to define the domains (xi=1.2 Å and
xf=2.5 Å). In this case all orbital domains at 2.5 Å are contained in the ones at 1.2 Å (the
two C-C bonds are broken and at 2.5 Å one lone pair on each fragment is formed). Thus, the
larger domains determined at 1.2 Å are automatically used for all structures. This leaves the
energy at short distances unchanged and lowers the energy at long distances. In a second
test the domains determined at the equilibrium distance (xi=1.33 Å) were merged as well.
As mentioned above, at this point one of the orbital domains extends over three atoms and
consequently these extended domains are used for all three orbitals which involve the CO
bond. As expected, the deviations from the canonical CCSD results are somewhat smaller
in the latter case. At large distances, the contribution of the basis functions at the methylene
C-atom to the description of the CO fragment becomes very small, which explains why the
deviations become the same for both test cases.
The flattening of the curve is a positive feature, since it indicates a more systematic
deviation from the reference canonical calculation. However, it is observed that the differ-
ences between CCSD and LCCSD increase with decreasing C-C distance. This is almost
certainly due to the increasing BSSE in the canonical calculation, which is absent (or at
least reduced) in the local case[32, 33, 34, 35, 36, 37, 66]. An additional feature is that the
increase of this effect is not monotonic, but exhibits a flattening in the region between 1.6
and 1.45 Å. In this region the molecule becomes planar, and this leads to a reduction of the
BSSE. In fact, near 1.45 Å there is a valley-ridge inflection point[67, 68], i.e., the second
energy derivative with respect to the out-of-plane angle(s) changes from being positive at
shorter distances to negative at longer distances. This leads to a very sudden change in the
optimized angle as a function of the C-C bond distance (see lower panel of Fig. 3.3).
The same procedure was applied to the lowest singlet state of propadienone. The C-
C distance between CO and vinylidene was varied in the same range as ketene. At the
equilibrium structure propadienone has a ’kinked’ structure (Cs symmetry),[69] but at small
C-C distances (1.26 Å) it ’snaps’ to C2v. Again, as in the case of ketene and shown in the
56 Chapter 3. Computing Potential Energy Surfaces using Local Correlation Methods
1
2
3
∆E
lo
c 
/ m
E h
LCCSD
LCCSD (merge at 1.20 A)
LCCSD (merge at 1.33 A)
-152.16
-152.12
-152.08
-152.04
En
er
gy
 / 
E h
local
canonical
1.2 1.4 1.6 1.8 2.0 2.2 2.4
r(C-C) / A
90
120
150
180
a
n
gl
e 
/ o
α
β
MP2
CCSD
Figure 3.3: Upper panel: (L)MP2 and (L)CCSD potential energy curves for ketene (in a.u.) as a function of
the C-C distances. For each C-C distance, all other geometry parameters were optimized at the CCSD/cc-
pVDZ level. Middle panel: Difference between the LCCSD and CCSD energies in millihartree. Lower panel:
Optimized out-of-plane angles. α is the angle between the CH2 plane and the C-C-bond and β is the C-C-O
angle.
3.3. Test Applications 57
2
3
4
∆E
lo
c 
/ m
E h
LCCSD
LCCSD (merge at 1.20 A)
-190.12
-190.10
-190.08
-190.06
-190.04
En
er
gy
 / 
E h
local
canonical
1.2 1.4 1.6 1.8 2.0 2.2 2.4
r(C-C) / A
120
150
180
a
n
gl
e 
/ o φ
ρ
MP2
CCSD
Figure 3.4: Upper panel: (L)MP2 and (L)CCSD potential energy curves for propadienone (in a.u.) as a
function of the C-C distance. For each C-C distance, all other geometry parameters were optimized at the
CCSD/cc-pVDZ level. Middle panel: Difference between the LCCSD and CCSD energies in millihartree.
Lower panel: Optimized out-of-plane angles. φ is the C-C-O angle and ρ the C-C-C angle.
58 Chapter 3. Computing Potential Energy Surfaces using Local Correlation Methods
lower panel of Fig. 3.4, there is a valley-ridge inflection point leading to a very sudden
change of the bending angle as a function of the C-C bond distance.
Domain changes occur exclusively for LMOs located at the carbons. At small distances,
there are two σ C-C bond orbitals and two pi orbitals delocalized over the three centers.
After bond breaking, two lone pairs are left (one on the CO, the other on vinylidene) as
well as one σ and one pi orbital on vinylidene. In principle, merging of the two lone
pairs would be sufficient to describe the dissociation qualitatively correct. However, at
short distances the pi (a′′) bond of the vinylidene fragment extends to the third carbon
atom. Since the procedure does not consider the symmetry of the orbitals, three merged
domains extending over the three C atoms are generated. The profile for the calculation
with geometry dependent domains shown in the middle panel of Fig. 3.4 resembles the one
obtained by Russ and Crawford at small distances, but lacks some of the discontinuities
found by these authors in the bond-breaking region. This may be due to differences in the
details of the domain selection procedure.
The use of merged domains produces a smooth curve. Again, the differences between
the local and non-local calculations increases with decreasing distance, and at the valley-
ridge inflection point the curvature suddenly changes. As already discussed for ketene, the
increase of the difference ∆Eloc(r) between local and canonical energies with decreasing
C-C distance is attributed to increasing BSSE in the non-local case.
The effect of the domain approximation for LMP2, LCCSD, and LCCSD(T) was also
studied for the propadienone. The effect is seen to be very similar in all cases, which is in
agreement with previous studies.[10] Based on this fact, further tests were performed only
at the LMP2 level (the computationally cheapest method).
3.3.2 SN2 reaction of hydrochlorocarbons with chlorine
The nucleophilic attack of chloride on ethylchloride is a well known textbook example of
a SN2 reaction. It is well described by single-reference correlation methods, and various
studies at the MP2 level have been published.[70, 71, 72, 73, 74] It is a further example in
which lone pairs turn into bonds (and vice versa) during the reaction, and therefore domain
changes can be expected. Furthermore, the analogous reactions of 1-propylchloride and 1-
butylchloride have been considered in order to investigate the effect of local approximations
on computed barrier heights and well depths of the complexes in the entrance channels.
For the C2H5Cl + Cl− reaction, the energy profile was computed along an assumed
reaction coordinate R2−R1, where R1 and R2 are the two C-Cl bond distances (see Fig.
3.6). The difference was taken since if only one distance was fixed the other one varied very
3.3. Test Applications 59
quickly in some regions, leading to an unexpected shape of the energy profile. The reaction
coordinate was varied in steps of 0.04 Å in the region between -2.0 to 2.0 Å. All other
internal coordinates were optimized at the MP2 level. The cc-pVTZ(d/p) basis was used
for C and H, and the aug-cc-pVTZ(d) basis for Cl (d-functions on H and f -functions on
the other atoms were omitted). In the following, this basis is denoted [aug]-cc-pVTZ(d/p).
Fig. 3.5 shows the MP2 and LMP2 energy profiles along the reaction coordinate. The
upper panel shows the absolute energies. The MP2 and LMP2 curves are closely parallel,
but the LMP2 energies are about 10 mH higher than the MP2 ones. In agreement with
previous studies[70, 74] it is found that the transition state is unsymmetric with respect
to R1 and R2, with a σ ′ plane containing the carbons and chlorines. Correspondingly,
the reaction coordinate is unsymmetric; in the exit channel, the CH3 group is rotated by
60 degrees about the C-C axis relative to the minimum structure. Thus, in this region
the optimized structures correspond to local minima, which are separated from the global
minima by rotational barriers.
The middle panel of Fig. 3.5 shows the difference ∆Eloc = ELMP2−EMP2 on a smaller
scale. The dashed curve corresponds to the calculation with standard domains. Two discon-
tinuities are found in this case on the LMP2 potential, one at positive and one at negative
values of R2−R1. These correspond to the changes of lone pairs into C-Cl bonds for the
entering and exiting Cl− anions, respectively. Between these discontinuities two domains
extend over the whole Cl-C-Cl unit, leading to an energy lowering of about 1 mH. With the
automatic domain merging procedure, applied to the reactant, transition state, and prod-
uct structures, all Cl lone pair and C-Cl domains are united, leading to 8 identical merged
domains. In this case a smooth potential function is obtained. Interestingly, ∆Eloc has a
minimum for R2−R1 ≈ 0 and maxima around R2−R1 ≈ ±0.8 Å. As can be seen in the
lower panel of Fig. 3.5, showing R1+R2, the average distance between the two chlorine
atoms and the ethyl group is minimal in this region; in addition, there is also the closest
proximity between the hydrogen atom on the σ ′ plane at the neighboring CH3 group and
the incoming Cl−. Therefore, the maxima in ∆Eloc should be again due to a maximum of
the BSSE in the canonical calculation, i.e., the bumps are not caused by the LMP2 but by
artificial lowering of the canonical MP2 energy by basis set superposition effects.
We now turn to the question about the effect of the local approximations on the
computed well depths and barrier heights for the reaction of Cl− with ethylchloride, 1-
propylchloride and 1-butylchloride. Two stationary points were optimized in each case:
the weak complex in the entrance channel, which is stabilized by electrostatic and van der
Waals forces, and the transition state, where the electrophilic carbon is partly bonded to
the entering ion. These stationary points were optimized with no symmetry constraints
60 Chapter 3. Computing Potential Energy Surfaces using Local Correlation Methods
8
9
10
∆E
lo
c 
/ m
E h
LMP2
LMP2 (merge)
-998.45
-998.44
-998.43
-998.42
-998.41
En
er
gy
 / 
E h
local MP2
canonical MP2
-2 -1 0 1 2
R2 - R1 / A
4.7
5.0
5.3
5.6
R 2
 
+
 R
1 
/ A
Figure 3.5: MP2 and LMP2 Energy profiles for the C2H5Cl + Cl− reaction. The [aug]-cc-pVTZ(d/p) basis
has been used. Upper panel: absolute energies; middle panel: energy difference between MP2 and LMP2
calculations; Lower panel: R1+R2 as a function of the reaction coordinate R1−R2.
3.3. Test Applications 61
Figure 3.6: Schematic representation of the SN2 reactions of Cl− with ethylchloride (X=H), 1-propylchloride
(X=CH3), and 1-butylchloride (X=CH2CH3). D1 is the dihedral angle between the entering chlorine, the two
carbons and the X fragment.
at the MP2 level. All optimizations were carried out with the [aug]-cc-pVTZ(d/p) basis.
The values of the optimized coordinates are summarized in Table 3.1 (see Fig. 3.6 for the
definition).
For ethylchloride + Cl− the structures are in good agreement with those obtained in
previous studies[70, 71, 72, 73, 74]; small differences are due to the larger basis set. How-
ever, for the propylchloride complex, the structure differs from previous ones[70, 71] in the
orientation of the added methyl group. The values for D1 given in the aforementioned ref-
erences are also presented in Table 3.1. Jensen[70] placed the alkane chain in the direction
of the entering chlorine (D1 ≈ 0o), similar to the ethyl reaction. In our calculations, it is
instead pointing sideways (D1 ≈ 90o), as in the transition state. No true PES minimum was
found for the other structure. Using the same level of theory (MP2/6-31G*) it is, however,
possible to locate a saddle point with similar conformational features as the ones published
in the previous works.
For butylchloride no previous calculations of the geometries were found. In the opti-
mizations the start geometries were taken from the propylchloride structures, replacing the
hydrogen that is farthest from the electrophilic carbon by a methyl group. However, no fur-
ther extensive search was performed and there is no guarantee that the optimized structures
62 Chapter 3. Computing Potential Energy Surfaces using Local Correlation Methods
Table 3.1: Optimized geometry parameters for the stationary points of the SN2 reactions. The structures were
optimized at the MP2/[aug]-cc-pVTZ(d/p) level (see text). Distances are in Å and angles in degrees (see Fig.
3.6 for the definition of the coordinates).
R1 R2 A1 A2 D1a D1b D1 c
Complex
ethylchloride 1.834 3.326 110.7 84.2 0.0 0.0 0.0
1-propylchloride 1.833 3.369 111.5 80.6 -85.7 0.0 0.0
1-butylchloride 1.831 3.424 111.6 76.8 -81.1 — —
Transition state
ethylchloride 2.355 2.330 94.9 100.0 2.3 0.0 2.9
1-propylchloride 2.341 2.342 97.2 97.1 -87.8 -90.0 -88.2
1-butylchloride 2.341 2.340 97.1 97.0 -87.9 — —
aThis work.
bRef. [70]
cRef. [71]
correspond to the global minima.
Single point MP2 and LMP2 calculations were carried out on the optimized struc-
tures, using the aug-cc-pVXZ basis sets (X=D,T,Q).[75, 76] The basis will be refered
to as AVXZ for short. Due to the large number of basis functions, density fitting (DF)
approximations[77] were used throughout. The results are presented in Tables 3.2 and 3.3.
In order to study the effect of the local approximations on the relative energies, the LMP2
calculations were performed with three different choices of domains. In the first ”stan-
dard” case, the domains were determined by the method of Boughton and Pulay[9] at the
individual structures. In the second case (denoted merged(A)), the standard domains deter-
mined for the reactants and at the transition state were merged. This affects only the lone
pairs of the incoming Cl− ion, yielding four equivalent domains. In the third case (denoted
merged(B)) also the products were included in the merge procedure, yielding the same 8
equivalent domains as used in Fig. 3.5. This is relevant to see the effect of using different
points for the domain merging.
Table 3.2 shows that in this case the effect of the merging on the computed barrier
heights is small for all three systems. The standard and merged results differ by at most
0.2 kcal/mol. For the AVTZ and AVQZ basis sets the agreement with the canonical MP2
3.3. Test Applications 63
Table 3.2: MP2 and LMP2 barrier heights (in kcal mol−1) for the SN2 reactions.
DF-LMP2
Reactant Basisa Standard Merged(A)b Merged(B)c DF-MP2
C2H5Cl AVDZ 18.2 18.1 18.3 17.6
AVTZ 18.5 18.5 18.5 18.5
AVQZ 18.8 18.8 18.9 18.9
C3H7Cl AVDZ 17.3 17.2 17.3 16.6
AVTZ 17.9 17.8 17.9 17.8
AVQZ 18.2 18.1 18.2 18.2
C4H9Cl AVDZ 17.6 17.4 17.4 16.9
AVTZ 18.1 18.1 18.1 18.1
AVQZ 18.4 18.4 18.5 18.6
aAVXZ denotes the aug-cc-pVXZ basis sets[76]
bCase (A): the domains of reactant and TS are merged.
cCase (B): the domains of reactant, TS, and product are merged.
results is excellent. However, in the case with the smaller AVDZ basis set the barrier
height is lower in the canonical calculation. Very likely, this effect is due to an artificial
lowering of the canonical MP2 barrier by BSSE effects. This is supported by the fact that
the convergence with increasing basis set size is faster for LMP2 than for MP2.
The effect of the BSSE can be directly shown for the complex by applying the counter-
poise correction (CP).[78] Table 3.3 shows the CP-corrected binding energies. As demon-
strated previously for many other cases[32, 33, 34, 35, 36, 37, 66], the CP corrections (in
parenthesis) are much smaller for LMP2 than for MP2, and the CP-uncorrected LMP2
values (obtained by subtracting the CP correction) are in excellent agreement with the CP-
corrected MP2 values. The binding energies obtained with standard domains are always
slightly too small, consistent with previous experience[34, 66]. This is due to the missing
class of ionic excitations, which are neglected by construction in the local calculations.
When the domains are extended by the merging procedure, the most important contribu-
tions of these excitations are included, and the CP corrections slightly increase. This leads
to even better agreement between the CP-uncorrected LMP2 and the CP-corrected MP2
binding energies. Due to the remaining BSSE, the CP-uncorrected LMP2 binding energies
64 Chapter 3. Computing Potential Energy Surfaces using Local Correlation Methods
Table 3.3: MP2 and LMP2 binding energies (in kcal mol−1) of the complexes in the entrance channels of the
SN2 reactions. All values are counterpoise corrected, and the BSSE correction is given in parenthesis.
DF-LMP2
Reactant Basisa Standard Merged(A)b Merged(B)c DF-MP2
C2H5Cl AVDZ -11.2 (+0.3) -11.2 (+0.5) -11.6 (+0.6) -11.5 (+1.2)
AVTZ -11.6 (+0.0) -11.6 (+0.2) -11.7 (+0.2) -11.7 (+0.7)
AVQZ -11.8 (+0.0) -11.8 (+0.1) -11.9 (+0.1) -11.9 (+0.4)
C3H7Cl AVDZ -11.6 (+0.3) -11.7 (+0.5) -11.9 (+0.7) -12.1 (+1.4)
AVTZ -12.2 (+0.1) -12.3 (+0.2) -12.3 (+0.3) -12.5 (+0.8)
AVQZ -12.5 (+0.0) -12.5 (+0.1) -12.6 (+0.1) -12.8 (+0.6)
C4H9Cl AVDZ -12.2 (+0.3) -12.3 (+0.5) -12.4 (+0.7) -12.7 (+1.6)
AVTZ -12.8 (+0.1) -12.9 (+0.2) -13.0 (+0.3) -13.3 (+0.9)
AVQZ -13.2 (+0.0) -13.2 (+0.1) -13.3 (+0.1) -13.4 (+0.5)
aAVXZ denotes the aug-cc-pVXZ basis sets[76]
bCase (A): the domains of reactant and TS are merged.
cCase (B): the domains of reactant, TS, and product are merged.
are in some cases slightly larger than the CP corrected MP2 ones.
3.3.3 Hydrogen fluoride addition to double bonds
The addition of hydrogen fluoride (FH)1 to ethene is a frequently used model system for
addition reactions involving unsaturated hydrocarbons. It has been subject of several the-
oretical works, some of the references are contained in Ref. [79]. The addition follows
a concerted mechanism in the gas phase as depicted in Fig. 3.7. The reaction proceeds
through the formation of a weak van der Waals complex (RC), leading to a four centered
transition state (TS1). Fluoroethane is then formed, but at the eclipsed conformation (TS2)
which by rotation leads to the final product (P).
The reaction is a good test candidate for the proposed merge procedure. The system
evolves from a double bond to a 4-center bond between FH and ethene, finally leading to
a saturated system. Also of interest is the last step, where a methyl rotation takes place.
As discussed in Section 3.2.1, the merging procedure depends on a 1-to-1 correspondence
1The abbreviation FH will be used throughout the text to avoid possible confusion with Hartree-Fock.
3.3. Test Applications 65
Figure 3.7: Schematic representation of four stationary points of the hydrogen fluoride addition reactions
- ethene (X1=X2=H), 1-propene (X1=CH3,X2=H) and 2-butene (X1=X2=CH3). RC stands for the van der
Waals reactant complex, TS1 and TS2 the first and second transition states respectively, and the product state
P. On the bottom, a representation of TS1 with a description of the geometric parameters of Table 3.4. D1
stands for the dihedral angle between the entering hydrogen, the carbons and the fluorine.
between orbitals in different geometries. For this purpose, both overlap and domain criteria
were used in building the pairing list. The overlap criterion fails in this case and it will be
interesting to see how well the pairing algorithm works.
Besides ethene, also the reactions with 1-propene and 2-butene were considered. Just
as in the SN2 case, these are systems of increasing size built by substitution of a hydrogen
by a methyl group, this time in the vicinity of the double bond. The same mechanism
was adopted for the three systems. For ethene and 2-buthene there can only be one final
product. For propene there are two possibilities: 1-fluoropropyl and 2-fluoropropyl. The
latter product was considered. In all three cases the four stationary points in the PES were
found, and the same naming procedure is used as in the work of Cremer et al. (see Fig. 3.7
and Table 3.4).[79]
No reaction paths were calculated, only relative energies between the minima/maxima.
The reactant domains (computed at large distances) were merged with the TS1 domains.
Due to the large domains found for the transition state structures, taking the product do-
mains as reference would be insufficient. The merging procedures leads to relatively small
changes. One of the fluorine lone pair domains is augmented to include one of the carbons
PAOs, and the entering hydrogen PAOs are added to one of the double bonding carbon
domains. Single point calculations were also carried out for MP2 and regular LMP2 cal-
culations, using the cc-pVXZ basis sets (X=D, T or Q)[75]. Full results are displayed in
66 Chapter 3. Computing Potential Energy Surfaces using Local Correlation Methods
Table 3.5.
The results agree with previous findings. The merged values lie between the local and
canonical results, or at least quite close. All values converge with increasing basis set size,
and the increasing molecular size does not affect this trend. The largest error is found
for the ∆ETS1 value, which lies around 50 kcal/mol (and as such, the error is only of the
order of 1% in the relative energy). One last notice should be made about the cc-pVQZ
value for the butene ∆ETS1, for which the local result still lies about 0.8 kcal/mol above
the canonical one. A slow basis set convergence for the correlation energy is observed in
this case, probably due to BSSE. Counterpoise corrections carried on the complex structure
gave a BSSE of 0.89 kcal/mol, confirming the assumption.
The values for ∆ER are all in good agreement, which provides evidence for the robust-
ness of the pairing algorithm. Although the methyl group is strongly rotated, the program
still identifies the correct orbital. Ordering the orbitals based on the overlap criterion would
lead to an incorrect pairing, and to a large error in the correlation energy.
3.3. Test Applications 67
Table 3.4: Relevant geometric parameters for the stationary points found for the hydrogen fluoride addition to
a double bond reactions. Structures were optimized at the MP2/[aug]-cc-pVTZ(d/p) level (diffuse functions
added to fluorine). Distances are in Å and angles in degrees (see Fig 3.7 for support).
Complex
R1 R2 R3
ethene + HF 1.337 0.932 2.228
propene + HF 1.340 0.934 2.164
butene + HF 1.342 0.936 2.145
Transition state 1
R1 R2 R3 R4 A1 A2 D1
ethene + HF 1.399 1.309 1.317 1.904 73.1 93.8 0.0
propene + HF 1.403 1.329 1.292 1.958 74.7 91.7 1.4
butene + HF 1.404 1.363 1.277 1.971 73.6 92.7 3.9
Transition state 2
R1 R3 R4 A1 A2 D1
ethene + HF 1.526 1.086 1.404 108.6 110.0 0.0
propene + HF 1.528 1.086 1.413 108.9 108.6 2.9
butene + HF 1.531 1.088 1.413 107.3 109.0 4.8
Product
R1 R3 R4 A1 A2 D1
ethene + HF 1.509 1.088 1.402 109.6 109.6 180.0
propene + HF 1.512 1.088 1.411 110.0 107.9 -177.4
butene + HF 1.515 1.090 1.412 107.8 108.2 177.5
68 Chapter 3. Computing Potential Energy Surfaces using Local Correlation Methods
Ta
bl
e
3.
5:
E
ne
rg
ie
s
of
th
e
FH
ad
di
tio
n
re
ac
tio
n
st
at
io
na
ry
po
in
ts
re
la
tiv
e
to
th
e
re
ac
ta
nt
co
m
pl
ex
R
C
(i
n
kc
al
m
ol
−1
).
T
he
se
re
su
lts
ar
e
ob
ta
in
ed
fr
om
si
ng
le
po
in
ts
on
th
e
M
P2
/[
au
g]
-c
c-
pV
T
Z
(d
/p
)(
ad
de
d
di
ff
us
e
fu
nc
tio
ns
to
th
e
flu
or
in
e)
op
tim
iz
ed
st
ru
ct
ur
es
.
∆E
T
S1
∆E
T
S2
∆E
P
D
F-
L
M
P2
D
F-
L
M
P2
D
F-
L
M
P2
R
ea
ct
an
t
B
as
is
St
an
da
rd
M
er
ge
d
D
F-
M
P2
St
an
da
rd
M
er
ge
d
D
F-
M
P2
St
an
da
rd
M
er
ge
d
D
F-
M
P2
et
he
ne
cc
-p
V
D
Z
54
.3
54
.4
53
.5
-8
.7
-9
.4
-1
0.
0
-1
2.
3
-1
3.
1
-1
3.
8
cc
-p
V
T
Z
53
.3
53
.4
52
.9
-8
.0
-8
.5
-8
.5
-1
1.
3
-1
1.
9
-1
2.
0
cc
-p
V
Q
Z
52
.0
52
.1
51
.9
-7
.8
-8
.0
-8
.0
-1
1.
1
-1
1.
4
-1
1.
4
pr
op
en
e
cc
-p
V
D
Z
51
.8
51
.9
50
.9
-9
.4
-1
0.
1
-1
0.
8
-1
2.
7
-1
3.
5
-1
4.
3
cc
-p
V
T
Z
50
.9
51
.0
50
.3
-8
.5
-9
.0
-9
.1
-1
1.
7
-1
2.
3
-1
2.
5
cc
-p
V
Q
Z
49
.4
49
.5
49
.2
-8
.4
-8
.6
-8
.6
-1
1.
6
-1
1.
9
-1
1.
9
bu
te
ne
cc
-p
V
D
Z
55
.3
55
.8
54
.5
-6
.2
-6
.2
-7
.4
-1
0.
0
-1
0.
1
-1
1.
4
cc
-p
V
T
Z
54
.1
54
.3
53
.7
-5
.2
-5
.7
-5
.7
-8
.8
-9
.2
-9
.4
cc
-p
V
Q
Z
53
.2
53
.0
52
.4
-5
.1
-5
.3
-5
.2
-8
.6
-8
.8
-8
.8
Chapter 4
Natural Localized Molecular Orbitals
for Local Correlation Schemes
69

4.1. Critical Assessment of the Boughton-Pulay Criteria 71
4.1 Critical Assessment of the Boughton-Pulay Criteria
Since the seminal works by Saebø and Pulay,[1, 28] the fundaments of local correlation
methods have been kept practically unchanged. There has been significant investment in
expanding the range and efficiency of local methods available, and in the pair approxi-
mations used, but little on the localization procedure on the basis of the procedure. Also
the domain selection criterion, proposed in 1993 by Boughton and Pulay,[9] has remained
untouched.
The choice of method for localizing the occupied space is usually of little importance.
Pipek-Mezey (PM) localization[11] is preferred in most cases, since it keeps the pi-σ sep-
aration in planar molecules, but any set of local orbitals could be used. It has been found
that the correlation energy is rather insensitive to the localization method. However, the
PM procedure is known to be sensible to the use of diffuse basis sets. Since it is based on
the AO overlap matrix, near linear dependencies may lead to artificially large LMO coeffi-
cients at some atoms. This frequently happens, for instance, in aromatic compounds with
basis sets such as aug-cc-pVTZ or aug-cc-pVQZ. Using the Boughton-Pulay (BP) method
one then finds unphysical large domains for the pi-orbitals, which include the neighboring
H-atoms. One solution to this problem is to remove the most diffuse basis function of each
angular momentum type for each atom in the localization criterion. However, this solution
is also not straightforward, since again it depends on the basis set used. For larger basis
sets or with even more diffuse functions it might be necessary to remove extra functions.
Other problems have been pointed out at the domain selection criterion (detailed in Sec-
tion 2.1.4). In the BP procedure one computes the overlap of the LMO with a trial function
built as linear combination of AOs belonging to the domain centers, adding centers to the
list till this value exceeds the BP criterion TBP (see Eq. (2.53)). The value to which this
parameter should be set is, however, unclear, as it also suffers from basis set dependence.
Usually, the criterion is more easily fullfilled for larger basis sets, and therefore for a fixed
value of the threshold TBP the domains will become smaller with increasing basis set. To
compensate for this, one can use different thresholds for different basis sets, e.g. 0.98 for
double zeta, 0.985 for triple zeta, and 0.99 for quadruple zeta. But apart from the fact that
this is not a well defined and user-friendly model, the domains often still differ for different
basis sets.
Another critical point of the BP criterion is the use of Löwdin populations for ordering
the atoms. This population analysis does not converge when going to larger basis sets.
In some cases physically unreasonable results can be expected. For example, the partial
atomic charge in the methane carbon is predicted to be positive when using the aug-cc-
72 Chapter 4. Natural Localized Molecular Orbitals for Local Correlation Schemes
pVTZ basis set (+0.05 a.u.).[80] The variations can even be quite large when going from a
double zeta to a triple zeta basis, changing as much as 0.5 a.u. (see again Ref. [80]). This
of course affects the reliability of the whole procedure.
All these problems are an obstacle to the generalized use of local methods, since the
user is forced to control too many aspects of the calculation. Besides the BP criterion,
many other parameters are used to give stable domains. Limits on the minimum charge
population of each atom (depending on whether it is a hydrogen or a heavy atom), as
well as parameters for automatically adding atoms with charges above a certain value are
today in use. Much of these defaults can be safely used, but exceptions have also been
identified. In fact, it is generally advisable to check the orbital domains before carrying out
a local correlation calculation. There is an obvious need for a truly automated procedure
for localization and domain selection.
4.2 Natural Localized Molecular Orbitals
The Natural Atomic Orbitals (NAOs) and related population analysis were introduced by
Weinholdt and coworkers in 1985.[13] They are defined as the eigenfunctions of the density
(see Eq. (2.58)). The matrix representation of the density is defined as
Γρσ =< χρ |ρ|χσ >=∑
µν
< χρ |χµ > Dµν < χν |χσ >
= [SDS]ρσ . (4.1)
The procedure is started by dividing the matrix Γ into one-center blocks
Γ(AA) Γ(AB) Γ(AC) . . .
Γ(BA) Γ(BB) Γ(BC) . . .
Γ(CA) Γ(CB) Γ(CC) . . .
...
...
... . . .

,
where for Γ(AB)µν , µ ∈ {A} and ν ∈ {B}. The AO overlap matrix S is partitioned in the same
way. For each diagonal sub-matrix the generalized eigenvalue problem
Γ(AA)X= S(AA)XW, (4.2)
4.2. Natural Localized Molecular Orbitals 73
whereW is a diagonal matrix holding the eigenvalues, is solved. The eigenvectorsX are the
pre-NAOs, and their occupancy is given by the respective eigenvalues W. These orbitals
are divided into two sets: the Natural Minimal Basis (NMB) and the Natural Rydberg Basis
(NRB). The division is made by taking the first N orbitals into the NMB set, where N is the
number of valence orbitals needed for the representative ground state configuration of the
neutral atoms. The orbitals are then orthogonalized among each other, while maintaining
the Γ matrix block diagonalization. The steps are as follows:
(1) The NRB orbitals are Schmidt-orthogonalized relative to the NMB set,
(2) the eigenvalue problem of Eq. (4.2) is again solved, this time for the density and
overlap matrices in the NRB basis,
(3) the NMB and NRB orbitals of the different centers are orthogonalized using an
occupancy-weighted orthogonalization scheme (see Ref. [13]),
(4) the eigenvalue problem is again solved, this time in the basis of the whole orthogonal
orbital set.
The diagonalization and orthogonalization matrices multiplied together give the TNAO ma-
trix. Transformation of Γ into the NAO basis
D˜= (TNAO)†ΓTNAO, (4.3)
gives a block-diagonalized matrix, and its diagonal elements D˜rr are the final occupation
numbers for the NAO orbital with index r. In this way, one can divide the charge among
the atoms as
PA = ∑
r∈{A}
D˜rr. (4.4)
This is the so called Natural Population Analysis (NPA).[13]
The Natural Bond Orbitals (NBOs) are built by diagonalizing one and two-center
blocks of the NAO density matrix. The procedure is as follows:
(1) All NAOs with eigenvalues above a given threshold are added to the NBO list as
lone-pairs and all lone pair contributions to the density matrix are removed.
(2) The two-center blocks of the NAO density matrix are diagonalized. Again, all or-
bitals with eigenvalues above the threshold are added to the NBO list. These NBOs
are refered to as 2-center bond orbitals.
74 Chapter 4. Natural Localized Molecular Orbitals for Local Correlation Schemes
(3) If the number of NBOs found in this way is equal to the number of electron pairs,
the search is stopped and one may proceed to the next step. If not, the threshold is
decremented and step 2 is repeated. It would also be possible to expand the search
to 3-center bonds, but we found that for the systems included in this study (and in
general organic compounds) this was not necessary.
(4) The remaining orbital space (of low occupation) is divided into Rydberg and anti-
bonding orbitals. More details can be found in Ref. [81], or in Appendix A.
The NBO orbitals are by construction orthogonal, but should not be used directly in
post-SCF calculations since the occupied NBO orbitals do not span the SCF occupied space
exactly. Therefore, a final TNLMO transformation is performed which rotates the NBO
orbitals so that the orbitals with highest occupations (the so called Lewis set) span the
SCF valence space. This is done by 2x2 Jacobi rotations which zero the density matrix
elements between the Lewis and non-Lewis spaces. For closed-shell SCF wave functions
this makes all diagonal elements Dii = 2 and all other elements zero. Since the diagonal
elements of the NBO density matrix of the Lewis space are already quite close to 2 one
only needs a limited number of 2x2 Jacobi rotations and the orbital space stays localized.
The procedure is further detailed in Appendix A. In summary, the final Natural Localized
Molecular Orbitals (NLMO) coefficients are obtained as
L= TNAOTNBOTNLMO = TNAOV. (4.5)
which corresponds to a connected series of transformations starting from the AO basis
AO T
NAO−→ NAO TNBO−→ NBO TNLMO−→ NLMO. (4.6)
All of these sets have been used in the past for analyzing self-consistent wave functions.
The NAOs are the basis for the above mentioned NPA analysis, which is free from many
of the deficiencies found in Mulliken of Löwdin populations. Of special interest is its
stability with respect to the choice of basis set. This can be easily explained by reviewing
the way the NAOs are built. After solving Eq. (4.2) for Γ, the orbitals are divided into the
NMB and NRB sets. All subsequent operations strive to maintain the form of the NMB set
(where the majority of the electron population is kept) at the cost of the NRB orbitals. By
increasing the basis set, low-populated diffuse functions will be tagged to the latter set and
will therefore have little effect on the NPA populations. The NPA derived atomic charges
have been shown to converge with respect to basis set size, and also to deliver results which
agree well with experimental evidence and/or chemical sense.
4.2. Natural Localized Molecular Orbitals 75
HOOC CH3
NH2
NH2 O
OH
O
O
H2N COOH
SHOO H2N NH2
H2O
CH4
S
S
S
O
HOOC COOH
H2N NH2
O
NH2
N
NH
C2H6
COOH
COOH
alanine (01)
aniline (02)
benzene (03)
benzoquinone (04)
butadiene (05)
butane (06)
cyclohexane (07)
dimethylether (08)
ethane (09)
ethanol (10)
formaldehyde (11)
furan (12)
glycine (13)
hexane (14)
hexatriene (15)
hydrazine (16)
imidazole (17)
isobutene (18)
maleic acid (19)
methane (20)
methylamine (21)
oxalic acid (22)
oxirane (23)
pentane (24)
propane (25)
thianthrene (26)
thiophene (27)
thiophenol (28)
urea (29)
water (30)
Figure 4.1: Test set of 30 molecules used in this work. All geometries were pre-optimized with B3LYP/cc-
pVTZ(d/p) (including up to d functions for the second and third row elements, and up to p functions for the
hydrogens). The numbering shown is the same used in some of the diagrams.
The NBO orbitals are an extension to the NAO procedure, whereby also 2-center blocks
are diagonalized, giving a Lewis-like description of bonding. Several analysis procedures
have been proposed over the years, but this will not be discussed in the text. A review can
be found in Ref. [82].
The NLMO orbitals, the last ones in the set, can be used in local correlation treatments.
They have in the past compared favorably to Boys[30] and Edmiston-Ruedenberg[31].
Measures of LMO charge centroid distances relative to the atomic nuclei, as well as di-
rect comparison of the orbital coefficients showed similar results for all three localization
methods. However, their biggest advantage is the possibility of using the NPA analysis
to calculate the charge of each NLMO into charges of centers. If stable enough, a single
parameter could be used to control the domain sizes, replacing the somewhat intricate use
76 Chapter 4. Natural Localized Molecular Orbitals for Local Correlation Schemes
of the default PM/BP procedure. For comparing the use of NLMO and PM orbitals in local
correlation treatments, a test set comprising 30 molecules was chosen. They are depicted
in Fig. 4.1. Included in this list are typical small organic molecules, medium-sized satu-
rated and unsaturated hydrocarbons as well as aromatic systems. Two sets of calculations
were run. In both sets, a DF-LMP2/cc-pVTZ calculation was carried out using the BP
criterion with a value of TBP = 0.985. In the first set PM localized orbitals were used and
in the second NLMOs. The results are shown in Fig. 4.2, where both the percentage of
correlation energy recovered (left scale) as well as the difference of the average domain
sizes LNLMO−LPM (right scale) is plotted. As can be seen in the diagram, this difference is
always positive, except for the benzoquinone molecule, meaning that the NLMO domains
are slightly larger. There are, however, no major differences between the two. The correla-
tion energy recovered with both orbital sets is very similar, and the differences are mainly
due to the different domain sizes. In the cases where the domains are the same, the energies
are almost identical, which supports the conclusion that the localization method has very
little effect on the energy. These results indicate that replacing Pipek-Mezey by NLMO
orbitals should have little effect on the accuracy of local correlation methods, as seen with
other localization procedures.[9]
The next Section discusses different ways of partitioning the orbital charge through the
centers, making use of the NPA population, and a new domain criterion is proposed.
4.3 Natural Population Domain Criterion
4.3.1 Orbitals Population
No unique way has been described so far to divide the charge of each NLMO into charges
of centers. Löwdin or Mulliken could be used but, as discussed before, they are unreli-
able. The alternative is the use of NPA charges and/or the coefficients of the only natural
orbital set which is uniquely tagged to the atoms - the NAOs. The NBOs contain 2-center
orbitals, which do not differentiate between the two centers involved (one could divide
the orbital contribution or consider the electronegativity of the atoms involved, but this
approach would be somewhat empirical).
In order to determine where the NLMOs are located, one makes use of the V transfor-
mation from NAO to NLMOs and the NAO Density Matrix D˜. The charge of NLMO i at
the center A will be refered to as PAi. There are various possible ways to determine this
value:
4.3. Natural Population Domain Criterion 77
4 8 12 16 20 24 28
molecule
97.5
98
98.5
99
99.5
100
p e
r c
e n
t  c
o r
r e
l a
t i o
n  
e n
e r
g y
NLMO
Pipek-Mezey
-10
0
10
20
30
40
 
∆  
a
v e
r a
g e
 d
o m
a i
n  
s i z
e
Figure 4.2: Percentage of correlation energy recovered using NLMO and Pipek-Mezey orbitals in LMP2
calculations, and with TBP = 0.985. Also shown (in bars) is the average domain size difference between the
two sets. The basis set used was cc-pVTZ. The molecule numbers refer to the ones used in Fig. 4.1.
(1) Use only the transformation coefficients, as in the NBO program[83]
PAi = 2 ∑
r∈[A]
V 2ri, (4.7)
since the NLMOs are normalized, 0< PAi < 2 is valid.
(2) Use the transformation coefficients, with the NAO Density Matrix elements as
weighting coefficients
PAi = 2
∑r∈[A]V 2riD˜rr
∑rV 2riD˜rr
. (4.8)
The sum in the denominator runs through all NAOs.
(3) Use the transformation coefficients to divide the charge through the occupied orbitals
PAi = ∑
r∈{A}
(
V 2ri
∑ jV 2r j
)
D˜rr (4.9)
78 Chapter 4. Natural Localized Molecular Orbitals for Local Correlation Schemes
O
0.827
0.060
0.112 0.827
0.112
0.060
0.112
0.827
0.0600.0600.827
0.112
0.060
0.112
0.827
0.112
0.060
0.827
0.990
0.028
0.004
0.951
0.030
0.001
0.030
0.942
0.023
0.023
0.942
0.030
0.001
0.030
0.951
0.004
0.028
0.990
1.752
0.007
0.007
0.066
0.909
0.047 0.058
0.991
0.043
0.058
0.043
0.991
0.066
0.047
0.909
Figure 4.3: NPA calculated pi-orbitals charges for NLMO orbitals. The basis set used was aug-cc-pVQZ. The
values are given for each heavy atom, with the orbitals ordered vertically.
The first option is in fact a percentage, giving the weight of each NAO. No use is made
of the NPA charges, so that low-occupancy NAOs have the same impact as the ones with
higher-occupancy. This choice would lead to an overestimate of the NLMO population
on nearby atoms which were included due to the orthogonalization tails, and this effect is
undesirable. Option (2) already includes the NPA information, but is not consistent with
the individual atomic populations. Consider for example a CH4 molecule. There will be
four valence NLMOs, one for each C-H bond. They will be almost perfectly localized, so
that each hydrogen atom population could in fact be extracted either by the NPA analysis,
or by using the electron charge fraction indicated by the NLMO population. The two values
will not coincide, since some of the charge is given to the non-occupied NLMOs. Only by
use of Eq. (4.9) are both analyses in agreement. An example for the charge partitioning
can be seen in Fig. 4.3. The charges are given for the pi-orbitals of benzene, hexatriene and
furan.
4.3.2 NPA-based Domain Criterion
It would be desirable to replace the BP procedure by a stable criterion based on the charges
obtained using the NPA procedure and Eq. (4.9). There are two approaches for such
a criterion. One may define a minimum charge for the LMO domain to be considered
"filled". Centers would be added to the domain list in the order of decreasing charges, until
∑A∈[i]PAi exceeds the threshold. Another possibility is to add all centers to a domain with
charge above a given limit. Both approaches should lead to similar results, but the latter
alternative was found to be more stable with respect to basis set size, especially in aromatic
systems. The new parameter will be refered to as TNPA. For a given LMO φi, all atoms for
which PAi > TNPA are added to the domain list [i].
4.4. Comparison to Boughton-Pulay 79
The value of TNPA is the only parameter needed for determining the domains. Which
value it should take remains, however, an open question. Observing Fig. 4.3, it is clear that
it shouldn’t be much above 0.10 a.u., otherwise the pi-orbitals of benzene would be double-
centric bonds. A too low value is also not advisable, since it would make the selection
unstable (for low values the populations start to form a continuum). Values below 0.01 a.u.
should be avoided. In order to define the best TNPA value, further tests were made, namely
on reaction energies. In this text, only absolute energies and the qualitative features of the
domain selection will be discussed.
I would like to end this Section by adding a further comment on the procedure. The new
criterion is extremely reliable in defining domains for pi orbitals in aromatic systems. By
adjusting the value of TNPA, merged pi domains can be obtained. Values between 0.03-0.01
a.u. are advisable, and could replace the use of merging procedures.[10]
4.4 Comparison to Boughton-Pulay
A series of tests were performed to compare the performance of the new domain selec-
tion scheme. The combination of NLMO orbitals as ocuppied space and the NPA-based
criterion will be from now on denoted as NLMO/NPA. In the tests, the 30 molecule set
depicted in Fig. 4.1 was used. All calculations were carried out with the density-fitting
variants of HF, MP2 and LMP2.[77, 84] The correlation consistent basis sets of Dunning
and co-workers, cc-pVXZ [75] and aug-cc-pVXZ [76] (with X=D, T and Q) were used.
4.4.1 Domain Convergence with respect to Basis Set
The variation of the domains with basis set is measured by a parameter ∆=∑i∆i, where ∆i
is the number of non-coinciding atoms in the orbital domain i, relative to the domains ob-
tained with the cc-pVDZ basis set. For example, if for two different basis sets the domains
of a particular orbital are C1, C2, H1 and C1, C2, ∆i = 1, while for C1, C2, H1 and C1, C2,
H2 ∆i = 2. Fig. 4.4 shows the variation of ∆ for the largest molecule in the set, thianthrene.
Two sets of calculations were carried out with the Pipek-Mezey orbitals. In the first case,
fixed parameters were used for the domain selection (TBP = 0.985) and localization for all
basis sets. In a second series of calculations, the parameters were changed as a function
of the basis set. The BP criterion was set to 0.980 for the double-zeta basis sets, 0.985 for
triple and 0.990 for quadruple-zeta. For the augmented basis sets, the contribution of the
most diffuse basis function of each angular momentum type for each atom was eliminated
in the localization criterion. In the case of aug-cc-pVQZ, the last two were eliminated.
80 Chapter 4. Natural Localized Molecular Orbitals for Local Correlation Schemes
cc
-pV
DZ
au
g-c
c-p
VD
Z
cc
-pV
TZ
au
g-c
c-p
VT
Z
cc
-pV
QZ
au
g-c
c-p
VQ
Z
0
40
80
120
|  ∆
 
d o
m
a i
n s
 |
Pipek-Mezey (TBP=0.985)
Pipek-Mezey (variable TBP)
NLMO (TNPA=0.05)
Figure 4.4: Sum of the domain variations (absolute) for the thianthrene molecule with different localization
procedures and domain selection test. For the variable TBP case, different parameters for localization and
domain selection were used (see text for more information).
These values are refered to as variable TBP in Fig. 4.4. This procedure should decrease the
basis set dependence. For the NLMO/NPA method, a single value TNPA = 0.05 was used.
The results in Fig. 4.4 show large fluctuations in the domains when using a fixed TBP
value. The use of different parameters for different basis sets helps to decrease these differ-
ences, but the changes in the domain lists when using diffuse functions is still measurable.
Only in the case of NLMO/NPA is ∆ = 0 for all six basis sets, i.e., there is not a single
domain change.
Similar tests were done for all 30 molecules using TNPA values of 0.025, 0.05, and 0.10.
The PM/BP domains change significantly as a function of the basis set, not only in the case
of aromatic rings, but also in smaller molecules like dimethylether or oxirane. The use of
diffuse functions generally leads to a steep increase in the domain sizes. The results are
shown in Figs 4.6 and 4.5.
Contrary to the BP defined domains, the NPA-based criterion is extremely robust. For
the 30 molecules depicted in Fig. 4.1 and using TNPA = 0.05, all domains were kept. For
TNPA = 0.025, there is a difference between the double-zeta domains of the oxalic acid, and
4.4. Comparison to Boughton-Pulay 81
the remaining sets. One of the carboxylic pi-orbitals extend to a neighboring carbon for the
larger basis sets. Since there are two carboxylic groups, ∆ = 2. The population change in
the neighboring carbon is rather small (it changed from 0.023 to 0.026 a.u., cc-pVDZ and
cc-pVTZ respectively), but enough to go above the threshold. For TNPA = 0.100, there is
also a single change. For the aug-cc-pVTZ basis set, one of the oxygen lone pairs in glycine
turns into a double bond, again due to population fluctuations in the order of ±0.001 a.u.
Even with these exceptions, there is an enormous gain in the use of the NPA criterion.
4.4.2 Correlation Energies
Fig. 4.7 shows the fraction of correlation energy recovered relative to canonical MP2 us-
ing the PM/BP and NLMO/NPA methods. In the latter case, two different thresholds were
used for comparison. The results are rather similar in all cases and differ mainly for the
aromatic molecules due to the different sizes of the pi-orbital domains. The largest fraction
of correlation energy is recovered for the very small molecules water and formaldehyde, the
smallest one for alkanes like pentane or cyclohexane. The surprising fact that these most
saturated and well localized systems are most strongly affected by the domain approxi-
mation has been discussed before[10]. Most likely, this is related to the intramolecular
basis set superposition error, which is expected to be largest for molecules in which many
atoms have a tetrahedral environment. In the local methods, the BSSE is minimized by
construction. Clearly, these variations can have a significant effect on reaction energies.
One extreme case, the hydration of benzene to cyclohexane, has been studied in Ref. [10].
4.4.3 Local Gradients
An important disadvantage in the use of NLMOs as occupied space is that no single min-
imization criterion is available. This is relevant for the calculation of analytical gradients.
The geometry dependence of the localized orbitals coefficient matrix L with respect to a
nuclear displacement λ is given by
L(λ ) = C∆C(λ )U∆U(λ ). (4.10)
The matrices C and U have already been defined in Eqs. (2.14) and (2.47). In this case,
they refer respectively to the canonical orbital coefficients and localization matrices at the
reference geometry (λ = 0). The geometry dependence is given by ∆C(λ ) and ∆U(λ ).
The contribution to the gradient from ∆C(λ ) is computed with help from the Coupled-
perturbed Hartree-Fock equations, and bears no relation to the localization method used.
82 Chapter 4. Natural Localized Molecular Orbitals for Local Correlation Schemes
4 8 12 16 20 24 28
molecule
0
4
8
12
44
48
|  ∆
 
d o
m
a i
n s
 |
Pipek-Mezey/BP
NLMO/NPA=0.100
NLMO/NPA=0.050
NLMO/NPA=0.025
aug-cc-pVDZ
~
~ ~
~
4 8 12 16 20 24 28
molecule
0
4
8
12
36
40
|  ∆
 
d o
m
a i
n s
 |
Pipek-Mezey/BP
NLMO/NPA=0.100
NLMO/NPA=0.050
NLMO/NPA=0.025
aug-cc-pVTZ
~
~
~
~
4 8 12 16 20 24 28
molecule
0
4
8
12
16
20
|  ∆
 
d o
m
a i
n s
 |
Pipek-Mezey/BP
NLMO/NPA=0.100
NLMO/NPA=0.050
NLMO/NPA=0.025
aug-cc-pVQZ
Figure 4.5: Domain changes ∆ for the aug-cc-pVDZ, aug-cc-pVTZ and aug-cc-pVQZ basis sets in compari-
son to the cc-pVDZ basis. See text for a definition of ∆.
4.4. Comparison to Boughton-Pulay 83
4 8 12 16 20 24 28
molecule
0
2
4
6
8
10
|  ∆
 
d o
m
a i
n s
 |
Pipek-Mezey/BP
NLMO/NPA=0.100
NLMO/NPA=0.050
NLMO/NPA=0.025
cc-pVTZ
4 8 12 16 20 24 28
molecule
0
2
4
6
8
10
|  ∆
 
d o
m
a i
n s
 |
Pipek-Mezey/BP
NLMO/NPA=0.100
NLMO/NPA=0.050
NLMO/NPA=0.025
cc-pVQZ
Figure 4.6: Domain changes ∆ for the cc-pVTZ and cc-pVQZ basis sets in comparison to the cc-pVDZ basis.
See text for a definition of ∆.
The dependence of the localization procedure on the geometry must however be defined.
In the Pipek-Mezey localization procedure, one minimizes the number of atoms where the
orbital is localized, as measured by the Mulliken population. For orbital φi, the Mulliken-
charge at atom A is given by
QAi = ∑
µ∈A
∑
ν
LµiSµνLν i
=∑
jk
U jiUki ∑
µ∈A
∑
ν
Cµ jSµνCνk. (4.11)
84 Chapter 4. Natural Localized Molecular Orbitals for Local Correlation Schemes
4 8 12 16 20 24 28
molecule
98
98.5
99
99.5
100
p e
r c
e n
t  c
o r
r e
l a
t i o
n  
e n
e r
g y
NLMO (TNPA=0.025)
NLMO (TNPA=0.050)
Pipek-Mezey (TBP=0.985)
Figure 4.7: Comparison of the fraction of correlation energy relative to canonical MP2 (in percent) using the
NLMO/NPA and PM/BP methods for localization and domain selection. The cc-pVTZ basis set was used.
The Pipek-Mezey localization consists in minimizing the value
qi =
[
M
∑
m=1
(Qmi)2
]−1
(4.12)
for each occupied orbital, which is equivalent to maximizing the function
F =∑
i
∑
m
(Qmi)
2 . (4.13)
The localization functional must be stationary with respect to infinitesimal changes in the
geometry, subject to the orthonormalization constraint. As discussed in Ref. [56], these
conditions are fulfilled when
M
∑
m
[Smll −Smkk]Smkl = 0, for all k > l, (4.14)
4.4. Comparison to Boughton-Pulay 85
with
SAkl = ∑
µ∈A
∑
ν
[
LµkSµνLν l +LµlSµνLνk
]
. (4.15)
These correspond to the Coupled-perturbed localization equations. Since NLMO orbitals
have no unique minimization criterion, there is no such set of equations in this case. A pos-
sible solution to this problem is to use the NPA domain criterion together with Pipek-Mezey
orbitals (PM/NPA). The matrix V in Eq. (4.9) is then substituted by the transformation ma-
trix from NAOs to PM LMOs
V′ =
(
TNAO
)−1LPM. (4.16)
The NLMOs can also be used as a starting guess for the PM localization, in order to keep
the PM orbitals as similar as possible to the NLMOs in cases in which PM localization is
not unique.
The PM/NPA combination was tested for all 30 molecules and 6 basis sets. Using the
standard PM method, the domains were found to still vary considerably, although less than
in the PM/BP case. However, significant improvements could be achieved by removing
some functions from the localization criterion. For the cc-pVXZ basis sets, the most dif-
fuse basis function of each angular momentum type for each atom was removed. For the
augmented basis sets the two most diffuse functions were removed (as already mentioned,
this can be done by zeroing the corresponding rows and columns of the overlap matrix used
in the PM procedure). With these changes the NPA-based center charges are almost as sta-
ble as as those of the NLMO/NPA combination. For the recommended value of TNPA = 0.05
only two domain changes were observed for the whole test set. In the benzoquinone and
formaldehyde molecules one of the carbonyl oxygen lone pairs changed to a CO bond for
some basis sets.

Chapter 5
Local Quantum Mechanical Hybrid
Scheme
87

5.1. Localized Orbitals as Molecular Subspaces 89
5.1 Localized Orbitals as Molecular Subspaces
As discussed in Chapter 2, hybrid QM/MM schemes are nowadays fundamental tools not
only in Computational Biochemistry, but also in general solvation and solid state problems.
Performing a search in the Web of Science with the term "QM/MM", gives a total of
1,051 results for the period of 2000-20061. This is a clear evidence for their wide range of
use and their importance in Chemistry today.
From the two schemes available, the subtractive approach is the most flexible in the
combination of methods. It allows not only to couple quantum and molecular mechanics,
it also supports coupling of different quantum chemical methods. Applications are not
limited to large biological molecules. Even in small to medium-sized systems savings
can be made by defining a small group of atoms where a higher level approach should be
used. However, this involves cutting out a model system from the whole (what has been
previously refered to as cluster in Section 2.3). If this system has a covalent bonding to the
rest of the molecule, the dangling bonds must be saturated. In the ONIOM scheme,[71]
this is done by adding a hydrogen atom along the broken covalent bond with a predefined
distance. By performing a high and a low-level calculation on the model, and calculating
the ∆Esub correction (see Eq. (2.69)), the errors due to this link atom are mostly cancelled
out. Nonetheless, this procedure has several limitations
(1) The link atom should be able to mimic the properties of the deleted moiety. The use
of a hydrogen atom to cap a dangling C-C bond is usually sufficient, but double or
highly polar bonds should be avoided. Cutting through aromatic systems is also not
possible.
(2) Three calculations are needed to obtain a single energy value, independently of the
method used.
(3) Polarization effects are not included in the higher level calculation. The higher level
correction for the model is computed in vacuo, without the effect of the environment.
This is true for IMOMO, but electrostatic embedding has been recently implemented
in the ONIOM QM/MM coupling (also sometimes referred to as IMOMM).[54]
All of these problems are connected to the use of a different Hamiltonian for the high-level
correction. Could it be possible to design another hybrid QM/QM approach which would
not suffer from the same faults?
1This is actually a modest estimate, since the search is only performed on the title, abstract and keywords.
Also, the subtractive schemes usually avoid the QM/MM designation.
90 Chapter 5. Local Quantum Mechanical Hybrid Scheme
The first thing to correct is the way the model system is defined. If atoms are the
basis for this definition, it is inevitable to cut through bonds. Although the nuclei are well
defined in space and easy to group together, the electrons, on the other hand, are spread
out through many atomic centers. It is evident why atoms are not the best choice for
defining model systems. In fact, since one is interested in solving an electronic problem,
the most straightforward solution would be to split the electrons into different groups. In
a converged RHF solution, each electron pair occupies a distinct molecular orbital. These
molecular orbitals define subspaces and in theory one could use them to define the model.
However, canonical orbitals usually span most of the molecular space, and it would be
unreasonable to select a group of orbitals to describe local reaction energetics. There is no
criteria to select individual contributions. On the other hand, localized orbitals are perfect
candidates since they are well located in a region of the molecule and chemical sense could
guide us in the choice of the most significant orbitals for the problem in question.
Another issue is that in correlated post-HF calculations the space spanned by an electron
is not only given by the occupied orbital, but also by the virtual orbitals into which it can be
excited. In local correlation methods, as discussed in Chapters 2 and 3, the excitation space
is also local since domains are used to restrict the virtual space. Each orbital has its own
domain and therefore, even in correlated methods, the space spanned by each individual
electron is well defined.
The discussion above hints at a new way to couple different quantum chemical methods
into a single calculation. Each LMO or group of LMOs, together with their associated
domains, can be viewed as a subsection of the system and can be individually treated at
a specific level of theory. Similar to IMOMO, a lower level method, e.g., LMP2, can be
applied to a large part of the molecule or the whole system, and a higher level method, e.g.
LCCSD(T), to a smaller subset of LMOs. As in IMOMO, Eq. (2.69) is effectively used to
compute the final energy, but no artificial splitting of the molecule is needed, and only one
calculation needs to be performed. Moreover, the same Hartree-Fock orbitals are used in
the low-level and high-level calculations, and optionally a coupling of the high-level region
to the environment can also be introduced at the correlated level.
5.2. Local Regions Approach 91
5.2 Local Regions Approach
5.2.1 Method
The new hybrid QM/QM coupling scheme should allow, in a single calculation, for the
treatment of molecular regions at different levels of accuracy through the use of HF and
local correlation methods. Several local methods have been implemented in the Molpro
program package over the last years, including Configuration Interaction methods, Møller-
Plesset perturbation theory and Coupled Cluster up to perturbative triples (for a list of
references, please consult Chapter 2). This discussion will be restricted to the use of LMP2,
LCCSD and LCCSD(T), since these are the most commonly used quantum mechanical
electronic structure methods to date.
The principle behind the method is to divide the orbitals into regions, each with a dif-
ferent correlation level. The same nomenclature is used as in the ONIOM approach. If two
regions are defined, with the high-level treatment being LMP2, and the low-level HF, this
will be referred to as a LMP2:HF calculation. If another region is added, using LCCSD(T),
the name given will be LCCSD(T):LMP2:HF. Due to the similarity to the IMOMO ap-
proach, and the use of local methods, the scheme will be referred to as Local Molecular
Orbital : Molecular Orbital (or LMOMO for short).
In the LMOMO scheme, one starts by performing a HF calculation for the whole sys-
tem. After localization, a list of centers is assigned to each LMO as described in Section
2.1.4, and then each LMO is assigned to a region. In the current implementation, a max-
imum of three regions is allowed: a high-level region to be treated by, e.g., LCCSD(T),
a low-level region, to be treated by LMP2, and the remainder, which is not correlated. It
would be straightforward to extend this concept further.
The assignment of the LMOs to the regions is done as follows:
(1) A list of atoms and the corresponding correlation method for a region is provided as
input.
(2) All LMOs which contain at least one of these centers in their domain lists are as-
signed to the region.
(3) The second region may be provided, and in this case steps 1 and 2 are repeated. The
higher level regions should be assigned at the end (e.g., when coupling LCCSD and
LMP2, the LMP2 region should be given first, and the LCCSD one afterwards).
(4) If no further region is provided, all remaining orbitals are treated by a default method;
normally, this is HF (i.e. not correlated), but another method, e.g., LMP2, may be
92 Chapter 5. Local Quantum Mechanical Hybrid Scheme
specified as well.
The LMOMO procedure can be implemented as an extension to the pair approximations
(discussed in Section 2.1.4). Consider a LCCSD(T):LMP2:HF hybrid calculation. The
orbital pairs made up of LCCSD(T) orbitals will be classified as strong pairs, the LMP2
orbitals will build weak pairs, and pairs with HF orbitals are simply neglected (very distant
pairs). Mixed pairs will belong to the lowest level regions.
The low-level correlation calculation, usually LMP2, is performed with all orbital pairs
that can be formed from the orbitals in the high-level and low-level regions. Thus, the
coupling between the different regions is fully included in the LMP2. Optionally, very
distant pairs can be removed in order to achieve linear scaling. The final correlation energy
is computed as a sum of pair energies; for all strong pairs the LCCSD pair energy is taken,
for all other pairs the LMP2 pair energy. Finally, the triples correction is added. This is
exactly as in any LCCSD calculation without regions.
The restriction of the pair list automatically leads to a restriction of the number of
transformed integrals needed in the calculation. For LMP2, there is a one-to-one corre-
spondence between the amplitudes T i jrs and the required electron repulsion integrals (ri|s j),
and therefore the list of integrals is directly determined by the pair list and the associated
pair domains. In the LCCSD(T) method further and larger integral classes are needed[4].
However, the occupied integral labels are always related to pair labels, and the virtual ones
to domains of correlated pairs. Thus, the number of required integrals is determined by
the list of high-level pairs. If the size of the high-level region is fixed, the number of
transformed integrals becomes independent of the size of the molecule. This leads to an
asymptotic O(1) scaling.
5.2.2 Preliminary Tests
Peptide Bond Formation Energies
The first system under study was a condensation reaction between polyglycine residues.
Specifically, the formation of (gly)8 from two (gly)4 chains with water as a by-product was
considered. Since the chains were assumed to be linear and the structures were not fully
optimized, this is just a convenient model to study a number of different approximations at
relatively low cost. It is also a system commonly used for evaluating the scaling properties
of local correlation methods. The results are shown in Fig. 5.1, where the reaction energies
are plotted as a function of the size of the high-level region.
Four sets of calculations were performed: In the first case, the high-level method was
LMP2, and the remainder was not correlated (LMP2:HF). In the second case, similar cal-
5.2. Local Regions Approach 93
N H
H
C
CN
CC
NC
H H
OH
HHO
CN
CC
H
H
O
O HHH
O
H H
N H
H
C
CN
CC
NC
H H
OH
HHO
CN
CC
H
H
O
O HHH
O
H H
A
N H
H
C
CN
CC
NC
H H
OH
HHO
CN
CC
H
H
O
O HHH
O
H H
N H
H
C
CN
CC
NC
H H
OH
HHO
CN
CC
H
H
O
O HHH
O
H H
B
N H
H
C
CN
CC
NC
H H
OH
HHO
CN
CC
H
H
O
O HHH
O
H H
N H
H
C
CN
CC
NC
H H
OH
HHO
CN
CC
H
H
O
O HHH
O
H H
C
N H
H
C
CN
CC
NC
H H
OH
HHO
CN
CC
H
H
O
O HHH
O
H H
N H
H
C
CN
CC
NC
H H
OH
HHO
CN
CC
H
H
O
O HHH
O
H H
D
+
+
+
+
10 20 30 40
number of orbitals in the high-level region
-8
-7
-6
-5
-4
-3
-2
∆ E
R
 
/  k
c a
l  m
o l
-
1
LMP2:HF
LCCSD(T0):HF
LCCSD(T0):LMP2
LCCSD(T0):LMP2:HF
A
B
C D
HF
LMP2
LCCSD(T0)
Figure 5.1: Reaction energies ∆ER (in kcal mol−1) for the peptide bond formation between two gly4 chains
computed at various levels of theory with the cc-pVTZ basis set. The horizontal lines represent reference
values, according to the legend on the right hand side. The high-level selection is depicted in the Lewis
diagrams shown above and further explained in the text. The atoms included in the high-level region are
shown in black, the remaining molecule in light gray.
94 Chapter 5. Local Quantum Mechanical Hybrid Scheme
culations were performed with LCCSD(T0) as high-level method, i.e., LCCSD(T0):HF.
Third, the high-level method was again LCCSD(T0), but all remaining orbitals were
correlated using LMP2 (LCCSD(T0):LMP2). And finally, the low-level LMP2 re-
gion was restricted to be region D, and the rest of the molecule was left uncorrelated
(LCCSD(T0):LMP2:HF). In each of these cases, the size of the high-level region was varied
as shown in Fig. 5.1 (regions A,B,C,D). In addition, full LMP2 and LCCSD(T0) calcula-
tions were performed for comparison. The abscissa of Fig. 5.1 corresponds to the number
of orbitals in the high-level region. Zero means the result of just the low-level calculation
(HF or LMP2), while on the right hand side of the scale the results of the full LMP2 and
LCCSD(T0) calculations are indicated.
The figure shows that HF strongly underestimates the reaction energy, while LMP2
overestimates it. The convergence of the LMP2:HF result with respect to the size of the
high-level region is rather slow. Case A, which just correlates the terminal NH2 and COOH
groups, is clearly insufficient. The error is reduced to about 1 kcal/mol if all atoms up to
the second neighboring bonds (relative to the atoms involved in the reaction) are included
in the high-level region (case B). However, a satisfactory value of the reaction energy is
only obtained by including a larger section of the system (cases C or D). For region D, the
error amounts to at most 0.1 kcal/mol.
The same trend is visible in the LCCSD(T0):HF case. Only with region C is the value
close to the full result. The convergence can however be greatly improved by the use of
LMP2 in the lower level regions. Both LCCSD(T0):LMP2 and LCCSD(T0):LMP2:HF
results are quite similar. The inclusion of the first-neighboring atoms is already sufficient
to obtain near LCCSD(T0) values. It should be noted that the cost of LMP2 is small
when compared to the coupled cluster calculation, and therefore comparable timings are
obtained for region B with LCCSD(T0):HF or LCCSD(T0):LMP2 (for further information
on timings see Section. 5.2.3).
Including Environment Effects in the High-Level Region
In Section 2.3, the concept of electronic embedding was introduced in the context of
QM/MM coupling schemes. It is defined as the (approximate) inclusion of polarization
effects due to the surrounding environment. If the lower level region is represented by a
force field, the embedding is performed by including point charges in the QM-Hamiltonian,
which represent the environment atoms. If the lower level region is itself treated at the QM
level (QM/QM), these effects are left out. A relatively straightforward solution to the prob-
lem could be the use of RESP charges, computed from a lower level run of the host, and
including the charges on the other two calculations (see Eq. 2.69) just as in the QM/MM
5.2. Local Regions Approach 95
case. However, no work has been done in this direction. In the LMOMO scheme, as pre-
viously discussed, electronic embedding is implicitly included. This has an influence not
only on the HF value, but also on the correlated calculation, since the reference function is
changed.
Another type of effect yet to be considered, is what could be referred to as correlation
embedding. Let us consider a LCCSD(T):LMP2 calculation. In the LMOMO scheme, the
Ri j MP2 residuals will be computed for each orbital pair. On top of this calculation, a
group of orbitals will be selected, the doubles and singles CCSD residual equations will be
solved and the perturbative triples calculated. For the LCCSD(T) step, only orbital labels
belonging to the high-level region will be considered, and the LCCSD(T) contribution will
be therefore independent of the LMP2 amplitudes outside the region.
Shown below is the LCCD residual2 in matrix form
Ri j =Ki j+K(Ti j)+∑
kl
[
Kkli j + tr(T
i jKlk)−δ jlβki−δkiβl j
]
S˜TklS˜+Gi j+G ji, (5.1)
with
βi j = Fi j+∑
k
tr
([
2Kik−Kki
]
Tk j
)
. (5.2)
The Gi j matrices include many more couplings which need not to be discussed at this time
(see Refs. [33, 4]). It is possible to include the LMP2 computed amplitudes of the sur-
rounding environment in the sum running over kl in Eq. (5.1). In the LMOMO scheme, as
defined up till now, this sum will only run over pairs of orbitals in the high-level region. By
adding these extra terms, one is effectively including correlation effects of the neighboring
region into the high-level calculation. As before, this can be implemented through the use
of the local pair approximations. Neighboring orbitals in the regions border (one orbital
located in the high-level region, the other in the low-level) may be classified as close pairs.
The program can then be instructed to include the LMP2 amplitudes into the CC residuals.
This requires only a small extra computational cost. The sum will be larger, and the nec-
essary K operators have to be computed. This is , however, for most purposes insignificant
when compared to the total cost of the LCCSD(T) calculation. The most expensive term to
compute is the K(Ti j) term, which is diagonal in the pair index.
Calculations have been performed in order to access the effect of this coupling between
regions. One should keep in mind that only orbital pairs close to the region border are added
(the ones which in a regular LCCSD(T) run would be classified either as strong or as close).
2The singles are not discussed to keep the formulae simple and compact. Including the terms arising from
the singles would not change the following discussion.
96 Chapter 5. Local Quantum Mechanical Hybrid Scheme
10 20 30 40
number of orbitals in the high-level region
-1
-0.8
-0.6
-0.4
-0.2
0
∆ ∆
E  
/  k
c a
l  m
o l
-
1
without embedding
with embedding
A
B C
D
Figure 5.2: Reaction energy error ∆∆E (in kcal mol−1) for LCCSD(T0):LMP2 calculations on the conden-
sation reaction of glycine tetrapeptides. The high-level regions are described in Fig. 5.1. In one set of
calculations, orbital pairs at the boundary region were classified as close, and their amplitudes were included
in the CC residuals (with embedding). For the other set, only strong pairs at the high-level region were
included (without embedding).
The test system was again the condensation reaction of two gly4 residues, and the method in
test was LCCSD(T0):LMP2. The results are shown in Fig. 5.2, with the same regions as in
Fig. 5.1. Two sets of calculations were carried out, one without including amplitudes from
the low-level region, other including close pairs connecting both regions and introducing
the LMP2 amplitudes in Eq. (5.1). The figure plots the error of the LMOMO calculation
relative to the full LCCSD(T0) result
∆∆E= ∆ER(LCCSD(T0) : LMP2)−∆ER(LCCSD(T0)). (5.3)
It is preferable to plot ∆∆E instead of ∆ER in comparing both calculations, since the
close pairs within the LCCSD(T0) region are also included in the CC residual, and the
LCCSD(T0) result converges to a different value. The effect is, however, small. The dis-
cussion is also out of the scope of this work.
It is observed in Fig. 5.2 that the effect on the energies is rather small, except for selec-
5.2. Local Regions Approach 97
tion A. This effect is obviously due to the small region size. The number of orbitals in the
high-level region is so small that the CC amplitudes are not well converged (in comparison
to the ones in the full local CC treatment). Comparing the result with embedding for region
A, and for region B without, it is clear to see that the improvement on going from selection
A to B (in the non-embedding case) is mostly due to a better description of correlation in
the smaller region, an effect of the added neighboring orbitals in the residuals. It is also
interesting to see an almost linear error for the case with embedding.
Calculations were carried out for two other systems, later discussed in Sections 5.3.1
and 5.3.2. The effect was found to be similar, slightly improving the energies for smaller
regions, but never more than 1 kcal mol−1. Since the effect of including the close pairs
in the CC iterations (for the full calculation) is of the same magnitude, it is questionable
whether this option should actually be used.
5.2.3 Scaling of the Method
The use of correlation regions should lead asymptotically to the scaling regime of the lower
level calculation. As long as the high-level region is fixed, extending the molecular system
should not influence its cost. If HF would be the method of choice for the low-level, this
would mean an O(1) scaling3. The present implementation, although not optimized for the
integral transformation, already shows optimal scaling for the iteration steps and the triples
calculation. To evaluate the scaling and computational cost of this approach the peptide
bond reaction energy was computed
2 glyn −→ gly2n + H2O ,
where n stands for the number of glycine residues in the polypeptide.
The LMP2 case
In a LMOMO calculation, the number of pairs only depends on the correlated region size.
In the LMP2:HF case, as previously discussed, all quantities are directly connected to the
pair list, and the O(1) scaling should be directly observable.
LMP2:HF/cc-pVDZ calculations were performed for reactions n= 2,6, with the same
atom selection C as in Fig. 5.1. The iteration timings are compared to the full LMP2
counterparts in Fig. 5.3. The LMP2 curve shows the expected linear scaling, while the
3This discussion is only aimed at the correlation calculation.
98 Chapter 5. Local Quantum Mechanical Hybrid Scheme
2 3 4 5 6
number of glycine residues  n
0
50
100
150
200
250
C P
U  
T i
m
e  
/ s
LMP2
LMP2:HF
Figure 5.3: Timings (in s) for LMP2/cc-pVDZ and LMP2:HF/cc-pVDZ MP2 iterations on the protein bond
reaction. All calculations were performed on an AMD-Opteron MP2400+ dual processor machine. Density
fitting approximations were used.
LMP2:HF curve is flat, with O(1) scaling. The computational cost for the transformation
steps is still not optimal (linear scaling) due to a dependence on the basis set size. However,
even without the proper asymptotical scaling behavior, significant savings are made.
The LCCSD case
In the case of LCCSD, obtaining the target O(1) scaling behavior involves some further
approximations, due to the slow decay of some 1- and 3-external integrals. These arise in
terms from the interaction between singles and doubles. The first term in question is found
in the pair residuals
Ri jrs = · · ·−∑
kl
∑
tu
S˜rtT
l j
tu S˜us∑
v
tkv [2(vk|li)− (vl|ki)]. (5.4)
The slow decay is due to 1-external integrals (vk|ll) which decay only with the inverse
square of the distance between orbitals k and l. The other term is connected to 3-external
5.2. Local Regions Approach 99
10 20 30 40
number of orbitals in the high-level region
-6
-5.8
-5.6
-5.4
-5.2
-5
∆ E
R
 
/  k
c a
l  m
o l
-
1
with long-range terms
without long-range termsA
B
C
D
Figure 5.4: Reaction energies ∆ER (in kcal mol−1) for the peptide bond formation between two gly4 chains
computed at the LCCSD(T0):LMP2:HF/cc-pVTZ level of theory. The high-level selection is depicted in the
Lewis diagrams shown in Fig. 5.1 and further explained in the text.
integrals
Gi jrs = · · ·+∑
tu
∑
k
∑
v
S˜rtT
i j
tu t
k
v [2(vk|us)− (uk|vs)]. (5.5)
The integral (vk|us) has the same dependence as the previously discussed 1-external in-
tegral. In the current LCCSD program version, the term in Eq. (5.5) is fully computed,
with the cost scaling quadratically. However, the prefactor is small and is only expected
to become a bottleneck for very large calculations. The 1-external term is truncated using
additional distance criteria, just as discussed in Ref. [5]. In the case of a LMOMO cal-
culation, keeping such terms has a strong effect on the scaling, and leads to a breakdown
of the O(1) asymptotic limit. However, since both terms have opposite signs, and they
should cancel out at large distances (to restore the asymptotic r−6 distance dependence),
one can in fact neglect them, introducing only small errors in the calculation. LMOMO
LCCSD(T0):LMP2:HF calculations for the gly4 condensation reaction were repeated, both
including the two contributions and removing them. The results are shown in Fig. 5.4.
The maximal error made by neglecting the aforementioned integrals is 0.02 kcal mol−1.
100 Chapter 5. Local Quantum Mechanical Hybrid Scheme
2 3 4 5 6
number of glycine residues  n
0
200
400
600
800
C P
U  
T i
m
e  
/  i
t e
r a
t i o
n  
x  
1 0
 s
LCCSD
LCCSD:HF
Figure 5.5: Timings (in s) for LCCSD/cc-pVDZ and LCCSD:HF/cc-pVDZ CC iterations on the protein bond
reaction. The time is computed as the average of each iteration, and multiplied by a factor 10 (the regular
number of cycles needed to converge the CCSD solution). All calculations were performed on an AMD-
Opteron MP2400+ dual processor machine. Density fitting approximations were used.
Such an error is much lower than those introduced by the local approximations, and of
the same order of magnitude as the density fitting errors. Also of interest is the fact that
the error remains constant with increasing CC region size. Therefore, the approximation
seems to be well-founded, and can be used to significantly reduce the cost of LMOMO
calculations.
At the time of this work, the integral transformation costs still scaled linearly, but the
iterations showed the expected O(1) scaling. The timings for LMOMO LCCSD:HF and
LCCSD calculations are shown in Fig. 5.5. The iteration time is defined as the average of
all iteration steps multiplied by a factor 10. This is due to the fact that the regular LCCSD
calculation takes 11 iterations (or more) to converge, while the LCCSD:HF only takes 10.
In order to effectively compare the iteration times, the number of steps should be the same.
The reduced number of iterations is due to the reduced number and size of the residuals to
compute, which is a further advantage for the hybrid scheme.
The LCCSD timings only show linear scaling behavior beyond n = 4, somewhat later
5.2. Local Regions Approach 101
2 3 4 5 6
number of glycine residues  n
0
500
1000
1500
C P
U  
T i
m
e  
/ s
(T0)
(T0):HF
Figure 5.6: Timings (in s) for LCCSD(T0)/cc-pVDZ and LCCSD(T0):HF/cc-pVDZ triples calculations on
the protein bond reaction. All calculations were performed on an AMD-Opteron MP2400+ dual processor
machine. Density fitting approximations were used.
than in the LMP2 case, certainly due to the larger CC operator lists4. The O(1) scaling
behavior of LCCSD:HF is already visible beyond n = 2, a rather early onset. However, it
is not totally independent of the molecular size, and there are still some steps which slowly
grow in weight, although not visible in the graph. These steps may be an outcome from the
use of regular sized-matrices or some of the loop structures, but are nevertheless too small
in size to become relevant (for any application size where HF can still be computed).
The (T0) perturbative triples correction
In the triples calculation only excitations from those orbital triples (i jk) are included for
which i, j, and k belong to the high-level region. Again, this list is further reduced by the
condition that one of the pairs (i j), (ik), or ( jk) is a strong pair; the two other pairs can
4In the case of local Coupled Cluster, further distance criteria are used for the operator lists. The {K}
operator list, for example, will include all pairs which are within 8 Bohr from each other. This means that
only for molecules with a diameter significantly above 16 Bohr will the linear scaling regime be visible. The
glycine tripeptide measures about 22 Bohr.
102 Chapter 5. Local Quantum Mechanical Hybrid Scheme
either be strong or close[5, 6, 7]. In order to obtain an accurate triples energy, it is necessary
to include the close-pair amplitudes in the triples calculation. Only pairs with two orbitals
in the high-level region are treated as close pairs; the close-pair amplitudes are determined
in the low-level calculation.
Calculations were performed for LCCSD(T0):HF, and the timings for the triples calcu-
lation is given in Fig. 5.6. The desired scaling is obtained already for n= 2, and consider-
able savings are made with the use of LMOMO.
5.3 Test Applications
5.3.1 Proton Transfer
In this section, a first application of the LMOMO scheme is discussed. The reaction is
depicted in Fig. 5.7, a proton transfer step for an intermediate involved in the reaction be-
tween bis(1,3,4-thiadiazole)-1,3,5-triazinium halides and nitrogen-based nucleophiles.[85]
The active site, although well localized (the proton ’jumps’ from one nitrogen to a neigh-
bouring one), is difficult to model since the atoms involved are located in a ring structure.
For these cases, when using QM/MM or ONIOM-related methods, one would be forced to
include the aromatic rings into the model system. This is due to the fact that ’link atoms’
cannot accurately replace the aromaticity effect. In the LMOMO scheme, however, cutting
through the rings is possible, since most of the aromaticity effect is contained in the SCF,
which is always performed for the whole system. The structures have been preoptimized at
the B3LYP/cc-pVDZ level.
The reaction energy barrier was computed with different region sizes and methods. The
cc-pVTZ basis set[75] was used. The results are shown in Fig. 5.8, plotting the error of the
barrier height relative to the pure high-level result as a function of the orbitals included in
the region. The first plausible choice for the high-level region are the three atoms involved
C
N
C
N
C
N
S
C
NN
C
S
CH3H
CH3
N
H H
CH3
H3C
C
N
C
N
C
N
S
C
NN
C
S
CH3H
CH3
N
H H
CH3
H3C
Figure 5.7: Proton transfer reaction.
5.3. Test Applications 103
in this reaction, two nitrogens and the migrating proton. In Fig. 5.8 this selection is denoted
as region A. For the combination of HF with LMP2 or LCCSD(T0), it is seen that already
3/4 of the correlation correction to the HF value is obtained. The error amounts, however,
still to 3-4 kcal mol−1. By adding neighboring atoms to the region the results converge
only slowly. Only with the selection C, which includes most of the atoms in the rings, the
target value is approached to within 1 kcal mol−1. This shows the physical significance of
including all contributions from the aromatic system into the correlation treatment. How-
ever, combining LCCSD(T0) with LMP2 for the low-level region (LCCSD(T0):LMP2),
one obtains a much more stable result. In this case already region A yields a value that
is in very close agreement with the full LCCSD(T0) calculation, a very satisfying result.
LCCSD(T0):LMP2:HF calculations, with selection D as the LMP2 region (leaving only
the methyl groups uncorrelated) give the same results as LCCSD(T0):LMP2 to within 0.02
kcal/mol. These values were left out of Fig. 5.8, as they would simply superimpose with
the other set.
The effect of removing the contributions shown in Eqs. (5.4) and (5.5) from the CC
residuals was tested also for the LCCSD(T0):LMP2:HF case. The effect was found to be
as small as previously discussed in Section 5.2.3 for the poliglycine condensation reaction.
5.3.2 Hydroxylation Reaction
The second example concerns the hydroxylation step in the p-Hydroxybenzoate Hydrox-
ylase (PHBH) enzyme catalytic cycle. The cofactor FADOOH hydroxylates the substrate
(para- hydroxybenzoate) after being oxidized by molecular oxygen. The peroxide moiety
is broken and an OH group is moved onto the substract ring. The reaction is represented
in Fig. 6.2. QM/MM calculations have been carried out with the cofactor and substrate
treated at the QM level, and the remaining enzyme and solvent at the MM level. The MM
environment is accounted for by point charges in the QM Hamiltonian. Further details on
the reaction and the QM/MM modeling are given in Section 6.2.
The hydroxylation occurs between two aromatic systems, and therefore breaking bonds
close to the reaction should lead to large errors. This system is another example where the
LMOMO approach allows to divide the system, even with significant electron delocaliza-
tion. As before, the smallest region included the atoms directly involved in the process (the
peroxide moiety and the carbon to which the OH group moves). The first, second, and third
neighboring atoms were then added to the list. In the coupled cluster region, the distance
criteria for weak pairs was set to 7 Bohr, in order to include triples in the interaction of the
substrate and cofactor with the migrating OH group. Further details of these calculations
104 Chapter 5. Local Quantum Mechanical Hybrid Scheme
C
N
C
N
C
N
S
C
NN
C
S
CH3H
CH3
N
H H
CH3
H3C
C
N
C
N
C
N
S
C
NN
C
S
CH3H
CH3
N
H H
CH3
H3C
C
N
C
N
C
N
S
C
NN
C
S
CH3H
CH3
N
H H
CH3
H3C
C
N
C
N
C
N
S
C
NN
C
S
CH3H
CH3
N
H H
CH3
H3C
A B
C D
0 10 20 30 40
number of orbitals in the high-level region
-5
0
5
10
15
∆ ∆
E  
/  k
c a
l  m
o l
-
1
LMP2:HF
LCCSD(T0):HF
LCCSD(T0):LMP2
A
B
C D
Figure 5.8: Regions calculation error ∆∆E= ∆E(high:low)-∆E(high) (in kcal mol−1) for the proton transfer
reaction. The error is given as a function of the orbitals included in the high-level region. The first point
is a low-level calculation, the second a regions calculation including the two nitrogens and the transfered
proton in the high-level region. The following points are obtained by adding the next neighbouring atoms
(see ilustration above).
5.4. Comparison to other partitioning methods 105
are given in Section 6.2.
As shown in Fig. 5.9, the region calculation converges to an accuracy of 1 kcal mol−1
rather quickly. In contrast to the previous example, quantitative results are already obtained
when the first neighbors are included in the high-level region of the LCCSD(T0):HF calcu-
lations (region B). Again, when the low-level region is treated by LMP2, the convergence
with region size is further improved, and as in the previous example it is sufficient to use
the smallest region (A) for the high-level calculation. Even though our program is not yet
fully optimized, this leads to dramatic savings (approximately a factor of 10) as compared
to the full LCCSD(T0) calculation. The cost of the correlation calculation is even compa-
rable to the HF calculation. One LCCSD(T0):HF B single point calculation takes about
1142 min, compared to 447 min for the HF. The LCCSD(T0):LMP2 A takes 690 min, less
than double the time. For even larger system sizes the HF calculation should become the
bottleneck. It should also be noted that the algorithm is still not optimized, and that further
savings should be possible.
5.4 Comparison to other partitioning methods
In this Section, a comparison is made between the LMOMO and IMOMO schemes.[71] In
IMOMO, a model system is built and the difference between a low and a high-level method
is used to extrapolate the total energy, using the low-level estimate for the real system. This
allows for the combination of any two (or more) quantum mechanical methods. The local
approach can only combine ab initio local methods (e.g., DFT is not supported). Also,
IMOMO allows for the use of different basis sets. In LMOMO, dual- or multiple basis set
approaches could be used. For instance, it may be sufficient to use a smaller basis set for
regions of the system that are sufficiently far apart from the correlated region. Even though
unphysical polarization artifacts may occur for neighboring atoms with different basis set
sizes, it should at least be possible to reduce the number of polarization functions without
significantly affecting the accuracy of the method. This will, however, not be discussed in
the context of this work.
For the results shown in this Section, a simple Perl program was written and interfaced
to the Molpro program package in order to perform IMOMO calculations. Upon receiving
a geometry the program would generate the model system (with parametrized distances for
the link atoms, see below) and run the three calculations displayed in Eq. 2.69.
106 Chapter 5. Local Quantum Mechanical Hybrid Scheme
C
C
C
C
C
C
H
H
C
H
H
O
OO
C
N
C C
N
C
N
CN
H
C
O
O
OO
H
HC
C C
CH
H3C CH3
H R
C
C
C
C
C
C
H
H
C
H
H
O
OO
C
N
C C
N
C
N
CN
H
C
O
O
OO
H
HC
C C
CH
H3C CH3
H R
C
C
C
C
C
C
H
H
C
H
H
O
OO
C
N
C C
N
C
N
CN
H
C
O
O
OO
H
HC
C C
CH
H3C CH3
H R CC
C
C
C
C
H
H
C
H
H
O
OO
C
N
C C
N
C
N
CN
H
C
O
O
OO
H
HC
C C
CH
H3C CH3
H R
A B
C D
0 10 20 30 40 50 60
number of orbitals in the high-level region
-5
0
5
10
15
20
25
30
∆ ∆
E  
/  k
c a
l  m
o l
-
1
LMP2:HF
LCCSD(T0):HF
LCCSD(T0):LMP2
A
B C D
Figure 5.9: Regions calculation error ∆∆E= ∆E(high:low)-∆E(high) (in kcal/mol) for the hydroxilation re-
action. The error is given as a function of the orbitals included in the high-level region. The first point is
a low-level calculation, the second a regions calculation including the peroxide moiety and the hydroxilated
carbon in the high-level region. The following points are obtained by adding the next neighboring atoms (see
ilustration above).
5.4. Comparison to other partitioning methods 107
5.4.1 Chlorohydrocarbon SN2 reactions
To compare both methods, calculations were performed on a series of chlorohydrocarbons
reactions, a system previously studied with IMOMO by Re et al.[86] The SN2 reactions
with OH− are described by the following general formula
CH(4−n)Cln + OH− → CH(4−n)Cl(n−1)OH + Cl−,
with n= 2,3 or 4.
For this set of calculations, LCCSD(T0) and LMP2 were used as high and low-level
methods. Hartree-Fock is not discussed since it gives large errors, even when used in
combination with the other methods. This had already been mentioned in the above cited
work.[86]
The regions used are the same as in the Re and Morokuma work. The carbon, the en-
tering OH group and the leaving Cl are the high-level region, the lower region corresponds
to the spectating atoms. The only orbitals to be included at the lower level are therefore the
spectating chlorine lone pairs. This is the closest comparison possible, since in any other
case the central carbon would be almost excluded from the high-level region (only one C
orbital would be included). This choice leads to small differences between full LCCSD(T0)
and the regions calculation in the n=2 case, but is already a 50/50 partition for the larger
system.
The results are shown in Table 5.1. Although for the smaller systems the comparison
is quite good (0.2 kcal/mol difference), for the n=4 case there is a small increase in the
error. This should be connected to the fact that this is the largest approximation made,
but it is also interesting to notice that the IMOMO results follow the same trend. The two
procedures compare well with each other, and the errors involved are of the samemagnitude
as expected, from the observations in previous Sections.
5.4.2 Aminoacid-water complexes
For a second series of tests, hydrogen bonded systems were considered. A previous
IMOMO study of Anderson et. al. [87] compared the performance of different high and
low-level methods for the prediction of dissociation energies of aminoacids-water com-
plexes. It was found that the combination MP2:HF gave very small errors. Using the
same HF/6-31+G(d) optimized structures, single point calculations were carried out with
the cc-pVTZ basis set, and a combination of LMP2 and HF (with the IMOMO and local
approaches). The aminoacids included in the study were Asparagine (ARG), Glutamine
(GLN), Serine (SER) and Threonine (THR) (the water complexes are depicted in Fig.
108 Chapter 5. Local Quantum Mechanical Hybrid Scheme
Table 5.1: SN2 reaction energies (in kcal/mol) for the CH(4−n)Cln series, with n= 2,3,4. The basis set used
was aug-cc-pVTZ throughout.
LMP2 LCCSD(T0):LMP2 LCCSD(T0)
IMOMO local
n= 2
reactant complex -22.4 -22.7 (-0.3) -23.0 (-0.6) -22.4
transition state -12.2 -14.0 (-0.5) -13.6 (-0.1) -13.5
product -56.5 -59.1 ( 0.2) -59.2 ( 0.1) -59.3
average absolute error ( 0.3) ( 0.3)
n= 3
transition state -8.5 -10.7 (-1.3) -9.9 (-0.5) -9.4
product -61.5 -64.2 ( 0.5) -64.3 ( 0.4) -64.7
average absolute error ( 0.9) ( 0.5)
n= 4
reactant complex -7.8 -8.1 (-0.5) -8.0 (-0.4) -7.6
transition state 0.9 -0.7 (-0.9) -0.3 (-0.5) 0.2
product -62.9 -65.6 ( 0.9) -65.7 ( 0.8) -66.5
average absolute error ( 0.8) ( 0.6)
5.10). The dissociation energy for the aminoacid + water system was computed, as well
as for the tripeptides (built by extending the respective aminoacid with two Glycine (GLY)
side chains). The results are shown in Table 5.2.
The difference between the two approaches is minimal. The largest error is seen in the
local calculation for the ASN water complex, but even this is below 0.2 kcal mol−1. Also,
the errors seem to be independent of the system size, with comparable accuracy between
the aminoacid and the tripeptides results. The local hybrid scheme could therefore be a
useful tool for the study of weak interactions in biological systems. Possible applications
are enzyme docking and the study of specific DNA strains interactions.
5.4. Comparison to other partitioning methods 109
C
H
N
C H
O
H
CH
H
O
O
H
H
H
O H
C
H
N
C H
C
H
CH
H
O
O
H
H
O H
C
H H
O N H
H
C
H
N
C O
C
H
CH
H
O
O
H
H
H
H
H
H
O H
C
H
N
C HH
CH
H
O
O
H
H
O H
C
O N H
H
ASPARAGINE (ASN) GLUTAMINE (GLN)
SERINE (SER) THREONINE (THR)
Figure 5.10: Aminoacid-water complexes. The tripeptides are built by expanding the aminoacid from both
sides with glycine chains.
110 Chapter 5. Local Quantum Mechanical Hybrid Scheme
Table 5.2: Dissociation energies ∆Ed (in kcal mol−1) computed at the HF/cc-pVTZ and LMP2/cc-pVTZ level,
and errors relative to the LMP2 estimate ∆∆Ed (in kcal mol−1) for the IMOMO and local hybrid schemes.
The same scaling factors as in the Anderson et. al. study (0.786011 for N-C bonds and 0.723866 for C-C
bonds) were used in the IMOMO calculations.
∆Ed ∆∆Ed (LMP2:HF)
HF LMP2 IMOMO local
ASN 5.86 7.27 -0.07 -0.17
GLN 4.83 6.27 -0.01 -0.01
SER 5.00 6.15 0.05 0.00
THR 3.60 6.30 0.09 0.02
GLY-ASN-GLY 6.91 8.71 -0.08 0.04
GLY-GLN-GLY 7.52 9.44 -0.08 -0.03
GLY-SER-GLY 6.14 7.39 0.12 0.01
GLY-THR-GLY 5.41 6.76 0.00 -0.09
average absolute error 0.06 0.05
Chapter 6
Computation of Activation Barriers in
Enzymes
111

6.1. Local Correlation Methods - Tools for Computational Biochemistry 113
6.1 Local Correlation Methods - Tools for Computational
Biochemistry
The accurate prediction of enzyme kinetics from first principles is one of the central goals
of Computational Biochemistry. Possible applications include the development of mutant
analogues of natural ocurring enzymes, with improved reactivity and/or selectivity. Cur-
rently there is considerable debate about the applicability of Transition State Theory (TST)
to compute rate constants of enzyme-catalyzed reactions. Classical TST is known to fail
for some cases, but corrections can be added to include the effects of dynamical recross-
ing and quantum mechanical tunneling, as discussed in Section 2.4.1. Nevertheless, the
framework of TST has been heavily disputed, particularly on the possible role of protein
dynamics and conformational effects on the enzyme activity. The following is taken from
a Feature Article in the Journal of Physical Chemistry B by Prof. Martin Karplus:[88]
"Simple behavior [is] defined by two (related) aspects of reactions. The first is
that a simple phenomenological rate law with an exponential time dependence
for the rate applies and the second is that the temperature dependence of the
rate follows the Arrhenius equation. We have seen that although simple behav-
ior is found in some protein reactions, significant deviations from both types of
simplicity have been documented and interpreted theoretically."
Although some emphasis is given to the "complex" cases, the point is made that Arrhenius
dependence can be seen even in such elaborate systems as enzymes. In the same article,
three requirements are given for this "simple" behavior:
(1) it should be possible to define a reaction coordinate (or other progress variable),
(2) a well-defined barrier with a free energy several times kT separating the reactant and
product states should exist along the reaction coordinate,
(3) and the rate of the reaction (as defined by the reaction coordinate) should be slow
compared to the elementary collisional events that lead to equilibration of the other
degrees of freedom.
All three requirements are interconnected, since (2) and (3) will need the definition of
a reaction coordinate, and (3) would be hard to fulfill without a high activation barrier.
However, it is still quite hard to make a clear distinction between one and the other case,
since all of the points are difficult to prove either theoretically or experimentally, and many
arguments arise over the weight of complex behavior on the activated reactions.
114 Chapter 6. Computation of Activation Barriers in Enzymes
The use of computational methods could help to establish whether TST is adequate for
the quantitative treatment of enzymatic reactions. Comparison of the experimental Gibbs
free energy of activation (∆‡G), assuming simple behavior, with the computed activation
free energy should show the importance of other effects. However, the methods used to
date for estimating activation energies are in general unable to give quantitative predic-
tions. Quantum mechanical/molecular mechanical methods are normally used, but the size
of the required QM region is often too large for accurate ab initio treatment of the active
site. These regions include normally from 20 to 100 atoms, depending on the reaction un-
der study. Semiempirical methods, though applicable to large systems, are generally not
accurate enough because computed free energies of activation may have an error of ten
or more kcal mol−1. DFT offers improved accuracy but still lacks key physical interac-
tions (e.g., dispersion). Often, DFT underestimates barrier heights by several kcal mol−1,
which cannot be systematically improved. Thus, when theoretical barriers do not agree
with those from experiment, it is not clear whether the discrepancy arises from deficiencies
in the electronic structure theory, in the experimental observations, or in the underlying
theoretical framework of QM/MM and TST.
With the development of linear scaling local correlated methods, the amenable system
sizes has grown significantly. One can now routinely treat up to 50 atoms with the density
fitted LCCSD(T0) algorithm as implemented in Molpro. LMP2 calculations have also been
reported for system sizes well above 100 atoms. Density-fitting approximations as well as
explicit correlation terms can be used together with these methods in order to speed up the
calculations with respect to basis set size, or to minimize basis errors in the barrier energies.
The use of local methods could improve the quantum mechanical treatment of the active
site, approaching a 1-2 kcal mol−1 accuracy, as is nowadays routinely possible for small
molecular systems. This would allow for a quantitative comparison between theory and
experiment, checking the validity of TST.
In this Chapter, applications of local correlation methods for the computation of ac-
tivation barriers in enzymes are presented. The two systems chosen seem to fulfill the
requirements for simple behavior. The reaction coordinate is well defined and decoupled
from other movements in the enzyme. The estimated free energy activation barriers are
also estimated to be in the range of 13-15 kcal mol−1, which fulfills requirement (2).
The same procedure will be followed as in the study of small molecular systems, with
the exception of the conformational sampling which is later discussed. In estimating the
free energy activation barrier at a given temperature T , we consider the different contribu-
6.2. The p-Hydroxybenzoate Hydroxylase enzyme 115
tions individually
∆‡G(T ) = ∆‡E0+∆‡EZPVE+∆‡H(0→ T )+T∆‡S. (6.1)
The first term is the electronic activation energy, which is equal to the electronic energy
difference between the reactant and the transition state. The ∆‡EZPVE term is the Zero-
Point Vibrational Energy (ZPVE) correction, while ∆‡H(0→ T ) corrects for temperature
effects1. The last term is the entropic contribution2. For a reaction where a single bond is
broken and a new one formed, both ∆‡EZPVE and ∆‡H(0→ T ) should be small, and of the
order of 1-2 kcal mol−1. If the reaction does not involve large changes to the solvation shell
and the conformational liberty of the substrate is kept, the entropic correction should also
be relatively small. The largest contribution in this case would be given by ∆‡E0, perhaps
even an order of magnitude larger than the other terms. As such, most of the effort should
be focused in reducing the error in the electronic energy. However, one should keep in mind
that there are several documented cases where the reaction bottleneck is due to a diffusion
barrier or an entropic effect. In these cases the discussion above will not apply.
Due to the flexible structure of the enzymatic environment, sampling of the confor-
mational space is needed. This is done by computing a number of reaction trajectories,
starting from different initial structures. These structures are obtained from molecular dy-
namics simulations, and will be referred to as snapshots. The activation barriers can then
be averaged to obtain the final result. No weighting is used since the conformational space
available will be small and this would lead to a strong bias for smaller barriers. Also, the
only quantity to be sampled is ∆‡E0, since the other quantities should not vary much and,
as discussed above, they will have smaller contributions3.
6.2 The p-Hydroxybenzoate Hydroxylase enzyme
6.2.1 Overview
The p-Hydroxybenzoate Hydroxylase (PHBH) enzyme is a flavoprotein classified as
monooxygenase which catalyzes the transformation of p-hydroxybenzoate (pOHB) to 3,4-
dihydroxybenzoate (3,4-DOHB). It plays a major role in the oxidative degradation of aro-
matic compounds, being 3,4-DOHB the substrate for subsequent cathecol ring-cleavage
1This is calculated as ∆‡H(0→ T ) = ∆‡H(TK)−∆‡H(0K).
2This term is normally computed as the difference between the enthalpy and the Gibbs energy at a given
temperature.
3The entropic estimate is taken from dynamic runs, so it does include conformational sampling, but not
in the same way as ∆‡E0, since this effect is only included through the ∆‡G term.
116 Chapter 6. Computation of Activation Barriers in Enzymes
Figure 6.1: Catalytic cycle for the PHBH enzyme.
reactions. The catalytic cycle is shown in Fig. 6.1. It has also been proposed as a biocata-
lyst for the hydroxylation of fluorinated and chlorinated pOHB derivatives.
The enzyme activity is highest in the pH range 7.5-8.5,[89] where the rate-limiting step
is the one depicted in Fig. 6.1 with rate constant k4, and in further detail in Fig. 6.2. The
flavin-adenine (FAD) cofactor is at this stage in its hydroperoxide form (FADHOOH), and
serves as the active hydroxylation agent. pOHB is hydroxylated at the meta position, result-
ing in FADHO and a hydroxycyclohexadienone intermediate. It is known that the reaction
follows the aromatic electrophilic substitution mechanism, with the FADHOOH cofactor
acting as a formal ”OH+” donor. The substrate is believed to be in its dianionic (phenolate)
form during the process, and both the phenolate as well as the resulting oxidoflavin should
be significantly stabilized by a positive electrostatic potential in the enzyme pocket.
The activation enthalpy for this step has been estimated to be around 12 kcal mol−1.
This value was taken from temperature-dependent measurements of the overall rate be-
tween 277 and 298 K, at pH 8.[89] The authors mention that the Arrhenius plot4 yielded
a straight line, but give no further information on the diagram (or error bar). Other mea-
4In an Arrhenius diagram the ln(k) is plotted against 1/t. According to the Arrhenius equation, the slope
of the line should be equal to −∆‡H/R.
6.2. The p-Hydroxybenzoate Hydroxylase enzyme 117
Figure 6.2: Hydroxylation step in
the PHBH catalytic cycle, with
pOHB as substrate. In the up-
per picture only the QM region
is shown. The "-R" group in the
cofactor represents the QM/MM
crossing, and is substituted in the
QM calculations by a link hydro-
gen atom. The remaining FAD-
HOOH structure is depicted below.
surements provided rates of hydroxylation as well as turnover rates, which, converted to
activation free energies ∆‡G, give a rough estimate of 14-15 kcal mol−1 for the hydrox-
ylation step5. Both the enthalpy values as well as the Gibbs free energies will be later
discussed.
Several theoretical works have also been dedicated to the study of the hydroxylation
activation barrier. Already in 1992, a correlation was found between the HOMO en-
ergies of fluorosubstituted pOHB derivatives and the experimental turnover rates for the
compounds.[90] These calculations were done in the gas phase, and only at the semiem-
pirical AM1 level, but already supported the idea that the hydroxylation step could be rate-
determining. The first QM/MM calculation of the system was made by Ridder et al.,[91]
employing a combination of AM1 for the QM region and the CHARMM force field for
the description of the enzymatic environment. A good correspondence was found with the
earlier results of Vervoort et al.[90] However, the computed values for the activation bar-
rier vary considerably according to the QM level used. The first predictions made at the
AM1/CHARMM level gave a value of about 17 kcal mol−1 for the enthalpy.[91] The HF
results drastically overshooted to 30 kcal mol−1, while the higher level corrections at the
B3LYP and LMP2 levels gave 11-12 kcal mol−1 [92] which compared well to the exper-
5According to Eq. (2.71), and setting the transmission coefficient to 1, the Gibbs energy can be estimated
from the rate constant k4 as ∆‡G= RTln
(
kBT
hk4
)
.
118 Chapter 6. Computation of Activation Barriers in Enzymes
imental estimate. This is however in disagreement with findings in the gas phase, which
show that B3LYP and MP2 should underestimate the barrier.[93] In the same work, an
AM1 prediction is given with extraordinary agreement to experiment, which is rather puz-
zling. Only in a recent paper by Senn et al. has this value been revised, and the converged
AM1 QM/MM estimate was found to lie between 22-26 kcal mol−1.[94] These variations
are all linked to the size of the QM system treated, how solvation effects are included and
the TS modeling approach. As such, there are some doubts about many of the theoretical
results previously presented. Our proposal was to combine a well controled approach for
the QM/MM treatment of the system with a converged ab initio result in order to provide a
reliable theoretical estimate on the limit of todays computational tools.
6.2.2 Model Setup and Simulation
QM/MMModel
The QM region chosen is depicted in Fig. 6.2. It includes the pOHB molecule and the
isoalloxazine part of the cofactor up to the first methylene unit of the side chain. The
dangling bond was saturated in the QM region by an hydrogen link atom. This gives a total
of 49 QM atoms. Previous theoretical works on the same system with larger QM regions
showed only small deviations below 1 kcal mol−1. The total system size is of about 23 000
atoms. This includes enzyme, substrate and cofactor (FADHOOH) as well as the aqueous
solvent layer. Electrostatic embedding was used between the two regions. A charge-shift
scheme was applied at the QM/MM boundary, meaning that the atom connecting the two
regions will have its charge shifted to the nearest neighbors.[95] Further charges are also
added between the atoms in order to correct the dipole moment6.
Reaction Path Modeling
The QM/MM modeling was based on the X-ray structure of a PHBH-substrate complex
obtained by Gatti et. al. (2.0 Å resolution).[96] Details on system preparation are given in
Ref. [93]. The MM forcefield used throughout was GROMOS96,[97] a unified-atom force
field.
After system preparation (minimization and MD runs protocols are given in the above
cited reference) and equilibration, a MD run of 200 ps was performed in a cubic box of
water molecules, from which six snapshots were taken in 40 ps intervals. Another four
6Charge neutrality is obtained by shifting the charge, but this leads to an artificial dipole. This can be
compensated by distributing the charge in the neighboring region. Higher-order moments are not corrected,
since they are considered to have a negligible effect.
6.2. The p-Hydroxybenzoate Hydroxylase enzyme 119
snapshots were taken from AM1/MM MD simulations of the snapshots at 120 and 200
ps that were carried out in the course of free energy calculations using thermodynamic
integration.[98] The snapshots will be numbered from 1-10 (i.e., 1-6 from MM MD after
0, 40, 80, 120, 160, 200 ps, and 7-8 from QM/MMMD at 120 ps, and 9-10 from QM/MM
MD at 200 ps, respectively). The resulting structures were re-optimized at the B3LYP/MM
level, including the QM region and all surrounding residues within a distance of 5 Å. The
basis set used was TZVP.[99]
In order to ensure a continuous reaction path connecting TS and reactant structures, a
reaction coordinate was defined as the difference between the breaking O-O and forming
C-O bond lengths. This coordinate was then moved from the TS value up to a reactant
minimum, relaxing all other geometry parameters (in the region detailed before) along the
path. This part of the project was carried out by the group of Prof. Walter Thiel at the
Max-Planck Institute in Mülheim.
6.2.3 The Hydroxylation Activation Barrier
The two main sources of error in an ab initio estimate of the activation barrier will be both
the truncation of the n-particle and of the one-particle expansions. Relativistic (scalar and
spin-orbit) effects should be negligible, since all atoms involved are at most second row
elements. From small molecule calculations, it is known that the CCSD(T) method can
give reaction barrier predictions to within 1 kcal mol−1 of the FCI result. With this error
estimate in mind, the CCSD(T) method with a complete basis set (CCSD(T)/CBS) will be
from now on taken as a reference value. By approaching this result with the use of a local
density fitting coupled cluster algorithm, the sources of error to be considered are
(1) the basis set error. The LCCSD(T0) estimate for a certain basis set should be cor-
rected for the finite expansion used. This may involve extrapolation procedures or
the use of explicitly correlated methods.
(2) the local approximations. There are several involved, each of these are to be later
discussed:
(a) domain approximations
(b) pair and triples list approximations
(c) the non-iterative triples (T0) approximation
(3) the density fitting approximation. This error is estimated to be in the order of 0.1 kcal
mol−1, and it will therefore be neglected.
120 Chapter 6. Computation of Activation Barriers in Enzymes
Table 6.1: HF and MP2 computed barrier heights (in kcal mol−1) for the 0 ps reaction pathway. All reported
values include the effect of the MM environment, and were calculated as the energy difference between the
B3LYP/def-TZVP pre-optimized reactant and transition state structures.
∆‡E0
Basis Set HF MP2
cc-pVTZ 35.6 12.1
[aug]-cc-pVTZa 35.8 11.8
aug-cc-pVTZ 36.0 11.7
aug-cc-pVQZ 36.0 12.1
DF-MP2-F12b 12.2
a) diffuse functions only on O atoms
b) MP2-F12/2∗A(loc)/aug-cc-pVTZ correction (see Ref. [100])
The following Sections discuss the magnitude of the above mentioned error sources.
All of the following calculations were performed with density fitting approximations, and
the "DF" prescript will be dropped.
Basis set Error
To estimate the effect of basis set truncations on the reaction barriers, MP2 calculations
were performed for one of the reaction paths. The effect is expected to be similar for all the
paths, as the major differences between them lie in the conformational space scanned (to
which the basis set should be more or less insensitive). The MP2 values should be enough
to give a good estimate of the basis set effect at the coupled cluster level, since CCSD(T) is
known to have a similar basis set dependence (or even lower) as Møller-Plesset perturbation
theory. The results are shown in Table 6.1.
The convergence of the barrier heights with the basis set size seems to be relatively fast.
The cc-pVTZ result is remarkably close to the best estimates (within 0.1 kcal mol−1). It
seems to be an error compensation effect, due to the use of a small basis and the lack of
diffuse functions. The inclusion of diffuse basis functions is, however, recommended due
to the anionic nature of the substrate, since they help to describe the diffuse electron cloud
of the oxygens. The cc-pVTZ basis with addition of diffuse functions on the oxygens
will be denoted as [aug]-cc-pVTZ, and this was the choice for the local coupled cluster
calculations. The basis set error is therefore estimated to be around 0.4-0.5 kcal mol−1.
Significant improvements would only be possible by increasing the basis up to quadruple-
zeta quality.
6.2. The p-Hydroxybenzoate Hydroxylase enzyme 121
1 2 3 4 5 6 7 8 9 10
paths
10
12
14
16
∆ E
0 
/  k
c a
l  m
o l
-
1
MP2
LMP2
Figure 6.3: LMP2/[aug]-cc-pVTZ and MP2/[aug]-cc-pVTZ computed barrier heights (in kcal mol−1). All
values are taken from QM/MM single point calculations.
Local approximations
As already discussed in Section 2.1.4, several approximations are used in the local coupled
cluster program. The first source of error to be inspected is the truncation of the virtual
space through the use of domains. For this purpose, MP2 and LMP2 calculations were
performed for all reaction paths. As already discussed in Chapter 3, the effect should be
similar for LMP2 and LCCSD(T0). The results are shown in Fig. 6.3.
The local values underestimate on the average (compared to the canonical counterparts)
the reaction barrier by about 0.3 kcal/mol. This is within the expected accuracy for a triple
zeta basis set. Test calculations for LMP2 with merged domains (reactant and transition
state) give a barrier average of 0.6 kcal mol−1 above the canonical estimate. Since there
is no domain unbalance in this case, it seems that there is a furtuitous error compensation
in the LMP2 result. By merging the domains, the transition state has its energy no longer
artificially lowered by its larger domain list. This effect (+0.9 kcal mol−1) appears to partly
cancel with BSSE. The best local result is the one using the original domains, and this was
122 Chapter 6. Computation of Activation Barriers in Enzymes
3 5 7 9
R
w
 / Bohr
14
15
16
17
18
∆E
0 
/ k
ca
l m
ol
-
1
R
c
 = 3 Bohr
R
c
 = 1 Bohr
Figure 6.4: Activation barrier ener-
gies (in kcal mol−1) computed at the
LCCSD(T0)/[aug]-cc-pVDZ level as a
function of the local distance criteria for
weak and close pairs. The domains used
were calculated with the [aug]-cc-pVTZ
basis set.
chosen also for the LCCSD(T0) calculations.
Regarding the pair approximations in the local coupled cluster program, test calcu-
lations were performed using different distance criteria for the pair classification. Two
parameters were under study, Rc and Rw. As previously discussed, the former defines the
maximum distance between orbitals centers for which a pair can still be classified as strong.
The orbital pairs with distances between the two parameters are classified as close and will
influence the triples list. Furthermore, close pairs are included in the triples calculation by
the use of LMP2 amplitudes. Results are shown in Fig. 6.4.
The first fact one can notice in the graph is that for a given value of Rw, the coupled
cluster estimate is almost independent of the choice of Rc. Increasing the value of Rc from
1 to 5 Bohr (not shown in Fig. 6.4) only changes the activation energy value in about 0.3-
0.4 kcal mol−1. This indicates that the approximation of computing pair contributions with
LMP2, instead of LCCSD, is of little importance. However, the triples pair list seems to
be a determining factor in the accuracy. The difference between Rw = 3 Bohr and Rw = 5
6.2. The p-Hydroxybenzoate Hydroxylase enzyme 123
Bohr is quite large. This can be easily explained by considering the reactant and transition
state structures. The OH which is transferred in the course of the reaction is found in the
reactant complex still bounded to the FADHOOH. In the transition state, it is about 4 Bohr
apart from both substrate and cofactor. If the values for Rw are below this distance, the
interaction of the OH moiety with the rest of the system will not be treated at the triples
level, and the correlation estimate wil be unbalanced. Since the effect can be captured by
solely changing Rw, this parameter was set to 7 Bohr, while the default value of Rc = 1
Bohr was kept 7.
The non-iterative triples (T0) approximation was tested by calculating for one reaction
path the converged triples solution. The difference was found to be below 0.1 kcal mol−1
and therefore negligible.
In conclusion, the errors associated with the use of the DF-LCCSD(T0)/[aug]-cc-pVTZ
level of theory in comparison to CCSD(T)/CBS are individually lower than 0.5 kcal mol−1
and should not exceed more than 1.0 kcal mol−1 in total. However, one should remember
that this estimate is given only for the QM treatment. The MM modeling and QM/MM
setup will lead to further errors which have not been accounted for. The reaction path
optimization has also been performed at a lower level of theory (B3LYP/TZVP), which
should, however, be adequate for the reaction in study.
Results
The hydroxylation barrier height was computed as the average of ten reaction paths. The
∆‡E0 values, averages and root mean square deviations are given in Table 6.2. All values
shown are QM/MM results (including MM relaxation terms).
7At the time, due to convergence problems in the local coupled cluster program, it was not possible to
compute all of the values featured in Fig. 6.4. Only values for Rw = 3,5 and 7 were available. The series
seemed to indicate an almost logarithmic convergence, and the highest value was taken. Now it would seem
that a choice of Rw = 5 would be more adequate, but both values are equally close to our best estimate.
124 Chapter 6. Computation of Activation Barriers in Enzymes
Ta
bl
e
6.
2:
A
ct
iv
at
io
n
ba
rr
ie
re
ne
rg
ie
s
(i
n
kc
al
m
ol
−1
)c
al
cu
la
te
d
at
di
ff
er
en
tl
ev
el
s
of
th
eo
ry
an
d
w
ith
th
e
[a
ug
]-
cc
-p
V
T
Z
ba
si
s
se
t.
T
he
re
su
lts
do
no
ti
nc
lu
de
Z
PE
co
rr
ec
tio
n.
T
he
L
C
C
SD
+
(T
0)
/D
Z
va
lu
es
co
rr
es
po
nd
to
th
e
L
C
C
SD
/[
au
g]
-c
c-
pV
T
Z
re
su
lts
w
ith
ad
de
d
tr
ip
le
sc
or
re
ct
io
n
ca
lc
ul
at
ed
w
ith
th
e
sm
al
le
r[
au
g]
-c
c-
pV
D
Z
ba
si
s
(b
ut
w
ith
th
e
sa
m
e
do
m
ai
ns
as
in
th
e
la
rg
er
ba
si
s)
.
H
F
B
3L
Y
P
M
P2
L
M
P2
SC
S-
L
M
P2
L
C
C
SD
L
C
C
SD
+
(T
0)
/D
Z
L
C
C
SD
(T
0)
1
35
.8
8.
6
11
.8
12
.5
14
.8
20
.4
14
.1
13
.8
2
41
.1
11
.6
14
.1
13
.6
16
.4
24
.0
16
.6
16
.1
3
38
.6
10
.1
13
.5
12
.6
15
.2
23
.2
16
.0
16
.2
4
39
.2
10
.3
13
.1
12
.5
15
.2
22
.7
15
.4
15
.9
5
35
.8
8.
4
11
.5
11
.2
13
.7
19
.9
13
.4
13
.4
6
38
.7
10
.1
12
.0
11
.6
14
.4
21
.5
14
.5
14
.4
7
32
.1
6.
9
9.
3
9.
0
11
.4
16
.8
11
.1
10
.9
8
37
.7
9.
1
11
.7
11
.6
14
.3
20
.8
14
.0
14
.0
9
39
.8
10
.7
12
.5
12
.0
14
.8
22
.3
15
.1
14
.5
10
41
.2
11
.2
13
.5
13
.3
16
.1
23
.6
16
.3
16
.4
av
er
ag
e
38
.0
9.
7
12
.3
12
.0
14
.6
21
.5
14
.6
14
.6
R
M
S
2.
6
1.
4
1.
3
1.
2
1.
3
2.
1
1.
5
1.
6
6.2. The p-Hydroxybenzoate Hydroxylase enzyme 125
Taking the LCCSD(T0) values as reference, HF clearly overestimates the activation
barriers, as previously discussed. On the other hand, MP2 (local and canonical) results
underestimate the barrier, just as B3LYP. These values show in fact a common tendency
observed in small molecular systems and confirm the observations made in Ref. [93].
The MP2 method introduces a correlation correction to the HF estimate which due to its
perturbative nature overcorrects. The Coupled Cluster methods, LCCSD and LCCSD(T0),
show a more univocal type of convergence. The former method still overestimates the
barrier height by about 7 kcal mol−1. A further comment should be made on the two
remaining columns of Table 6.2.
The spin component scaled LMP2 method (SCS-LMP2) is an empirically corrected
MP2 approach first introduced by Grimme.[16] Since in HF theory the movement of par-
allel spins is "correlated" due to the Pauli Principle (the two electrons are not allowed
to occupy the same orbital), the correlation energy will be different for parallel and anti-
parallel spins. By introducing two empirical factors which scale both contributions it is
possible to obtain more balanced MP2 results. This is reflected in the values in the Table,
with the SCS-LMP2 and LCCSD(T0) averages in agreement to within 0.1 kcal mol−1. The
SCS-LMP2 calculation, however, is performed with only a fraction of the cost of the CC
calculation. In this case, the computational time is about an order of magnitude lower.
Another approximation tested in the course of this work was the use of a smaller basis
set to obtain the triples correction (T0). These calculations were performed with the cc-
pVDZ basis set, with diffuse functions added to the oxygens. It will be refered to as [aug]-
cc-pVDZ. In order to minimize the effect of domain approximation errors, the domains
calculated with the [aug]-cc-pVTZ basis were used. LCCSD(T0)/[aug]-cc-pVDZ calcu-
lations were performed, and the triples correction added to the LCCSD/[aug]-cc-pVTZ
values. These results are depicted in Table 6.2 with the denomination LCCSD + (T0)/DZ.
The individual values are somewhat different, with a maximum absolute deviation of 0.5
kcal mol−1, but agree on the average with the triples result using the larger basis set.
The activation enthalpies and free energies are obtained by adding the missing contri-
butions listed in Eq. (6.1). The simulations carried out in Mülheim provided a zero point
energy correction of−1.1 kcal mol−1 and an enthalpic temperature correction of−0.2 kcal
mol−1. Entropic corrections were only computed using AM1 for a gas phase model and
amount to 0.4 kcal mol−1. Some selected computed values are shown in Table 6.3, together
with the experimental estimates.
The results show how important the treatment of correlation is for the activation
barrier value. The only result in agreement with the experimental estimates is the
LCCSD(T0)/[aug]-cc-pVTZ, taking into account the error bars of Table 6.2. In fact, the
126 Chapter 6. Computation of Activation Barriers in Enzymes
Table 6.3: Activation barrier enthalpies and free energies computed at 300 K, and collected experimental
estimates. The values in parentheses are the root mean square deviations for the 10 computed paths.
∆‡H ∆‡G
HF 36.7 (2.6)
B3LYP 8.4 (1.4)
LMP2 10.7 (1.2)
LCCSD(T0) 13.3 (1.5) 13.7 (1.5)
experiment 12.0 14-15
root-mean square deviation of the local coupled cluster results is larger than the error es-
timates for the QM treatment discussed in the previous subsections. If one would be in-
terested in increasing the precision of the result, a broader sampling should be used. Only
then it would be reasonable to improve the QM method.
The remaining thermodynamic correction terms may become a determining factor of
the accuracy, if both the sampling and the QM method would be improved. Even if com-
putationally much more demanding than determining the static activation energies, there
is still some room for improvement. The enthalpic and ZPVE corrections need two opti-
mized structures (reactant and TS) and the respective Hessians. By todays computational
standards it would be impossible to perform these calculations at the coupled cluster level.
Since no analytical gradients are available for the local CC algorithm, this would involve
49 ∗ 3 ∗ 2 = 258 single points to obtain a numerical gradient, needed at each optimization
step. A parallel code running on a large cluster could perform this task, but the Hessian
calculation would be an impossible challenge. Another possibility would be to use the SCS-
LMP2 method, which has readily available analytic gradients. The optimizations could be
performed on a single computer, and the Hessian run could be split into parallel calcula-
tions. Considering the results in Table. 6.2, this should be a good approximation to the CC
result. The entropy correction would have to be tackled in a different way. Since this cor-
rection is obtained by computing the difference between the Gibbs energy and the enthalpy,
one would need to do an Umbrella Sampling[101] or Thermodynamic Integration[98] with
the higher level methods in the QM region. However, this involves thousands of single
point calculations, and even using HF it would already be at an enormous cost. But again
there are ways by which the cost could be reduced, delivering still something close to the
higher level result. One could compute some single points along the reaction path at the
higher level and take the difference relative to a lower level (e.g., semiempirical). A contin-
6.2. The p-Hydroxybenzoate Hydroxylase enzyme 127
uous function can then be obtained by interpolating these differences, and added to the total
energy at each point of the run.[102, 103] This will simulate the higher level run, as long
as a sufficient number of points have been used and the low level to high level difference is
kept relatively stable.
128 Chapter 6. Computation of Activation Barriers in Enzymes
6.3 The Chorismate Mutase enzyme
6.3.1 Overview
The Chorismate Mutase (CM) enzyme catalyzes the Claisen rearrangement of chorismate
to prephenate, a key step in the shikimic acid pathway that produces aromatic amino acids.
It has been the object of extensive experimental and computational research, in part due to
its biological significance, but also for being a rare example of an enzymatically catalyzed
reaction which keeps the same mechanism in various solvents as well as in the enzyme
environment. The chemical step is believed to be largely rate-limiting in Bacillus subtilis
CM (BsCM), and catalysis proceeds without covalent binding of the substrate to the en-
zyme. This makes BsCM a particularly convenient target for QM/MM studies, which have
focused on aspects such as the structure of the enzyme-substrate complex, reaction path-
ways, and the role of active-site residues in TS stabilization. However, almost all of the
previous reaction modeling has been carried out by using semiempirical or DFT methods,
which do not predict barrier heights with chemical accuracy.
This particular CM species has sparked for some years a vivid debate in the community.
The 2x106-fold reaction rate enhancement over the uncatalyzed process in water was early
on connected to reactant deformation.[104] The enzyme pocket would trap the chorismate
in a reactive conformation, in analogy to CMs found in other bacteria and plants. However,
Hilvert and coworkers[105] found a strong enthalpic effect of about 8 kcal mol−1, and an
almost marginal entropic contribution. The Gibbs free energy of activation was estimated to
be around 15.4 kcal mol−1, and the enthalpy 12.7±0.4 kcal mol−1. These values are to be
compared to the ones in solution, 24.5 and 20.7±0.4 kcal mol−1 respectively. Electrostatic
stabilization of the TS, together with substrate conformational effects were put forward as
an explanation for the strong catalytic effect.
Figure 6.5: The chorismate to prephenate Claisen rearragement catalyzed by BsCM.
6.3. The Chorismate Mutase enzyme 129
Evidence for TS stabilization has been given by several QM/MM studies. Lyne et.
al.[106] recorded the effect of deleting neighboring amino acid residues on an approximate
activation barrier. The level of theory used (AM1) was insufficient to deliver quantita-
tive results on the total effect. However, significant contributions from positively charged
residues were observed, in agreement with the TS stabilization proposal. The active site of
the wild-type (WT) enzyme is represented in Fig. 6.6. Several positively charged residues
are found in the pocket. The residue Arg90 is of special importance. It has close con-
tacts to the breaking C-O bond, and a number of experimental and theoretical works have
been dedicated to the analysis of this particular residue in the enzymatic activity. Kien-
höffer and coworkers synthesized and studied a BsCM mutant with the important cationic
residue Arg90 replaced with the non-coded amino acid citrulline (Cit).[107] The system
has potential for a largely focussed alteration of TS effects, as the arginine is mutated to
a neutral but isosteric analogue, proposed to form a similar but less stabilizing hydrogen
bonding pattern with the substrate. The mutation resulted in a 104-fold reduction in kcat,
or 5.9 kcal mol−1 increase in the overall free-energy barrier. The Km registered a 2.7-fold
increase. This small dissociation constant difference was taken as an evidence of relatively
minor complex conformational distortion. The reduced efficiency is therefore interpreted
as arising from unfavorable TS stabilization due to loss of the cationic nature of the sta-
bilizing hydrogen bond donor. These results have been supported by theoretical QM/MM
investigations of the mutant.[108]
On the other hand, Worthington et. al.[109] have argued that cationic stabilization oc-
curs to an equal extent on the transition and reactant states. Another explanation was put
forward by Hur et. al.[110]. The catalytic effect of BsCMwould be due to the enzyme abil-
ity to preferentially bind near attack conformations (NACs) of the substrate. This theory
is based on the premise that specific conditions are needed for the reaction to take place.
The atoms must come together at a given distance and angle, and the enzyme would favor
such conformations in comparison to solution. Free energies of NAC formation correlated
well with the experimental ∆∆‡G, but these results were only based on mole fractions.
Mulholland and coworkers[111] later disputed these results, and calculated the NAC con-
tribution with help of a free energy perturbation method. The effect was found to be two
times smaller. This result came in support of the TS stabilization theory, since the NACs
were insufficient to explain the 9.1 kcal mol−1 difference in the Gibbs free energy. Also,
the same effects that lead to preferential stabilization of the TS should also lead to a higher
NAC population. A related work can be found in the same year, with new estimates for the
stabilization effect of residues on the activation barrier.[112] The study of a related CM, as
well as of a group of mutants shortly followed by the Bruice group[113], keeping a lively
130 Chapter 6. Computation of Activation Barriers in Enzymes
Figure 6.6: Active site of BsCM with substrate, and relevant neighboring residues. The QM region is repre-
sented as ball-and-sticks, MM part as sticks, with only the side chains of the residues shown.
discussion for more than 3 years.
In the study of the WT enzyme, my work was based on the modeling studies car-
ried out at the Theoretical Chemistry Group in Bristol. In previous accounts, they had
already presented a theoretical barrier height prediction on the basis of QM/MM DFT
calculations.[114, 112] A study on the convergence of the QM treatment could solve many
of the questions in debate. By calculating an activation barrier at higher levels of theory,
the determining factor for catalysis could be identified.
6.3.2 Model Setup and Simulation
QM/MMModel
The CM enzyme system was taken from previous studies carried out by the Bristol group.
The structure is derived from an X-ray structure of an enzyme-transition state analogue
complex8 (PDB code 2CHT).[115]
The QM region consists solely of the chorismate. There is no bonding between the
substrate and the environment, and therefore no need to include link atoms. This choice
8A transition state analogue is a chemical species with structural properties close to the idealized TS. In
this case, an endo-oxabicyclic inhibitor was used.
6.3. The Chorismate Mutase enzyme 131
corresponds to a relatively small region, only 24 QM atoms, but it should be noted that
the barrier height has been found to be insensitive to the size of the QM region.[116] The
system is made of 7057 atoms in total. It comprises a sphere of radius 25 Å centered on
the substrate, containing 4192 protein atoms, 947 water molecules and the substrate. The
QM/MM interaction included electrostatic embedding.
For the Arg90Cit mutant, extra parameters were needed for the non-coded aminoacid
Cit. Intramolecular interaction and vdW parameters were generated by analogy with those
for standard CHARMM residues. Partial atomic charges were taken from the Guimarães
et al. study,[108] and checked with QM treatment of the amino acid residue. The results
compared well, with deviations lying below 1.0 kcal mol−1 (which can also be caused by
conformational effects).
Reaction Path Modeling
Again, a reaction coordinate was chosen as the difference of distances between the breaking
and forming bond, evidenced in Fig. 6.5. From previous studies it was known that the TS
is found at a a value of r= δ (C-C) - δ (C-O) ≈ −0.6 Å. QM/MM molecular dynamic runs
were performed with the BsCM complex, applying a restraint to the coordinate. The AM1
and PM3 semiempirical methods were used for the QM part, the molecular force field used
was CHARMM27.[50] Snapshots were taken from 5 to 30 ps for each trajectory, in a total
of 16 structures. This modelling work was performed by the Theoretical Chemistry Group
in Bristol.[114]
The QM/MM interface was provided by in-house routines from the Bristol group[117]
linked to the Jaguar quantum program[118] and the MM program package TINKER.[119]
The point charge information was later translated to Molpro format. Electrostatic embed-
ding was used throughout.
6.3.3 The Claisen Rearrangement Barrier
The same procedure was followed as in the PHBH case. To obtain a converged value for
the Gibbs free energy of activation, a high-level result for the ∆‡E0 value was combined
with lower level corrections. The same tests for basis set error and local approximation
effects were carried out, and are detailed in the following sections.
Basis Set Error
Calculations were carried out at the MP2 level of theory for a series of basis sets. The
results are shown in Table 6.4.
132 Chapter 6. Computation of Activation Barriers in Enzymes
Table 6.4: HF and MP2 computed barrier heights (in kcal mol−1) for one reaction pathway. All reported
values include the effect of the MM environment, and were calculated as the energy difference between the
B3LYP/6-31G* pre-optimized reactant and transition state structures.
∆‡E0
Basis Set HF MP2
aug-cc-pVDZ 29.9 10.2
[aug]-cc-pVTZa 31.1 12.3
aug-cc-pVTZ 30.9 11.8
aug-cc-pVQZ 31.0 12.2
aug-cc-pV5Z 31.0 12.3
DF-MP2-F12b 12.3
a) diffuse functions only on O atoms
b) MP2-F12/2∗A(loc)/aug-cc-pVTZ correction (see Ref. [100])
Except for the aug-cc-pVDZ basis results, all other values are relatively close to our best
results (aug-cc-pV5Z or the explicit correlated calculation). At the HF level the near CBS
limit is already reached with the triple-zeta basis, the correlation correction is as usual more
basis-set dependent. However, the [aug]-cc-pVTZ basis, which was also previously used in
the PHBH case actually gives the near CBS result. This is probably an error compensation
effect, due to some polarization near the oxygens (due to the diffuse functions added) and
the basis set incompleteness. This was also the basis later used in our calculations.
Local Approximations
The same series of tests were conducted in the CM case as for the PHBH hydroxylation
step. The first effect to be under examination was the domain approximation. MP2 and
LMP2 calculations with the [aug]-cc-pVTZ basis set were carried out. In the LMP2 runs
both regular domains (recomputed at each structure), and merged domains were used (see
Chapter 3). The results are plotted in Fig. 6.7 for 10 trajectories9.
For this reaction the electronic structure changes significantly along the reaction coor-
dinate. The orbitals are more delocalized at the TS and therefore the standard Boughton-
Pulay procedure yields larger domains at the TS than for the reactants. This leads to an
underestimation of the barrier height if standard domains are used. Fig. 6.7 shows the de-
viation between canonical and local barrier heights for 10 paths. The merged domains have
9The snapshots used for Fig. 6.7 were taken at 10, 12, 16, 18 and 20 ps, and are ordered accordingly. The
odd numbers stand for AM1 paths, and the even numbers for the PM3 paths.
6.3. The Chorismate Mutase enzyme 133
1 2 3 4 5 6 7 8 9 10
paths
6
8
10
12
14
∆ E
0 
/  k
c a
l  m
o l
-
1
MP2
LMP2 with merge
LMP2 without merge
Figure 6.7: LMP2/[aug]-cc-pVTZ and MP2/[aug]-cc-pVTZ computed barrier heights (in kcal mol−1). All
values are taken from QM/MM single point calculations.
been determined separately for each path, using the reactant and transition state domains
as reference. The resulting merged domains are appropriate to describe the whole reaction
paths. The effect of the domain merging procedure is found to be very similar for all snap-
shots. When standard domains are used, the LMP2 barrier heights are about 1.4 kcal/mol
lower than the canonical MP2 ones. On the other hand, the LMP2 barrier heights obtained
with merged domains are about 0.5 kcal/mol higher than the canonical ones. Probably, this
is again at least partly a BSSE effect, which artificially lowers the canonical MP2 barrier
heights, as already discussed for the SN2 reactions in Chapter 3.
The pair approximations in the coupled cluster program were again tested. A reaction
pathway was chosen (the snapshot at 38 ps, from the AM1/MM dynamics run) and sev-
eral combinations of parameters were used with LCCSD(T0)/[aug]-cc-pVTZ. However,
contrary to the PHBH case, no convergence was observed when increasing the Rc and Rw
parameters in the 1-7 Bohr range. In order to effectively choose a combination which would
reduce the pair approximation, the calculations were repeated with the [aug]-cc-pVDZ ba-
sis. The smaller basis allowed to increase the parameters further and even to completely
134 Chapter 6. Computation of Activation Barriers in Enzymes
3 5 7 9
R
w
 / Bohr
12
13
14
15
∆E
0 
/ k
ca
l m
ol
-
1
full LCCSD(T0)
R
c
 = 1 Bohr
R
c
 = 3 Bohr
R
c
 = 5 Bohr
Figure 6.8: Activation barrier ener-
gies (in kcal mol−1) computed at the
LCCSD(T0)/[aug]-cc-pVDZ level as a
function of the local distance criteria for
weak and close pairs. The domains used
were calculated with the [aug]-cc-pVTZ
basis set.
turn off the pair approximation. The same procedure as in the PHBH case was applied. The
triple-zeta domains were used for the double-zeta local calculations. The results are shown
in Fig. 6.8.
The line tagged as full LCCSD(T0) represents the local coupled cluster result without
any pair approximations. All pairs are treated at the higher level, and the triples list is full.
There are two effects visible in the diagram. For a given Rc, increasing the value of Rw
leads to a lower value for the activation barrier. Fixing Rw and incrementing Rc leads to
the opposite effect. The variations are much smaller than in the PHBH case. However,
the fluctuations can be still as large as 1 kcal mol−1. In Table 6.5, the LCCSD correla-
tion contributions as well as the triples are shown for several different choices of distance
criteria. The reasons behind the strange behavior of Fig. 6.8 become clear. Both values
converge rather slowly, but from opposite directions. The LCCSD energy contribution con-
verges from below, since treating pairs at the LMP2 level leads to an underestimation of
the barrier. The (T0) contribution converges from above, since the triples will correct the
6.3. The Chorismate Mutase enzyme 135
Table 6.5: LCCSD and LCCSD(T0) correlation contributions (in kcal mol−1) to the activation energy (∆E0)
as a function of the distance criteria Rc and Rw (in Bohr). SD stands for the correlation energy difference
between TS and reactant, as computed by LCCSD. (T0) gives the triples effect, and LCCSD(T0) the total
correlation contribution. The (∞,∞) result is obtained without any pair approximations (all pairs treated by
CCSD and full triples list). At the bottom, the (1,5) values are also shown (the distance criteria used for
the final activation energy values). All results were computed with the [aug]-cc-pVDZ basis set, using the
domains from [aug]-cc-pVTZ.
(Rc,Rw) SD (T0) LCCSD(T0)
(1,3) -7.74 -3.46 -11.20
(3,5) -7.60 -4.18 -11.78
(5,7) -7.12 -4.91 -12.03
(∞,∞) -6.72 -4.99 -11.71
(1,5) -7.74 -4.36 -12.10
CC values, lowering the barrier. Both effects tend to cancel out, and it is actually easier
to find distance parameters with a good error compensation, than converging both values.
The best combination of parameters according to Fig. 6.8 (and Table 6.5) would be Rc=3,
Rw=5 Bohr. However, the combination of Rc=1 and Rw=5 Bohr also compares quite well.
The error is 0.4 kcal mol−1 relative to the full coupled cluster result. Since the latter is
computationally less demanding, and as the domain approximation error was about the
same, but with opposite sign (+0.5 kcal mol−1), these were the values chosen. Some error
compensation is to be expected.
Results
Single point calculations were carried out on the reactant and transition state structures for
all 16 snapshots. The basis set used was [aug]-cc-pVTZ. The results are shown in Table 6.6
for each pathway individually, together with averages and root-mean square deviations.
Just as in the PHBH case, the LCCSD(T0) values will be taken as reference. The
averages vary between 10-30 kcal mol−1, again revealing the importance of choosing an
adequate QM treatment. Just as before, HF drastically overshoots the barrier, inclusion of
correlation lowers the value, but only SCS-LMP2 compares favorably with LCCSD(T0).
LCCSD and LMP2 deviate by about 5 kcal mol−1. The B3LYP value, which does not in-
clude ZPVE or temperature corrections compares well to the enthalpic experimental value
of 12.7 kcal mol−1. This favorable comparison had already been pointed out by Clayessens
et. al.[114]. However, the comparison made in the above cited Communication is faulty, as
it only takes into account electronic effects (at 0K). The zero point correction to the barrier
136 Chapter 6. Computation of Activation Barriers in Enzymes
has been computed by performing B3LYP frequency calculations on six different opti-
mized reactant and TS structures. The atoms included in the calculation were the substrate,
2 water molecules within 3 Å of chorismate and 4 hydrogen terminated residues close to
the substrate (Arg7, Arg63, Arg90 and Glu78). The water molecules and the amino acids
were frozen during the optimizations. Rows and columns of the Hessian corresponding to
the frozen atoms were deleted prior to diagonalisation. The ZPVE correction to the barrier
ammounts to -1.5 kcal mol−1 and the average enthalpic temperature correction is -0.1 kcal
mol−1. Some selected theoretical values for enthalpies and free energies of activation are
shown in Table 6.7.
6.3. The Chorismate Mutase enzyme 137
Ta
bl
e
6.
6:
A
ct
iv
at
io
n
ba
rr
ie
re
ne
rg
ie
s
(i
n
kc
al
m
ol
−1
)c
al
cu
la
te
d
at
di
ff
er
en
tl
ev
el
s
of
th
eo
ry
an
d
w
ith
th
e
[a
ug
]-
cc
-p
V
T
Z
ba
si
s
se
t.
T
he
re
su
lts
do
no
ti
nc
lu
de
Z
PE
co
rr
ec
tio
n.
H
F
B
3L
Y
P
M
P2
L
M
P2
SC
S-
L
M
P2
L
C
C
SD
L
C
C
SD
(T
0)
AM1
8
ps
32
.5
14
.2
11
.8
12
.4
16
.9
21
.4
16
.2
10
ps
31
.2
13
.4
12
.4
12
.9
17
.1
21
.5
16
.5
12
ps
31
.3
13
.4
11
.5
12
.0
16
.3
20
.7
15
.7
16
ps
29
.9
11
.7
11
.1
11
.5
15
.7
20
.2
15
.0
18
ps
29
.3
11
.3
11
.0
10
.9
15
.1
19
.6
14
.4
20
ps
29
.9
12
.0
10
.6
11
.2
15
.5
19
.8
14
.8
30
ps
24
.7
8.
1
9.
5
10
.0
13
.7
17
.6
12
.9
38
ps
25
.7
8.
6
10
.1
10
.5
14
.3
18
.4
13
.5
PM3
8
ps
31
.8
13
.6
11
.3
11
.9
16
.4
20
.8
15
.6
10
ps
30
.4
11
.1
9.
5
10
.1
14
.5
19
.2
13
.9
12
ps
32
.8
14
.9
11
.6
12
.3
16
.7
21
.1
16
.2
16
ps
30
.1
12
.2
10
.8
11
.3
15
.6
20
.0
15
.0
18
ps
29
.6
11
.7
10
.7
11
.2
15
.3
19
.7
14
.6
20
ps
29
.4
11
.2
8.
5
9.
2
13
.6
18
.2
13
.3
30
ps
30
.4
11
.3
10
.4
10
.9
15
.1
19
.9
14
.6
38
ps
28
.9
10
.4
9.
0
9.
6
13
.8
18
.5
13
.4
av
er
ag
e
29
.9
11
.8
10
.6
11
.1
15
.4
19
.8
14
.7
R
M
S
2.
1
1.
8
1.
0
1.
0
1.
1
1.
1
1.
1
138 Chapter 6. Computation of Activation Barriers in Enzymes
Table 6.7: Activation barrier enthalpies and free energies computed at 300 K, and collected experimental
estimates. The values in parentheses are the root mean square deviations for the 10 computed paths.
∆‡H ∆‡G
HF 28.3 (2.1)
B3LYP 10.2 (1.8)
LMP2 9.5 (1.0)
LCCSD(T0) 13.1 (1.1) 15.6 (1.1)
experiment 12.7±0.4 15.4±0.4
The entropic contribution to the activation barrier (T∆‡S) at 300K has been computed
by the Theoretical Chemistry Group in Bristol. Their estimate is 2.5 kcal mol−1, to be
compared with 2.7 kcal mol−1 from experiment. Once more, the only theoretical result
within the error bounds of experiment and the pathway sampling is LCCSD(T0). The
enthalpy and Gibbs free energy of activation are even within the experimental error. Just
as in the PHBH case, these results seem to validate the use of classical TST in such a
developed system. Even more surprising is the dependence the values have on the QM
treatment, and how this can be converged to reproduce the experimental estimate. This
behavior parallels the one found for smaller molecular systems, and the high dimensionality
of the problem seems to be well captured by a simple pathway sampling.
Including all corrections to the B3LYP value, it underestimates the reaction barrier by
about 2 kcal mol−1, a discredit to some of the previous DFT results.[111, 114] The final
SCS-LMP2 value is not given, but by comparing the values shown in Table 6.6 it should
also reproduce fairly well the LCCSD(T0) enthalpy. This confirms this scaled approach as
a reliable improvement to the (L)MP2 method in the computation of activation barriers. Its
use in the description of weak interaction forces has recently been discussed.[66]
Chapter 7
Summary
139

141
The aim of this research was to expand the scope of applications for local correlation
methods by means of QM/MM and QM/QM hybrid schemes. The efforts have been fruit-
ful. The QM/MM interface between Chemshell and Molpro, which was made available at
the beginning of my PhD, has served several applications to date. The interface was coded
in collaboration with Stephan Thiel (MPI Mülheim) and was based on an earlier imple-
mentation by Paul Sherwood (CLRC Daresbury). Other projects have started since, and
the prospects are positive. The QM/MM calculations featured in this thesis have set a new
standard for the QM treatment of enzymatic systems, and highlighted some deficiencies
in earlier approaches to the description of the active site reactivity. To date, most of the
published work in the field still makes use of computed barrier heights at the semiempirical
or density-functional levels of theory, although the QM system sizes are similar to those
featured in this work (20 to 100 atoms). It has been shown in this thesis that such molecular
systems can be treated with high-accuracy through the use of local correlation methods. In
particular, that static corrections to the barriers may suffice to significantly improve theo-
retical values, and with an inexpensive approach. Many of the tests performed to evaluate
the errors in the QM treatment were carried out with lower level methods (MP2 for basis
set convergence) or smaller basis sets ([aug]-cc-pVDZ for the pair approximations). This
approach has the potential to be extended to much larger systems. Most of the tests could be
carried for system sizes up to at least 100 atoms. Alternative (and cheaper) approaches have
also been tested for obtaining the final barrier values, namely SCS-LMP2 or smaller basis
sets for the triples correction. The error in the QM part of the calculation can be reduced
to within 1 kcal mol−1. The overall accuracy is thus no longer dependent on the active site
treatment, but instead on the modeling and sampling techniques, which are significantly
cheaper.
At the heart of this project was also the development of a local QM/QM hybrid scheme.
Through the use of local orbital spaces, it has been shown that within a single calculation
different levels of theory can be applied to specific parts of the molecule, according to their
relevance to the reaction under study. The advantages of the LMOMO method over other
proposed hybrid schemes are manifold. Polarization effects are implicitly included, since
the HF calculation is performed for the whole system. The approach is flexible enough to
allow for extra coupling terms between correlated regions, successfully including environ-
ments effects in the high-level region residual equations. However, only for very small-
sized high-level regions were these contributions shown to be significant. The approach
is also better suited for biomolecular structures, where aromaticity can play a critical role.
The regions are defined as groups of orbitals, avoiding the need to cut through bonds and
the errors associated with link atoms or frozen localized orbitals.
142 Chapter 7. Summary
The method was tested by computing reaction energies for a model system (glycine
peptide formation), barrier heights for a proton transfer reaction and an enzymatic reaction,
as well by computing SN2 reaction barriers and dissociation energies of some hydrogen
bound complexes. In the first three cases it was found that a simple 2-level scheme, like
LCCSD(T0):HF or LMP2:HF, requires rather large correlation region sizes (typically 20-
30 orbitals) to obtain converged results. However, much smaller high-level regions are
sufficient in a 3-level scheme like LCCSD(T0):LMP2:HF; for the tested examples, it was
enough to include the atoms directly involved in the reactions into the high-level region. In
large systems this leads to a dramatic reduction of the computational effort, and one obtains
results of LCCSD(T0) quality with a computational effort that is only slightly larger than
that of the initial Hartree-Fock calculation.
The bottleneck in large-scale applications will therefore be the Hartree-Fock step. Even
though linear scaling can in principle be achieved, the onset is rather late and the prefac-
tor high. Therefore, for medium size systems, more important than linear scaling is to
reduce the prefactor and the total cost of the HF in general. One possibility is to use local
density fitting approximations as described in Ref. [84]. Further savings may be possible
by using dual- or multiple basis set approaches. For instance, it may be sufficient to use
a smaller basis set for the parts of the system that are distant from the correlated region.
Even though this is not entirely unproblematic since unphysical polarization artifacts may
occur, it should at least be possible to reduce the number of polarization functions. This
possibility is still to be explored.
Most of the applications for which the hybrid schemes are best suited revolve around
the description of chemical reactivity. The use of local correlation methods, however, has
been often criticized for its use of geometry-dependent excitation spaces. The domains
defining the virtual space for each occupied orbital may change in the course of bond
breaking/formation phenomena, leading to discontinuities in the PES. This is a topic which
is still approached in conference talks and several publications. However, in my opinion,
some of the criticism is unfounded. In a recent paper, Subotnik et. al.[120] state that "com-
putational chemists cannot always optimize geometries with confidence according to the
Pulay-Werner scheme. Geometries have certainly been successfully optimized when the
domains are fixed and thus the potential energy surfaces are smooth; but for large and/or
subtle changes in geometry, where the best domains are not obvious and should not be held
static, geometric optimization is not very practical". On the evidence of previous publi-
cations and my own experience with local methods, this claim is excessive. The general
procedure chosen for optimization with local methods is to use a geometry-dependent do-
main definition at first, and then to reoptimize the structure with fixed domains. This has
143
been proven in numerous instances to be a reliable procedure. In fact, it parallels the use
of an integral-grid in density functional theory. The grid for integral calculation in DFT is
also geometry dependent, and typically it will only be kept fixed for the last steps. These
procedures are normally concealed in a black box-like treatment, and the same can be
practiced in local methods. Nonetheless, several groups have focused on developing sim-
ilar local correlation methods, while avoiding the discontinuity problem. These are based
on advanced integral prescreening tools and/or different definitions for virtual and occupied
spaces. The former methods have been shown to reliably decrease the cost of correlated
calculations, but only for MP2 theory, and with small basis sets. The latter approaches
also appear problematic with larger basis sets, due to linear-dependencies in the basis, or
prescreening deficiencies. The local methods, as proposed by Saebø and Pulay, and further
developed by Werner and Schütz, notwithstanding the PES discontinuity problem, are the
most effective methods to date.
The question posed was how could one keep the efficiency of the method, while avoid-
ing the problem of a noncontinuous PES. A merging procedure was proposed, which works
by comparing domains of different structures, and building a domain definition that encom-
passes the changes in the sampled space. The method has been applied to several differ-
ent reaction types, delivering smooth potential surfaces. Moreover, it has been shown that
BSSE effects are reduced in local calculations, and this leads to better basis set convergence
when computing barrier heights or weak interactions. The effect of the domain approxima-
tion was broadly investigated. It has been shown to be similar at the LMP2 and LCCSD(T)
levels. It has hence been suggested to test the error of domain approximations by only
comparing MP2 and LMP2 results. This has been applied to the study of both QM/MM
systems featured in this work.
Since the computational cost of LMP2 and LCCSD(T) calculations increases, respec-
tively, with the third and fourth power of the average domain sizes, the procedure does
lead to an increment in the computational cost. However, for all systems under study, the
changes were found to be localized at the atoms where the bond breaking/formation is tak-
ing place and the linear scaling behavior for larger applications should be maintained. On
the other hand, the procedure is limited by the need to define a priori the sampling space to
be scanned. Notwithstanding, the procedure has been useful for the calculation of reaction
path potentials, and as a valid test for the domain approximation error.
The last subject in my work was the use of Natural Localized Orbitals as an occupied
space for local correlation methods. While computing the various activation and reaction
energies featured in this thesis, a recurrent problem was the basis set dependency of the do-
main definition. In investigating the effect of different basis sets, it was necessary to adjust
144 Chapter 7. Summary
several parameters for the choice of domains, in order to keep the definition constant. This
dependency is connected to both the localization procedure (Pipek-Mezey) and the domain
definition (Boughton-Pulay). The solution pursued was to use the NLMO orbitals, which
are known to be less basis set dependent, and have an associated charge population analy-
sis, more stable than Mulliken or Löwdin. The method is not new (the natural localization
algorithm has been proposed for over 20 years) nor is the idea of applying it to local corre-
lation algorithms, Flocke et. al.[29] introduced NLCCSD in 2004. My implementation has
shown promising results in combination with the Pulay local ansatz. By defining a unique
parameter for the domain criterion, based on orbital populations, it has been shown that
the NLMO orbitals give stable domains with respect to basis set changes. For a test set of
30 molecules not a single domain changed for all 6 basis sets used, even when including
diffuse functions. The domains obtained by the new criterion are physically meaningful,
with pi-orbitals easily identifiable by their delocalized orbital populations. The degree of
delocalization seems to correlate well with chemical intuition. By decreasing the basis
set dependency of the domain definition, a further step is taken in approaching the local
correlation methods of a model chemistry. Preliminary results reveal that the fraction of
correlation energy recovered is not much affected and similar as with the previous methods
used. Therefore, it can be expected that previous conclusions regarding the accuracy of
local correlation methods will not be much affected. Further systematic studies of reaction
energies are in progress, and the initial results support this assertion.
The greatest liability in the use of NLMOs is perhaps in their use for gradient calcula-
tions. For the computation of a local correlation method analytic gradient, a minimization
criterion is needed. This is available for Pipek-Mezey, but not for NLMOs. Our proposal
has been to use the Pipek-Mezey orbitals together with an NPA analysis domain criterion.
The greatest fault in the choice of these orbitals is the redundancy problem which occurs
when using diffuse or higher-angular momentum basis functions. However, has discussed
in Chapter 4, this can be overcome by eliminating some functions from the localization
criterion, without great loss to the stability of the method.
Overall, the research presented in these pages has given a strong contribution to a more
widespread and reliable use of local correlation methods. By the use of hybrid schemes,
the application of these methods has been drastically expanded. During my first PhD year,
the possibility (or even the interest) of using Coupled Cluster in the context of enzymatic
reactions was disputed in several occasions. Two years later, the first work was published,
rewarding the insight and dedication of many involved in the project. I believe that local
correlation methods will have a long lasting impact in the field, and am overjoyed to have
participated in its first steps.
Chapter 8
Zusammenfassung
145

147
Die Berechnung von Reaktionsbarrieren molekularer Systeme ist eine der Hauptheraus-
forderungen der Theoretischen Chemie. Reaktionsbarrieren sind von besonderer Relevanz
für das Verständnis und die Voraussage von Katalysephänomenen, ebenso wie für die Ra-
tionalisierung unserer Kenntnisse der individuellen chemischen Reaktivität im Rahmen der
"General Transition State Theory".
Die Hartree-Fock (HF) Näherung ist der Ausgangspunkt für die meisten quanten-
chemischen Methoden. Sie berücksichtigt die Coulomb Wechselwirkung der Elektronen
nur im Mittel, nicht jedoch die Elektronenkorrelation, d.h., die unmittelbare Coulomb
Wechselwirkung zwischen den einzelnen Elektronen. Dieser Effekt ist bei der Beschrei-
bung chemischer Reaktivität von großer Bedeutung, da es während eines Bindungsbruches
oder einer Bindungsbildung zu starken Änderungen bei der Wechselwirkung der Elektro-
nen kommt. Die HF Methode weist daher normalerweise Fehler im Bereich von 100-500%
für Reaktionsbarrieren auf. Die näherungsweise Behandlung der Elektronenkorrelation
durch Dichte-Funktional-Theorie (DFT) hingegen stellt eine kosten-günstige Art dar, um
diese Effekte zu berücksichtigen. Die Qualität der Ergbnisse hängt jedoch stark davon ab,
welche Parametrisierung bei den Funktionalen verwendet wurde. Kein Funktional liefert
bisher gleich gute Ergebnisse für alle chemischen Systeme. Konventionelle post-HF ab in-
itio Methoden hingegen bieten einen Weg die Ergebnisse systematisch zu verbessern. Der
steile Anstieg der Rechenkosten mit wachsender Molekülgröße erlaubt jedoch nur quanti-
tative Berechnungen für kleine Systeme (bis maximal 15 Atome).
Lokale Korrelationsmethoden vermeiden die steile Skalierung konventioneller, kano-
nischer Methoden durch Verwendung einer lokalen Basis zur Beschreibung des besetzten
und des virtuellen Raumes. Die Zahl der Anregungen vom besetzten in den virtuellen Raum
kann mittels Entfernungskriterien bezüglich der besetzten Orbitale begrenzt werden.[1]
Dies ermöglicht eine hierarchische Behandlung der Korrelation von Elektronen. Dabei
werden nahebeieinanderliegende Orbitale mit genauerenMethoden behandelt und entfernte
Orbitale vernachlässigt.[2] Lineares Skalierungsverhalten wurde bereits für lokale Møller-
Plesset Störungstheorie zweiter Ordnung (LMP2),[3] lokales Coupled-Cluster mit Ein- und
Zweifachanregungen (LCCSD) [4] und LCCSD(T0) mit störungstheoretischer Berücksich-
tigung der Dreifachanregungen[5, 6, 7] nachgewiesen.
In dieser Doktorarbeit sind entscheidende Fortschritte bezüglich der Anwendbarkeit
lokaler Korrelationsverfahren bei der akkuraten Berechnung von Reaktionspfaden und
-barrieren erzielt worden. In Kapitel 3 und 4 werden die Fortschritte bei der Wahl
der Methode für die Domänenberechnung und bei der Lokalisierung der besetzten Or-
bitale beschrieben. In Kapitel 5 ist eine neue QM/QM-Hybridmethode (Quantum Mecha-
nics/Quantum Mechanics) und ihre Anwendung auf biomolekulare Systeme beschrieben.
148 Chapter 8. Zusammenfassung
In Kapitel 6 werden die Berechnungen der Aktivierungsenergien zweier enzymatischer
Reaktionen vorgestellt. In diesen Rechnungen wurden lokale Methoden erstmalig auf
QM/MM (Quantum Mechanics/Molecular Mechanics) Systeme angewendet.
Kapitel 3
Ein neues Verfahren zur Eliminierung der Geometrieabhängigkeit von Anregungsdomä-
nen ("domains") wurde im Molpro Programmpaket [8] implementiert. Diese Abhängigkeit
kann zu Unstetigkeiten in der Potentialfläche führen und stellt einen Nachteil lokaler Me-
thoden bei der Berechnung von Reaktionspfaden dar.
Die Orbitaldomänen werden in der Regel durch ein von Boughton und Pulay
vorgeschlagene Verfahren bestimmt.[9] Dabei werden zunächst alle Atome gemäß ab-
nehmender Ladung (Mulliken oder Löwdin Gross Population) geordnet. Dann werden
Atom für Atom, der Reihe nach, die Basisfunktionen von Atomen der Orbitaldomäne
hinzugefügt, bis die Vollständigkeitsbedingung
1−
∫
|(φi− φˆi)2|dτ > TBP
erfüllt ist. Der Parameter TBP ist ein Vollständigkeitskriterium und nimmt in der Regel
Werte um 0.980 an. Die Versuchsfunktion φˆi ist eine Linearkombination von Atomorbitalen
(AOs), die sich in der Domäne befinden
|φˆi >= ∑
A∈[i]
∑
µ∈A
|χµ > Lˆµi.
In den lokalen Korrelationsverfahren bestimmen die Orbital- oder Paardomänen den
virtuellen Raum für jedes Orbital und Orbitalpaar. Änderungen in dieser Domänen-
definition als Funktion der Geometrie führen zu unstetigen Potentialflächen. Weil das
vorgegebene Kriterium geometrieabhängig ist, kann es durchaus passieren, dass Sprünge
im Energieprofil vorkommen. Dies ist vor allem bei der Dehnung oder dem Bruch von
Molekülbindungen der Fall. Auch die elektronische Struktur des Moleküls wirkt sich auf
die Domänen aus. Wenn der Übergangszustand stärker delokalisiert ist als der des Reak-
tanten, kann dies zur Unterschätzung der berechneten Aktivierungsenergien führen.
Hier soll eine einfache Prozedur vorgeschlagen werden, um diese Probleme zu be-
heben. Die Domänen zweier Geometrien werden verglichen und anhand dieses Vergleichs
neue Domänen definiert. Diese vereinigten Domänen können dann für mehrere Struk-
turen in einem Pfad verwendet werden. Es wird nachgewiesen, dass die Prozedur stetige
Potentialflächen und stabilere Ergebnisse bei der Berechnung von Aktivierungsenergien
149
delokalisierter Übergangszustände liefert.
Kapitel 4
Die Orbitaldomänen in lokalen Korrelationsmethoden werden in der Regel, wie oben
gezeigt, mit Hilfe des Boughton-Pulay Verfahrens bestimmt. Diese Methode zeigt jedoch
eine starke Abhängigkeit vom verwendeten Basissatz. Die Reihenfolge der Atome wird an-
hand der Löwdin Ladungen bestimmt. Diese konvergieren mit steigender Basissatzgröße
jedoch nicht zu einem Grenzwert. Für kleinere Ladungen erhält man in der Regel keine
zuverlässige Vorhersage. Die Vollständigkeitsbedingung selbst ist ebenfalls basissatzab-
hängig. Für größere Basen ist es einfacher das Kriterium zu erfüllen. Der Parameter TBP
muss folglich der Basis angepasst werden. TBP = 0.980 für eine double-zeta Basis und
TBP = 0.985 für triple-zeta sind häufig vorgeschlagene Werte.[10]
Auch das Lokalisierungverfahren ist in der Regel basissatzabhängig. Die Pipek-Mezey
Methode[11] hängt von der AO Überlappmatrix ab, die für größere Basen, insbesondere
mit diffusen Funktionen, lineare Abhängigkeiten aufweist. Im Benzolmolekül werden
die Domänen, die die delokalisierten pi-Orbitalen beschreiben sollen, oft zu groß, und
schließen auch die benachbarten Wasserstoffatome mit ein.
Als Alternative zur Pipek-Mezey/Boughton-Pulay (PM/BP) Kombination zur
Lokalisierung der besetzten Orbitale, bzw. zur Definition der Domänen, wurden die Natural
LocalizedMolecular Orbitals (NLMO)[12] und die Natural Population Analysis (NPA)[13]
vorgeschlagen. Die NPA liefert hierbei eine Ladungsverteilung besetzter Orbitale die als
ein Kriterium für die Domänenbildung verwendet wird. Dieses führt zu stabilen Domänen
für eine vielzahl verschiedener Bassissätze (von cc-pVTZ bis aug-cc-pVQZ). Hierbei wird
kein Überlappkriterium benötigt, was die Basissatzabhängigkeit des Verfahrens deutlich
verringert. Die Atome werden in der Domänenliste aufgenommen wenn ihre Ladungen
den Parameter TNPA überschreiten.
Zur Validierung der Methode wurde ein Satz aus 30 Molekülen gewählt. Dieser besteht
aus typischen, kleinen, organischen Molekülen, gesättigten und ungesättigten Kohlen-
wasserstoffen sowie aromatischen Systemen. Es wird gezeigt, dass das neue Kriterium
für die Domänenbestimmung sehr stabil in Bezug auf den Basissatz ist. Mit dem in dieser
Arbeit empfohlenen Auswahlkriterium TNPA = 0.05 wurde bei allen sechs untersuchten Ba-
sissätzen keine einzige Änderung der Domänenstruktur beobachtet. Außerdem sind die
Domänen physikalisch deutbar. Die pi-Orbitale sind in der NPA Ladungsverteilung deut-
lich erkennbar. Der Prozentsatz der kanonischen MP2-Korrelationenergie, der bei LMP2
bisher erhalten wurde, bleibt bei der vorgeschlagenen Methode ähnlich. Das NPA Domä-
nenkriterium kann auch bei PM-Orbitalen verwendet werden, so dass analytische Gradien-
150 Chapter 8. Zusammenfassung
ten weiterhin verfügbar sind.
Kapitel 5
Die Anwendung hochgenauer quantenchemischer Methoden ist oft aufgrund der System-
größe nicht möglich. Die meisten untersuchten Effekte sind jedoch auf einen verhältnis-
mäßig kleinen Teil des Systems beschränkt, nämlich das reaktive Zentrum. Diese kleine
Region kann jedoch meist mit genauen lokalen Methoden behandelt werden. Der Effekt
der Umgebung sollte aber nicht vernachlässigt werden, denn das reaktive Zentrum kann
durch Polarisations- und sterische Effekte beeinflusst werden. Um das reaktive Zentrum
und Umgebung beschreiben zu können, wurden in den letzten Jahre mehrere "gekoppelte"
(Hybrid) Methoden entwickelt, wobei unterschiedliche Regionen mit verschiedenen Me-
thoden behandelt werden können, je nach Anforderung an die Genauigkeit und je nach
Systemgröße.
Im Rahmen dieser Arbeit wird ein neues QM/QM Kopplungsverfahren vorgeschlagen.
Dieses stellt eine Erweiterung der lokalen Korrelationsmethoden dar. Da in den lokalen
Methoden sowohl der besetzte als auch der virtuelle Raum lokal sind, können die Orbitale
und die entsprechenden Domänen so gruppiert werden (Region), dass unterschiedliche
Korrelationsmethoden angewendet werden können, ohne dass es des Rückgriffs auf Mo-
dellsysteme für Molekülfragmente oder des Bruchs von kovalenter Bindungen bedarf. Die
Methode nutzt die Paarnäherung aus, wobei die Orbitalpaare als "strong" (starke), "weak"
(schwache) oder "very distant" (sehr entfernte) eingeordnet werden, je nach der Region in
der sich die Orbitale befinden. Die Orbitalpaare des reaktiven Zentrums werden als starke
Paare klassifiziert und auf möglichst hohem Niveau (z.B. LCCSD(T)) behandelt, während
die direkte Umgebung über die schwachen Paare auf MP2 Niveau beschrieben wird.
Da es in den lokalen QM Verfahren diese intrinsische Hierarchie gibt, können "gekop-
pelte" Ergebnisse verschiedener Regionen im Molekül aus einer einziger Rechnung ex-
trahiert werden. Polarisationseffekte sind mit inbegriffen, da das Korrelationsverfahren
die HF Orbitale für das gesamte System verwendet. Bei anderen Kopplungsmethoden ist
dies nicht möglich, weil die "high-level" Rechnungen an einem kleineren Modellsystem
durchgeführt werden müssen und dabei die Korrelationsbeiträge des Fragmentes von der
Umgebung unbeeinflusst bleiben.
Rechnungen wurden für mehrere Reaktionen durchgeführt. Mit Hilfe der hier
entwickelten LMOMO Methode könnte man in mehreren Fällen die "high-level" Ergeb-
nisse mit reduziertem Rechenaufwand reproduzieren. Die notwendige "high-level"-
Regionsgröße, um konvergierte Ergebnisse zu bekommen, ist aber systemabhängig, wenn
die direkte Umgebung unkorreliert bleibt. Erkennbare Verbesserungen werden erhalten,
151
wenn die dem reaktiven Zentrum unmittelbar benachbarte Region auf MP2-Niveau korre-
liert wird. Das erfordert nur geringem Mehraufwand. Die Skalierung der Methode (für den
Korrelationsanteil) kann theoretisch asymptotisch O(1) erreichen und wird für LMP2, die
Lösung der Coupled-Cluster Gleichungen und die "triples" Berechnung bei LCCSD(T0)
nachgewiesen.
Kapitel 6
Kombinierte QM/MM Methoden gehören heutzutage zum wichtigsten Handwerkszeug
der Theoretischen Biochemie. Diese verwenden eine quantenchemische Rechnung zur
Beschreibung des reaktiven Zentrums, während die Umgebung mittels einer kostengün-
stigen Kraftfeldmethode behandelt wird. Die QM-Region ist jedoch meistens zu groß,
um eine quantitative QM Methode zu verwenden, und die Diskussion muss oft auf allge-
meine Effekte und Tendenzen beschränkt bleiben. Die lokalen Korrelationsmethoden er-
möglichen es aufgrund ihres reduzierten Rechenaufwands erstmals quantitative QM/MM
Rechnungen durchzuführen. LCCSD(T) Rechnungen können heutzutage für mehr als 50
Atome mit einer triple-zeta oder sogar quadrupel-zeta Basis durchgeführt werden. Die vor-
liegende Doktorarbeit zeigt die erfolgreiche Anwendbarkeit lokaler Methoden auf Über-
gangszustände enzymatischer Systeme und beweist gleichzeitig die Gültigkeit der klassi-
schen Übergangszustandstheorie bei den untersuchten enzymatischen Systemen.
Die Hydroxylierungsschritte im Katalysezyklus der p-Hydroxybenzoat-Hydroxylase.
Die katalysierte Claisen-Umlagerung von Chorismat zu Prephenat.
152 Chapter 8. Zusammenfassung
Zwei Systeme wurden untersucht. Die p-Hydroxybenzoat-Hydroxylase (PHBH)
katalysiert die Hydroxylierung des Substrats p-Hydroxybenzoat (pOBH). Es spielt eine
entscheidende Rolle beim oxidativen Abbau aromatischer Stoffe in Bodenbakterien.
Das Chorismat-Mutase (CM) Enzym katalysiert die Claisen-Umlagerung des Choris-
mats in Prephenat, einem Schritt im Shikimisäureweg für die Produktion aromatischer
Aminosäuren. Für beide Systeme liegen experimentelle Werte für die Aktivierungsen-
thalpie und Gibbs-Energie vor. Durch QM/MMModellierung der Systeme wurden mehrere
Reaktionspfade auf DFT-Niveau optimiert und die Vorhersage der Reaktionsbarriere mit
Hilfe lokaler Korrelationsverfahren verbessert. Die Konvergenz mit steigender Basis-
satzgröße und die lokalen Näherungen wurden überprüft. Die berechneten Barrieren
auf das DF-LCCSD(T0)/[aug]-cc-pVTZ-Niveau befinden sich nach Mittelwertbildung in-
nerhalb der Fehlergrenze des Experimentes und der Rechnung (etwa 1,5 kcal mol−1
Genauigkeit). Die aus der quantenmechanischen Beschreibung herrührende Abweichung
vom CCSD(T)/CBS Limit wurde auf ≤ 1 kcal mol−1 abgeschätzt. Die mittlere qua-
dratische Abweichung der Aktivierungsenergien der Reaktionspfade liegt oberhalb dieser
Abschätzung, und damit ist die QM Methode nicht der entscheidende Faktor für die
Genauigkeit.
Appendix A
Natural Localized Molecular Orbitals
A.1 Notation
The following notation will be used throughout
χµ,ν ,... AO
χ˜r,s,t,... NAO
ϕr,s,t,... NHO
φr,s,t,... NBO
ψi, j,k,... occupied NLMO
A.2 General Structure
This Appendix complements the description made in Chapter 4, on the construction of
Natural Localized Molecular Orbitals. The NBO method performs the analysis of a
many-electron molecular wave function in terms of localized electron-pair "bonding" units.
This involves the construction of Natural Atomic Orbitals (NAOs), Natural Bond Orbitals
(NBOs) and Natural Localized Molecular Orbitals (NLMOs). These may be used to per-
form Natural Population Analysis (NPA) and other tasks pertaining to the locality of wave
function properties. Each natural localized set forms a complete orthonormal set of one-
electron functions for expanding the delocalized canonical orbitals.
To obtain the final NLMO set from the AOs, a series of stepwise transformations are
required:
AO T
NAO−→ NAO TNBO−→ NBO TNLMO−→ NLMO.
153
154 Appendix A. Natural Localized Molecular Orbitals
The next sections detail how each transformation is obtained, starting from the SCF molec-
ular orbitals coefficients and the nonorthogonal AO basis.
A.3 NAO Transformation
source files : nbo.f
nao.f
The aim of this procedure is to find the transformation TNAO from the nonorthogonal
AO basis set {χµ} to the orthonormal NAO basis set {χ˜r}
|χ˜r >=∑
µ
TNAOrµ |χµ > . (A.1)
One starts by computing the first-order reduced density matrix D, obtained from the SCF
coefficients as
Dµν = 2
occ
∑
i
C∗µiCν i (A.2)
and builds the matrix representation for the density
Γ = SDS, (A.3)
where S is the AO overlap matrix. Considering the matrix in block form (each block per-
taining to an atom) 
Γ(AA) Γ(AB) Γ(AC) . . .
Γ(BA) Γ(BB) Γ(BC) . . .
Γ(CA) Γ(CB) Γ(CC) . . .
...
...
... . . .

,
the NAO orbitals will be the eigenvectors of the diagonal density blocks
Γ(AA)|χ˜r >= γr|χ˜r >, (A.4)
where the index A stands for an atom label, r the NAO label, and γr is the occupation of the
rth NAO belonging to center A.
The steps to obtain the transformation matrix TNAO are as follows:
A.3. NAO Transformation 155
(1) Partitioning and symmetry averaging of Γ and S.
Each of these matrices is partitioned into (Al) blocks, l being the angular momentum
of the basis function (s, p,d,... ). Each of the blocks are then averaged over m (the
magnetic quantum number)
Γ(Al)µν =
1
2l+1
2l+1
∑
m=1
P(Alm)µν , (A.5)
S(Al)µν =
1
2l+1
2l+1
∑
m=1
S(Alm)µν . (A.6)
(2) Formation of pre-NAOs
For each (Al) block, one solves the generalized eigenvalue problem
Γ(Al)T˜(Al) = S(Al)T˜(Al)W˜(Al), (A.7)
where T˜(Al) is the pre-NAO transformation matrix for the (Al) block, and W˜(Al) a
diagonal matrix with the symmetry-averaged pre-NAOs occupancies.
(3) Orthogonalization of the high- and low-occupancy NAO spaces
(a) Selection of NMB orbitals
The orthogonalization of the pre-NAOs is done taking into account the occu-
pancy of each orbital. Orbitals with higher occupation should be less distorted
in the process, in order to preserve maximum locality of the electron density.
With this in mind, a group of orbitals is taken as the Natural Minimal Basis
(NMB) set. This selection is made according to the ground state configura-
tion of each atom. For each hydrogen one s-type pre-NAO should be taken,
for carbon two s-type functions and three p-type, and so on. The remaining
orbitals are tagged to belong to a Natural Rydberg Basis (NRB) set. These are
the pre-NAOs of lower occupancy.
(b) Weighted interatomic orthogonalization of the NMB space.
The NMB orbitals are orthogonalized among themselves. The first step is to
find the largest occupation number in the set, and then to divide all values by
this reference to obtain the weighting numbers γ(i). The transformation vector
for each NMB orbital is weighted
TˆNMBµr = γ(r).T˜
NMB
µr . (A.8)
156 Appendix A. Natural Localized Molecular Orbitals
The weighted NMB overlap matrix is built
Sˆ= (TˆNMB)†STˆNMB (A.9)
and the new NMB vectors are obtained as
TNMB = TˆNMBSˆ−
1
2 . (A.10)
For orbitals with the same occupancy this procedure would reduce to a simple
Löwdin orthogonalization.
(c) Schmidt interatomic orthogonalization of NRB to NMB orbitals.
Each NRB orbital is Schmidt orthogonalized to each NMB set orbital:
TNRBµr = Tˆ
NRB
µr −∑
s
TNMBµs
[
(TNMB)†STˆNRB
]
sr
(A.11)
(d) Restoring the natural character of the NRB space.
Due to the Schmidt orthogonalization, one needs to repeat steps (1) and (2),
but only for the NRB space. This is done by transforming both density and
overlap matrices to the NRB basis and rediagonalizing. This is again done in a
symmetry averaged way.
(e) Weighted interatomic orthogonalization of the NRB space.
The NRB space is divided into two sets. One of low occupation (with weights
below 10−4) and a high occupation set. The latter set is orthogonalized with
occupancy weighting. Then, the two sets are Schmidt orthogonalized relative
to each other and, at last, the low occupancy orbitals are orthogonalized without
occupancy weighting (in order to avoid numerical instability).
(f) Final diagonalization
The density and overlap matrices are transformed into the NAO basis, the den-
sity blocks diagonalized (as in steps (1) and (2)), giving the final NMB and
NRB orbitals and occupancies.
A.4. NBO Transformation 157
A.4 NBO Transformation
source files : nbo.f
nho.f
The final NAO density matrix, in the case of a bonded species, will have significantly
large off-diagonal elements. One can therefore expand the search to 2-atom blocks to
include a bond description into the set. The highest occupation orbitals are referred to as
Lewis-type NBOs. These will be the orbitals with occupation exceeding the value of a
parameter thrnbo. This parameter is initialized at 1.90. If the number of orbitals found
is below half the number of electrons in the system, the search is repeated with thrnbo
decremented by 0.10. This procedure is repeated until enough NBOs are found or the
parameter is below 1.50. In the latter case, the search could be expanded to 3-center bonds,
but this option has not yet been implemented.
The orbitals are divided into the following types: core, valence lone pairs and two-
centers bonds (all three classes are Lewis-type), Rydberg and two-center antibonding. For
searching two-center NBOs, it is useful to build a list of probable pairs. The Wiberg Bond
Indices are used. These indices are calculated by summing the square of the off-diagonal
elements of the NAO density matrix
WAB = ∑
r∈A,s∈B
D˜2rs (A.12)
where A and B are atom labels, r and s NAO labels, belonging to the respective centers.
The centers with the largest index will be the first to be considered in the 2-center NBO
bond search. If the first cycle fails to find an appropriate Lewis structure, the ordering is
then defined according to the z-matrix row numbers.
A.4.1 Core and Valence lone pair NBOs
The first NBOs in the search are the core and lone pair orbitals. They are both built in the
same manner, as eigenvectors of the one-center blocks of the NAO density matrix. The
core orbitals are only a convention. The first few eigenvectors for each atom will be taken
to be core, the number depending on the atom involved. The number of core orbitals will
be 1 for the second row elements, 5 for the third row and so on. Both sets of orbitals are
obtained by checking the occupancy of the NAOs. If they are above thrnbo, new NHO
and NBO orbitals are built out of the respective NAO.
158 Appendix A. Natural Localized Molecular Orbitals
After each core/lone pair orbital φr is found, the density matrix block is depleted from
their contribution:
Γ(A) = Γ(A)− γAr|φr >< φr| (A.13)
where γAr is the occupation of the rth NBO at center A.
A.4.2 Two-center Bond NBOs
If the number of core and lone pair NBOs found is below the number of electron pairs, a
search is started for two-center bonds. One takes the depleted density matrix, and forms
subblocks Γ(AB) of the centers with largest Wiberg Bond indices. The block is then diago-
nalized
U†Γ(AB)U=W(AB) (A.14)
where W(AB) is a diagonal matrix with the occupations for each eigenvector. For each
eigenvalue above thrnbo a new NBO of the form
φr = αAr|ϕAr >+αBr|ϕBr > (A.15)
is built, where α is a polarization coefficient and the {ϕAr} functions directed NHOs. The
NHOs are also kept in a NHO transformation matrix, after normalization.
A.4.3 Rydberg NBOs
A projection matrix is built for each NHO
RAr = 1−|ϕAr >< ϕAr|, (A.16)
and a full projection operator for atom A is assembled by multiplying together the projectors
RA =∏
r
RAr. (A.17)
The significant elements of RA are taken and normalized. The resulting matrix R¯A is then
used to transform the density subblock for center A
Γ¯A = R¯†AΓ
(A)R¯A. (A.18)
The density subblock is then diagonalized, its eigenvectors are the Rydberg NHOs/NBOs.
A.5. NLMO Transformation 159
A.4.4 Orthogonalization of the NHOs
At this point, the number of NHOs should be equal to the total number of basis functions.
Although normalized, they are not orthogonal, and therefore they are at this stage symmet-
rically orthogonalized.
A.4.5 Antibonding NBOs
The remaining low-occupation NBOs are antibonding orbitals, which may be found by
calculating the density in the basis of the constituent NHOs of each 2-center NBO and di-
agonalizing. In this way one obtains the occupations as the eigenvalues and the polarization
coefficients as the eigenvectors for both bonding and anti-bonding NBOs.
A.5 NLMO Transformation
source files : nbo.f
nlmo.f
This procedure consists in transforming the Lewis-type orbitals so that they span the
occupied space. Two sets of orbitals are considered
• high-occupancy orbitals (NBO set A) - which consist of core, lone pair and bond
orbitals,
• low-occupancy orbitals (NBO set B) - the remaining Rydberg and anti-bonding or-
bitals.
In order to separate occupied from virtual space, the density matrix should be diagonal
in the D(AA) block (with Dii = 2) and 0 elsewhere. There are various ways to diagonalize
the NBO density matrix, but to maintain the symmetry properties, the procedure suggested
by Reed et. al.[12] is followed:
(1) Find the element Di j in D(AB) of largest magnitude. If |Di j| < e1, the threshold for
zeroing the elements of D(AB), go to step (6) (e1 = 5.10−9)
(2) Find all elements Dkl in D(AB) which are (1− e2)|Di j| or greater in magnitude, and
for which the conditions (1− e3)Dii < Dkk < (1+ e3)Dii and (1− e3)D j j < Dll <
(1+ e3)D j j are true for k and l. Here, e2 is the criterion for degeneracy or near
160 Appendix A. Natural Localized Molecular Orbitals
degeneracy, and we have set it to 1.10−3. The additional criterion e3 should be set
to a value safely larger than e2 (to ensure that elements of D(AB) that are symmetry
equivalent by the e2 criterion are not rejected), and we have set it to 5.10−3. The
number of near-degenerate off-diagonal elements so found is denoted as noff.
(3) Diagonalize the noff matrices (2x2 Jacobi rotation) Dii |Di j|
|Di j| D j j

(4) Find the symmetrized transformation Tsym that reduces the magnitudes of the noff
off-diagonal elements of D in an optimal, yet symmetric manner
(a) Multiply together the noff rotations to yield Tjac
(b) Average the elements in Tjac which are equal in magnitude in D to within a
multiplicative factor of (1± e3), to give the initial Tsym. (All elements in the
AB block of Tsym are made negative, and all elements in the BA block are made
positive.)
(c) Multiply the elements in the AB and BA blocks of Tsym by the sign of the cor-
responding element in D so that all rotation directions are correct.
(d) Normalize the columns of Tsym
(e) Perform a Löwdin symmetric orthogonalization of the column vectors of Tsym
to ensure that Tsym is unitary
(5) Transform D by Tsym and return to step (1) for the next rotation.
(6) The NLMO procedure is finished. TNLMO is just the product of all the Tsym transfor-
mations and thus will retain the symmetry present in D in the NBO basis by virtue of
step (4b).
A.5.1 Exclusion of core orbitals
To exclude the core orbitals from the localization, the same procedure is followed, but with
the valence density as input. This means that the running index in Eq. (A.2) only goes
through the occupied valence orbitals. Also
• the number of NMB orbitals per atom should be accordingly reduced.
A.5. NLMO Transformation 161
• the search for core NBOs is turned off.
• the NLMO procedure is done as detailed before. To build the full orbital set (includ-
ing the canonical core orbitals)
– the nvirt virtual NLMO orbitals are Schmidt orthogonalized relative to the ncore
core orbitals
– the new virtual set should be linearly dependent, with ncore redundant func-
tions. These are removed by building the overlap matrix and diagonalizing.
The eigenfunctions corresponding to the lower ncore eigenvalues are removed.
– the core orbital vectors are then added to the coefficient matrix.
Appendix B
Domain Merging - Quick Guide
B.1 General Procedure
In order to create a merged domain, as a junction of domain lists from two (or more)
different geometries, the following steps should be taken:
(1) Perform an Hartree-Fock energy run for the first structure, followed by orbital local-
ization and domain list build. The domain list should be saved. It is, however, not
necessary to run the full local calculation, only the domains are needed.
Example: hf
{lmp2,domonly=1,save=5400.2}
(2) Perform the same calculation on the second structure, but this time the previously
saved domain list is read, and the merging procedure is applied. The merged domains
can then be saved.
Example: hf
{lmp2,domonly=1,save=5500.2
mergedom,start=5400.2}
(3) Any number of local calculations can be run by reading the merged domain list
Example: hf
{lmp2,start=5500.2}
or step (2) can be repeated in order to include more points.
Some examples on the use of this procedure can be found in the Molpro2006.1
testjobs: hf_loc_merg.test and loc_eom.test.
162
B.2. A step-by-step example: ketene 163
B.2 A step-by-step example: ketene
In this Section, a worked example for domain merging is discussed - the ketene dissociation
from Chapter 3. Geometries were optimized for fixed C-C distances, ranging from 1.2 Å
to 2.5 Å (as depicted in Fig. B.1). The objective of this procedure is to read the domain list
at r(C-C)=1.2 Å (the first point in the path) and merge it with the domain list at r(C-C)=2.5
Å. The resulting domains can then be used for the other structures along the path.
Figure B.1: Diagrammatic representation of the ketene dissociation path.
(1) The following input will perform an SCF calculation for the structure with
r(C-C)=1.2 Å, build the LMOs and PAOs, and save this information on record 5400.2:
geometry
nosym
5
C1 -1.22038 0.00000 0.00000
C2 -0.02038 0.00000 0.00000
O3 1.15610 0.00000 0.00000
H4 -1.78291 0.93638 0.00000
H5 -1.78291 -0.93638 0.00000
endg
hf
{lmp2,domonly=1,save=5400.2}
164 Appendix B. Domain Merging - Quick Guide
The domain information as well as a message confirming the save can be found in
the output:
(...)
Orbital domains
Orb. Atom Charge Crit.
4.1 1 C1 1.11 0.000
5 H5 0.75 0.992
5.1 1 C1 1.11 0.000
4 H4 0.75 0.992
6.1 1 C1 1.17 0.000
2 C2 0.78 0.992
7.1 1 C1 0.98 0.000
2 C2 0.98 0.997
8.1 3 O3 1.28 0.000
2 C2 0.69 0.999
9.1 3 O3 1.39 0.000
2 C2 0.57 0.991
10.1 3 O3 1.68 0.952
2 C2 0.28 0.982
11.1 3 O3 1.82 0.997
(...)
Domain information saved on record 5400.2
(2) A calculation for the second structure (at a 2.5 Å distance) is performed, restoring
the saved domains and merging both sets
geometry
nosym
5
C1 -2.07692 0.00000 0.14319
C2 0.40614 0.00000 -0.14734
O3 1.51994 0.00000 0.07573
H4 -2.10841 0.86712 -0.57628
H5 -2.10841 -0.86712 -0.57628
endg
hf
{lmp2,domonly=1,save=5500.2;mergedom,start=5400.2}
B.2. A step-by-step example: ketene 165
The output should give the following information:
(...)
Orbital domains
Orb. Atom Charge Crit.
4.1 1 C1 1.17 0.000
5 H5 0.81 0.998
5.1 1 C1 1.17 0.000
4 H4 0.81 0.998
6.1 1 C1 1.89 0.995
7.1 2 C2 1.84 0.988
8.1 3 O3 1.29 0.000
2 C2 0.70 0.999
9.1 3 O3 1.44 0.000
2 C2 0.56 1.000
10.1 3 O3 1.43 0.000
2 C2 0.57 1.000
11.1 3 O3 1.84 0.998
Domain list read from record 5400.2
Augmented orbital domains
Orb. Atoms
6.1 1 C1 2 C2
7.1 1 C1 2 C2
(...)
Domain information saved on record 5500.2
In this case, only orbitals 6.1 and 7.1 are changed. The merged domains are coinci-
dent with the orbital domains for the structure r(C-C)=1.2 Å.
166 Appendix B. Domain Merging - Quick Guide
(3) If the same procedure would be repeated, this time starting with the structure at r(C-
C)=1.33 Å, which bears the domain list:
Orbital domains
Orb. Atom Charge Crit.
4.1 1 C1 1.13 0.000
5 H5 0.77 0.994
5.1 1 C1 1.13 0.000
4 H4 0.77 0.994
6.1 1 C1 1.22 0.000
2 C2 0.72 0.990
7.1 2 C2 1.01 0.000
1 C1 0.96 0.997
8.1 3 O3 1.29 0.000
2 C2 0.70 0.998
9.1 3 O3 1.39 0.000
2 C2 0.59 0.994
10.1 3 O3 1.65 0.945
2 C2 0.31 0.980
1 C1 0.04 1.000
11.1 3 O3 1.82 0.997
the domain merge output (by 2.5 Å) would be as following:
Augmented orbital domains
Orb. Atoms
6.1 1 C1 2 C2
7.1 1 C1 2 C2
8.1 1 C1 2 C2 3 O3
9.1 1 C1 2 C2 3 O3
10.1 1 C1 2 C2 3 O3
As before, orbitals 6.1 and 7.1 are slightly augmented, becoming bond orbitals. Or-
bital 10.1, however, includes C1 in its domain. Comparing both domain sets, the
program recognizes that an orbital with domain {O3, C2} should be augmented, but
does not distiguish between orbitals 8.1, 9.1 and 10.1, merging the whole set. This
procedure might seem somewhat wasteful in this case, but it protects the algorithm
from orbital transformations. This has been discussed in further detail in Chapter 3.
Appendix C
LMOMO - Quick Guide
C.1 General Procedure
The use of LMOMO is controlled by the REGION directive
REGION,METHOD=method,[DEFAULT=default_method],
[TYPE=INCLUSIVE|EXCLUSIVE], atom1, atom2 ...
The list of atoms defines the orbitals which will be treated at the level defined by
method. If TYPE=INCLUSIVE, any orbital containing one of the atoms in its domain cen-
tre list will be included. This is the default and has been used throughout this work. For
TYPE=EXCLUSIVE, only orbitals whose domains are exclusively covered by the given atoms
will be added. This is in general not advised. With the use of this option the most delo-
calized orbitals (usually the pi-orbitals) will be in general excluded, even when the major
charge centers are inside the region. The potential energy surface between two atoms is also
affected in an unpredictable way. The use of the EXCLUSIVE option should be restricted to
cases where specific orbitals need to be added, and the INCLUSIVE option fails to give the
desired selection. The remaining atoms, if no further region is assigned, will be treated at
the level given by default_method.
Any local correlation treatment can be given as method, with the restriction that only
MP2 and HF can be used as default_method. Up to two REGION directives may be included
in a single calculation, ordered according to the correlation level (method) specified for the
region. The highest level region should be given last.
A simple LMP2:HF LMOMO calculation can be invoked with the use of
lmp2
region,method=mp2,default=hf,type=inclusive,...
167
168 Appendix C. LMOMO - Quick Guide
A three region LCCSD(T0):LMP2:HF calculation can be called as
lccsd(t)
region,method=mp2,default=hf,type=inclusive,...
region,method=ccsd(t),default=hf,type=inclusive,...
Some examples on the use of this procedure can be found in the Molpro2006.3 [8]
testjobs: lmp2_regions.test, h2odim_regions.test and form_atoml.test.
C.2. A step-by-step example: SN2 reaction 169
C.2 A step-by-step example: SN2 reaction
As an example for the use of the LMOMO approach (and for clarity) a small molecular
system was chosen, the SN2 reaction of ethylchloride with Cl−, a case already discussed
in Chapter 3 in the context of domain merging. Two structures will be used in this Section,
the van der Waals complex and the transition state (both depicted in Fig. C.1).
Figure C.1: Depiction of the van der Waals complex (left side) and transition state (right side) structures for
the SN2 reaction of ethylchloride with Cl−.
Below is the input for an LMP2:HF calculation on the complex, including only the
chlorines in the LMP2 region
set,charge=-1
geomtyp=xyz
geometry
9
C1 -0.017510 0.060386 0.000000
C2 1.489731 -0.001094 0.000000
CL1 -0.596632 1.800801 0.000000
CL2 0.180908 -3.259591 0.000000
H1 -0.445001 -0.403897 -0.876785
H2 -0.445001 -0.403897 0.876785
H3 1.756600 -1.056243 0.000000
H4 1.898238 0.484172 -0.883681
H5 1.898238 0.484172 0.883681
endg
hf
lmp2
region,mp2,type=inclusive,Cl1,Cl2
170 Appendix C. LMOMO - Quick Guide
Notice that the atoms must be numbered, unless there is only one atom of the given
type. The input, as can be seen in this example, can be quite compact. There is no need to
use the METHOD keyword as long as the order is respected. Also, if the low level region is
HF, the DEFAULT keyword may be skipped.
In the output, after the orbital domains information, the LMOMO data is displayed:
==========================
Local Regions
==========================
Region= 1 Method=MP2 Type=1 Class= 2 Atoms:CL1 CL2
Region= 2 Method=HF Type=1 Class= 1 All remaining atoms
Ordering localized MOs according to center regions
Orbital domains and regions
Orb. Atom Region
13.1 4 CL2 MP2
14.1 4 CL2 MP2
15.1 4 CL2 MP2
16.1 4 CL2 MP2
17.1 1 C1 MP2
3 CL1
18.1 3 CL1 MP2
19.1 3 CL1 MP2
20.1 3 CL1 MP2
21.1 2 C2 HF
7 H3
22.1 1 C1 HF
5 H1
23.1 1 C1 HF
6 H2
24.1 1 C1 HF
2 C2
25.1 2 C2 HF
8 H4
26.1 2 C2 HF
9 H5
Region= 1 Method=MP2 Type=1 Class= 2 Orbitals 13.1 14.1 15.1 16.1 17.1
18.1 19.1 20.1
Region= 2 Method=HF Type=1 Class= 1 All remaining orbitals
C.2. A step-by-step example: SN2 reaction 171
The orbitals might have to be reordered, so that the higher level ones are numbered
first. This has to do with the local correlation program internal structure. The orbitals to be
treated at the MP2 level are as expected the chlorine lone pairs and any C-Cl bonds which
may be present (since this is the reactant complex, there is only one bond, connecting the
carbon to the leaving chlorine).
Using the same input, but this time for the transition state geometry, the following
output should be obtained
==========================
Local Regions
==========================
Region= 1 Method=MP2 Type=1 Class= 2 Atoms:CL1 CL2
Region= 2 Method=HF Type=1 Class= 1 All remaining atoms
Ordering localized MOs according to center regions
Orbital domains and regions
Orb. Atom Region
13.1 4 CL2 MP2
14.1 4 CL2 MP2
15.1 4 CL2 MP2
16.1 1 C1 MP2
4 CL2
17.1 1 C1 MP2
3 CL1
18.1 3 CL1 MP2
19.1 3 CL1 MP2
20.1 3 CL1 MP2
21.1 2 C2 HF
7 H3
22.1 1 C1 HF
5 H1
23.1 1 C1 HF
6 H2
24.1 1 C1 HF
2 C2
25.1 2 C2 HF
9 H5
26.1 2 C2 HF
8 H4
172 Appendix C. LMOMO - Quick Guide
Region= 1 Method=MP2 Type=1 Class= 2 Orbitals 13.1 14.1 15.1 16.1 17.1
18.1 19.1 20.1
Region= 2 Method=HF Type=1 Class= 1 All remaining orbitals
In calculating relative energies, the number of orbitals in each region should be con-
sistent. One of the previous lone pairs is now a C-Cl bond, which is also included in the
LMP2 region.
In a three region LCCSD:LMP2:HF calculation, with both chlorines in the LCCSD
region and the carbon in the LMP2 (leaving only the three methyl C-H bonds uncorrelated),
the input should be given as:
(...)
hf
lccsd
region,mp2,type=inclusive,C1,Cl1,Cl2
region,ccsd,type=inclusive,Cl1,Cl2
Notice that the first region (LMP2) contains the second. This is not necessary, but
advisable, in order to avoid that orbitals connecting both sets of atoms are left out. The
local correlation program to be invoked was also changed. It should always be the highest
correlation level.
The output will show the three regions, with the orbitals reordered in a consistent way
relative to the levels of theory. The transition state output would be similar, and will be
skipped. The complex structure output should be as follows:
C.2. A step-by-step example: SN2 reaction 173
==========================
Local Regions
==========================
Region= 1 Method=MP2 Type=1 Class= 2 Atoms:C1
Region= 2 Method=CCSD Type=1 Class= 4 Atoms:CL1 CL2
Region= 3 Method=HF Type=1 Class= 1 All remaining atoms
Ordering localized MOs according to center regions
Orbital domains and regions
Orb. Atom Region
13.1 4 CL2 CCSD
14.1 4 CL2 CCSD
15.1 4 CL2 CCSD
16.1 4 CL2 CCSD
17.1 1 C1 CCSD
3 CL1
18.1 3 CL1 CCSD
19.1 3 CL1 CCSD
20.1 3 CL1 CCSD
21.1 1 C1 MP2
5 H1
22.1 1 C1 MP2
6 H2
23.1 1 C1 MP2
2 C2
24.1 2 C2 HF
7 H3
25.1 2 C2 HF
8 H4
26.1 2 C2 HF
9 H5
Region= 1 Method=MP2 Type=1 Class= 2 Orbitals 21.1 22.1 23.1
Region= 2 Method=CCSD Type=1 Class= 4 Orbitals 13.1 14.1 15.1 16.1 17.1
18.1 19.1 20.1
Region= 3 Method=HF Type=1 Class= 1 All remaining orbitals
Appendix D
Electrostatic embedding - the polarized
QM Hamiltonian
The use of electrostatic embedding in a QM/MM calculation involves minor changes to the
QM program. The total Hamiltonian has already been presented in Section 2.3. The QM
Hamiltonian will include the MM atoms in the form of point charges, so that the extra terms
to be computed will only involve one-electron integrals. In the HF case, a polarization term
is included in the Fock operator
< µ|hpol|ν >=−∑
x
qx
∫ χ∗µ(ri)χν(ri)
rix
dri, (D.1)
where qx is the point charge, and rix the distance between electron i and the point charge x.
In the DFT case, the operator is added to the potential ν(r) in Eq. (2.56). In post-HF
calculations, no further changes are needed since the point-charge effect is induced through
a one-electron operator1. The nuclei interactions are trivial to include, a classical Coulomb
energy term between pairs of point charges is needed.
The extra terms to be added to the gradient are also relatively straightforward, and are
given below. The first set of equations give the terms necessary for the gradient relative to
the movement of a point charge y in the direction λ y. The second set refers to the movement
of a nucleus m in the λm direction.
1The polarization will however have an effect on the correlation energy estimate since the reference func-
tion is changed.
174
175
MMmovement
• point charge-nuclei interaction
∂
∂λ y
(
∑
m
∑
x
Zmqx
|rmx|
)
=∑
m
−Zmqy
r3my
(λ y−λm) (D.2)
• point charge-electron interaction
∂
∂λ y
(
∑
i
∑
x
−2< i| qx|r1x| |i>
)
=∑
µν
−qyDµν < µ| ∂∂λ y
1
|r1y| |ν > (D.3)
QMmovement
• point charge-nuclei interaction
∂
∂λm
(
∑
n
∑
x
Znqx
|rmx|
)
=∑
x
−Zmqx
r3mx
(λ x−λm) (D.4)
• point charge-electron interaction
∂
∂λm
(
∑
i
∑
x
−2< i| qx|r1x| |i>
)
=∑
µν
∑
x
−qxDµν ( < ∂∂λmµ|
1
|r1x| |ν >+
< µ| 1|r1x| |
∂
∂λm
ν >
)
(D.5)
Appendix E
Optimized stationary points structures
In this Appendix, the structures for the reactions discussed in Chapter 3 are provided. All
stationary points were optimized at the MP2 level of theory, with the cc-pVTZ basis set for
C and H atoms, and aug-cc-pVTZ for the halogens (F and Cl). The basis was truncated,
removing d functions for H and f functions for all other atoms, and will be refered to as
[aug]-cc-pVTZ(d/p).
E.1 SN2 Reactions
The following structures have been used in Section 3.3.2. The starting structures for the
ethylchloride and propylchloride reactions were based on the geometries given in Ref. [70].
ethylchloride
8
MP2/[aug]-cc-pVTZ(d/p) ENERGY= -538.684753678278
C 0.0000491284 -0.0281725670 0.0000000000
C 1.5133326805 0.0056340682 0.0000000000
CL -0.6820565678 1.6397425625 0.0000000000
H -0.3919835253 -0.5193607804 -0.8823819824
H -0.3919835253 -0.5193607804 0.8823819824
H 1.9005397634 -1.0117021252 0.0000000000
H 1.8858362939 0.5190142077 -0.8811946822
H 1.8858362939 0.5190142077 0.8811946822
176
E.1. SN2 Reactions 177
ethylchloride + Cl− vdWC
9
MP2/[aug]-cc-pVTZ(d/p) ENERGY= -998.454049715442
C -0.0175103714 0.0603860676 0.0000000000
C 1.4897305750 -0.0010944576 0.0000000000
CL -0.5966320198 1.8008005483 0.0000000000
CL 0.1809079349 -3.2595906730 0.0000000000
H -0.4450013087 -0.4038966857 -0.8767845791
H -0.4450013087 -0.4038966857 0.8767845791
H 1.7566002000 -1.0562428347 0.0000000000
H 1.8982384203 0.4841717569 -0.8836807219
H 1.8982384203 0.4841717569 0.8836807219
ethylchloride + Cl− TS
9
MP2/[aug]-cc-pVTZ(d/p) ENERGY= -998.423735700402
C 0.0893485352 -0.1932118058 0.0057894157
C 1.5814783688 0.0070626598 -0.0156402249
CL -0.4216609371 2.1052282761 -0.0474182077
CL -0.0042796436 -2.5201671935 0.0790955645
H -0.4662212878 -0.2650957137 -0.9035760411
H -0.4501853038 -0.2130344848 0.9269505376
H 2.0856719055 -0.9528148166 -0.0324667740
H 1.8604883980 0.5816952596 -0.8928959171
H 1.8895841831 0.5645728239 0.8631194862
propylchloride
12
MP2/[aug]-cc-pVTZ(d/p) ENERGY= -577.89491330
C 0.0000000000 0.0000000000 0.0000000000
C 0.0000000624 -0.0000001170 1.5162567214
C 1.4008302125 0.0000001225 2.1195011244
CL 0.7081221925 -1.5208117776 -0.6669099573
H -1.0043124474 0.0638302770 -0.4018910141
H 0.5985937500 0.8135720688 -0.3968188980
H -0.5390379766 0.8944022377 1.8331771643
H -0.5668032213 -0.8587898726 1.8715903014
H 1.3542487590 0.0422231295 3.2044425429
H 1.9436674789 -0.8974559929 1.8370533557
H 1.9682241024 0.8620285201 1.7727769374
178 Appendix E. Optimized stationary points structures
propylchloride + Cl− vdWC
12
MP2/[aug]-cc-pVTZ(d/p) ENERGY= -1037.66265603
C 0.0000000000 0.0000000000 0.0000000000
C 0.0000000000 0.0000000000 1.5114596785
C 1.4084641222 -0.0000001231 2.0950450613
CL 0.6097966673 -1.5929584568 -0.6710061629
CL 0.2490790090 3.3144472800 0.5492214989
H -0.9962715091 0.1164498208 -0.4079608971
H 0.6398742353 0.7820538958 -0.3888053057
H -0.4957622007 0.9273630017 1.7956943671
H -0.5804929134 -0.8473090282 1.8800303572
H 1.3759068871 0.0121340360 3.1831720329
H 1.9678927783 -0.8797163406 1.7804518974
H 1.9268347699 0.8964758697 1.7611463879
propylchloride + Cl− TS
12
MP2/[aug]-cc-pVTZ(d/p) ENERGY= -1037.63374396
C1 0.0000000000 0.0000000000 0.0000000000
C2 0.0000000000 0.0000000000 1.5064064098
C3 1.4192325487 -0.0000001241 2.0608496716
CL1 0.0549055380 -2.3222706517 -0.2941727203
CL2 0.0879100877 2.3221864185 -0.2902258106
H1 -0.9207877163 0.0070733658 -0.5451807825
H2 0.9138502473 -0.0060923696 -0.5538840186
H3 -0.5372256635 0.8813345344 1.8451491264
H4 -0.5373399246 -0.8811820150 1.8451907852
H5 1.4170810028 -0.0014231805 3.1500005476
H6 1.9488514668 -0.8859701922 1.7165555767
H7 1.9477997256 0.8872098880 1.7184256460
E.1. SN2 Reactions 179
butylchloride
14
MP2/[aug]-cc-pVTZ(d/p) ENERGY= -617.10200758
C 0.0000000000 0.0000000000 0.0000000000
C 0.0000000559 -0.0000001202 1.5158116543
C 1.3940291449 0.0000001219 2.1343585968
C 1.3392543805 0.0436024024 3.6594206895
CL 0.6899485897 -1.5294819789 -0.6666230012
H -1.0035365571 0.0748891822 -0.4021697762
H 0.6073576460 0.8065702406 -0.3977134665
H -0.5369486485 0.8959010217 1.8371999470
H -0.5670927045 -0.8590815060 1.8745173703
H 1.9315911091 -0.8884746593 1.8086078717
H 1.9493207110 0.8612927435 1.7595217749
H 2.3367364716 0.0441231661 4.0916867761
H 0.8229643015 0.9375045390 4.0052341616
H 0.8058426583 -0.8212480411 4.0496180485
butylchloride + Cl− vdWC
15
MP2/[aug]-cc-pVTZ(d/p) ENERGY= -1076.87090560
C 0.0000000000 0.0000000000 0.0000000000
C 0.0000000000 0.0000000000 1.5102996325
C 1.3993774885 0.0000000000 2.1126834621
C 1.3520694936 0.1690244225 3.6292142616
CL 0.6259591969 -1.5827916892 -0.6746671929
CL 0.5168054290 3.2941666423 0.7798249609
H -0.9972164751 0.1061576928 -0.4096078020
H 0.6356838704 0.7890547645 -0.3825330823
H -0.4887433721 0.9326254731 1.7946175203
H -0.5838312039 -0.8435322269 1.8866176692
H 1.9182017316 -0.9234098419 1.8522705114
H 1.9420707639 0.8367533400 1.6749623563
H 2.3502516815 0.1583711492 4.0647332951
H 0.8851403606 1.1197022957 3.8789304041
H 0.7739878753 -0.6300857020 4.0955058584
180 Appendix E. Optimized stationary points structures
butylchloride + Cl− TS
15
MP2/[aug]-cc-pVTZ(d/p) ENERGY= -1076.84161729
C 0.0000000000 0.0000000000 0.0000000000
C 0.0000000000 0.0000000000 1.5055518053
C 1.4137589632 0.0000000000 2.0707031481
C 1.4181593540 0.0132275139 3.5973812005
CL 0.0612762866 -2.3219666382 -0.2876101794
CL 0.0833007687 2.3214847949 -0.2840617995
H -0.9205252782 0.0045991210 -0.5458359741
H 0.9142263414 -0.0041700513 -0.5534474357
H -0.5349924361 0.8825268537 1.8493052029
H -0.5354847452 -0.8820325399 1.8497749264
H 1.9306747206 -0.8862718504 1.7031168063
H 1.9359012460 0.8769870653 1.6886484156
H 2.4312406659 0.0111462233 3.9972333714
H 0.9104910629 0.9001815777 3.9752340513
H 0.9004927407 -0.8608602334 3.9913745284
E.2 Hydrogen fluoride addition to double bonds
The following structures have been used in Section 3.3.3. The starting structures for the
ethene reaction were based on information available in Ref. [79]. The other geometries
were obtained by replacing terminal hydrogen atoms by methyl groups, followed by reop-
timization.
hydrogen fluoride
2
MP2/[aug]-cc-pVTZ(d/p) ENERGY= -100.313071238678
F 0.0000000000 0.0000000000 -0.0942777040
H 0.0000000000 0.0000000000 0.8270777040
E.2. Hydrogen fluoride addition to double bonds 181
ethene
6
MP2/cc-pVTZ(d/p) ENERGY= -78.369790996442
C 0.0000000000 0.0000000000 -0.6666936911
C 0.0000000000 0.0000000000 0.6666936911
H 0.9222825072 0.0000000000 -1.2271017049
H -0.9222825072 0.0000000000 -1.2271017049
H -0.9222825072 0.0000000000 1.2271017049
H 0.9222825072 0.0000000000 1.2271017049
ethene + FH RC
8
MP2/[aug]-cc-pVTZ(d/p) ENERGY= -178.691016864483
C 0.0000000000 1.1405443988 -0.6692956743
C 0.0000000000 1.1411092880 0.6681435357
H 0.9229383890 1.1449049029 -1.2292129521
H -0.9229383890 1.1449049029 -1.2292129521
H -0.9229383890 1.1459543476 1.2280571943
H 0.9229383890 1.1459543476 1.2280571943
F 0.0000000000 -1.9162868319 0.0008815253
H 0.0000000000 -0.9847713560 0.0004381288
ethene + FH TS1
8
MP2/[aug]-cc-pVTZ(d/p) ENERGY= -178.608060439451
C -0.0084871764 0.6715812017 -0.3943440342
C -0.0098235336 0.5764014180 1.0010527421
H 0.9082502150 0.8137640312 -0.9413457645
H -0.9246095913 0.8098870104 -0.9433636556
H -0.9345397342 0.7356728856 1.5307205818
H 0.9130462100 0.7395881124 1.5327594007
H -0.0066977492 -0.6548863139 0.5337084285
F -0.0042205246 -1.2154386090 -0.6491483445
182 Appendix E. Optimized stationary points structures
ethene + FH TS2
8
MP2/[aug]-cc-pVTZ(d/p) ENERGY= -178.702797959265
C 0.1970312769 -0.3539038725 -0.4071188308
C 0.2201399492 -0.3789610152 1.1181260054
H 1.1947926557 -0.3158045932 -0.8325369728
H -0.2909227813 0.5049570907 1.4875600513
H -0.2892459650 -1.2519156096 1.5134097233
H 1.2340844985 -0.3675880330 1.5048574520
H -0.3346534891 -1.2033511313 -0.8239320528
F -0.4722736819 0.7950856841 -0.8580567482
ethene + FH P
8
MP2/[aug]-cc-pVTZ(d/p) ENERGY= -178.708176825049
C 0.2001382437 -0.3525527560 -0.4004451323
C 0.2108755926 -0.3630964289 1.1086621882
H 1.2054197262 -0.3234125438 -0.8117892700
H -0.8038122366 -0.3705003532 1.4952015949
H 0.7267293939 0.5127828871 1.4904852409
H 0.7255539226 -1.2530244047 1.4645909926
H -0.3299286931 -1.2094898421 -0.8070550089
F -0.4635939493 0.7950874416 -0.8571466054
propene
8
MP2/cc-pVTZ(d/p) ENERGY= -117.577927665734
C 0.4747049479 0.0000000000 -0.1252372511
C 0.4980393138 0.0000000000 1.2099256872
H 1.4179165064 0.0000000000 -0.6564432295
C -0.7917587362 0.0000000000 -0.9477485273
H -0.4165819966 0.0000000000 1.7864720803
H 1.4301228919 0.0000000000 1.7529492961
H -0.5723099707 0.0000000000 -2.0109550090
H -1.3965729782 -0.8783059603 -0.7332590234
H -1.3965729782 0.8783059603 -0.7332590234
E.2. Hydrogen fluoride addition to double bonds 183
propene + FH RC
11
MP2/[aug]-cc-pVTZ(d/p) ENERGY= -217.903717406146
C -0.2949464958 -0.7589248509 -1.1081130073
C -0.2635734587 -0.9397181034 0.2191262125
C 0.9819761214 -0.8877121470 1.0521140392
F -0.5663562167 2.1178713673 -0.0298859660
H -1.1944155203 -1.1399213606 0.7369851660
H -1.2201884136 -0.8188655108 -1.6606605649
H 0.6121512452 -0.5671181860 -1.6642307205
H -0.5174308060 1.2076764453 -0.2320019781
H 0.9024154192 -0.1193064773 1.8186892781
H 1.1380683803 -1.8373237473 1.5605454831
H 1.8528567451 -0.6756304293 0.4375340580
propene + FH TS1
11
MP2/[aug]-cc-pVTZ(d/p) ENERGY= -217.825579045524
C -0.2520535390 -0.3785729341 -1.3034813408
C -0.2214244016 -0.5815695909 0.0847879002
C 1.0415252692 -0.5567653211 0.8669271772
F -0.8226019684 1.2507959262 0.4262276543
H -1.1014975198 -0.9709355986 0.5736881160
H -1.0949189532 -0.7505047492 -1.8634137020
H 0.6888968550 -0.2672857925 -1.8201607574
H -0.5975120282 0.7535134663 -0.7860344691
H 0.8731873015 -0.2346912486 1.8868217680
H 1.4357862877 -1.5749913382 0.8757715055
H 1.7711817743 0.0902096717 0.3906489038
184 Appendix E. Optimized stationary points structures
propene + FH TS2
11
MP2/[aug]-cc-pVTZ(d/p) ENERGY= -217.916808102417
C -0.2241902530 -0.2764932915 -1.3955578473
C -0.2045409651 -0.2936131155 0.1325450575
C 1.1882295601 -0.2807821668 0.7196778762
F -0.8787850632 0.8547575488 0.6046268602
H -0.7717793383 -1.1359468955 0.5210222778
H 0.7825935587 -0.2996466233 -1.8037003764
H -0.7778500136 -1.1177759860 -1.8012112166
H -0.7050142577 0.6375539137 -1.7305519203
H 1.1481783900 -0.2457911070 1.8043458472
H 1.7220651182 -1.1787765824 0.4157372256
H 1.7335177096 0.5869206760 0.3557511258
propene + FH P
11
MP2/[aug]-cc-pVTZ(d/p) ENERGY= -217.922109481487
C -0.1696038150 0.3275921246 -0.1071077965
F -0.1509333087 0.3721646301 1.3030434555
C 1.2623276789 0.3325607362 -0.5923944839
H -0.6851625589 1.2321667456 -0.4254022341
C -0.9450396962 -0.9015173180 -0.5241650741
H 1.7865328801 1.2084487119 -0.2217333192
H 1.2861003530 0.3456294229 -1.6798326474
H 1.7755215727 -0.5596677732 -0.2420503504
H -1.9471817401 -0.8789590111 -0.1063553546
H -1.0197403579 -0.9435331697 -1.6085615897
H -0.4372540079 -1.7967630993 -0.1736336056
E.2. Hydrogen fluoride addition to double bonds 185
butene
12
MP2/cc-pVTZ(d/p) ENERGY= -156.791951913676
C -0.5510587595 -0.0001209307 -1.8789764162
C -0.5510458547 -0.0001192664 -0.3783321672
C 0.5510458007 -0.0001191812 0.3783321662
C 0.5510587509 -0.0001210601 1.8789763937
H -1.5165651447 -0.0001191996 0.1169172580
H 1.5165651188 -0.0001188211 -0.1169172037
H -1.0656507082 0.8759157516 -2.2711936625
H 0.4654374272 -0.0001053773 -2.2652667474
H -1.0656221770 -0.8761758660 -2.2711911162
H 1.0656505574 0.8759156607 2.2711937377
H -0.4654373995 -0.0001057734 2.2652668224
H 1.0656223886 -0.8761759367 2.2711909354
butene + FH RC
14
MP2/[aug]-cc-pVTZ(d/p) ENERGY= -257.115559652543
C -0.5466759865 0.6792339680 -1.8871234670
C -0.5486678474 0.6780042622 -0.3864015332
C 0.5555376227 0.6802587566 0.3764479020
C 0.5536107437 0.7017739885 1.8770048881
F -0.0026055517 -2.2937916224 0.0178696977
H -1.5144247833 0.6896518164 0.1088748132
H 1.5213190374 0.6792701136 -0.1188655978
H -0.0008669764 -1.3582134212 0.0104802135
H -1.0477127078 1.5671786978 -2.2686725491
H 0.4687125148 0.6631776348 -2.2747762999
H -1.0795347601 -0.1852285375 -2.2795759357
H 1.0840693276 -0.1586308269 2.2810689810
H 1.0560066876 1.5938788482 2.2469456243
H -0.4620253205 0.6931123219 2.2644622629
186 Appendix E. Optimized stationary points structures
butene + FH TS1
14
MP2/[aug]-cc-pVTZ(d/p) ENERGY= -257.032805714759
C 0.0773171907 -0.4340726131 -0.8542952823
C 0.0524864538 -0.2504657896 0.5371407928
C 1.2826425288 -0.2723483405 1.3703233945
F -0.1474525052 1.7035606259 0.3704032663
H -0.8985178132 -0.2947979617 1.0487770724
C -1.1571152549 -0.8742472565 -1.6121890503
H 1.0462925297 -0.6612567966 -1.2810279888
H 0.0306118065 0.8262969425 -0.6570131286
H 1.1791234270 0.3524634437 2.2484172053
H 1.4445385698 -1.3058208512 1.6840184037
H 2.1417399774 0.0487366203 0.7898199000
H -1.1444473421 -0.4951361724 -2.6300248254
H -1.2269401153 -1.9590752362 -1.6610349935
H -2.0543288407 -0.4955729848 -1.1282671178
butene + FH TS2
14
MP2/[aug]-cc-pVTZ(d/p) ENERGY= -257.123434043630
C 0.0205319390 -0.0090883416 -1.0040613205
C 0.0263806831 -0.0218617481 0.5267186038
F 1.3641941862 -0.0235428597 0.9808127934
C -0.6675814339 -1.2209181222 1.1305745389
H -0.4046510268 0.9010879085 0.9128627468
H -0.5016500446 -0.8901351757 -1.3740724739
C -0.6153912129 1.2533204915 -1.5830250484
H 1.0542654744 -0.0918337729 -1.3319321074
H -0.6074158466 -1.1960779705 2.2147476800
H -1.7154163547 -1.2225112006 0.8364262873
H -0.2046115734 -2.1363397265 0.7695218162
H -0.6070177767 1.2364410237 -2.6698049293
H -1.6501738300 1.3505028779 -1.2591857766
H -0.0769838260 2.1395834925 -1.2541716464
E.2. Hydrogen fluoride addition to double bonds 187
butene + FH P
14
MP2/[aug]-cc-pVTZ(d/p) ENERGY= -257.128435872515
C -0.2738290972 0.0343659040 -1.0005303569
C -0.2917278645 0.0396897469 0.5150389864
C 1.0768993369 0.0671411131 1.1571113462
F -0.9850971752 1.1953665918 0.9334373942
H -0.8707730775 -0.8073238386 0.8841789265
H -1.2996311468 0.1490498233 -1.3461190644
C 0.3309226137 -1.2460136671 -1.5752191071
H 0.2833531426 0.9067752546 -1.3408738285
H 0.9811164648 0.1667731485 2.2343514038
H 1.6163249151 -0.8520106690 0.9432459299
H 1.6501412670 0.9090995505 0.7754443890
H 0.2426092643 -1.2584132076 -2.6580031277
H 1.3854828144 -1.3363913315 -1.3283466821
H -0.1823514576 -2.1256644189 -1.1900502092
Bibliography
[1] P. Pulay, Chem. Phys. Lett. 100, 151 (1983).
[2] S. Saebø, P. Pulay, J. Chem. Phys. 86, 914 (1987).
[3] M. Schütz, G. Hetzer, H.-J. Werner, J. Chem. Phys. 111, 5691 (1999).
[4] M. Schütz, H.-J. Werner, J. Chem. Phys. 114, 661 (2001).
[5] M. Schütz, H.-J. Werner, Chem. Phys. Lett. 318, 370 (2000).
[6] M. Schütz, J. Chem. Phys. 113, 9986 (2000).
[7] M. Schütz, J. Chem. Phys. 116, 8772 (2002).
[8] H.-J. Werner, P. J. Knowles, R. Lindh, F. R. Manby, M. Schütz, P. Celani, T. Korona,
G. Rauhut, R. D. Amos, A. Bernhardsson, A. Berning, D. L. Cooper, M. J. O. Dee-
gan, A. J. Dobbyn, F. Eckert, C. Hampel, G. Hetzer, A. W. Lloyd, S. J. McNicholas,
W. Meyer, M. E. Mura, A. Nicklass, P. Palmieri, R. Pitzer, U. Schumann, H. Stoll,
A. J. Stone, R. Tarroni, T. Thorsteinsson, Molpro, version 2006.3, a package of ab
initio programs (2007). See http://www.molpro.net.
[9] J. W. Boughton, P. Pulay, J. Comput. Chem. 14, 736 (1993).
[10] H.-J. Werner, K. Pflüger, Ann. Reports in Comput. Chem. 2, 53 (2006).
[11] J. Pipek, P. G. Mezey, J. Chem. Phys. 90, 4916 (1989).
[12] A. E. Reed, F. Weinhold, J. Chem. Phys. 83, 1736 (1985).
[13] A. E. Reed, R. B. Weinstock, F. Weinhold, J. Chem. Phys. 83, 735 (1985).
[14] J. C. Slater, Phys. Rev. 34, 1293 (1929).
[15] A. Szabo, N. S. Ostlund, Modern Quantum Chemistry (Dover Pulications, New
York, 1996).
188
Bibliography 189
[16] S. Grimme, J. Chem. Phys. 118, 9095 (2003).
[17] P. Pulay, S. Saebø, W. Meyer, J. Chem. Phys. 81, 1901 (1984).
[18] P. J. Knowles, H.-J. Werner, M. Schütz, Modern Methods and Algorithms of Quan-
tum Chemistry, J. Grotendorst, ed. (NIC-Directors, Jülich, 2000), pp. 97–161.
[19] C. Hampel, K. A. Peterson, H.-J. Werner, Chem. Phys. Lett. 190, 1 (1992).
[20] M. Urban, J. Noga, S. J. Cole, R. J. Bartlett, J. Chem. Phys. 83, 4041 (1985).
[21] J. A. Pople, M. Head-Gordon, K. Raghavachari, J. Chem. Phys. 87, 5968 (1987).
[22] K. Raghavachari, G. W. Trucks, J. A. Pople, M. Head-Gordon, Chem. Phys. Lett.
157, 479 (1989).
[23] W. Kutzelnigg, Localization and Delocalization in Quantum Chemistry, O. Chalvet
et al., ed. (D. Deidel, Dordrecht, 1975), p. 143.
[24] P. Y. Ayala, G. E. Scuseria, J. Chem. Phys. 110, 3660 (1999).
[25] G. E. Scuseria, P. Y. Ayala, J. Chem. Phys. 111, 8330 (1999).
[26] P. E. Maslen, M. Head-Gordon, Chem. Phys. Lett. 283, 102 (1998).
[27] P. E. Maslen, M. Head-Gordon, J. Chem. Phys. 109, 7093 (1998).
[28] S. Saebø, P. Pulay, Chem. Phys. Lett. 113, 13 (1985).
[29] N. Flocke, R. J. Bartlett, J. Chem. Phys. 121, 10935 (2004).
[30] S. F. Boys, Quantum Theory of Atoms, Molecules, and the Solid State, P. O. Löwdin,
ed. (Academic Press, New York, 1966), pp. 253–262.
[31] C. Edmiston, K. Ruedenberg, Rev. Mod. Phys. 34, 457 (1963).
[32] S. Saebø, W. Tong, P. Pulay, J. Chem. Phys. 98, 2170 (1993).
[33] C. Hampel, H.-J. Werner, J. Chem. Phys. 104, 6286 (1996).
[34] M. Schütz, G. Rauhut, H.-J. Werner, J. Phys. Chem. A 102, 5997 (1998).
[35] B. Hartke, M. Schütz, H.-J. Werner, J. Chem. Phys. 239, 561 (1998).
[36] N. Runeberg, M. Schütz, H.-J. Werner, J. Chem. Phys. 110, 7210 (1999).
190 Bibliography
[37] L. Magnko, M. Schweizer, G. Rauhut, M. Schütz, H. Stoll, H.-J. Werner, Phys.
Chem. Chem. Phys. 4, 1006 (2002).
[38] T. Korona, K. Pflüger, H.-J. Werner, Phys. Chem. Chem. Phys. 6, 2059 (2004).
[39] T. Hrenar, G. Rauhut, H.-J. Werner, J. Phys. Chem. A 110, 2060 (2006).
[40] P. Hohenberg, W. Kohn, Phys. Rev. 136, B864 (1964).
[41] A. D. Becke, J. Chem. Phys. 98, 5648 (1993).
[42] A. D. Becke, Phys. Rev. A 38, 3098 (1988).
[43] J. C. Slater, Phys. Rev. 81, 385 (1951).
[44] C. Lee, W. Yang, R. G. Parr, Phys. Rev. B 37, 785 (1988).
[45] S. H. Vosko, L. Wilk, M. Nusair, Can. J. Phys. 58, 1200 (1980).
[46] M. J. S. Dewar, W. Thiel, J. Am. Chem. Soc. 99, 4899 (1977).
[47] W. Thiel, A. Voityuk, Theor. Chim. Acta 81, 391 (1992).
[48] M. J. S. Dewar, E. G. Zoebisch, E. F. Healy, J. J. P. Stewart, J. Am. Chem. Soc. 107,
3902 (1985).
[49] J. J. P. Stewart, J. Comput. Chem. 10, 221 (1989).
[50] A. D. MacKerell, D. Bashford, M. Bellott, R. L. Dunbrack, A. D. MacKerell,
D. Bashford, M. Bellott, R. L. Dunbrack, D. Joseph-McCarthy, L. Kuchnir, K. Kucz-
era, F. T. K. Lau, C. Mattos, S. Michnick, T. Ngo, D. T. Nguyen, B. Prodhom, W. E.
Reiher, B. Roux, M. Schlenkrich, J. C. Smith, R.Stote, J. Straub, M. Watanabe,
J. Wiorkiewicz-Kuczera, D. Yin, M. Karplus, J. Chem. Phys. B 102, 3586 (1998).
[51] P. A. Bash, M. J. Field, M. Karplus, J. Am. Chem. Soc. 109, 8092 (1987).
[52] M. Sierka, J. Sauer, Farad. Discuss. Chem. Soc. 106, 41 (1997).
[53] S. Dapprich, I. Komáromi, K. S. Byun, K. Morokuma, M. J. Frisch, J. Mol. Struct.
THEOCHEM 461, 1 (1999).
[54] T. Vreven, K. S. Byun, I. Komaromi, S. Dapprich, J. A. Montgomery, K. Morokuma,
M. J. Frisch, J. Chem. Theory Comput. 2, 815 (2006).
Bibliography 191
[55] G. E. Briggs, J. B. S. Haldane, Biochem. J. 19, 339 (1925).
[56] A. ElAzhary, G. Rauhut, P. Pulay, H.-J. Werner, J. Chem. Phys. 108, 5185 (1998).
[57] G. Rauhut, H.-J. Werner, Phys. Chem. Chem. Phys. 3, 4853 (2001).
[58] M. Schütz, H.-J. Werner, R. Lindh, F. R. Manby, J. Chem. Phys. 121, 737 (2004).
[59] G. Rauhut, A. E. Azhary, F. Eckert, U. Schumann, H.-J. Werner, Spectrochim. Acta
A 55, 647 (1999).
[60] G. Rauhut, H.-J. Werner, Phys. Chem. Chem. Phys. 5, 2001 (2003).
[61] N. J. Russ, T. D. Crawford, Chem. Phys. Lett. 400, 104 (2004).
[62] J. Gauss, H.-J. Werner, Phys. Chem. Chem. Phys. 2, 2083 (2000).
[63] N. J. Russ, T. D. Crawford, J. Chem. Phys. 121, 691 (2004).
[64] W. D. Allen, H. F. S. III, J. Chem. Phys. 89, 329 (1988).
[65] Q. Cui, K. Morokuma, J. Chem. Phys. 107, 4951 (1997).
[66] J. G. Hill, J. A. Platts, H.-J. Werner, Phys. Chem. Chem. Phys. 8, 4072 (2006).
[67] P. Valtazanos, K. Ruedenberg, Theor. Chim. Acta 69, 281 (1986).
[68] W. Quapp, M. Hirsch, D. Heidrich, Theor. Chim. Acta 100, 285 (1993).
[69] A. L. L. East, J. Chem. Phys. 108, 3574 (1998).
[70] F. Jensen, Chem. Phys. Lett. 196, 368 (1992).
[71] S. Humbel, S. Sieber, K. Morokuma, J. Chem. Phys. 105, 1959 (1996).
[72] I. Lee, C. K. Kim, D. S. Chung, B.-S. Lee, J. Org. Chem. 59, 4490 (1994).
[73] J. J. Blavins, D. L. Cooper, P. B. Karadakov, J. Phys. Chem. A 108, 914 (2004).
[74] A. Streitwieser, G. S.-C. Choy, F. Abu-Hasanayan, J. Am. Chem. Soc. 119, 5013
(1997).
[75] T. H. Dunning, Jr., J. Chem. Phys. 90, 1007 (1989).
[76] R. A. Kendall, T. H. Dunning, R. J. Harrison, J. Chem. Phys. 96, 6796 (1992).
192 Bibliography
[77] H.-J. Werner, F. R. Manby, P. Knowles, J. Chem. Phys. 118, 8149 (2003).
[78] F. Bernardi, S. F. Boys, Mol. Phys. 19, 553 (1970).
[79] D. Cremer, A. Wu, E. Kraka, Phys. Chem. Chem. Phys. p. 674 (2001).
[80] F. Jensen, Introduction to Computational Biochemistry (John Wiley & Sons, Eng-
land, 1999).
[81] J. P. Foster, F. Weinhold, J. Am. Chem. Soc. 102, 7211 (1980).
[82] A. E. Reed, L. A. Curtiss, F. Weinhold, Chem. Rev. 88, 899 (1988).
[83] E. D. Glendening, J. K. Badenhoop, A. E. Reed, J. E. Carpenter, J. A. Bohmann,
C. M. Morales, F. Weinhold, Nbo 5.0 (2001). Theoretical Chemistry Institute, Uni-
versity of Winsconsin, Madison.
[84] R. Polly, H.-J. Werner, F. R. Manby, P. J. Knowles, Mol. Phys. 102, 2311 (2004).
[85] K. Wermann, M. Walther, W. Gunther, H. Gorls, E. Anders, J. Org. Chem. 66, 720
(2001).
[86] S. Y. Re, K. Morokuma, Theor. Chim. Acta 112, 59 (2004).
[87] J. A. Anderson, B. W. Hopkins, J. L. Chapman, G. S. Tschumper, J. Mol. Struct.
THEOCHEM 771, 65 (2006).
[88] M. Karplus, J. Phys. Chem. B 104, 11 (2000).
[89] W. J. H. van Berkel, F. Müller, J. Chem. Phys. 179, 307 (1989).
[90] J. Vervoort, I. M. C. M. Rietjens, W. J. H. van Berkel, Eur. J. Biochem. 206, 479
(1992).
[91] L. Ridder, A. J. Mulholland, J. Vervoort, I. M. C. M. Rietjens, J. Am. Chem. Soc.
120, 7641 (1998).
[92] L. Ridder, J. N. Harvey, I. M. C. M. Rietjens, J. Vervoort, A. J. Mulholland, J. Phys.
Chem. B 107, 2118 (2003).
[93] S. R. Billeter, C. F. W. Hanser, T. Z. Mordasini, M. Scholten, W. Thiel, W. F. van
Gunsteren, Phys. Chem. Chem. Phys. 3, 688 (2001).
[94] H. M. Senn, S. Thiel, W. Thiel, J. Chem. Theory Comput. 1, 494 (2005).
Bibliography 193
[95] P. Sherwood, A. H. de Vries, M. F. Guest, G. Schreckenbach, C. R. A. Catlow, S. A.
French, A. A. Sokol, S. T. Bromley, W. Thiel, A. J. Turner, S. Billeter, F. Terstegen,
S. Thiel, J. Kendrick, S. C. Rogers, J. Casci, M. Watson, F. King, E. Karlsen, M. S.
voll, A. Fahmi, A. Schäfer, C. Lennartz, J. Mol. Struc. 632, 1 (2003).
[96] D. L. Gatti, B. Entsch, D. P. Ballou, M. L. Ludwig, Biochemistry 35, 567 (1996).
[97] W. F. van Gunsteren, S. R. Billeter, A. A. Eising, P. H. Hünenberger, P. Krüger,
A. E. Mark, W. R. P. Scott, I. G. Tironi, Biomolecular Simulation: The GROMOS96
Manual and User Guide (Biomos b.v., Zürich and Groningen, VdF Hochschulverlag,
ETH Zürich, Zürich, 1996).
[98] J. G. Kirkwood, J. Chem. Phys. 3, 300 (1935).
[99] A. Schäfer, C. Huber, R. Ahlrichs, J. Chem. Phys. 100, 5829 (1994).
[100] F. R. Manby, H.-J. Werner, T. B. Adler, A. J. May, J. Chem. Phys. 124, 094103
(2006).
[101] G. M. Torrie, J. P. Valleau, Chem. Phys. Lett. 28, 578 (1974).
[102] F. P.-D. Martin, R. Dumas, M. J. Field, J. Am. Chem. Soc. 122, 788 (2000).
[103] J. J. Ruiz-Pernía, E. Silla, I. Tuñón, S. Martí, J. Phys. Chem. B 110, 17663 (2006).
[104] A. Y. Lee, J. D. Stewart, J. Clardy, B. Ganem, Chem. Biol. 2, 195 (1995).
[105] P. Kast, M. Asif-Ullah, D. Hilvert, Tetrahedron Lett. 37, 2691 (1996).
[106] P. D. Lyne, A. J. Mulholland, W. G. Richards, J. Am. Chem. Soc. 117, 11345 (1995).
[107] A. Kienhöfer, P. Kast, D. Hilvert, J. Am. Chem. Soc. 125, 3206 (2003).
[108] C. R. W. Guimarães, M. Udier-Blagovic´, I. Tubert-Brohman, W. L. Jorgensen, J.
Chem. Theory Comput. 1, 617 (2005).
[109] S. E. Worthington, A. E. Roitberg, M. J. Krauss, J. Phys. Chem. B 105, 7087 (2001).
[110] S. Hur, T. C. Bruice, Proc. Natl. Acad. Sci. 100, 12015 (2003).
[111] K. E. Ranaghan, A. J. Mulholland, Chem. Comm. pp. 1238–1239 (2004).
[112] K. E. Ranaghan, L. Ridder, B. Szefczyk, W. A. Sokalski, J. C. Hermann, A. J. Mul-
holland, Org. Biomol. Chem. 2, 968 (2004).
194 Bibliography
[113] X. Zhang, T. C. Bruice, Proc. Natl. Acad. Sci. 102, 18356 (2005).
[114] F. Clayessens, K. E. Ranaghan, F. R. Manby, J. N. Harvey, A. J. Mulholland, Chem.
Comm. pp. 5068–5070 (2005).
[115] Y. Chook, H. Ke, W. Lipscomb, Proc. Natl. Acad. Sci. 90, 8600 (1993).
[116] A. Crespo, D. A. Scherlis, M. A. Martí, P. Ordejon, A. E. Roitberg, D. A. Estrin, J.
Phys. Chem. B 107, 13728 (2003).
[117] J. N. Harvey, Faraday Discuss. 127, 165 (2004).
[118] I. Schödinger, Jaguar, 4.0 (1996-2001).
[119] J. W. Ponder, Tinker: Software tools for molecular design, v4.0 (2003).
[120] J. E. Subotnik, A. Sodt, M. H. Gordon, J. Chem. Phys. 125, 074116 (2006).