PHYSICAL REVIEW RESEARCH 6, 043326 (2024)

Training robust and generalizable quantum models

Julian Berberich ,1,* Daniel Fink ,2 Daniel Pranjić ,3 Christian Tutschku ,3 and Christian Holm 2

1University of Stuttgart, Institute for Systems Theory and Automatic Control, 70569 Stuttgart, Germany
2University of Stuttgart, Institute for Computational Physics, 70569 Stuttgart, Germany

3Fraunhofer IAO, Fraunhofer Institute for Industrial Engineering, 70569 Stuttgart, Germany

(Received 3 May 2024; accepted 28 November 2024; published 27 December 2024)

Adversarial robustness and generalization are both crucial properties of reliable machine learning models. In
this paper, we study these properties in the context of quantum machine learning based on Lipschitz bounds.
We derive parameter-dependent Lipschitz bounds for quantum models with trainable encoding, showing that the
norm of the data encoding has a crucial impact on the robustness against data perturbations. Further, we derive
a bound on the generalization error which explicitly involves the parameters of the data encoding. Based on
these theoretical results, we propose a practical strategy for training robust and generalizable quantum models
by regularizing the Lipschitz bound in the cost. Moreover, we show that, for fixed and nontrainable encodings,
as those frequently employed in quantum machine learning, the Lipschitz bound cannot be influenced by tuning
the parameters. Thus trainable encodings are crucial for systematically adapting robustness and generalization
during training. The practical implications of our theoretical findings are illustrated with numerical results.

DOI: 10.1103/PhysRevResearch.6.043326

I. INTRODUCTION

Robustness of machine learning (ML) models is an increas-
ingly important property, especially when operating on real-
world data subject to perturbations. In practice, there are vari-
ous possible sources of perturbations such as noisy data acqui-
sition or adversarial attacks. The latter are tiny but carefully
chosen manipulations of the data, and they can lead to dra-
matic misclassification in neural networks [1,2]. As a result,
much research has been devoted to better understanding and
improving adversarial robustness [3–5]. It is well-known that
robustness is closely connected to generalization [1,2,6–8],
i.e., the ability of a model to extrapolate beyond the training
data. Intuitively, if a model is robust then small input changes
only cause small output changes, thus counteracting the risk
of overfitting.

A Lipschitz bound of a model f is any L > 0 satisfying

‖ f (x1) − f (x2)‖ � L‖x1 − x2‖ (1)

for all x1, x2 ∈ D ⊆ Rd , where d is the data dimension. By
definition, Lipschitz bounds quantify the worst-case output
change that can be caused by data perturbations and, thus, they
provide a useful measure of adversarial robustness. Therefore
they are a well-established tool for characterizing robustness
and generalization properties of ML models [2,6,9–15]. Lip-
schitz bounds cannot only be used to better understand these

*Contact author: julian.berberich@ist.uni-stuttgart.de

Published by the American Physical Society under the terms of the
Creative Commons Attribution 4.0 International license. Further
distribution of this work must maintain attribution to the author(s)
and the published article’s title, journal citation, and DOI.

two properties, but they also allow one to improve them by
regularizing the Lipschitz bound during training [2,6,16–18].

In this paper, we study the interplay of robustness and
generalization in quantum machine learning (QML). Varia-
tional quantum circuits are a well-studied class of quantum
models [20–23] and they promise benefits over classical
ML in various aspects including trainability, expressivity,
and generalization performance [24,25]. Data reuploading
circuits generalize the classical variational circuits by con-
catenating a data encoding and a parametrized quantum
circuit not only once but repeatedly, thus iterating between
data- and parameter-dependent gates [26]. This alternation
provides substantial improvements on expressivity, leading
to a universal quantum classifier even in the single-qubit
case [26,28].

Just as in the classical case, robustness is crucial for quan-
tum models. First, if QML is to provide benefits over classical
ML, it is necessary to implement QML circuits which are
robust with respect to quantum errors occurring due to imper-
fect hardware in the noisy intermediate-scale quantum (NISQ)
era [29]. Questions of robustness of quantum models against
such hardware errors have been studied, e.g., in Refs. [30,31].
Lipschitz bounds can be used to study robustness of quantum
algorithms against certain types of hardware errors, e.g., co-
herent control errors [32].

However, robustness against hardware errors is entirely
different from and independent of the robustness of a quantum
model against data perturbations, which is the subject of this
paper. The latter type of robustness has been studied in the
context of quantum adversarial machine learning [33,34]. Not
surprisingly, just like their classical counterparts, quantum
models are also vulnerable to adversarial attacks, both when
operating based on classical data [35,36] and quantum data
[35,37–42]. To mitigate these attacks, it is desirable to design

2643-1564/2024/6(4)/043326(11) 043326-1 Published by the American Physical Society

https://orcid.org/0000-0001-6366-6238
https://orcid.org/0000-0003-2953-7971
https://orcid.org/0009-0007-5307-7377
https://orcid.org/0000-0003-0401-5333
https://orcid.org/0000-0003-2739-310X
https://ror.org/04vnq7t77
https://ror.org/04vnq7t77
https://ror.org/01nh64743
https://crossmark.crossref.org/dialog/?doi=10.1103/PhysRevResearch.6.043326&domain=pdf&date_stamp=2024-12-27
https://doi.org/10.1103/PhysRevResearch.6.043326
https://creativecommons.org/licenses/by/4.0/


JULIAN BERBERICH et al. PHYSICAL REVIEW RESEARCH 6, 043326 (2024)

FIG. 1. Schematic illustration of the quantum model and training setup considered in this work for an exemplary Fashion MNIST data
set [19]. The data x enter the quantum circuit via a trainable encoding, i.e., they are encoded into unitary operators Uj,� j (x) via an affine
function w�

j x + θ j with trainable parameters w j , θ j . During training, we minimize a cost function consisting of the empirical loss as well as
an additional regularization term penalizing the norms of the parameters w j . This regularization reduces the Lipschitz bound of the quantum
model with respect to data perturbations and, thereby, encourages improved robustness and generalization properties.

training schemes encouraging adversarial robustness of the
resulting quantum model. Existing approaches in this direc-
tion include solving an (adversarial) min-max optimization
problem during training [35] or adding adversarial examples
to the training data set [43].

Besides robustness, another important aspect of any quan-
tum model is its ability to generalize to unseen data [44,45]. In
particular, various works have shown generalization bounds
[24,46–49], i.e., bounds on the expected risk of a model de-
pending on its performance on the training data. While these
bounds provide insights into possibilities for constructing
quantum models that generalize well, they also face inherent
limitations due to their uniform nature [50].

A. Contribution

This paper presents a flexible and rigorous framework for
robustness and generalization of quantum models, providing
both a theoretical analysis as well as a simple regularization
strategy which allows to systematically adapt robustness and
generalization during training (see Fig. 1 for an overview).
More precisely, we first derive a Lipschitz bound of a given
quantum model which explicitly involves the parameters of
the data encoding. Based on this result, we propose a regu-
larized training strategy penalizing the norm of the encoding
parameters, which are considered trainable, in order to im-
prove (adversarial) robustness of the model. Further, we derive
a generalization bound which explicitly depends on the pa-
rameters of the quantum model and therefore does not share
the limitations of existing uniform generalization bounds [50].
With numerical results, we demonstrate that the proposed
Lipschitz bound regularization can indeed lead to substantial
improvements in robustness and generalization of quantum
models. Finally, given that the derived Lipschitz bound mainly
depends on the norm of the data encoding, our results reveal
the importance and benefits of trainable encodings over quan-
tum circuits with a priori fixed encoding as frequently used in
variational QML [20–24,28].

B. Outline

The paper is structured as follows. In Sec. II, we introduce
the considered class of quantum models with trainable encod-
ings and we state their Lipschitz bound. Next, in Sec. III, we
use the Lipschitz bound to study robustness of quantum mod-
els and to derive a regularization strategy for robust training
whose benefits are demonstrated with numerical simulations.
We then derive a generalization bound which depends ex-
plicitly on the data encoding parameters and we confirm this
insight numerically by showing improved generalization un-
der the proposed regularization strategy (Sec. IV). Further, in
Sec. V, we discuss an important implication of our results on
the benefits of trainable encodings for robustness and general-
ization. Finally, Sec. VI concludes the paper. In the Appendix,
we provide technical proofs, details on the numerical simula-
tions, as well as additional theoretical and numerical results.

II. QUANTUM MODELS AND THEIR LIPSCHITZ BOUNDS

We consider parametrized unitary operators of the form

Uj,� j (x) = e−i(w�
j x+θ j )Hj , j =, 1 . . . , N (2)

with input data x ∈ Rd , trainable parameters � j = {w j, θ j},
w j ∈ Rd , θ j ∈ R and fixed Hermitian generators Hj . The gen-
erators Hj are user-chosen, see Ref. [44] for references with
guidelines. Depending on the choice of Hj , the operator Uj,� j

acts on either one or multiple qubits. The operators Uj,� j give
rise to the parametrized quantum circuit

U�(x) = UN,�N (x) · · ·U1,�1 (x), (3)

where � = {� j}N
j=1 comprises the set of trainable parameters.

Throughout this paper, we abbreviate the nq-qubit input state
|0〉⊗nq by |0〉. The quantum model considered in this paper
consists of U�(x) applied to |0〉 and followed by a measure-
ment with respect to the observable M, i.e.,

f�(x) = 〈0|U�(x)†MU�(x)|0〉 . (4)

043326-2


TRAINING ROBUST AND GENERALIZABLE QUANTUM … PHYSICAL REVIEW RESEARCH 6, 043326 (2024)

Note that each of the unitary operators Uj,� j (x) involves the
full data vector x, i.e., the data are loaded repeatedly into the
circuit, a strategy that is commonly referred to as data re-
uploading [26]. The encoding of the data x into each Uj,� j (x)
is realized via an affine function w�

j x + θ j , where both w j and
θ j are trainable parameters. Hence, we refer to (4) as a quan-
tum model with trainable encoding. Such trainable encodings
are a generalization of common quantum models [20–24,28],
for which the w j’s are fixed (typically unit vectors) and only
the θ j’s are trained.

Our results rely on Lipschitz bounds (1). A Lipschitz
bound quantifies the maximum perturbation of f that can be
caused by input variations. For the quantum model f�, we can
state the following Lipschitz bound

L� = 2‖M‖
N∑

j=1

‖w j‖‖Hj‖. (5)

The formal derivation can be found in Appendix A. For a
given set of parameters w j , (5) allows to compute the Lips-
chitz bound of the corresponding quantum model. Note that
L� depends only on w j but it is independent of θ j . This
fact plays an important role for potential benefits of trainable
encodings since the parameters w j are not optimized during
training for fixed-encoding circuits. We note that all results in
this paper hold for arbitrary p-norms as long as the same p is
used for both vector and induced matrix norms.

III. ROBUSTNESS OF QUANTUM MODELS

Suppose we want to evaluate the quantum model f� at
x, i.e., we are interested in the value f�(x), but we can
only access f� at some perturbed input x′ = x + ε with an
unknown ε. Such a setup can arise due to various reasons,
e.g., x may be the output of some physical process which
can only be accessed via noisy sensors. The perturbation ε

may also be the result of an adversarial attack, i.e., a per-
turbation aiming to cause a misclassification by choosing ε

such that

‖ f�(x + ε) − f�(x)‖ (6)

is maximized. In either case, to correctly classify x despite
the perturbation, we require that f�(x + ε) is close to f�(x),
meaning that (6) is small. According to (1), a Lipschitz
bound L of f� quantifies exactly this difference, implying that
the maximum possible deviation of f�(x + ε) from f�(x) is
bounded as

‖ f�(x + ε) − f�(x)‖ � L‖ε‖. (7)

This shows that smaller Lipschitz bounds imply better
(worst-case) robustness of models against data perturbations.
Thus, using (5), the robustness of the quantum model f�
is mainly influenced by the parameters of the data en-
coding w j , Hj , and by the observable M. In particular,
smaller values of

∑N
j=1 ‖w j‖‖Hj‖ and ‖M‖ lead to a more

robust model.
We now apply this theoretical insight to train robust quan-

tum models using regularization. We consider a supervised
learning setup with loss � and training data set (xk, yk ) ∈

X × Y of size n. The following optimization problem can be
used to train the quantum model f�:

min
�

1

n

n∑
k=1

�( f�(xk ), yk ). (8)

In order to ensure that f� not only admits a small training loss
but is also robust and generalizes well, we add a regulariza-
tion, leading to

min
�

1

n

n∑
k=1

�( f�(xk ), yk ) + λ

N∑
j=1

‖w j‖2‖Hj‖2. (9)

Regularizing the parameters w j encourages small norms of
the data encoding and, thereby, small values of the Lipschitz
bound L�. We weight the parameter norms ‖w j‖ by ‖Hj‖ due
to their joint occurrence in (5). The hyperparameter λ > 0
allows for a trade off between the two objectives of a small
training loss and robustness/generalization in the cost func-
tion. Note that the regularization does not involve the θ j’s
since they do not influence the Lipschitz bound (5), an issue
we discuss in more detail in Sec. V. Moreover, we do not
introduce an explicit dependence of the regularization on M
since we do not optimize over the observable in this paper. We
note that penalty terms similar to the proposed regularization
can be used for handling hard constraints in binary opti-
mization via the quantum approximate optimization algorithm
[51,52]. Indeed, the above regularization can be interpreted as
a penalty-based relaxation of the corresponding constrained
training problem, i.e., of training a quantum model with Lips-
chitz bound below a specific value.

We now evaluate our theoretical findings based on the
circle classification problem from [53]: within a domain
of X = [−1,+1] × [−1,+1], a circle with radius

√
2/π is

drawn, and all data points inside the circle are labeled with
y = +1, whereas points outside are labeled with y = −1, see
Appendix C (Fig. 6). For the quantum model with train-
able encoding, we use general SU(2) operators and encode
w�

j x + θ j into the first two rotation angles. We repeat this
encoding for each of the considered 3 qubits, followed by
nearest neighbor entangling gates based on CNOTs. Such a
layer is then repeated 3 times. As observable, we use M =
Z ⊗ Z ⊗ Z . The resulting circuit is illustrated in Appendix C
(Fig. 5). As norms in the regularized training problem (9), we
employ 2-norms, but we note that exploring different choices
is an interesting future direction. For example, regularization
with nonsquared norms (e.g., a 1-norm) may enforce sparsity
of the trained QML model and can, thereby, simplify its im-
plementation on NISQ hardware.

The numerical results for the robustness simulations are
shown in Fig. 2, where we compare the worst-case test ac-
curacy and Lipschitz bound of three trained models with
different regularization parameter λ ∈ {0, 0.2, 0.5}. Addition-
ally, the plot shows the accuracy of a trained quantum model
with fixed encoding for the same numbers of qubits and layers
[see (14) and Appendix C for details]. The worst-case test
accuracy of all models is obtained by sampling different noise
samples ε from [−ε̄,+ε̄]. This procedure amounts to finding
adversarial noise samples and, therefore, the resulting worst-
case test accuracy (approximately) quantifies the adversarial

043326-3


JULIAN BERBERICH et al. PHYSICAL REVIEW RESEARCH 6, 043326 (2024)

FIG. 2. We compare robustness of quantum models trained via
(9) for λ ∈ {0, 0.2, 0.5} and a quantum model with fixed encoding
(14). As training and test set, we draw n = 200 and 1000 points
xi ∈ X , respectively, uniformly at random. To study robustness, we
perturb each of the 1000 test data points by random noise drawn
uniformly from [−ε̄, +ε̄]d (d = 2). The test accuracy in the plot is
the worst case over 200 noise samples per data point.

robustness against attacks which are norm-bounded by ε̄. As
expected, all four models deteriorate with increasing noise
level. For zero noise level ε̄ = 0, the model with the largest
regularization parameter λ = 0.5 (and, hence, the smallest
Lipschitz bound L� = 3.67) has a smaller test accuracy than
the nonregularized model with λ = 0. This can be explained
by a decrease in the training accuracy that is caused by the
additional regularization in the cost. For increasing noise lev-
els, however, the enhanced robustness outweighs the loss of
training performance and, therefore, the model with λ = 0.5
outperforms the model with λ = 0. The fixed-encoding model
achieves comparable performance to the trainable-encoding
model with λ = 0.5 for small noise and the worst performance
among all models for high noise. These observations can be
explained by the high Lipschitz bound of the fixed-encoding
model as well as its reduced expressivity, i.e., its limited
ability to approximate functions from data due to the fixed en-
coding parameters w j . Finally, the model with λ = 0.2 almost
always outperforms the model with λ = 0 and, in particular, it
yields a higher test accuracy for small noise levels. This can be
explained by the improved generalization performance caused
by the regularization, an effect we discuss in more detail in the
following.

IV. GENERALIZATION OF QUANTUM MODELS

The Lipschitz bound (5) not only influences robustness
but also has a crucial impact on generalization properties of
the quantum model f�. Intuitively, a smaller Lipschitz bound
implies a smaller variability of f� and, therefore, reduces
the risk of overfitting. This intuition is made formal via the
following generalization bound.

Theorem IV1 (Informal version). Consider a supervised
learning setup with loss � and data set (xk, yk ) ∈ X × Y of
size n drawn according to the probability distribution P. For
the quantum model f� from (4), define the expected risk

R( f�) = ∫
X×Y �(y, f�(x))dP(x, y) and the empirical risk

Rn( f�) = 1
n

∑n
k=1 �(yk, f�(xk )). The generalization error of

f� is bounded as

|R( f�) − Rn( f�)| � C1‖M‖
N∑

j=1

‖w j‖‖Hj‖ + C2√
n

(10)

for some C1,C2 > 0.
The detailed version and proof of Theorem IV 1 are pro-

vided in Appendix B. Generalization bounds as in (10)
quantify the ability of f� to generalize beyond the avail-
able data. The bound (10) depends on the data encoding via∑N

j=1 ‖w j‖‖Hj‖ and on the observable via ‖M‖. In partic-
ular, f� achieves a small generalization error if its Lipschitz
bound L� is small and the size n of the data set is large. Note,
however, the following fundamental trade-off: A too small
Lipschitz bound L� may limit the expressivity of f� and,
therefore, lead to a high empirical risk Rn( f�), in which case
the generalization bound (10) is meaningless. In conclusion,
Theorem IV 1 implies a small expected risk R( f�) if n is
large, L� is small, and f� has a small empirical risk Rn( f�).
In contrast to existing generalization bounds [24,46–49], the
bound (10) is not uniform and explicitly involves the Lipschitz
bound (5), i.e., the parameters of the data encoding. Hence,
Theorem IV 1 does not share the limitations of uniform QML
generalization bounds [50] and it can be used to systemati-
cally influence the generalization performance during training
via regularization. In particular, according to Theorem IV 1,
the regularized training problem (9) encourages models with
improved generalization properties, where the hyperparame-
ter λ trades off between the empirical risk Rn( f�) and the
generalization bound (10). Extending our results by studying
the direct impact of λ on the generalization performance is
an interesting next step, which we expect to be nontrivial
due to the nonconvexity of the loss function in (9), compare
Ref. [54]. In practice, the hyperparameter λ can be tuned, e.g.,
via cross-validation. We discuss and interpret the impact of the
hyperparameter λ in more detail with the following numerical
results.

We evaluate the generalization performance of the train-
able encoding again on the circle classification problem. The
training setup is identical to the robustness simulations and
the numerical results are shown in Fig. 3. Increasing the
regularization parameter λ decreases the Lipschitz bound L�

of the trained model. In accordance with the generalization
bound (10), this reduction of L� improves generalization
performance with the maximum test accuracy at λ = 0.15.
Beyond this value, the regularization causes a too small Lip-
schitz bound, limiting expressivity and, therefore, decreasing
the training accuracy. As a result, the test accuracy decreases
as well. This illustrates the role of λ as a hyperparameter: Reg-
ularization does not always improve performance, but there is
a sweet spot for λ at which both superior generalization and
robustness over the unregularized setup (i.e., λ = 0) can be
obtained.

V. BENEFITS OF TRAINABLE ENCODINGS

A popular class of quantum models is obtained by
constructing circuits which alternate between data- and

043326-4


TRAINING ROBUST AND GENERALIZABLE QUANTUM … PHYSICAL REVIEW RESEARCH 6, 043326 (2024)

FIG. 3. Results for the generalization simulations. The training
setup is identical to the robustness simulations as described in Fig. 2.
As test set, we draw 10 000 points uniformly at random and evaluate
the trained models with different regularization parameter λ.

parameter-dependent gates, i.e., replacing U�(x) in (3) by

U f
φ (x) = W (φL )V (x) · · ·W (φ1)V (x), (11)

compare [20–24,28]. The unitary operators V and W are given
by

V (x) = e−ixDGD · · · e−ix1G1 , (12)

W (φ j ) = e−iφ j,pSp · · · e−iφ j,1S1 (13)

for trainable parameters φ j and generators Gi = G†
i , Si = S†

i .
The corresponding quantum model is given by

f f
φ (x) = 〈0|U f

φ (x)†MU f
φ (x)|0〉 , (14)

see Fig. 4.
It is not hard to show that the parametrized quantum circuit

U�(x) in (3) generalizes the one in (11). Indeed, Uj,� j (x) in
(2) reduces to either

e−ix j G j or e−iφ j S j (15)

for suitable choices of w j , θ j , and Hj . Note that the data
encoding of the quantum model f f

φ (x) is fixed a priori via the
choice of w j and, in particular, it cannot be influenced during
training. Therefore we refer to f f

φ (x) as a quantum model with
fixed encoding, in contrast to f�(x) in (4) which contains
trainable parameters w j and, therefore, a trainable encoding.

Benefits of trainable encodings for the expressivity of
quantum models have been demonstrated numerically in
Refs. [55–57] and theoretically in Refs. [26,28]. In the fol-
lowing, we discuss the importance of trainable encodings for

FIG. 4. Circuit representation of the quantum model (14) with
fixed encoding.

robustness and generalization. Recall that the Lipschitz bound
(5), which we showed to be a crucial quantifier of robustness
and generalization of quantum models, only depends on the
observable M and on the data encoding w j , Hj , but is inde-
pendent of the parameters θ j . Hence, in the quantum model
f f
φ (x) with fixed encoding, the Lipschitz bound (5) cannot

be influenced during training and, instead, is fixed a priori
via the choice of the Hermitian generators Gj . As a result,
training has a limited effect on robustness and generalization
properties of fixed-encoding quantum models.

The distinction between trainable and fixed data encodings
becomes even more apparent when expressing quantum mod-
els as Fourier series [28]. In this case, fixed-encoding quantum
models choose the frequencies of the Fourier basis functions
before the training and only optimize over their coefficients.
On the contrary, trainable-encoding quantum models simul-
taneously optimize over the frequencies and the coefficients
[57] which, according to the Lipschitz bound (5), is key for
influencing robustness and generalization properties.

These insights confirm the observation by [58,59] that
fixed-encoding quantum models are neither sensitive to data
perturbations nor to overfitting. On the one hand, resilience
against these two phenomena is a desirable property. How-
ever, the above discussion also implies that Lipschitz bound
regularization, which is a systematic and effective tool for
influencing robustness and generalization [2,6,16,18], cannot
be implemented for fixed-encoding quantum models to im-
prove robustness and generalization. Indeed, our robustness
simulations in Fig. 2 show that the fixed-encoding model has
a considerably higher Lipschitz bound than all the considered
trainable-encoding models. This implies a significantly worse
robustness with respect to data perturbations and, therefore,
leads to a rapidly decreasing test accuracy for larger noise
levels. Further, regularizing the parameters φ as suggested,
e.g., by Ref. [60], does not affect the Lipschitz bound and,
therefore, cannot be used to improve the robustness. In Ap-
pendix C, we study the effect of regularizing the φ j’s on
generalization. We find that the influence of regularizing the
φ j’s on the test accuracy is limited and likely dependent on
the specific ground-truth distribution generating the data and
the chosen circuit ansatz.

To conclude, our results show that training the encoding in
quantum models not only increases the expressivity but also
leads to superior robustness and generalization properties.

VI. CONCLUSION

In this paper, we studied robustness and generalization
properties of quantum models based on Lipschitz bounds.
Lipschitz bounds are a well-established tool in the classical
ML literature which not only quantify adversarial robustness
but are also closely connected to generalization perfor-
mance. We derived Lipschitz bounds based on the size of
the data encoding which we then used to study robustness
and generalization of quantum models. Given that our gen-
eralization bound explicitly involves the parameters of the
data encoding, it does not face the limitations of uniform
generalization bounds [50]. Further, our theoretical results
highlight the role of trainable encodings combined with
regularization techniques for obtaining robust and general-
izable quantum models. The numerical results confirm our

043326-5


JULIAN BERBERICH et al. PHYSICAL REVIEW RESEARCH 6, 043326 (2024)

theoretical findings, showing the existence of a sweet spot for
the regularization parameter for which our training scheme
improves both robustness as well as generalization compared
to a nonregularized training scheme. It is important to em-
phasize that these numerical results with specific choices of
rotation and entangling gates are mainly used for illustration,
but our theoretical framework applies to all quantum models
that can be written as (4) and, therefore, also allow for dif-
ferent rotation gates, entangling layers, or even parametrized
multiqubit gates.

While our results indicate the potential of using Lipschitz
bounds and regularization techniques in QML, it opens up
various promising directions for future research. First and
foremost, transferring existing research on Lipschitz bounds
in classical ML to the QML setting provides a systematic
framework for handling robustness and generalization, be-
yond the first results presented in this paper. For example,
while we only consider quantum models with affine encodings
w�

j x + θ j , it would be interesting to extend our results to
more general, nonlinear encodings. Classical neural networks
are ideal candidates for realizing a nonlinear encoding since
their Lipschitz properties are well-studied [2,6,9–15], which
would allow to train hybrid quantum-classical models which
are not only expressive but also admit desirable robustness and
generalization properties. Finally, although we focus on vari-
ational quantum models, the basic principles of our results are
transferrable to different quantum models, including quantum
kernel methods [22,23] or linear quantum models [25].

ACKNOWLEDGMENTS

This work was funded by Deutsche Forschungsgemein-
schaft (DFG, German Research Foundation) under Germany’s
Excellence Strategy-EXC 2075-390740016. We acknowledge
the support by the Stuttgart Center for Simulation Science
(SimTech). This work was also supported by the German
Federal Ministry of Economic Affairs and Climate Action
through the project AutoQML (Grant No. 01MQ22002A).

DATA AVAILABILITY

The source code for the numerical case studies is publicly
accessible on GitHub [27].

APPENDIX A: LIPSCHITZ BOUNDS
OF QUANTUM MODELS

In this section, we study Lipschitz bounds of quantum
models as in (4). We first derive a Lipschitz bound which
is less tight than the one in (5) but can be shown using a
simple concatenation argument (Sec. A 1). Next, in Sec. A 2,
we prove that (5) is indeed a Lipschitz bound.

1. Simple Lipschitz bound based on concatenation

Before stating the result, we introduce the notation

W =

⎛
⎜⎝w�

1
...

w�
N

⎞
⎟⎠, 	 =

⎛
⎜⎝θ1

...

θN

⎞
⎟⎠. (A1)

Theorem A1. The following is a Lipschitz bound of f�:

L = 2‖M‖‖W ‖
N∑

j=1

‖Hj‖. (A2)

Proof. Our proof relies on the fact that a Lipschitz bound
of a concatenated function can be obtained based on the
product of the individual Lipschitz bounds. To be precise,
suppose f can be written as f = f1 ◦ f2 ◦ · · · ◦ fh, where ◦
denotes concatenation and each fi admits a Lipschitz bound
Li, i = 1, . . . , h. Then, for arbitrary input arguments x, y of f ,
we obtain

‖ f (x) − f (y)‖ � L1‖ f2 ◦ · · · ◦ fh(x) − f2 ◦ · · · ◦ fh(y)‖
� · · · � L1L2 · · · Lh‖x − y‖. (A3)

We now prove that (A2) is a Lipschitz bound by representing
f� as a concatenation of the three functions

gmeas(zm ) = 〈zm|M|zm〉 , (A4)

gunitary(zu) = e−izu,N HN · · · e−izu,1H1 |0〉 , (A5)

gaffine(za ) = W za + 	. (A6)

More precisely, it holds that

f�(x) = gmeas ◦ gunitary ◦ gaffine(x). (A7)

Hence, any set of Lipschitz bounds Lmeas, Lunitary, Laffine for the
three functions gmeas, gunitary, gaffine gives rise to a Lipschitz
bound of f� as their product:

L = LmeasLunitaryLaffine. (A8)

Therefore, in the following, we will derive individual Lips-
chitz bounds Lmeas, Lunitary, and Laffine.

Lipschitz bound of gmeas. Note that

dgmeas(zm )

dzm
= 2 〈zm|M. (A9)

Using ‖zm‖ = 1, we infer∥∥∥dgmeas(zm )

dzm

∥∥∥ � 2‖M‖. (A10)

Thus Lmeas = 2‖M‖ is a Lipschitz bound of gmeas.
Lipschitz bound of gunitary. It follows from [32, Theorem

2.2] that Lunitary = ∑N
j=1 ‖Hj‖ is a Lipschitz bound of gunitary.

Lipschitz bound of gaffine. Given the linear form of gaffine,
we directly obtain that Laffine = ‖W ‖ is a Lipschitz bound.

2. Proof that (5) is a Lipschitz bound

We first derive a Lipschitz bound on the parametrized uni-
tary U�(x). To this end, we compute its differential

dU�(x) = (dUN,�N (x))UN−1,�N−1 (x) · · ·U1,�1 (x)

+ · · · + UN,�N (x) · · ·U2,�2 (x)(dU1,�1 (x)). (A11)

Note that each term Uj,� j (x) can be written as the concatena-
tion of the two maps g j and h j defined by

g j (x) = w�
j x + θ j, (A12)

h j (z j ) = e−iz j Hj . (A13)

043326-6


TRAINING ROBUST AND GENERALIZABLE QUANTUM … PHYSICAL REVIEW RESEARCH 6, 043326 (2024)

To be precise, it holds that Uj,� j (x) = h j ◦ g j (x). The differ-
entials of the two maps g j and h j are given by

dh j (z j )(u) = −iHje
−iz j Hj u,

dg j (x)(v) = w�
j v, (A14)

where h j (z j )(u) denotes the differential of h j at z j applied to
u ∈ R, and similarly for gj (x)(v). Thus we have

dUj,� j (x)(v) = (dh j (g j (x))) ◦ (dg j (x)(v))

= −iHje
−i(w�

j x+θ j )Hj w�
j v

= −iHjUj,� j (x)w�
j v. (A15)

Inserting this into (A11), we obtain

dU�(x)(v)

= −i
(
HNU�(x)w�

N v + · · · + U�(x)H1w
�
1 v

)
. (A16)

We have thus shown that the Jacobian J�(x) of the map
U�(x) |0〉 is given by

J�(x) = −i
(
HNU�(x) |0〉w�

N + · · · + U�(x)H1 |0〉w�
1

)
.

(A17)

Using that |0〉 has unit norm, that the Uj,� j ’s are unitary, as
well as the triangle inequality, the norm of J�(x) is bounded
as

‖J�(x)‖ �
N∑

j=1

‖w j‖‖Hj‖. (A18)

Thus
∑N

j=1 ‖w j‖‖Hj‖ is a Lipschitz bound of U�(x) |0〉
[61, p. 356].

Finally, f�(x) is a concatenation of U�(x) |0〉 and the func-
tion z �→ 〈z|M|z〉, which admits the Lipschitz bound 2‖M‖,
compare (A10). Hence, a Lipschitz bound of f� can be ob-
tained as the product of these two individual bounds, i.e., as
in (5). �

APPENDIX B: FULL VERSION AND PROOF
OF THEOREM IV 1

We first state the main result in a general supervised
learning setup, before applying it to the quantum model (4)
considered in the paper. Consider a supervised learning setup
with data samples {xk, yk}n

k=1 drawn independently and iden-
tically distributed from Z := X × Y ⊆ Rd × R according to
some probability distribution P. We define the ε-covering
number of Z as follows.

Definition B1 (adapted from [7, Definition 1]). We say that
Ẑ is an ε-cover of Z , if, for all z ∈ Z , there exists ẑ ∈ Ẑ such
that ‖z − ẑ‖ � ε. The ε-covering number of Z is

N (ε,Z ) = min{|Ẑ| | Ẑ is an ε-cover of Z}. (B1)

For a generic model f : X → Y , a loss function � : Y ×
Y → R, and the training data {xk, yk}n

k=1, we define the ex-
pected loss and the empirical loss by

R( f ) =
∫
X×Y

�(y, f (x))dP(x, y) (B2)

and

Rn( f ) = 1

n

n∑
k=1

�(yk, f (xk )), (B3)

respectively. The following result states a generalization
bound of f .

Lemma B1. Suppose
(1) the loss � is nonnegative and admits a Lipschitz bound

L� > 0,
(2) Z is compact such that the value M :=

supy,y′∈Y �(y, y′) is finite, and
(3) L f > 0 is a Lipschitz bound of f .
Then, for any γ , δ > 0, with probability at least 1 − δ the

generalization error of f is bounded as

|R( f ) − Rn( f )| � γ L� max{1, L f }

+ M

√
2N

(
γ

2 ,Z
)

ln 2 + 2 ln
(

1
δ

)
n

. (B4)

Proof. For the following proof, we invoke the concept
of (K, ε)-robustness (adapted from [7, Definition 2]): The
classifier f is (K, ε)-robust for K ∈ N and ε ∈ R, if Z can
be partitioned into K disjoint sets, denoted by {Ci}K

i=1, such
that the following holds: For all k = 1, . . . , n, (x, y) ∈ Z ,
i = 1, . . . , K , if (xk, yk ), (x, y) ∈ Ci, then

|�(yk, f (xk )) − �(y, f (x))| � ε. (B5)

This property quantifies robustness of f in the following
sense: The set Z can be partitioned into a number of subsets
such that, if a newly drawn sample (x, y) lies in the same
subset as a testing sample (xk, yk ), then their associated loss
values are close. Let us now proceed by noting that, for any
(xk, yk ), k = 1, . . . , n, and (x, y) ∈ Z , it holds that

|�(yk, f (xk )) − �(y, f (x))|
�L�‖(yk, f (xk )) − (y, f (x))‖2

�L�(‖yk − y‖ + ‖ f (xk ) − f (x)‖)

�L� max{1, L f }‖(xk, yk ) − (x, y)‖, (B6)

where we use the Lipschitz bound L� of �, the triangle inequal-
ity, and the Lipschitz bound L f of f , respectively. Using [7,
Theorem 6], we infer that f is (N ( γ

2 ,Z ), L� max{1, L f }γ )-
robust for all γ > 0. It now follows from [7, Theorem 1] that,
for any δ > 0, with probability at least 1 − δ inequality (B4)
holds.

Let us now combine the Lipschitz bound (5) and lemma
B 1 to state a tailored generalization bound for the considered
class of quantum models f�, thus proving Theorem IV 1.

Theorem B1. Suppose
(1) the loss � is nonnegative and admits a Lipschitz bound

L� > 0 and
(2) Z is compact such that the value M :=

supy,y′∈Y �(y, y′) is finite.
Then, for any γ , δ > 0, with probability at least 1 − δ the

generalization error of f� is bounded as

|R( f ) − Rn( f )| � γ L� max

{
1, 2‖M‖

N∑
j=1

‖w j‖‖Hj‖
}

+ M

√
2N

(
γ

2 ,Z
)

ln 2 + 2 ln
(

1
δ

)
n

. (B7)

043326-7


JULIAN BERBERICH et al. PHYSICAL REVIEW RESEARCH 6, 043326 (2024)

FIG. 5. The trainable encoding quantum model employed in our numerical case studies uses general URot ∈ SU(2) unitaries
parametrized by three Euler angles αi, βi, γi, that each have the form w�

i x + θi for trainable parameters wi, θi. We set γ j = 0,
j = 0, . . . , 8, to enable a fair comparison to a quantum model with fixed encoding (since the data are two-dimensional, only two angles
are needed for encoding the data via the latter).

Theorem B 1 shows that the size of the data encoding
and of the observable directly influences the generalization
performance of the quantum model f�. In particular, for
smaller values of

∑N
j=1 ‖w j‖‖Hj‖ and ‖M‖, the expected

loss is closer to the empirical loss. The right-hand side of (B7)
contains two terms: The first one depends on the parameters
of the quantum model f� and characterizes its robustness via
the derived Lipschitz bound, whereas the second term decays
with increasing data length n. While (B4) holds for arbitrary
values of γ > 0, it is not immediate which value of γ leads to
the smallest possible bound: smaller values of γ decrease the
first term but increase N ( γ

2 ,Z ) in the second term (and vice
versa).

In contrast to existing QML generalization bounds
[24,46–49], Theorem B 1 explicitly highlights the role of
the model parameters via the Lipschitz bound. Using the
additional flexibility of the parameter γ , it can be shown
that the generalization bound (B7) converges to zero when
the data length n approaches infinity. To this end, we use
Ref. [62, Lemma 6.27] to upper bound the covering number

N
(

γ

2
,Z

)
�

(
6R

γ

)d+1

, (B8)

where R is the radius of the smallest ball containing Z . In-
serting (B8) into (B7) and choosing γ depending on n as
γ = n− 1

2d+2 , we infer

|R( f ) − Rn( f )| � 1

n
1

2d+2

L� max

{
1, 2‖M‖

N∑
j=1

‖w j‖‖Hj‖
}

+ M

√
2 ln 2(6R)d+1

√
n

+ 2 ln
(

1
δ

)
n

,

(B9)

which indeed converges to zero for n → ∞.

APPENDIX C: NUMERICS: SETUP
AND FURTHER RESULTS

In the following, we provide details regarding the setup of
our numerical results (Sec. C 1) and we present further nu-
merical results regarding parameter regularization in quantum
models with fixed encoding (Sec. C 2).

1. Numerical setup

All numerical simulations within this work were performed
using the PYTHON QML library PennyLane [63]. As device,
we used the noiseless simulator “lightning.qubit” together

with the adjoint differentiation method, to enable fast and
memory efficient gradient computations. In order to solve the
optimization problem in (9), we apply the ADAM optimizer
using a learning rate of η = 0.1 and the suggested values
for all other hyperparameters [64]. Furthermore, we run 200
epochs throughout and train 12 models based on different ini-
tial parameters for varying regularization parameters λ � 0.
Adding the regularization does not introduce significant com-
putational overhead as the evaluation of the cost only involves
a weighted sum of the terms ‖w j‖2. As final model, we take
the set of parameters for the model with minimal cost over
all runs and epochs. Furthermore, the training as well as the
robustness and generalization analysis were parallelized using
Dask [65].

For the trainable encoding model, the classical data is en-
coded into the quantum circuit with a general URot ∈ SU(2)
unitary parametrized by 3 Euler angles URot (α j, β j, γ j ) with

α j = w�
j,1x + θ j,1, (C1)

β j = w�
j,2x + θ j,2, (C2)

γ j = w�
j,3x + θ j,3. (C3)

URot in PennyLane is implemented by the following decom-
position:

URot (α, β, γ ) = RZ (γ )RY (β )RZ (α),

=
(

e− i
2 (γ+α) cos

(
β

2

) −e
i
2 (γ−α) sin

(
β

2

)
e− i

2 (γ−α) sin
(

β

2

)
e

i
2 (γ+α) cos

(
β

2

)
)

.

(C4)

In our numerical case study, we set γ j = 0 for all j since
this (1) still allows to reach arbitrary points on the Bloch
sphere and (2) enables an easier comparison to fixed-encoding
quantum models. In order to introduce entanglement in a
hardware-efficient way, we use a ring of CNOTs. The con-
sidered circuit is shown in Fig. 5 and involves three layers
of rotations and entanglement, which we observed to be a
good trade-off between expressivity and generalization. More
precisely, in our simulations, fewer layers were not sufficient
to accurately solve the classification task, whereas more layers
led to higher degrees of overfitting. For the fixed-encoding
quantum models as in (14), we use a similar three-layer
ansatz, encoding the two entries of the 2D data points into the
first and second angle α j , β j of the rotation gates, followed by
a parametrized SU(2) rotation with three free parameters that
are optimized, compare φ j in (11). We also perform a classical
data pre processing, scaling the input domain from [−1,+1]

043326-8


TRAINING ROBUST AND GENERALIZABLE QUANTUM … PHYSICAL REVIEW RESEARCH 6, 043326 (2024)

FIG. 6. From left to right, top to bottom: Illustration of the
ground truth of the circle classification problem, the decision bound-
ary for the fixed encoding model with regularization parameter λ f =
0.0, the trainable encoding model with λt = 0.15 and with λt = 0.0.
For the plot, we took the models with the lowest cost over all runs
and epochs. Furthermore, the small circles denote the 200 training
points.

to [−π,+π ], such that the full possible range of the rotation
angles can be utilized.

In Fig. 6, we plot the ground truth and the decision bound-
aries for the two quantum models corresponding to λ = 0.0
and λ = 0.15, as well as the decision boundary for the fixed-
encoding model. As expected, the decision boundary resulting
from the regularized training is significantly smoother than
the unregularized one, explaining the superior robustness
and generalization of the former. Further, the fixed-encoding
model does not accurately capture the ground truth due to its
limited expressivity and high Lipschitz bound.

2. Regularization in quantum models with fixed encoding

In the main text, we have seen that the Lipschitz bound of
the quantum model with fixed encoding f f

φ (x) in (14) cannot
be adapted by changing the parameters φ. As a result, it is not
possible to use Lipschitz bound regularization for improving
robustness and generalization. In the following, we investigate
whether regularization of φ can instead be used to improve
generalization performance. More precisely, we consider the
same numerical setup as for our generalization results with

FIG. 7. Results for the generalization simulations for the cir-
cle classification problem when using the fixed-encoding quantum
model (14) and regularization of the parameters φ. The training and
test setup are identical to simulations shown in Figs. 2 and 3. We
plot the dependency of the test accuracy and the Lipschitz bound
on the hyperparameter λ entering the regularized training problem
(C5). Further, we plot the test accuracy for the best solution of the
unregularized training problem.

trainable encoding depicted in Fig. 3. The main difference
is that we consider a fixed-encoding quantum model f f

φ (x)
(compare Fig. 4) which is trained via the following regularized
training problem

min
φ

1

n

n∑
k=1

�
(

f f
φ (xk ), yk

) + λ

L∑
j=1

‖φ j‖2. (C5)

The regularization with hyperparameter λ > 0 aims at
keeping the norms of the angles φ j small. Figure 7 depicts the
test accuracy and Lipschitz bound of the resulting quantum
model for different regularization parameters λ. First, note
that the Lipschitz bound is indeed constant for all choices of
λ due to the fixed encoding. Comparing Figs. 3 and 7, we
see that the trainable encoding yields a significantly higher
test accuracy in comparison to the fixed encoding. Moreover,
the influence of the regularization parameter on the test ac-
curacy is much less pronounced for the fixed encoding than
for the trainable encoding, confirming our previous discussion
on the benefits of trainable encodings. The test accuracy is
not entirely independent of λ since (1) regularization of the
parameters φ j influences the optimization and can improve
or deteriorate convergence and (2) biasing φ towards zero
may be beneficial if the underlying ground-truth distribution
is better approximated by a quantum model with small values
φ. Whether (2) brings practical benefits is, however, highly
problem-specific as it depends on the distribution generating
the data and on the circuit ansatz.

[1] I. J. Goodfellow, J. Shlens, and C. Szegedy, Explaining and
harnessing adversarial examples, arXiv:1412.6572.

[2] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan,
I. Goodfellow, and R. Fergus, Intriguing properties of neural
networks, arXiv:1312.6199.

[3] E. Wong and Z. Kolter, Provable defenses against
adversarial examples via the convex outer adversarial
polytope, in Proceedings of the 35th International
Conference on Machine Learning (PMLR, 2018),
pp. 5283–5292

043326-9

https://arxiv.org/abs/1412.6572
https://arxiv.org/abs/1312.6199


JULIAN BERBERICH et al. PHYSICAL REVIEW RESEARCH 6, 043326 (2024)

[4] Y. Tsuzuku, I. Sato, and M. Sugiyama, Lipschitz-margin train-
ing: Scalable certification of perturbation invariance for deep
neural networks, in Proceedings of the Advances in Neu-
ral Information Processing Systems (PMLR, 2018), Vol. 80,
pp. 6541–6550.

[5] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu,
Towards deep learning models resistant to adversarial attacks,
arXiv:1706.06083.

[6] A. Krogh and J. Hertz, A simple weight decay can improve
generalization, in Advances in Neural Information Processing
Systems (PMLR, 1991), Vol. 4.

[7] H. Xu and S. Mannor, Robustness and generalization,
Mach. Learn. 86, 391 (2012).

[8] N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami,
Distillation as a defense to adversarial perturbations against
deep neural networks, in Proceedings of the IEEE Symposium
on Security and Privacy (SP) (IEEE, Piscataway, NJ, 2016),
pp. 582–597.

[9] U. von Luxburg and O. Bousquet, Distance-based classifica-
tion with Lipschitz functions, J. Mach. Learn. Res. 5, 669
(2004).

[10] P. Bartlett, D. J. Foster, and M. Telgarsky, Spectrally-
normalized margin bounds for neural networks, in Advances in
Neural Information Processing Systems (PMLR, 2017), Vol. 30,
pp. 6240–6249.

[11] B. Neyshabur, S. Bhojanapalli, D. McAllester, and N.
Srebro, Exploring generalization in deep learning, in Advances
in Neural Information Processing Systems (PMLR, 2017),
pp. 5947–5956.

[12] J. Sokolić, R. Giryes, G. Sapiro, and M. R. D. Rodrigues,
Robust large margin deep neural networks, IEEE Trans. Signal
Process. 65, 4265 (2017).

[13] T.-W. Weng, H. Zhang, P.-Y. Chen, J. Yi, D. Su, Y. Gao, C.-J.
Hsieh, and L. Daniel, Evaluating the robustness of neural net-
works: an extreme value theory approach, in Proceedings of the
6th International Conference Learning Representations (ICLR)
(PMLR, 2018).

[14] W. Ruan, X. Huang, and M. Kwiatkowska, Reachability anal-
ysis of deep neural networks with provable guarantees, in
Proceedings of the 27th International Joint Conference Artificial
Intelligence (IJCAI) (AAAI Press, Stockholm, Sweden, 2018),
pp. 2651–2659.

[15] C. Wei and T. Ma, Data-dependent sample complexity of deep
neural networks via Lipschitz augmentation, in Advances in
Neural Information Processing Systems (2019), pp. 9725–9736.

[16] M. Hein and M. Andriushchenko, Formal guarantees on the
robustness of a classifier against adversarial manipulation, in
Proceedings of the Advances in Neural Information Processing
Systems (PMLR, 2017), pp. 2266–2276.

[17] H. Gouk, E. Frank, B. Pfahringer, and M. Cree, Regulari-
sation of neural networks by enforcing Lipschitz continuity,
Mach. Learn. 110, 393 (2021).

[18] P. Pauli, A. Koch, J. Berberich, P. Kohler, and F. Allgöwer,
Training robust neural networks using Lipschitz bounds,
IEEE Control Syst. Lett. 6, 121 (2022).

[19] H. Xiao, K. Rasul, and R. Vollgraf, Fashion-MNIST: A novel
image dataset for benchmarking machine learning algorithms,
arXiv:1708.07747.

[20] M. Schuld and F. Petruccione, Machine Learning with Quantum
Computers (Springer, Berlin, 2021).

[21] M. Benedetti, E. Lloyd, S. Sack, and M. Fiorentini, Parame-
terized quantum circuits as machine learning models, Quantum
Sci. Technol. 4, 043001 (2019).

[22] V. Havlíček, A. D. Córcoles, K. Temme, A. W. Harrow, A.
Kandala, J. M. Chow, and J. M. Gambetta, Supervised learning
with quantum-enhanced feature spaces, Nature (London) 567,
209 (2019).

[23] M. Schuld and N. Killoran, Quantum machine learning in fea-
ture Hilbert spaces, Phys. Rev. Lett. 122, 040504 (2019).

[24] A. Abbas, D. Sutter, C. Zoufal, A. Lucchi, A. Figalli, and S.
Woerner, The power of quantum neural networks, Nat. Comput.
Sci. 1, 403 (2021).

[25] S. Jerbi, L. J. Fiderer, H. P. Nautrup, J. M. Kübler, H. J. Briegel,
and V. Dunjko, Quantum machine learning beyond kernel meth-
ods, Nat. Commun. 14, 517 (2023).

[26] A. Pérez-Salinas, A. Cervera-Lierta, E. Gil-Fuster, and J. I.
Latorre, Data re-uploading for a universal quantum classifier,
Quantum 4, 226 (2020).

[27] https://github.com/daniel-fink-de/training-robust-and-
generalizable-quantum-models.

[28] M. Schuld, R. Sweke, and J. J. Meyer, Effect of data encoding
on the expressive power of variational quantum-machine-
learning models, Phys. Rev. A 103, 032430 (2021).

[29] J. Preskill, Quantum computing in the NISQ era and beyond,
Quantum 2, 79 (2018).

[30] R. LaRose and B. Coyle, Robust data encodings for quantum
classifiers, Phys. Rev. A 102, 032420 (2020).

[31] L. Cincio, K. Rudinger, M. Sarovar, and P. J. Coles, Machine
learning of noise-resilient quantum circuits, PRX Quantum 2,
010324 (2021).

[32] J. Berberich, D. Fink, and C. Holm, Robustness of quantum
algorithms against coherent control errors, Phys. Rev. A 109,
012417 (2024).

[33] D. Edwards and D. B. Rawat, Quantum adversarial machine
learning: status, challenges and perspectives, in Proceedings of
the 2nd IEEE International Conference Trust, Privacy and Se-
curity in Intelligent Systems and Applications (TPS-ISA) (IEEE,
Piscataway, NJ, 2020), pp. 128–133.

[34] M. T. West, S.-L. Tsang, J. S. Low, C. D. Hill, C. Leckie,
L. C. L. Hollenberg, S. M. Erfani, and M. Usman, Towards
quantum enhanced adversarial robustness in machine learning,
Nat. Mach. Intell. 5, 581 (2023).

[35] S. Lu, L.-M. Duan, and D.-L. Deng, Quantum adversarial ma-
chine learning, Phys. Rev. Res. 2, 033212 (2020).

[36] M. T. West, S. M. Erfani, C. Leckie, M. Sevior, L. C. L.
Hollenberg, and M. Usman, Benchmarking adversarially robust
quantum machine learning at scale, Phys. Rev. Res. 5, 023186
(2023).

[37] N. Liu and P. Wittek, Vulnerability of quantum classification to
adversarial perturbations, Phys. Rev. A 101, 062331 (2020).

[38] H. Liao, I. Convy, W. J. Huggins, and K. B. Whaley, Robust
in practice: adversarial attacks on quantum machine learning,
Phys. Rev. A 103, 042427 (2021).

[39] Y. Du, M.-H. Hsieh, T. Liu, D. Tao, and N. Liu, Quantum noise
protects quantum classifiers against adversaries, Phys. Rev. Res.
3, 023153 (2021).

[40] J. Guan, W. Fang, and M. Ying, Robustness verification of
quantum classifiers, in Proceedings of the International Con-
ference Computer Aided Verification (Springer, Cham, 2021),
pp. 151–174.

043326-10

https://arxiv.org/abs/1706.06083
https://doi.org/10.1007/s10994-011-5268-1
https://www.jmlr.org/papers/v5/luxburg04b.html
https://doi.org/10.1109/TSP.2017.2708039
https://doi.org/10.1007/s10994-020-05929-w
https://doi.org/10.1109/LCSYS.2021.3050444
https://arxiv.org/abs/1708.07747
https://doi.org/10.1088/2058-9565/ab4eb5
https://doi.org/10.1038/s41586-019-0980-2
https://doi.org/10.1103/PhysRevLett.122.040504
https://doi.org/10.1038/s43588-021-00084-1
https://doi.org/10.1038/s41467-023-36159-y
https://doi.org/10.22331/q-2020-02-06-226
https://github.com/daniel-fink-de/training-robust-and-generalizable-quantum-models
https://doi.org/10.1103/PhysRevA.103.032430
https://doi.org/10.22331/q-2018-08-06-79
https://doi.org/10.1103/PhysRevA.102.032420
https://doi.org/10.1103/PRXQuantum.2.010324
https://doi.org/10.1103/PhysRevA.109.012417
https://doi.org/10.1038/s42256-023-00661-1
https://doi.org/10.1103/PhysRevResearch.2.033212
https://doi.org/10.1103/PhysRevResearch.5.023186
https://doi.org/10.1103/PhysRevA.101.062331
https://doi.org/10.1103/PhysRevA.103.042427
https://doi.org/10.1103/PhysRevResearch.3.023153


TRAINING ROBUST AND GENERALIZABLE QUANTUM … PHYSICAL REVIEW RESEARCH 6, 043326 (2024)

[41] M. Weber, N. Liu, B. Li, C. Zhang, and Z. Zhao, Optimal
provable robustness of quantum classification via quantum hy-
pothesis testing, npj Quantum. Inf. 7, 76 (2021).

[42] W. Gong and D.-L. Deng, Universal adversarial examples
and perturbations for quantum classifiers, Natl. Sci. Rev. 9,
nwab130 (2022).

[43] W. Ren, W. Li, S. Xu, K. Wang, W. Jiang, F. Jin, X. Zhu, J.
Chen, Z. Song, P. Zhang, H. Dong, X. Zhang, J. Deng, Y. Gao,
C. Zhang, Y. Wu, B. Zhang, Q. Guo, H. Li, Z. Wang, et al.,
Experimental quantum adversarial learning with programmable
superconducting qubits, Nat. Comput. Sci. 2, 711 (2022).

[44] M. Cerezo, G. Verdon, H.-Y. Huang, L. Cincio, and P. J. Coles,
Challenges and opportunities in quantum machine learning,
Nat. Comput. Sci. 2, 567 (2022).

[45] E. Peters and M. Schuld, Generalization despite overfitting in
quantum machine learning models, Quantum 7, 1210 (2023).

[46] H.-Y. Huang, M. Bourghton, M. Mohseni, R. Babbush, S.
Boixo, H. Neven, and J. R. McClean, Power of data in quantum
machine learning, Nat. Commun. 12, 2631 (2021).

[47] L. Banchi, J. Pereira, and S. Pirandola, Generalization in
quantum machine learning: A quantum information standpoint,
PRX Quantum 2, 040321 (2021).

[48] M. C. Caro, E. Gil-Fuster, J. J. Meyer, J. Eisert, and R. Sweke,
Encoding-dependent generalization bounds for parametrized
quantum circuits, Quantum 5, 582 (2021).

[49] S. Jerbi, C. Gyurik, S. C. Marshall, R. Molteni, and V. Dunjko,
Shadows of quantum machine learning, Nat. Commun. 15, 5676
(2024).

[50] E. Gil-Fuster, J. Eisert, and C. Bravo-Prieto, Understanding
quantum machine learning also requires rethinking generaliza-
tion, Nat. Commun. 15, 2277 (2024).

[51] E. Farhi, J. Goldstone, and S. Gutmann, A quantum approxi-
mate optimization algorithm, arXiv:1411.4028.

[52] S. Hadfield, Z. Wang, E. G. Rieffel, B. O’Gorman, D.
Venturelli, and R. Biswas, Quantum approximate optimization
with hard and soft constraints, in Proceedings of the Second
International Workshop on Post Moores Era Supercomputing
(ACM Press, New York, NY, 2017), pp. 15–21.

[53] S. Ahmed, Tutorial: Data reuploading circuits (2021), https://
pennylane.ai/qml/demos/tutorial_data_reuploading_classifier/.

[54] P. Huembeli and A. Dauphin, Characterizing the loss landscape
of variational quantum algorithms, Quantum Sci. Technol. 6,
025011 (2021).

[55] F. J. Gil Vidal and D. O. Theis, Input redundancy for parame-
terized quantum circuits, Front. Phys. 8, 297 (2020).

[56] E. Ovalle-Magallanes, D. E. Alvarado-Carrillo, J. G. Avina-
Cervantes, I. Cruz-Aceves, and J. Ruiz-Pinales, Quantum
angle encoding with learnable rotation applied to quantum-
classical convolutional neural networks, Appl. Soft Comput.
141, 110307 (2023).

[57] B. Jaderberg, A. A. Gentile, Y. A. Berrada, E. Shishenina, and
V. E. Elfving, Let quantum neural networks choose their own
frequencies, Phys. Rev. A 109, 042421 (2024).

[58] K. Mitarai, M. Negoro, M. Kitagawa, and K. Fujii, Quantum
circuit learning, Phys. Rev. A 98, 032309 (2018).

[59] M. Schuld, A. Bocharov, K. M. Svore, and N. Wiebe,
Circuit-centric quantum classifiers, Phys. Rev. A 101, 032308
(2020).

[60] Y. Du, M.-H. Hsieh, T. Liu, S. You, and D. Tao, Learnabil-
ity of quantum neural networks, PRX Quantum 2, 040337
(2021).

[61] T. M. Apostol, Mathematical Analysis, 2nd ed. (Pearson Edu-
cation, 1974).

[62] M. Mohri, A. Rostamizadeh, and A. Talwalkar, Foundations of
Machine Learning, Adaptive Computation and Machine Learn-
ing, 2nd ed. (MIT Press, Cambridge, MA, 2018).

[63] V. Bergholm, J. Izaac, M. Schuld, C. Gogolin, S. Ahmed, V.
Ajith, M. S. Alam, G. Alonso-Linaje, B. AkashNarayanan, A.
Asadi, J. M. Arrazola, U. Azad, S. Banning, C. Blank, T. R.
Bromley, B. A. Cordier, J. Ceroni, A. Delgado, O. D. Matteo,
A. Dusko et al., Pennylane: Automatic differentiation of hybrid
quantum-classical computations, arXiv:1811.04968.

[64] D. P. Kingma and J. Ba, Adam: A method for stochastic opti-
mization, arXiv:1412.6980.

[65] Dask Development Team, Dask: Library for dynamic task
scheduling, https://dask.org.

043326-11

https://doi.org/10.1038/s41534-021-00410-5
https://doi.org/10.1093/nsr/nwab130
https://doi.org/10.1038/s43588-022-00351-9
https://doi.org/10.1038/s43588-022-00311-3
https://doi.org/10.22331/q-2023-12-20-1210
https://doi.org/10.1038/s41467-021-22539-9
https://doi.org/10.1103/PRXQuantum.2.040321
https://doi.org/10.22331/q-2021-11-17-582
https://doi.org/10.1038/s41467-024-49877-8
https://doi.org/10.1038/s41467-024-45882-z
https://arxiv.org/abs/1411.4028
https://pennylane.ai/qml/demos/tutorial_data_reuploading_classifier/
https://doi.org/10.1088/2058-9565/abdbc9
https://doi.org/10.3389/fphy.2020.00297
https://doi.org/10.1016/j.asoc.2023.110307
https://doi.org/10.1103/PhysRevA.109.042421
https://doi.org/10.1103/PhysRevA.98.032309
https://doi.org/10.1103/PhysRevA.101.032308
https://doi.org/10.1103/PRXQuantum.2.040337
https://arxiv.org/abs/1811.04968
https://arxiv.org/abs/1412.6980
https://dask.org