Computational Optimization and Applications (2025) 91:1263–1308 https://doi.org/10.1007/s10589-025-00684-x On the forward–backward method with nonmonotone linesearch for infinite-dimensional nonsmooth nonconvex problems Behzad Azmi1 ·Marco Bernreuther2 Received: 31 May 2023 / Accepted: 9 April 2025 / Published online: 8 May 2025 © The Author(s) 2025 Abstract This paper provides a comprehensive study of the nonmonotone forward–backward splitting (FBS) method for solving a class of nonsmooth composite problems in Hilbert spaces. The objective function is the sum of a Fréchet differentiable (not necessarily convex) function and a proper lower semicontinuous convex (not neces- sarily smooth) function. These problems appear, for example, frequently in the context of optimal control of nonlinear partial differential equations (PDEs) with nonsmooth sparsity-promoting cost functionals. We discuss the convergence and complexity of FBS equipped with the nonmonotone linesearch under different conditions. In par- ticular, R-linear convergence will be derived under quadratic growth-type conditions. We also investigate the applicability of the algorithm to problems governed by PDEs. Numerical experiments are also given that justify our theoretical findings. Keywords Nonsmooth nonconvex optimization · Forward–backward algorithm · Infinite-dimensional problems · Nonmonotone linesearch · Quadratic growth · PDE-constrained optimization Mathematics Subject Classification 90C26 · 49M41 · 65K05 · 65K15 · 49M37 · 65J22 B Behzad Azmi behzad.azmi@uni-konstanz.de Marco Bernreuther marco.bernreuther@iff.uni-stuttgart.de 1 Department of Mathematics and Statistics, University of Konstanz, Universitätsstraße 10, 78457 Konstanz, Germany 2 Institute for Industrial Manufacturing and Management IFF, University of Stuttgart, Allmandring 35, 70569 Stuttgart, Germany 123 http://crossmark.crossref.org/dialog/?doi=10.1007/s10589-025-00684-x&domain=pdf 1264 B. Azmi, M. Bernreuther 1 Introduction In this work, we are concerned with the following composite problem min u∈H �(u) := F(u) + R(u), (P) where H is a real Hilbert space, F is a continuously Fréchet differentiable function (possibly nonconvex), andR is a convex function whose proximal operator is assumed to be explicitly computable. The precise details will be introduced in Sect. 2. Problems of the form (P) appear in several fields of application such as optimal control problems [1, 2], system identification, signal, and image processing [3], machine learning, and statistics [4]. Arguably one of the most well-known algorithms for solving problem (P) is forward–backward splitting (FBS), also known as the proximal gradient method [3, 5], which is the generalization of the classical gradient method for problems with an additional nonsmooth term (see (7)). The iterations of FBS are defined by uk+1 = Prox 1 αk R ( uk − 1 αk ∇F(uk) ) for k ∈ N0, (1) where the step-sizes αk > 0 are supposed to be chosen in a way that guarantees convergence of the algorithm and accelerates it. It is known that the global convergence of FBS is to be sublinear of order (1/k) for the convex case [5], where k stands for the number of iterations. This order can be improved to (1/k2) using an inertial variant of the algorithm based on Nesterov’s accelerated techniques [6, 7]. Convergence of the iterates of FBS to a critical point of problem (P), even for the nonconvex case, has been shown for functions� satisfying the Kurdyka–Łojasiewicz property e.g., [8–12] or quadratic growth error bounds e.g., [13–16]. Due to the simplicity and efficiency of FBS, its convergence, complexity, and applicability have been extensively studied, making it an ongoing area of active research. Numerous step-size strategies have been proposed and analyzed within the context of FBS under various assumptions and structures. For structured nonconvex finite-dimensional problems, works such as [17–19] have focused on the inertial method and constant step-size strategies, while [20–25] have explored linesearch strategies based on function evaluations, as well as quasi-Newton and Newton-type directions. Additionally, for problem (P) posed in an infinite-dimensional Hilbert space, references such as [13, 26–31] have investigated FBS assuming the convexity of F , with [31] also addressing D.C. programming. In case where F is not necessarily convex, we can also mention [32, 33], which explore a class of inexact trust region methods to ensure the convergence of FBS and apply them to problems governed by PDEs. As is well-known for smooth problems, nonmonotone linesearch strategies appear to be numerically efficient in situations where a monotone scheme is forced to prop- agate along the bottom of a narrow, curved valley. Additionally, because they allow for increases in function values, they can be effectively combined with spectral gradi- ent methods, such as the Barzilai-Borwein (BB) step-sizes [34, 35], which naturally exhibit nonmonotone behavior. Building on the nonmonotone approach developed by 123 On the forward–backward method with nonmonotone… 1265 Grippo et al. [36], Wright et al. [37] proposed a nonmonotone FBS for nonsmooth convex composite problems posed in R n . The convergence of this approach, known as SpaRSA, was further studied in [38], where the authors demonstrated the order of (1/k) convergence and R-linear convergence for convex and strongly convex func- tions, respectively. Very recently, in [39], the authors proved convergence to a critical point of this scheme for finite-dimensional nonsmooth nonconvex composite functions under milder conditions, specifically, local Lipschitz continuity of the gradient for the smooth part and uniform continuity for the objective function on its level sets. From a computational perspective, both classical gradient and FBS methods can be accelerated if incorporated with the BB step-sizes. These step-sizes approximate the curvature of the Hessian for the smooth part of the objective function and are partic- ularly efficient in the context of PDE-constrained optimization [34, 40]. For strongly quadratic functionsF , R-linear convergence of the BB step-sizes has been established for (P) provided the condition number of the quadratic operator is sufficiently small (< 2); see [34, Remark 3.4]. However, even for strongly quadratic functions F with condition numbers larger than 2, convergence of the BB step-sizes is not clear. Thus, to guarantee convergence, one needs to combine the BB step-sizes with a nonmonotone linesearch, which relies on function evaluations. Weak and strong convergence can be distinguished only in the infinite-dimensional setting. In practice, numerical solutions to infinite-dimensional problems are typically obtained by implementing algorithms for finite-dimensional approximations. How- ever, analyzing the convergence of these algorithms in the infinite-dimensional context is crucial for ensuring the numerical robustness and stability of the finite-dimensional approximations. Such properties are expressed by the so-called mesh-independent principle (MIP); see, e.g., [40–46]. Roughly speaking, MIP uses results from infinite- dimensional convergence to predict the convergence properties of finite-dimensional approximations (discretized problems). Additionally, MIP offers a theoretical foun- dation for developing refinement strategies; see, e.g., [47]. In light of the above discussion, we investigate the well-posedness, convergence, and complexity of the nonmonotone FBS for problems of the form (P) posed in infinite-dimensional Hilbert spaces, under various assumptions such as nonconvex- ity, convexity, and quadratic growth-type conditions. These results also apply to finite-dimensional problems, where our convergence and complexity analyses and assumptions differ from those in previous works [37–39]. Notably, the convergence of this approach under quadratic growth-type conditions, and its worst-case evaluation complexity for reaching an approximate stationary point have not been previously investigated. We will clearly outline our contributions in the next section. In particular, we investigate the applicability of the nonmonotone FBS for prob- lems governed by PDEs. Despite its numerical efficiency, to the best of our knowledge, this approach has not yet been studied for infinite-dimensional problems. For these problems, due to the lack of compactness in the strong topology, we need to explore convergence in the weak topology, which involves studying notions of sequential continuity for some operators with respect to the weak topology. Here, the nonmono- tonicity of the approach, and working with the weak topology are two issues that we overcomewith our analysis using the Lipschitz continuity of the gradientmapping (see Definition 1). We will also highlight the challenges and differences that arise when 123 1266 B. Azmi, M. Bernreuther considering infinite-dimensional spaces after each proof. Additionally, we investigate and discuss the applicability of our assumptions and, consequently, our results to two examples of nonsmooth nonconvex problems with PDEs. Contributions More precisely, the contributions of this work can be summarized as follows: (i) Starting with the nonconvex case, we prove well-posedness of the algorithm evenwithout Lipschitz continuity of the gradient ofF . Under the global Lipschitz continuity of the gradient ofF , we prove the global convergence of the algorithm with complexity (1/ √ k). To be more precise, we show that the norm of the prox- gradient mapping of iterates vanishes with complexity (1/ √ k).We also establish that every weak sequential cluster point of iterates is a stationary point. (ii) We derive the worst-case evaluation complexity of finding an εtol-stationary point. More precisely, we give estimates on the maximal number of the objec- tive function and prox-grad operator evaluations for computing an approximate stationary point with a user-defined accuracy threshold εtol > 0. (iii) In the convex setting, relying on the concept of quasi-Fejer sequences,we are able to extend previously established results to global convergence, both in terms of function values and iterateswith respect to theweak sequential topology. Further, we show that the convergence is sublinear of order (1/k) in function values. (iv) Under quadratic growth-type conditions, we show global R-linear convergence, both in terms of function values and iterates. The proof of the latter is more deli- cate for infinite-dimensional problems since the transition from weak sequential convergence to strong convergence is not straightforward. (v) Finally, aiming at optimizationproblemsgovernedbyPDEs,wediscuss the valid- ity of the convergence results of the algorithmwithout strongLipschitz continuity of ∇F . Our theoretical framework is supported by two nonsmooth nonconvex problems governed by PDEs, including semilinear elliptic and parabolic equa- tions. We also show that our results are applicable to these problems and report on related numerical experiments. Outline of the paper The rest of this paper is organized as follows: Sect. 2 presents the preliminaries, assumptions on the optimization problem (P), the algorithm, and the nonmonotone linesearch strategy. Section 3 investigates the convergence and complexity of the algo- rithmcomprehensively under different conditions. Section 4 discusses the applicability of the results from the previous section to PDE-constrained optimization problems. Finally, numerical experiments are reported in Sect. 5 that justify our theoretical find- ings. To improve the readability of the paper, we provide proof of some results from Sect. 3 in Appendix A. 123 On the forward–backward method with nonmonotone… 1267 Notation Throughout this paper, the Hilbert space H is endowed with the scalar product (·, ·)H and the induced norm ‖ · ‖H . For a radius r > 0 and ū ∈ H , we define Br (ū) := {u ∈ H : ‖u−ū‖H < r}.We also denotewith PS : H → H , the orthogonal projection onto the set S ⊂ H . Further for every �̃ ∈ R,wedefine [ � ≤ �̃ ] := {u ∈ H : �(u) ≤ �̃}. ByArgmin� we denote the set ofminimizers of�. Sometimeswewill use set-valued (in-)equalities, i.e. A ≤ B for A, B ⊂ H , meaning that a ≤ b for all a ∈ A and b ∈ B. We call a sequence {uk}k ⊂ H quasi-Féjer monotone with respect to a non-empty set S ⊂ H , if for every v ∈ S it holds that ‖uk+1 − v‖2H ≤ ‖uk − v‖2H + εk, where {εk}k ⊂ R≥0 is a summable sequence, i.e., ∑∞ k=0 εk < ∞. To avoid confusion in the notation, the derivative and gradient of F are defined by F ′ : H → H ′ and ∇F : H → H , respectively. In this case, for every u ∈ H we identify ∇F(u) ∈ H by the unique Riesz representative of F ′(u) ∈ H ′, with H ′ denoting the dual space of H . We also use the notion of the convex subdifferential for convex function φ : H → R ∪ {+∞} at u ∈ dom φ := {u ∈ H : φ(u) < +∞} defined as ∂φ(u) := {w ∈ H : φ(v) − φ(u) ≥ (w, v − u)H for all v ∈ H}. 2 Problem formulation and algorithm In this section, the precise theoretical framework and the algorithmwill be introduced. First, we state the precise assumptions on (P). Assumption 1 For problem (P), A1: R : H → R ∪ {+∞} is proper, convex, and lower semicontinuous. A2: F : H → R ∪ {+∞} is continuously Fréchet differentiable on int(domF) con- taining domR, that is, domR ⊆ int(domF). A3: ∇F : int(domF) → H is globally LF ′-Lipschitz continuous. A4: The gradient ∇F : int(domF) → H is weak-to-strong sequentially continuous. Conditions A1–A2 are standard in order to guarantee the well-posedness of the algo- rithm. A3 will be used to derive complexity results for iterations and we will discuss the relaxation of this condition for a large class of problems governed by PDEs later in Sect. 3.4. We use A4 to show that every weak sequential accumulation point of iterates belongs to S∗. Note that, if dim(H) < ∞, A4 follows from A2. According to Assumption 1, the Fermat principle [48, Proposition 9.1.5] for (P) reads as follows: If u∗ ∈ H is a local minimizer of �, then − ∇F(u∗) ∈ ∂R(u∗). (2) 123 1268 B. Azmi, M. Bernreuther Further, with similar arguments as in the proof of [26, Theorem 26.2], the condition (2) can be equivalently expressed as u∗ = Prox 1 α R(u∗ − 1 α ∇F(u∗)) for some α > 0, (3) where the proximal operator Prox 1 α R : H → H is defined by Prox 1 α R(u) := argmin v∈H ( R(v) + α 2 ‖v − u‖2H ) . This operator is well-defined due to A1. We also define the set of critical points by S∗ := {u ∈ H : −∇F(u) ∈ ∂R(u)}. Now we turn our attention towards formalizing (1) and the proposed step-size update strategy by a nonmonotone linesearch method. Therefore we introduce the prox-grad operator and the gradient mapping in the Hilbert space setting analogously to, e.g., [5, 14]. Definition 1 For every α ∈ R>0, we define 1. the prox-grad operator Tα : int(domF) → domR with u �→ Prox 1 α R(u − 1 α ∇F(u)). 2. the gradient mapping Gα : int(domF) → H with u �→ α(u − Tα(u)). Due to Definition 1 and by some simple computations, we obtain for every u ∈ int(domF) that Gα(u) − ∇F(u) ∈ ∂R(Tα(u)). (4) Further, if we define for every u ∈ int(domF), w ∈ H , and α ∈ R>0 Qα(w, u) := F(u) + (∇F(u), w − u)H + α 2 ‖w − u‖2H + R(w), then Tα(u) is the unique minimizer of Qα(·, u), i.e. Tα(u) = argmin w∈H Qα(w, u). As a consequence, we can write (∇F(u), Tα(u) − u)H + 1 2α ‖Gα(u)‖2H + R(Tα(u)) ≤ R(u). (5) By means of the prox-grad operator and the gradient mapping, the iterations (1) can be reformulated as follows: 123 On the forward–backward method with nonmonotone… 1269 uk+1 = Tαk (uk) for k ∈ N0, (6) uk+1 = uk − 1 αk Gαk (uk) for k ∈ N0. (7) In this case, for every u0 ∈ int(domF), the iterations are well-defined, that is {uk}k ⊂ domR ⊆ H . Further, due to (7), we can see the analogy between the proximal gradient method and classical gradient descent for smooth minimization. Using (3) and the notion of the gradient mapping, we also can characterize the critical points of (P), as follows. Proposition 1 For u∗ ∈ int(domF), it holds that Gα(u∗) = 0 for some α ∈ R>0 if and only if u∗ ∈ S∗. ThereforeGα(ū) = 0 defines a natural termination condition analogously to the smooth case. There are many possible choices for the step-size αk in our iterative scheme. In this work, we will consider step-size updates by a BB-type update rule or a combination of them with a nonmonotone linesearch. To be more precise, as the initial trial step- size within the nonmonotone linesearch, we will choose one of the spectral gradient BB-types step-sizes defined by αBB1a k := (uk−uk−1,∇F(uk )−∇F(uk−1))H (uk−uk−1,uk−uk−1)H , αBB2a k := (∇F(uk )−∇F(uk−1),∇F(uk )−∇F(uk−1))H (uk−uk−1,∇F(uk )−∇F(uk−1))H , αBB1b k := (uk−uk−1,Gαk−1 (uk )−Gαk−1 (uk−1))H (uk−uk−1,uk−uk−1)H , αBB2b k := (Gαk−1 (uk )−Gαk−1 (uk−1),Gαk−1 (uk )−Gαk−1 (uk−1))H (uk−uk−1,Gαk−1 (uk )−Gαk−1 (uk−1))H . The first two strategies correspond to the BB-method presented e.g. in [40]. The last two novel BB-methods modify the classical BB-method and try to incorporate full first-order information by using the gradient mapping and not only the gradient of F . Furthermore, we will use so-called alternating BB-type update rules given by αABBa := αBB1a for k even and αBB2a for k odd, αABBb := αBB1b for k even and αBB2b for k odd. Remark 1 In the case of R = 0, we have Gl(u) = ∇F(u) for every l > 0 and u ∈ H . Therefore, many of the introduced step-size updates above are identical, i.e. αBB1b = αBB1a, αBB2b = αBB2a, and αABBb = αABBa. For the nonmonotone linesearch update, we consider �(uk+1) ≤ max 0≤ j≤m(k) �(uk− j ) − δ αk ‖Gαk (uk)‖2H , (8) 123 1270 B. Azmi, M. Bernreuther where 0 < δ < 1 and memory m : N0 → N0 satisfies m(0) = 0 and m(k) = min{m(k − 1) + 1, mmax} for k ∈ N, (9) with a given upper bound mmax ∈ N0. Similarly to [49], we also define the functions : N0 → N0 and ν : N0 → N0 with (k) := k − arg max 0≤ j≤m(k) �(uk− j ) for k ≥ 0 with k − m(k) ≤ (k) ≤ k, ν(k) := (kmmax + k) for k ≥ 0. Thus, by these notations, we have max 0≤ j≤m(k) �(uk− j ) = �(u (k)) and (8) can be rewritten as �(uk+1) ≤ �(u (k)) − δ αk ‖Gαk (uk)‖2H . (10) Further we set αk = αint,kη ik , (11) where η > 1. The initial step-size αint,k > 0 is chosen by the BB-method and a lower bound αinf > 0 is given to ensure nonnegativity. To limit the initial step-size numerically, also an upper bound αsup > αinf is employed. In the update, we choose the smallest integer ik ∈ N0 in (11), such that (8) is satisfied. This procedure is summarized in Algorithm 1. Algorithm 1 Nonmonotone FBS Input: 0 < δ < 1, mmax ∈ N0, η > 1, αsup > αinf > 0, u0 ∈ H , and α0 > 0. Output: A stationary point u∗ ∈ H of (P). 1: Set k = 0; 2: while ‖Gαk (uk )‖H > 0 do 3: if k = 0 then 4: Set αint,k := max{αinf ,min{αsup, α0}; 5: else 6: Compute αBBk according to a BB-method and set αint,k := max{αinf ,min{αsup, αBBk }}; 7: end if 8: Set αk = αint,kηik , where ik ≥ 0 is the smallest integer for which (8) holds; 9: Set uk+1 = uk − 1 αk Gαk (uk ) and k = k + 1 ; 10: end while Note that by Proposition 1, the criterion ‖Gαk (uk)‖H ≤ εtol with some tolerance εtol > 0 is a reasonable stopping criterion in the numerical realization of Algorithm 1. 123 On the forward–backward method with nonmonotone… 1271 3 Convergence and complexity analysis In this section, we present a detailed convergence and complexity analysis for Algo- rithm 1 under different assumptions and conditions. 3.1 General case We start by summarizing useful properties of the gradient mapping. Lemma 2 Suppose that A1–A2 hold, then we have the following properties: P1 For every l1 ≥ l2 > 0 and u ∈ int(domF) it holds that 1 l1 ‖Gl1(u)‖H ≤ 1 l2 ‖Gl2(u)‖H . P2 For every l1 ≥ l2 > 0 and u ∈ int(domF) it holds that ‖Gl1(u)‖H ≥ ‖Gl2(u)‖H . P3 Assume in addition that A3 holds, then the gradient mapping is Lipschitz contin- uous, that is ‖Gl(u) − Gl(v)‖H ≤ (2l + LF ′)‖u − v‖H , for every l > 0 and v, u ∈ int(domF). Proof The proof follows using the same arguments, e.g., given in [5, Theorem 10.9, Lemma 10.10] for finite-dimensional problems. �� Furthermore, thewell-known sufficient decrease condition can be formulated for the gradient mapping analogously to the finite-dimensional case presented in [5, Lemma 10.4]. Lemma 3 (Sufficient decrease lemma) Suppose that A1–A3 hold, then for every u ∈ int(domF) and l ∈ ( LF ′ 2 ,∞), we have �(Tl(u)) ≤ �(u) − l − LF ′ 2 l2 ‖Gl(u)‖2H . The previous lemmas allow us to summarize some important properties of Algo- rithm 1. Lemma 4 Suppose that A1–A2 hold. Then for every k ≥ 0, the following statements hold true: (i) For δ ∈ (0, 1 2 ), the nonmonotone linesearch is well-defined. That is, there exists α > 0 such that for every uk ∈ int(domF) and αk ∈ [α,∞) the nonmonotone rule (8) holds and thus, the nonmonotone linesearch terminates after finitely many iterations. Assume, in addition, A3 holds. (i) Then the statement of (i) is true for α := LF ′ 2(1−δ) and every fixed δ ∈ (0, 1). 123 1272 B. Azmi, M. Bernreuther (ii) The step-sizes are uniformly bounded from above. That is, for every k ≥ 1, we have αk ≤ α with α := max{ ηLF ′ 2(1−δ) , αsup}. (iii) It holds ‖Gαk+1(uk+1)‖H ≤ CG‖Gαk (uk)‖H , where CG := 3α+LF ′ αinf . Proof (i) If uk ∈ S∗, then the claim clearly holds true for every α > 0. Thus, we assume that uk /∈ S∗. We suppose also on contrary there does not exist α > 0 for which the claim holds true. Then the nonmonotone linesearch generates a sequence of step-sizes αk,i := αint,kη i with i ∈ N0 satisfying αk,i → ∞ and δ αk,i ‖Gαk,i (uk)‖2H > �(u (k)) − �(Tαk,i (uk)) ≥ �(uk) − �(Tαk,i (uk)). (12) First, we note that Tαk,i (uk) → uk as i → ∞. This follows from the fact that Prox 1 αk,i R(uk) → uk as i → ∞ (see e.g., [26, Theorem 23.47]) and ‖Tαk,i (uk) − uk‖H ≤ ‖Prox 1 αk,i R(uk − 1 αk,i ∇F(uk)) − Prox 1 αk,i R(uk)‖H + ‖Prox 1 αk,i R(uk) − uk‖H ≤ ‖Prox 1 αk,i R(uk) − uk‖H + 1 αk,i ‖∇F(uk)‖H , wherewe have used the firm nonexpansiveness of the proximal operator. Further, using (12) and the mean value theorem for F , we obtain for every i ∈ N0, that δ αk,i ‖Gαk,i (uk)‖2H ≥ �(uk) − �(Tαk,i (uk)) ≥ R(uk) − R(Tαk,i (uk)) + (∇F(uk + ti (Tαk,i (uk) − uk)), uk − Tαk,i (uk))H , where ti ∈ (0, 1) for all i ∈ N0. Using (5) we obtain that δ αk,i ‖Gαk,i (uk)‖2H ≥ 1 2αk,i ‖Gαk,i (uk)‖2H + (∇F ( uk + ti (Tαk,i (uk) − uk) )− ∇F(uk), Tαk,i (uk) − uk ) H ≥ 1 2αk,i ‖Gαk,i (uk)‖2H − 1 αk,i ‖∇F ( uk+ti (Tαk,i (uk) − uk) )−∇F(uk)‖H ‖Gαk,i (uk)‖H . Together with the fact that δ < 1 2 , we obtain ( 12 − δ)‖Gαk,i (uk)‖H ≤ ‖∇F ( uk + ti (Tαk,i (uk) − uk) )− ∇F(uk)‖H . Sending i → ∞ and using the continuity of ∇F , we obtain that lim i→∞ ‖Gαk,i (uk)‖H = 0. Finally, using P2, we can infer that ‖Gαinf (uk)‖H ≤ ‖Gαk,i (uk)‖H and thus ‖Gαinf (uk)‖H = 0. Now Proposition 1 implies uk ∈ S∗. This contradicts our assump- tion, which concludes the proof. 123 On the forward–backward method with nonmonotone… 1273 (ii) For every given αk ≥ α > LF ′ 2 we can invoke Lemma 3 and write �(Tαk (uk)) ≤ �(uk) − αk− LF ′ 2 α2 k ‖Gαk (uk)‖2H ≤ max 0≤ j≤m(k) �(uk− j ) − αk− LF ′ 2 α2 k ‖Gαk (uk)‖2H . Thus, (8) holds since αk satisfies αk− LF ′ 2 α2 k ≥ δ αk by assumption. (iii) To derive α, we consider the following cases: • ik = 0: In this case, due to (11), we have αk = αint,k ≤ αsup. • ik ≥ 1: In this case, (8) holds for αk and uk . Then due to (11), we can write �(Tαk η (uk)) > max 0≤ j≤m(k) �(uk− j ) − δη αk ‖Gαk η (uk)‖2H ≥ �(uk) − δη αk ‖Gαk η (uk)‖2H . (13) Further, by assumingwithout loss of generality that αk η ≥ LF ′ 2 ,we canuseLemma3 with l = αk η and u = uk to obtain �(Tαk η (uk)) ≤ �(uk) − αk η − LF ′ 2 ( αk η )2 ‖Gαk η (uk)‖2H . (14) Combining (13) and (14), we obtain δη αk > αk η − LF ′ 2 ( αk η )2 and as a consequence, αk < ηLF ′ 2(1−δ) . Summarizing the two above cases, we can conclude the second part with α := max{ ηLF ′ 2(1−δ) , αsup}. (iv) Finally for proving the last part, we use Lemma 2 and (iii) to obtain that ‖Gαk+1(uk+1)‖H P2≤ ‖Gα(uk+1)‖H ≤ ‖Gα(uk+1) − Gα(uk)‖H + ‖Gα(uk)‖H P3≤ (2α + LF ′)‖uk+1 − uk‖H +‖Gα(uk)‖H Def. 1≤ (2α+LF ′ ) αk ‖Gαk (uk)‖H +‖Gα(uk)‖H P1≤ (3α+LF ′ ) αk ‖Gαk (uk)‖H ≤ 3α+LF ′ αinf ‖Gαk (uk)‖H . Setting CG := 3α+LF ′ αinf , the proof is complete. �� Before stating the main convergence result of this section, we present a lemma concerning the subsequence {uν(k)}k and well-posedness of Algorithm 1. 123 1274 B. Azmi, M. Bernreuther Lemma 5 Suppose that the sequences {uk}k and {αk}k ⊂ R>0 are generated by Algo- rithm 1. Then the following properties hold: L1 {uν(k)}k is a subsequence of {uk}k with ν(k) − ν(k − 1) ≤ 2mmax + 1 and ν(k) ≤ (mmax + 1)k. Further, for every k ∈ N0, it holds �(uk) ≤ �(u ν( ⌈ k mmax+1 ⌉ ) ). L2 For every k ≥ 1, we have �(uν(k)) ≤ �(uν(k−1)) − δ αν(k)−1 ‖Gν(k)−1(uν(k)−1)‖2H , (15) and in particular the subsequence {�(uν(k))}k is monotonically decreasing. L3 Assume that � is bounded from below, i.e. inf u∈H �(u) > −∞. Then we have ∞∑ k=1 1 αν(k)−1 ‖Gν(k)−1(uν(k)−1)‖2H < ∞ and lim inf k→∞ 1 αk ‖Gαk (uk)‖2H = 0. (16) In particular, if mmax = 0, we have ∞∑ k=0 1 αk ‖Gαk (uk)‖2H < ∞ and lim k→∞ 1 αk ‖Gαk (uk)‖2H = 0. (17) Proof (L1) Using the fact that (k) ≥ k − mmax for every k, we obtain ν(k) = (kmmax + k) ≥ kmmax + k − mmax > (k − 1)mmax + (k − 1) ≥ ((k − 1)mmax + (k − 1)) = ν(k − 1), and thus {uν(k)}k ⊂ {uk}k . Moreover, we have ν(k) = (kmmax + k) ≤ kmmax + k, and ν(k) − ν(k − 1) = (kmmax + k) − ((k − 1)mmax + (k − 1)) ≤ kmmax + k − ((k − 1)mmax + (k − 1) − mmax) = 2mmax + 1. Further, for a given k ∈ N0, we have 0 ≤ ⌈ k mmax+1 ⌉ (mmax +1)−k ≤ mmax, and thus �(uk) ≤ �(u ( ⌈ k mmax+1 ⌉ (mmax+1)) ) = �(u ν( ⌈ k mmax+1 ⌉ ) ). This completes the verification of L1. (L2) Inserting ν(k) − 1 in place of k in (10), we obtain �(uν(k)) ≤ �(u (ν(k)−1)) − δ αν(k)−1 ‖Gαν(k)−1(uν(k)−1)‖2H . (18) 123 On the forward–backward method with nonmonotone… 1275 Further, we can write �(u (k+1)) = max 0≤ j≤m(k+1) �(uk+1− j ) ≤ max 0≤ j≤m(k)+1 �(uk+1− j ) ≤ max [ �(uk+1), max 1≤ j≤m(k)+1 �(uk+1− j ) ] ≤ max [ �(u (k)) − δ αk ‖Gαk (uk)‖2H , �(u (k)) ] ≤ �(u (k)), (19) where we have used m(k + 1) ≤ m(k) + 1. Therefore, {�(u (k))}k is decreasing and we can write �(u (ν(k)−1)) = �(u ( (kmmax+k)−1)) ≤ �(u (kmmax+k−mmax−1)) = �(u ((k−1)mmax+(k−1))) = �(uν(k−1)). (20) Together with (18) we can conclude (15) and, thus, L2 holds. (L3) Assume that Algorithm 1 does not converge after finitely many iterations. Summing (15) up for k = 1, . . . , k′, we obtain k′∑ k=1 δ αν(k)−1 ‖Gαν(k)−1(uν(k)−1)‖2H ≤ k′∑ k=1 �(uν(k−1)) − �(uν(k)) ≤ �(uν(0)) − �(uν(k′)). (21) Sending k′ to infinity and using the fact that� is bounded from below,we can conclude (16). Similarly (17) follows by using the fact that formmax = 0 it holds ν(k) = (k) = k. Thus, using (16), we can conclude the proof. �� Finally we are ready to present our main convergence result of this section. Theorem 6 Suppose that A1–A3 hold and that � is bounded from below. Then, for the sequence {uk}k ⊂ H generated by Algorithm 1 with {αk}k ⊂ R>0, the following statements hold true: (i) Either Algorithm 1 terminates after finitely many iterations with a stationary point of (P) or the sequence {‖Gαk (uk)‖H }k converges to zero, that is lim k→∞ ‖Gαk (uk)‖H = 0. (22) (ii) The following inequality holds true min 0≤i≤k ‖Gαinf (ui )‖H ≤ Cmmax G √ α(mmax+1)(�(u0)−�(uk )) kδ . (23) (iii) If, in addition, A4 holds, every weak sequential accumulation point of {uk}k ⊂ H is a stationary point of (P). 123 1276 B. Azmi, M. Bernreuther Proof (i) Note that if Algorithm1 converges after finitelymany iterations, by definition of the stopping criterion and Proposition 1, a stationary point of (P) has been found. Now assume that Algorithm 1 does not converge after finitely many iterations. Then, using L3, and the fact that αk ≤ α for all k by (iii) from Lemma 4, we arrive at lim k→∞ ‖Gαν(k)−1(uν(k)−1)‖H = 0. (24) It remains now to show that (22) holds. To show this, we will successively use (iv) of Lemma 4. Let k ≥ 0 be arbitrary. Using the fact that 0 ≤ k − ⌊ k mmax+1 ⌋ (mmax + 1) ≤ mmax and ⌊ k mmax+1 ⌋ (mmax + 1) − (⌊ k mmax+1 ⌋ (mmax + 1) ) ≤ mmax, we can write ‖Gαk (uk)‖H ≤ Cmmax G ‖Gα⌊ k mmax+1 ⌋ (mmax+1) (u⌊ k mmax+1 ⌋ (mmax+1) )‖H ≤ C2mmax G ‖Gα (⌊ k mmax+1 ⌋ (mmax+1) )(u (⌊ k mmax+1 ⌋ (mmax+1) ))‖H ≤ C2mmax G ‖Gα ν (⌊ k mmax+1 ⌋)(u ν (⌊ k mmax+1 ⌋))‖H ≤ C2mmax+1 G ‖Gα ν (⌊ k mmax+1 ⌋) −1 (u ν (⌊ k mmax+1 ⌋) −1 )‖H . (25) Finally, sending k to∞ and using (24), we obtain (22) and the proof of (i) is complete. (ii) Due to the facts that �(uk) ≤ �(u ν( ⌈ k mmax+1 ⌉ ) ) (by L1), �(u0) = �(uν(0)), and by successively using (8), we obtain that �(uk) − �(u0) ≤ �(u ν( ⌈ k mmax+1 ⌉ ) ) − �(u0) ≤ �(u ν( ⌈ k mmax+1 ⌉ ) ) − �(u ν( ⌈ k mmax+1 ⌉ −1) ) + �(u ν( ⌈ k mmax+1 ⌉ −1) ) − · · · − �(uν(1)) + �(uν(1)) − �(u0) ≤ ⌈ k mmax+1 ⌉ ∑ i=1 − δ αν(i)−1 ‖Gαν(i)−1(uν(i)−1)‖2H ≤ ⌈ k mmax+1 ⌉ ∑ i=1 − δ α ‖Gαν(i)−1(uν(i)−1)‖2H . (26) 123 On the forward–backward method with nonmonotone… 1277 Thus, using L1, we can infer that kδ α(mmax+1) min 0≤i≤ ⌈ k mmax+1 ⌉ (mmax+1)−1 ‖Gαi (ui )‖2H ≤ ⌈ k mmax+1 ⌉ δ α min 1≤i≤ ⌈ k mmax+1 ⌉ ‖Gαν(i)−1(uν(i)−1)‖2H ≤ ⌈ k mmax+1 ⌉ ∑ i=1 δ α ‖Gαν(i)−1(uν(i)−1)‖2H ≤ �(u0) − �(uk), and this yields min 0≤i≤ ⌈ k mmax+1 ⌉ (mmax+1)−1 ‖Gαi (ui )‖H ≤ √ α(mmax+1)(�(u0)−�(uk )) kδ . (27) Further, using the second part of Lemma 4 and the fact that ⌈ k mmax+1 ⌉ (mmax + 1) − 1 − k < mmax, we can deduce that min 0≤i≤k ‖Gαi (ui )‖H ≤ Cmmax G min 0≤i≤ ⌈ k mmax+1 ⌉ (mmax+1)−1 ‖Gαi (ui )‖H . (28) Finally, (23) follows from (27), (28), P2, and the fact that αk ≥ αinf for all k ≥ 0. (iii) We show that every weak sequential accumulation point of {uk}k ⊂ H is a stationary point of (P). In other words, we suppose that uki ⇀u∗ for a subsequence {uki }i ⊂ {uk}k and u∗ ∈ H . Then we show that u∗ ∈ S∗. For the sake of convenience, we use the same notation for the subsequence as for the sequence itself. To begin, due to (22) in (i), P2, and the fact that αk ≥ αinf , we can conclude that lim k→∞ ‖Gαinf (uk)‖H = 0 and lim k→∞ ‖Tαinf (uk) − uk‖H = 0. (29) Applying (4) for α = αinf and u = uk , we obtain for every k ∈ N0 that Gαinf (uk) − ∇F(uk) ∈ ∂R(Tαinf (uk)). (30) Due to A4, we can infer that uk⇀u∗ implies ∇F(uk) → ∇F(u∗) and, thus, using (29) we have Gαinf (uk) − ∇F(uk) → −∇F(u∗) as k → ∞. Due to (29), we also can conclude that Tαinf (uk)⇀u∗ as k → ∞. Therefore, sending k → ∞ in (30) and using the fact that the graph of ∂R is sequentially closed [26, Proposition 16.26] under the weak topology for domain and the strong topology for codomain, we obtain −∇F(u∗) ∈ ∂R(u∗). This completes the proof. �� Remark 2 In the case that dim(H) < ∞, the statement of (iii) in Theorem 6 holds true for every (strong) accumulation point of {uk}k without requiring A4. 123 1278 B. Azmi, M. Bernreuther In the next theorem, we derive an estimate that reflects the worst-case complexity of the required function and gradient-like evaluations of Algorithm 1 to find an εtol- stationary point. The proof is inspired by the one given in [50, Theorem3.4] for smooth problems. Theorem 7 (Worst-case complexity) Suppose that A1–A3 hold and that � is bounded from below with �̄ := infu∈H �(u) > −∞. Then for a given tolerance εtol > 0, Algorithm 1 requires at most k f max := ⌊ γ f comp(�(u0)−�̄) ε2tol ⌋ (31) function evaluations of � and kg max := ⌊ γ g comp(�(u0)−�̄) ε2tol ⌋ (32) Gradient-like Gα(·) evaluations to find an iterate uk satisfying ‖Gαk (uk)‖H ≤ εtol, where γ f comp := (mmax+1)C2mmax G γdecr and γ g comp := (mmax+1)αC2mmax G δ , with γdecr := min { δ αsup , 2(1−δ)δ n1ηLF ′ } and n1 := ⌊∣∣∣logη ( ηLF ′ 2αinf (1−δ) )∣∣∣⌋ . Proof The proof can be found in Appendix A. 1. �� This finishes our considerations of convergence and complexity in the general setting. 3.2 Convex case In this section, higher-order convergence rates will be shown in two cases of addi- tional structural assumptions. Firstly, we consider the case where F is also convex. Afterwards, the case of a quadratic growth assumption on � will be investigated. We assume the following modified version of Assumption 1. Assumption 2 Assume that A1–A3 hold. Further, instead of A4, assume that A’4: F : H → R is convex. Under Assumption 2, the whole function � is convex. In this case, we can conclude that the set of minimizers of � coincides with S∗ provided that S∗ �= ∅ and that S∗ is closed and convex. Further, we have S∗ = Argmin� = (∂�)−1(0). 123 On the forward–backward method with nonmonotone… 1279 The associated minimal function value is denoted by �∗ ∈ R. Next we prove an auxiliary lemma which will be used later. Lemma 8 Suppose that Assumption 2 holds, S∗ �= ∅, and {uk}k is generated by Algo- rithm 1. Then for every λ ∈ [0, 1], it holds �(uν(k)) − �∗ ≤ (1 − λ) ( �(uν(k−1)) − �∗)+ αλ2 2 dist2(uν(k)−1, S∗) + C̃ αν(k)−1 ‖Gαν(k)−1(uν(k)−1)‖2H , (33) where C̃ := LF ′ 2αinf . Further, we have the following inequality for the initial iterations �(uν(1)) − �∗ ≤ C0 dist 2(u0, S∗), (34) with constant C0 which is independent of u0. Proof The proof can be found in Appendix A. 2. �� Now we are ready to provide the main convergence result for the convex case. For the sake of convenience in the presentation, we set Ek := �(uk) − �∗ for every k ≥ 0, and use this notation in the remainder of this section. Theorem 9 (Global convergence and O(k−1) complexity for the convex case) Sup- pose that Assumption 2 holds, S∗ �= ∅, and {uk}k is generated by Algorithm 1. Then the following statements hold true: (i) Every weak sequential accumulation point of {uk}k belongs to S∗. (ii) {uk}k converges weakly to a minimizer u∗ ∈ S∗ and its “shadow” sequence converges strongly to u∗, that is PS∗uk → u∗. (iii) It holds �(uk) → �∗ as k → ∞ and for large enough k ≥ 0, there exist constants ρ1 > 0 and ρ2 > 0 such that �(uk) − �∗ ≤ ρ1 ρ2 + k . (35) Proof (i) We suppose that a subsequence {uki }i with uki ⇀u∗ is given. We will show that u∗ ∈ S∗. To show this, we prove that there exists a vanishing sequence of sub- gradients {wki }i ⊂ H , i.e. wki → 0, corresponding to {uki }i with wki ∈ ∂�(uki ) for every i ∈ N. Using (4), we define wk+1 := Gαk (uk) + ∇F(uk+1) − ∇F(uk) ∈ ∂�(uk+1). (36) Further, since S∗ �= ∅, � is bounded from below. Hence, we can use (i) of Theorem 6 and (22) holds. This, together with (36) and the boundedness of αk , implies ‖wk+1‖H ≤ ‖Gαk (uk)‖H + ‖∇F(uk+1) − ∇F(uk)‖H ≤ (1 + LF ′ αk )‖Gαk (uk)‖H → 0, 123 1280 B. Azmi, M. Bernreuther as k → ∞. In particular, we can infer that wki → 0 as i → ∞. Using the fact that the graph of ∂� is sequentially closed [26, Proposition 16.26] under the weak topology for domain and the strong topology for codomain together withwki → 0 and uki ⇀u∗, we arrive at 0 ∈ ∂�(u∗) and therefore u∗ ∈ S∗. (ii) We show that uk⇀u∗ with u∗ ∈ S∗. In this matter, we show that {uk}k is a quasi-Fejér sequence with respect to S∗ �= ∅. Using (4), we can write for every k ∈ N0 that 0 ∈ αk(uk+1 − uk) + ∂R(uk+1) + ∇F(uk). (37) Further, since R is convex and ∇F is Lipschitz continuous, by the Haddad-Bailon Theorem [26, Corollary 18.16, p. 270], we have (∇F(v) − ∇F(w), v − w)H ≥ LF ′−1‖∇F(v) − ∇F(w)‖2H . Further, we can write for every v,w, z ∈ int(domF) that (∇F(v) − ∇F(w), z − w)H = (∇F(v) − ∇F(w), v − w)H + (∇F(v) − ∇F(w), z − v)H ≥ LF ′−1‖∇F(v) − ∇F(w)‖2H − ‖∇F(v) − ∇F(w)‖H ‖z − v‖H ≥ − LF ′ 4 ‖z − v‖2H , (38) where in the last line we have used the fact that f (x) := LF ′−1x2 − x‖z − v‖H is strictly convex and attains its global minimum at x∗ = LF ′ 2 ‖z − v‖H . Using (38) for uk+1, uk , and any u∗ ∈ S∗ in place of v, z, and w, respectively, we obtain (∇F(uk) − ∇F(u∗), uk+1 − u∗)H ≥ − LF ′ 4 ‖uk+1 − uk‖2H . Together with the fact that ∂R is monotone, cf. [26], and (2), we can write (η + ∇F(uk), uk+1 − u∗)H ≥ − LF ′ 4 ‖uk+1 − uk‖2H for all η ∈ ∂R(uk+1). (39) Using (37) and (39), we can deduce that (uk+1 − uk, uk+1 − u∗)H ≤ LF ′ 4αk ‖uk+1 − uk‖2H . (40) Further, using (40) and the fact that (w − v,w − z)H = 1 2‖w − v‖2H − 1 2‖v − z‖2H + 1 2‖w − z‖2H for all z, w, v ∈ H , 123 On the forward–backward method with nonmonotone… 1281 we can infer that 1 2‖uk+1 − u∗‖2H − 1 2‖uk − u∗‖2H ≤ ( LF ′ 4αk − 1 2 ) ‖uk+1 − uk‖2H ≤ LF ′ 4α3 inf ‖Gαk (uk)‖2H , (41) where it can be seen from (21) and (25) that ∞∑ k=mmax+1 ‖Gαk (uk)‖2H ≤ (mmax + 1)C4mmax+2 G ∞∑ k=1 ‖Gαν(k)−1(uν(k)−1)‖H < ∞. Therefore, the sequence {uk}k ⊂ H is quasi-Fejér monotone with respect to S∗ and since, due to (i), every weak sequential accumulation point of {uk}k ⊂ H belongs to S∗ �= ∅, we can conclude by [51, Proposition 1(3)] that {uk}k is weakly convergent and it has a unique accumulation point. In addition, since S∗ is closed and convex, we can conclude, due to [52, Proposition 3.6 (iv)], that {PS∗uk}k converges strongly to a point û ∈ S∗. Moreover, since u∗ − PS∗uk → u∗ − û and uk − PS∗uk⇀u∗ − û, it follows from the definition of orthogonal projection that ‖u∗ − û‖2H = limk→∞(u∗ − PS∗uk, uk − PS∗uk)H ≤ 0. Hence, we obtain that u∗ = û. (iii) The proof of this part is inspired by the one in [38, Theorem 3.2.]. First, we show that �(uk) → �∗. Due to (ii), uk⇀u∗ for some u∗ ∈ S∗. As in the proof of (i), there exists a sequence wk → 0 with wk ∈ ∂�(uk). Therefore, we can write �(uk) ≤ �∗ + (wk, uk − u∗)H for every k ∈ N0. (42) Sending k → ∞ in (42) and using the facts that uk⇀u∗ and wk → 0 and the weak sequential lower semicontinuity of �, we can conclude that �∗ ≤ lim inf k→∞ �(uk) ≤ lim sup k→∞ �(uk) ≤ �∗. Hence, �(uk) → �∗. Next, we turn to the verification of (35) for a large enough k. Due to (33) of Lemma 8, for every λ ∈ [0, 1], it holds Eν(k) ≤ (1 − λ)Eν(k−1) + αλ2 2 dist2(uν(k)−1, S∗) + C̃ αν(k)−1 ‖Gαν(k)−1(uν(k)−1)‖2H . (43) Since {uk}k is quasi-Féjer monotone with respect to S∗ �= ∅, due to [52, Proposition 3.6 (ii)], the sequence dist2(uk, S∗) is convergent. Therefore, for a positive constant κ > 0, we have dist2(uν(k)−1, S∗) ≤ κ, for all k ≥ 1. (44) 123 1282 B. Azmi, M. Bernreuther Further, using (43), (44), and L2, we can write Eν(k) ≤ (1 − λ)Eν(k−1) + αλ2κ 2 + C̃ δ ( Eν(k−1) − Eν(k) ) , (45) where the expression on the right-hand side is strictly convex in λ, since d2 dλ2 ( (1 − λ)Eν(k−1) + αλ2κ 2 ) = ακ > 0. Therefore, it possesses the unique minimizer λ = Eν(k−1) ακ . Since {Eν(k)}k → 0, this implies that for large enough k, we can set λ = Eν(k−1) ακ ≤ 1 in (45) and obtain that Eν(k) ≤ Eν(k−1) − E2 ν(k−1) 2ακ + C̃ δ ( Eν(k−1) − Eν(k) ) . Together with the fact that Eν(k) ≤ Eν(k−1), we can write Eν(k) ≤ Eν(k−1) − Eν(k−1)Eν(k) 2ακ + C̃ δ ( Eν(k−1) − Eν(k) ) and, thus, obtain Eν(k) ≤ ( 1 + δ 2ακ(C̃+δ) Eν(k−1) )−1 Eν(k−1). (46) Further, (46) can be expressed as 1 Eν(k) ≥ 1 Eν(k−1) + δ 2ακ(C̃+δ) . Applying this inequality recursively for integers k1 and k2 with k2 ≥ k1 and large enough k1 satisfying Eν(k1) ακ ≤ 1, we obtain 1 Eν(k2) ≥ 1 Eν(k1) + δ(k2−k1) 2ακ(C̃+δ) , and by easy computations, also Eν(k2) ≤ 2ακ(C̃+δ)Eν(k1) 2ακ(C̃+δ)+Eν(k1)δ(k2−k1) . Thus, for k ≥ 0 large enough, we set k2 = ⌈ k mmax+1 ⌉ and obtain Ek ≤ E ν( ⌈ k mmax+1 ⌉ ) ≤ 2ακ(C̃+δ)Eν(k1) 2ακ(C̃+δ)+Eν(k1)δ( ⌈ k mmax+1 ⌉ −k1) ≤ 2ακ(C̃+δ)Eν(k1) −k1Eν(k1)δ+2ακ(C̃+δ)+Eν(k1)δ( k mmax+1 ) . 123 On the forward–backward method with nonmonotone… 1283 Therefore, (35) follows by setting ρ1 := (mmax + 1)δ−12ακ(C̃ + δ) and ρ2 := (mmax + 1)(δEν(k1)) −1 ( 2ακ(C̃ + δ) − k1Eν(k1)δ ) . Thus, the proof is complete. �� Comparing Theorem 9 to Theorem 6, one obtains weak convergence of the whole sequence and convergence of the associated objective function evaluations with rate 1/k. Remark 3 Note that for the case where dim(H) < ∞, due to the equivalence of the weak and strong topologies, it follows from (ii) of Theorem 9 that uk converges to u∗ in the strong topology. Moreover, we will later use the convergence PS∗uk → u∗ of the shadow sequence to derive the strong convergence uk → u∗ under the quadratic growth conditions. This is obviously no longer necessary when dim(H) < ∞. 3.3 Convergence under quadratic growth-type conditions In this section, we turn our attention towards quadratic growth-type conditions and study convergence of Algorithm 1 under these conditions. Definition 2 (Quadratic growth condition) We say that � satisfies the quadratic growth condition, if �(u) − �∗ ≥ γ�,� dist2(u, S∗) for all u ∈ � ∩ dom� (47) holds with a set � ⊂ H , a constant γ�,� > 0, and S∗ �= ∅. We refer to this notion as global if � = H , and as local, if for u∗ ∈ S∗, r ∈ (0,∞], and ω > 0, we have � = Br (u∗) ∩ [�∗ < � + ω]. Additionally, � is said to satisfy the strong quadratic growth condition at u∗ if S∗ = {u∗} on �. That is, �(u) − �(u∗) ≥ γ�,�‖u − u∗‖2H for all u ∈ � ∩ dom�. (48) The quadratic growth condition is a geometrical assumption which describes the flat- ness of the objective function around its minimizers. Roughly speaking, this condition is considered as a relaxation of the strong convexity condition and allows us to obtain faster rates of convergence (linear) and also convergence in the strong topology for the iteration sequence. It is also closely related to the notion of Tikhonov well-posedness [53]. The relationship between the quadratic growth condition and the so-called met- ric subregularity of the subdifferential has been investigated e.g. in [10, 54–57]. The strong quadratic growth condition (48) is said to be the quadratic functional growth property in [16] provided that � is continuously differentiable over a closed convex 123 1284 B. Azmi, M. Bernreuther set. In [15, 58], � is also called 2-conditional on � if it satisfies the quadratic growth condition (47). This property was recently proved in [10, Theorem 5] to be equivalent with the case where � satisfies the Kurdyka-Łojasiewicz inequality with order 1/2. Theorem 10 Suppose that Assumption 2 and the quadratic growth condition (47) hold for � := [� < �∗+ω] with ω > 0. Then, for the sequence of iterates {uk}k generated by Algorithm 1, there exists k̄ ∈ N such that for every k ≥ k̄, we have �(uk) − �∗ ≤ Ccσ k, (49) and dist2(uk, S∗) ≤ Cdσ k, (50) where the constants Cc > 0, Cd > 0, and 0 < σ < 1 are independent of k. Further, there exists u∗ ∈ S∗ such that uk converges in the strong topology to u∗, i.e. uk → u∗, and it holds ‖uk − u∗‖2H ≤ C pσ k, (51) with a constant C p > 0 which is independent of u∗ and k. Proof First, due to L2 and (iii) from Theorem 9, the sequence {�(uν(k))}k is monoton- ically decreasing and converges to �∗. Further, using (20), we can deduce for k ≥ 1 that �(uν(k)−1) ≤ �(u (ν(k)−1)) ≤ �(uν(k−1)). (52) Thus, for given ω > 0, there exists k̄ω ∈ N such that �(uν(k)−1)) ∈ [� < �∗ + ω] for all k ≥ k̄ω. Next, we show that Eν(k) ≤ θEν(k−1) for all k ≥ k̄ω, (53) with a constant θ ∈ (0, 1) independent of k. Let an arbitrary k ≥ k̄ω be given. To show (53), we choose an arbitrary ζ satisfying 0 < ζ < min{ 1 δ , 1 2C̃ , γ�,� 2αC̃ }. Now we consider the following cases: • The inequality 1 αν(k)−1 ‖Gν(k)−1(uν(k)−1)‖2H ≥ ζEν(k−1) holds. In this case, using L2, we obtain Eν(k) ≤ Eν(k−1) − δ αν(k)−1 ‖Gν(k)−1(uν(k)−1)‖2H ≤ (1 − ζ δ)Eν(k−1), 123 On the forward–backward method with nonmonotone… 1285 where due to the choice of ζ we have 1 − ζ δ < 1. • The inequality 1 αν(k)−1 ‖Gν(k)−1(uν(k)−1)‖2H < ζEν(k−1) (54) holds. This case is more delicate. Using the quadratic growth condition (47) and (52), we obtain for k ≥ k̄ω that dist2(uν(k)−1, S∗) ≤ 1 γ�,� Eν(k)−1 ≤ 1 γ�,� Eν(k−1). (55) Now, using Lemma 8, (33), and (55), we can write for every λ ∈ [0, 1] that Eν(k) ≤ ( C̃ζ + 1 − λ + α 2γ�,� λ2 ) Eν(k−1), (56) where it can be easily seen that the minimum of ( C̃ζ + 1 − λ + α 2γ�,� λ2 ) attained at λ∗ := min{1, γ�,� α } is strictly smaller than 1 and therefore (53) holds for k ≥ k̄ω. Let now k ≥ k̄ := k̄ω(mmax + 1) be given. By successively applying (53), we obtain Ek L1≤ E ν( ⌈ k mmax+1 ⌉ ) ≤ θ1−k̄ωθ ⌈ k mmax+1 ⌉ Eν(k̄ω−1) ≤ θ1−k̄ωθ k mmax+1 Eν(k̄ω−1) ≤ θ1−k̄ω ( θ 1 mmax+1 )k Eν(k̄ω−1). (57) Together with the quadratic growth condition (47), we have dist2(uk, S∗) ≤ 1 γ�,� Ek ≤ θ1−k̄ω γ�,� ( θ 1 mmax+1 )k Eν(k̄ω−1). (58) Thus, for every k ≥ k̄, the iterates uk stay in �. Setting Cc := θ1−k̄ωEν(k̄ω−1), Cd := γ −1 �,�θ1−k̄ωEν(k̄ω−1), and σ := θ 1 mmax+1 < 1, we are finished with the verification of (49) and (50). Next, we show that uk → u∗ for uk⇀u∗ with u∗ ∈ S∗ given in Theorem 9 (ii). Due to (50), dist(uk, S∗) → 0 and we can infer that uk − PS∗uk → 0. This together with fact that PS∗uk → u∗ (see (i i) in Theorem 9) leads to uk → u∗. Finally, we show that (51) holds true. To see this, let p ∈ N be arbitrary, using Young’s inequality we have ‖uk − uk+p‖2H ≤ 2 ( ‖uk − PS∗uk‖2H + ‖uk+p − PS∗uk‖2H ) = 2 ( dist2(uk, S∗) + ‖uk+p − PS∗uk‖2H ) . (59) 123 1286 B. Azmi, M. Bernreuther Using (41), we obtain for an arbitrary u∗ ∈ S∗ that ‖uk+p − u∗‖2H ≤ ‖uk − u∗‖2H + LF ′ 2α2 inf k+p−1∑ j=k ‖Gα j (u j )‖2H . In particular, we have for u∗ = PS∗uk that ‖uk+p − PS∗uk‖2H ≤ dist2(uk, S∗) + LF ′ 2α2 inf k+p−1∑ j=k ‖Gα j (u j )‖2H . (60) Further, using L2 and (25), we can write for large enough k that k+p−1∑ j=k ‖Gα j (u j )‖2H ≤ C4mmax+2 G k+p−1∑ j=k ‖Gα ν (⌊ j mmax+1 ⌋) −1 (u ν (⌊ j mmax+1 ⌋) −1 )‖2H ≤ C4mmax+2 G αδ−1 k+p−1∑ j=k δ α ν (⌊ j mmax+1 ⌋) −1 ‖Gα ν (⌊ j mmax+1 ⌋) −1 (u ν (⌊ j mmax+1 ⌋) −1 )‖2H ≤ C4mmax+2 G αδ−1 ⎛ ⎝E ν (⌊ k mmax+1 ⌋ −1 ) − E ν (⌊ k+p mmax+1 ⌋) ⎞ ⎠ . (61) Combining (59), (60), (61), and setting C̄ p := C4mmax+2 G αLF ′ δα2 inf , we arrive at ‖uk − uk+p‖2H ≤ 4 dist2(uk, S∗) + C̄ p ⎛ ⎝E ν (⌊ k mmax+1 ⌋ −1 ) − E ν (⌊ k+p mmax+1 ⌋) ⎞ ⎠ . Sending p → ∞ and using Theorem 9 (iii), (50), and similar computations as in (57), we obtain for every k ≥ k̄ that ‖uk − u∗‖2H ≤ 4 dist2(uk, S∗) + C̄ pE ν (⌊ k mmax+1 ⌋ −1 ) ≤ 4Cdσ k + C̄ pθ −1−k̄ωσ kEν(k̄ω−1) = C pσ k . Thus, (51) holds true with C p := 4Cd + C̄ pθ −1−k̄ωEν(k̄ω−1) and this completes the proof. �� Corollary 11 Suppose that Assumption 2 holds and the quadratic growth condition (47) is satisfied for � := Br (u∗)∩[� < �∗ +ω] with r , ω ∈ (0,∞). Further assume that, {uk}k generated by Algorithm 1, converges strongly to some u∗ ∈ S∗. Then there 123 On the forward–backward method with nonmonotone… 1287 exists k̄ ∈ N such that (49)–(51) hold with constants Cc > 0, Cd > 0, C p > 0, and 0 < σ < 1, which are independent of k and u∗. Proof The proof proceeds along the lines of the proof of Theorem 10 with the differ- ence that one also needs to be sure that {uk}k ⊂ Br (u∗) for a large enough k̄ ∈ N. This follows from the fact that uk → u∗. �� Remark 4 Due to the equivalence of weak and strong convergence in finite- dimensional spaces, the assumption of Corollary 11 automatically holds if dim(H) < ∞. For the case that dim(H) = ∞, it is not clear how to guarantee that {uk}k ⊂ Br (u∗) for large enough k ∈ N. In many cases of PDE-constrained optimization, the assumptions of the following corollary are very likely satisfied. Corollary 12 Suppose that A1–A3 in Assumption 1 hold and that F is convex on Br (u∗) with u∗ ∈ S∗ and r ∈ (0,∞). Further, we assume that the strong quadratic growth condition (48) holds for � := Br (u∗)∩[� < �∗ +ω] with ω ∈ (0,∞). Then, the sequence of iterations {uk}k generated by Algorithm 1 converges locally R-linear with respect to the strong topology. In other words, there exists a radius r0 ≤ r such that for every u0 ∈ Br0(u ∗) ∩ [� < �∗ + ω] and k ≥ 1, we have ‖uk − u∗‖2H ≤ CRσ k‖u0 − u∗‖2H (62) and �(uk) − �(u∗) ≤ σ k (�(u0) − �∗) , (63) where the constants CR > 0 and 0 < σ < 1 are independent of u0 and k. Proof By similar argument as in the proof of Theorem 10, we can show that for any uν(k)−1 ∈ Br (u∗) ∩ [� < �∗ + ω] with k ≥ 1, it holds Eν(k) ≤ θEν(k−1), (64) with θ ∈ (0, 1) independent of k. Let now k ≥ 1 be given.Assuming ui ∈ Br (u∗)∩[� < �∗+ω] for i = 0, . . . , k−1 and successively applying (64), we obtain Ek L2≤ E ν( ⌈ k mmax+1 ⌉ ) ≤ θ−1θ ⌈ k mmax+1 ⌉ Eν(1) ≤ θ−1θ k mmax+1 Eν(1) ≤ θ−1 ( θ 1 mmax+1 )k Eν(1). (65) 123 1288 B. Azmi, M. Bernreuther Together with the quadratic growth condition (48) and (34) from Lemma 8, we have ‖uk − u∗‖2H ≤ 1 γ�,� Ek ≤ 1 θγ�,� ( θ 1 mmax+1 )k Eν(1) ≤ C0 θγ�,� ( θ 1 mmax+1 )k ‖u0 − u∗‖2H . (66) Thus, for every u0 ∈ Br0(u ∗) with r0 ≤ ( θγ�,� C0 ) 1 2 r , iterates uk stay in Br (u∗) for k ≥ 0 and, as a consequence, the inequalities (65) and (66) are well-defined. By setting CR := C0 θγ�,� and σ := θ 1 mmax+1 < 1, we are finished with the verification of (62). As in (65), we can also write for every k ≥ 0 and u0 ∈ Br0(u ∗) ∩ [� < �∗ + ω] that Ek ≤ E ν( ⌈ k mmax+1 ⌉ ) ≤ θ ⌈ k mmax+1 ⌉ Eν(0) ≤ θ k mmax+1 Eν(0) ≤ σ kEν(0). Thus (63) holds also true and this completes the proof. �� Compared to Corollary 11, we do not need to assume the strong convergence of the sequence {uk}k in Corollary 12 and the following corollary. Corollary 13 Suppose that A1–A3 in Assumption 1 hold and F is locally strongly convex with a constant κ > 0 on Br (u∗) with some r ∈ (0,∞) and u∗ ∈ S∗. Then, the sequence of iterates {uk}k generated by Algorithm 1 converges locally R-linear in the strong topology to u∗. In other words, there exists r0 ≤ r such that (62) and (63) hold for every u0 ∈ Br0(u ∗). Proof Since � is locally strongly convex, using [59, Proposition 3.23], we can write �(u) − �(v) ≥ (w, u − v)H + κ 2‖u − v‖2H for all u, v ∈ Br (u ∗) and w ∈ ∂�(v). (67) Setting v = u∗ and w = 0, we can easily see that the strong quadratic growth property (48) holds for � := Br (u∗) and γ�,� := κ 2 . Therefore, the proof follows by Corollary 12. �� Remark 5 Note that, if the strong convexity of F in Corollary 13 holds globally, then R-linear convergence is also obtained globally. More precisely, (62) and (63) hold for every u0 ∈ H . 3.4 On relaxing the global Lipschitz continuity of∇F for problems governed by PDEs In this section,we analyze a list of assumptions satisfied for a large class of optimization problems governed by PDEs. We will study the applicability of these assumptions in Sect. 4. 123 On the forward–backward method with nonmonotone… 1289 Assumption 3 For problem (P), H1: � : H → R ∪ {+∞} is bounded from below and radially unbounded, i.e. lim‖u‖H →∞ �(u) = ∞. H2: R : H → R ∪ {+∞} is proper, convex, and lower semicontinuous. H3: F : H → R ∪ {+∞} is continuously Fréchet differentiable on int(domF) containing domR, that is, domR ⊆ int(domF). H4: ∇F : int(domF) → H is LF ′-Lipschitz continuous on every weakly sequen- tially compact subset of domR. H5: F is weakly sequentially lower semicontinuous (wlsc) and∇F : int(domF) → H is weak-to-strong sequentially continuous. Here, compared to Assumption 1, we do not impose the global Lipschitz condition on ∇F . Nevertheless, Assumption 3 will be sufficient to reproduce the results of Sect. 3.1. Furthermore, H1, H2, and H5 impose a set of conditions that ensure the existence of a global minimizer to (P), as will be shown in the following. Proposition 14 Suppose that H1, H2, and H5 hold. Then, problem (P) possesses a global minimizer ū ∈ H, and as a consequence, S∗ �= ∅. Further, every level set of � is weakly sequentially compact. Proof The proof uses standard arguments based on the direct methods in the calculus of variations. By H1, there exists �̄ := inf u∈H �(u), and a minimizing sequence {un}n ⊂ H with �(un) → �̄ for n → ∞. Due to H1 and [26, Proposition 11.11]), every level set of � is bounded and, thus, {un}n admits a weakly convergent subsequence {unk }k with unk ⇀ū ∈ H on [� ≤ �(ũ)] for some ũ ∈ domR. Properties H2 and H5 imply that � is wlsc, and thus, �(ū) ≤ lim infk→∞ �(unk ) = �̄. This shows that�(ū) = �̄, i.e. ū ∈ H is a globalminimizer of (P). Hence S∗ �= ∅. Further, since � is wlsc, every level set of � is weakly sequentially closed. Together, with the boundedness of the level sets, we conclude that every level set of � is weakly sequentially compact. �� Comparing the properties given in Assumptions 1 and Assumption 3 in detail, we realize that besides the global Lipschitz continuity of ∇F , the remaining properties of Assumption 1 follow from those of Assumption 3. Further, due to Lemma 4(i) and Lemma 5, Algorithm 1 is well-defined, even without the Lipschitz continuity of ∇F . Further, it can be seen from (10) and Definition 1, that for every u0 ∈ domR, the whole sequence {uk}k generated by Algorithm 1 stays in U0 := [� ≤ �(u0)]. Since U0 is weakly sequentially compact and U0 ⊂ domR, using H4 we can deduce that ∇F is LF ′-Lipschitz continuous on U0 with {uk}k ⊂ U0. Therefore all the results and estimates in Sect. 3.1 can be easily applied in the presence of Assumption 3. Further, similar observations are also valid for the results given in Sect. 3.2, provided that we replace H5 by A’4 and analogously also for Sect. 3.3. 123 1290 B. Azmi, M. Bernreuther 4 Application to PDE-constrained optimization In this section,we investigate the applicability of our theoretical results to twoproblems governed by semilinear elliptic and parabolic PDEs. For each example, we discuss the justification of Assumption 3. 4.1 Elliptic problem As a first example, we consider the following semilinear elliptic problem min (y,u) 1 2 ‖y − yd‖2L2(�) + σ 2 ‖u‖2L2(�) + λ‖u‖L1(�) (PE) subject to ⎧⎪⎨ ⎪⎩ −κ�y + exp(y) = u in �, y = 0 on ∂�, ua ≤ u ≤ ub a.e. in �, (68) on a bounded domain � ⊂ R n with n = 2, 3, which is either convex or possesses a C1,1-boundary ∂�. Here the parameter κ > 0 stands for the diffusion, the parameters σ, λ > 0 weigh the cost terms, ua, ub ∈ R with ua < 0 < ub are control bounds, and yd ∈ L2(�) denotes the desired state. First, we show that problem (PE) can be rewritten in the form (P). It is well-known, cf. [60, Theorems 2.7 and 2.12], that for given u ∈ H := L2(�), the state equation (68) is uniquely solvable in the weak sense, i.e., y(u) ∈ W ∩C(�)with W := H1 0 (�), and the control-to-state operator u �→ y(u) is well-defined and twice continuously Fréchet differentiable from H to W ∩ C(�). This operator is typically constructed by applying the implicit function theorem on the following equality constraints E(y, u) = 0 in Z ′ := H−1(�) with E(y, u) := −κ�y + exp(y) − u. Here, E : W × H → Z ′ is twice continuously Fréchet differentiable, for every u ∈ H , Ey(y, u) ∈ L(W , Z ′) has a bounded inverse, and Eu(y, u) ∈ L(H , Z ′) is continuous. Then, by defining F(u) := 1 2 ‖y(u) − yd‖2L2(�) , R(u) := σ 2 ‖u‖2L2(�) + λ‖u‖L1(�) + δU (u), (69) with the indicator function δU of U := {u : ua ≤ u ≤ ub a.e. in �} ⊂ H , problem (PE) has the form (P). Further, by standard computations as in [61, Chapter 1.6], it can be shown that ∇F(u) = −p(u) in H , (70) 123 On the forward–backward method with nonmonotone… 1291 where p = p(u) ∈ W (see e.g., [60, Theorem 3.2]) is the weak solution of the adjoint equation { −κ�p + exp(y)p = −(y − yd) in �, p = 0 on ∂�, (71) with y = y(u). Now it remains to verify Assumption 3 for the reduced problem (P) with (69). Properties H1 and H2 are clearly satisfied. H3 follows using the chain rule and the continuous Fréchet differentiability of the control-to-state mapping y(u). Note that ∇F(u) = (y′(u))∗J ′(y(u)) with J (y) := 1 2 ‖y − yd‖2L2(�) , (72) where y′(u)h = −E−1 y (y(u), u)Eu(y(u), u)h for every h ∈ H and the superscript “*” denotes the adjoint operator. It remains to verify H4 and H5. Since the control-to- state operator u → y(u) and J are twice continuously Fréchet differentiable, using (72), and the compact embedding W c ↪−→ H , one obtains that the second Fréchet derivative of F is bounded on bounded sets. Therefore ∇F is Lipschitz continuous on any bounded set in H . Thus, H4 holds. Finally using the weak formulation of the state (68) and adjoint (71) equations and [60, Theorem 2.11], it can be shown that y(uk)⇀y(ū) and p(uk)⇀p(ū) in W as uk⇀ū in H . This together with W c ↪−→ H , (70), and (69), implies thatF(uk) → F(ū) and ∇F(uk) → ∇F(ū) in W as uk⇀ū in H . Hence, H5 holds. This finishes the verification of Assumption 3. Note that using (70) and (3), the Fermat principle can be expressed as u∗ = Prox 1 α R(u∗ + 1 α p(u∗)) for some α > 0, where p(u∗) is the solution of (71) for u = u∗ and Prox 1 α R can be characterized in a pointwise a.e. sense. Due to [26, Proposition 24.13], we have for any α > 0 [ Prox 1 α R(u) ] (x) = Prox 1 α R(u(x)) for almost all x ∈ �, where a pointwise a.e. closed form representation of the proximal operator is given by [ Prox 1 α R(u) ] (x) = min{max{ ⎧⎪⎨ ⎪⎩ 1 C1 (u(x) − C2), u(x) > C2, 1 C1 (u(x) + C2), u(x) < −C2, 0, otherwise, , ua}, ub}, withC1 := 1+σ/α andC2 := λ/α. The calculations are made similarly to [4, Section 6]. 123 1292 B. Azmi, M. Bernreuther 4.2 Parabolic problem As a second example, we will consider the following semilinear parabolic problem min (y,u) 1 2 ‖y − yd‖2L2(0,T ;L2(�)) + λ‖u‖L1(0,T ;L1(�)) (PP) subject to ⎧⎪⎪⎪⎨ ⎪⎪⎪⎩ ẏ − κ�y + y3 = u in (0, T ) × �, y = 0 on (0, T ) × ∂�, y(0) = y0 in �, ua ≤ u ≤ ub a.e. in �, (73) where � ⊂ R n with n = 2, 3 is a bounded domain with Lipschitz boundary ∂� and T > 0. Further, κ > 0, λ ≥ 0, and the control bounds ua, ub ∈ R are defined as in problem (PE). We also consider the desired state yd ∈ L2(0, T ; L2(�)) and initial function y0 ∈ H1 0 (�). Similarly to the problem (PE), we define the control-to-state operator u �→ y(u) from H := L2(0, T ; L2(�)) to W := L2(0, T ; H2(�)∩ H1 0 (�))∩ H1(0, T ; L2(�)). The well-posedness and twice continuous Fréchet differentiability of the control-to- state operator can be established by similar arguments as given in [1, 62]. Compared to the elliptic problem (PE), we define F(u) := 1 2 ‖y(u) − yd‖2L2(0,T ;L2(�)) , R(u) := λ‖u‖L1(0,T ;L1(�)) + δU (u), (74) with U := {u : ua ≤ u ≤ ub a.e. in (0, T ) × �} ⊂ H and also consider E(y, u) := ( ẏ − κ�y + y3 − u y(0) ) with Z ′ := L2(0, T ; L2(�)) × H1 0 (�). In this case, (70) holds with p = p(u) ∈ W as the strong solution of the adjoint equation ⎧⎪⎨ ⎪⎩ − ṗ − κ�p + 3y2 p = −(y − yd) in (0, T ) × �, p = 0 on ∂�, p(T ) = 0 in �, (75) with y = y(u). The verification of Assumption 3 for (PP) follows by the similar arguments given for (PE) and using the compact embedding W c ↪−→ L2(0, T ; H1 0 (�)) and consequently W c ↪−→ H . Moreover, the closed form pointwise a.e. representation 123 On the forward–backward method with nonmonotone… 1293 of the proximal operator is given by Prox 1 α R(u)(t, x) = min{max{ ⎧⎪⎨ ⎪⎩ u(t, x) − C2, u(t, x) > C2, u(t, x) + C2, u(t, x) < −C2, 0, otherwise, , ua}, ub}, where C2 = λ/α. 4.3 Discussion on the quadratic growth condition In the remainder of the section, we present a situation in which F is locally strongly convex and, as a consequence,Algorithm1 converges locally R-linear byCorollary 13. We consider the case that for u∗ ∈ S∗, F is twice continuously Fréchet differentiable and its second derivative satisfies F ′′(u∗)(h, h) ≥ C‖h‖2H for all h ∈ H , (76) with some C > 0. For the two previous problems, we express F ′′(u∗) in terms of solutions to PDEs and discuss when F is locally strongly convex. For both problems (PE) and (PP), the control-to-state operator u �→ y(u) from H to W is twice continuously Fréchet differentiable, and by similar computation as in [63], its second derivative can be expressed as y′′(u)(h, q) = −E−1 y (y(u), u)Eyy(y(u), u)(y′(u)h, y′(u)q) for all h, q ∈ H , (77) since the control operator is linear. To be able to use Corollary 13 for problem (PE), we first redefine F and R as follows F(u) := 1 2 ‖y(u) − yd‖2L2(�) + σ 2 ‖u‖2L2(�) , R(u) := λ‖u‖L1(�) + δU (u). In this case, with the same arguments given in Sect. 4.1, (PE) can be rewritten as (P) for which H1–H4 hold. Further, using the chain rule and (77), we can write for every h, q ∈ H that F ′′(u)(h, q) = 〈J ′′(y(u))y′(u)h, y′(u)q〉W ′,W + 〈−E−∗ y (y(u), u)J ′(y(u)), Eyy(y(u), u)(y′(u)h, y′(u)q)〉Z ,Z ′ + σ(h, q)H , where 〈·, ·〉 stands for the dual pairing and J was defined in (72). Hence,F ′′(u∗)with u∗ ∈ S∗ can be expressed as F ′′(u∗)(h, h) = ‖yh‖2L2(�) + ∫ � p∗ exp(y∗)(yh)2dx + σ‖h‖2H , (78) 123 1294 B. Azmi, M. Bernreuther where y∗ = y(u∗), p∗ = p(u∗), and yh ∈ W is the weak solution of the linearized state equation { −κ�yh + exp(y∗)yh = h in �, yh = 0 on ∂�. Note that, F is locally strongly convex on a neighborhood of u∗ ∈ S∗ provided that (76) holds for some C > 0. Further, using (78), (76) can be equivalently rewritten as σ‖h‖2L2(�) + ‖yh‖2L2(�) + ∫ � p∗ exp(y∗)(yh)2dx ≥ C‖h‖2L2(�) for all h ∈ H . (79) We observe that the only term in (79) that can spoil the positive definiteness ofF ′′(u∗) and, hence, local strong convexity ofF , is the term involving p∗. This term originates from the nonlinearity in the state equation. Note that either a small enough adjoint p∗ or a large enough parameter σ ensure that (79) and equivalently (76) hold. A small enough adjoint can occur, for instance, if ‖y∗ − yd‖2 L2(�) is sufficiently small. Similarly, for the second example (PP), we obtain for F and R defined in (74) that F ′′(u∗)(h, h) = ‖yh‖2L2(0,T ;L2(�)) + T∫ 0 ∫ � 6p∗y∗(yh)2dxdt, (80) where yh ∈ W is the weak solution to the linearized state equation ⎧⎪⎨ ⎪⎩ ẏh − κ�yh + 3(y∗)2yh = h in (0, T ) × �, yh = 0 on (0, T ) × ∂�, yh(T ) = 0 in �, and p∗ solves the weak formulation of (75) for y∗. For this problem, due to the absence of the term σ 2 ‖·‖2H in the objective function, the local strong convexity is not clear. For the elliptic problem (PE), the authors in [64] have studied a weaker condition which implies that the quadratic growth condition (48) is satisfied. Furthermore from [65, Section 3], it especially follows that for σ = 0 condition (79) cannot be satisfied for the elliptic problem (PE). 5 Numerical experiments In this section, we report on the numerical experiments for the problems (PE) and (PP) in order to verify the capabilities of Algorithm 1 numerically. Throughout, we use ‖Gαk (uk)‖H ≤ εtol with some tolerance εtol > 0 as the termination condition, as it is proposed in Sect. 2. Our codes have been implemented in Python 3 and use FEniCS 123 On the forward–backward method with nonmonotone… 1295 Table 1 Example 1: Parameter setting Optimization problem Algorithm 1 � κ σ λ yd ua ub εtol αinf αsup α0 η δ mmax (0, 1)2 10−2 10−4 10−3 See Fig. 1a −3 2 10−6 10−4 102 10 8 0.9 8 Fig. 1 Example 1: The desired state yd , the optimal state, and the optimal control (left to right) (see [66]) for the matrix assembly. Sparse memory management and computations have been implemented with SciPy (see [67]). All computations below have been run on an Ubuntu 22.04 notebook with 32 GB main memory and an Intel Core i7-8565U CPU. We will also compare different step-size approaches for the iterations (1) with respect to gradient-like evaluations and function evaluations as introduced in The- orem 7. We consider a fixed step-size, different combination of BB-type step-sizes presented in Sect. 2 (without linesearch method), and BB-type step-sizes incorporated with the (non-)monotone linesearch approach. Note that for problems governed by nonlinear PDEs such as (PE) and (PP), any gradient-like Gα evaluation requires solving a nonlinear state equation and a linear adjoint equation and any function � evaluation is involved with solving a nonlinear state equation. Furthermore, the number of gradient-like evaluations corresponds to the number of iterations k of Algorithm 1. Example 1 (Elliptic problem) In this example, we consider problem (PE). For the spatial discretization, we follow a discretize-before-optimize approach and use P1- type finite elements on a Friedrichs-Keller triangulation of the spatial domain �. To efficiently evaluate the nonlinearity, we resort to mass lumping. For the numerical tests, we choose the parameters summarized in Table 1. Note that for the fixed step- size approach, we set αk = α0 in (1) for all k ∈ N0. The desired state, the optimal state, and the optimal control are illustrated in Fig. 1. We can see that the bounds are active for the control, though no strong sparsity is promoted, due to the choices of λ and σ . We compare the different BB-type step-sizes presented in Sect. 2with the fixed step- size approach and with a (non-)monotone linesearch approach. The results regarding computational time, function evaluations, and gradient-like evaluations are gathered 123 1296 B. Azmi, M. Bernreuther Fig. 2 Example 1: Convergence of Algorithm 1. “Error” refers to ‖Gαk (uk )‖H at the current iterate Fig. 3 Example 1: Example of non-convergence without linesearch. “Error” refers to ‖Gαk (uk )‖H at the current iterate in Table 2. For the linesearchmethods, the most volatile of the novel BB-type step-size updates, i.e. BB1b, is used as the initial trial step-sizes within Algorithm 1. As can be seen from Table 2, for this example, all other approaches outperform the one with fixed step-size by a huge margin of about two orders of magnitude. The alternatingBB-methods appear to bemore efficient compared to the single BB-updates for both the old and novel step-sizes. Furthermore, except in the case of BB1, the novel step-sizes outperform the old ones and, overall, they appear to be competitive. All of these considerations are valid for both computational time and gradient-like evaluations. Function evaluations are only needed if a linesearch method is used. Compared to the BB1b method, the nonmonotone linesearch method needs about 250 fewer gradient-like evaluations, but this comes at the cost of 887 additional function evaluations for the linesearch and this results in an increased overall computational time. Compared to the monotone (mmax = 0) linesearch, the nonmonotone approach performs significantly better. The convergence behavior is also visualized in Fig. 2. Figure 3 presents an example, where a BB-type step-size update without linesearch fails to converge. In this example, we start the algorithms with α0 = 1 instead of α0 = 10. This confirms the necessity of incorporating a linesearch strategy to ensure convergence. Example 2 (Parabolic problem) In this example, we consider (PP). Here, the spatial domain is discretized in the samemanner as in Example 1. For the temporal discretiza- tion, we use the Crank Nicolson/Adams Bashforth scheme [68]. In this scheme, the 123 On the forward–backward method with nonmonotone… 1297 Ta bl e 2 E xa m pl e 1: N um er ic al re su lts fo r fix ed st ep -s iz e, di ff er en ts pe ct ra lg ra di en tm et ho ds as in tr od uc ed in Se ct .2 ,a nd a (n on -) m on ot on e lin es ea rc h m et ho d w ith re sp ec t to B B 1b Fi xe d B B 1a B B 2a A B B a B B 1b B B 2b A B B b no nm on .L S (B B 1b ) m on .L S (B B 1b ) gr ad .- lik e ev al . 15 3, 66 2 61 8 10 46 57 1 94 1 60 8 38 3 69 7 99 1 fu n. ev al . 0 0 0 0 0 0 0 88 7 15 27 tim e [s ] 1. 56 ·1 03 1. 01 ·1 01 1. 36 ·1 01 1. 55 ·1 01 8. 39 ·1 00 7. 92 ·1 00 5. 65 ·1 00 2. 21 ·1 01 3. 30 ·1 01 123 1298 B. Azmi, M. Bernreuther Table 3 Example 2: Parameter setting Optimization problem Algorithm 1 � T κ λ yd ua ub εtol αinf αsup α0 η δ mmax (0, 1)2 1 10−2 10−2 See Fig. 4 −100 100 10−6 10−4 102 10 4 0.8 4 Fig. 4 Example 2: Snapshots of the desired state yd at time instances 0, 0.25, 0.5 and 0.75 Fig. 5 Example 2: Snapshots of the optimal state at time instances 0, 0.25, 0.5 and 0.75 Fig. 6 Example 2: Snapshots of the optimal control at time instances 0, 0.25, 0.5 and 0.75 implicit Crank Nicolson scheme is used except for the nonlinear terms which are treated using the explicit Adams Bashforth scheme and mass lumping. For the numer- ical tests, we choose the parameters summarized in Table 3. For the fixed step-size approach, we choose αk = α0 for k ∈ N0, similarly to the previous example. The desired state, the optimal state, and the optimal control at time instances t = 0, 0.25, 0.5, 0.75 are depicted in Figs. 4, 5, and 6, respectively. We can see sparsity in space for the control. Note that sparsity in time can also be observed in the sense that the control stays zero on an interval between time instances t = 0.25 and t = 0.75. Similarly to Example 1, we compare the different BB-type step-sizes presented in Sect. 2 with the fixed step-size approach and with a (non-)monotone linesearch 123 On the forward–backward method with nonmonotone… 1299 Ta bl e 4 E xa m pl e 2: N um er ic al re su lts fo r fix ed st ep -s iz e, di ff er en ts pe ct ra lg ra di en tm et ho ds as in tr od uc ed in Se ct .2 ,a nd a (n on -) m on ot on e lin es ea rc h m et ho d w ith re sp ec t to B B 1b Fi xe d B B 1a B B 2a A B B a B B 1b B B 2b A B B b no nm on .L S (B B 1b ) m on .L S (B B 1b ) gr ad .- lik e ev al . 51 5, 04 4 37 5 75 8 78 4 47 0 44 5 22 1 46 3 47 1 fu n. ev al . 0 0 0 0 0 0 0 63 9 71 3 tim e [s ] 5. 66 ·1 04 4. 28 ·1 01 8. 70 ·1 01 8. 89 ·1 01 5. 35 ·1 01 5. 08 ·1 01 2. 50 ·1 01 8. 65 ·1 01 9. 57 ·1 01 123 1300 B. Azmi, M. Bernreuther Fig. 7 Example 2: convergence of Algorithm 1. “Error” refers to ‖Gαk (uk )‖H at the current iterate approach, and the results regarding computational time, function evaluations, and gradient-like evaluations are presented in Table 4. To reveal the efficiency of the linesearch strategy, we incorporate the least efficient BB-type step-size, namely BB1b, with the linesearch strategy. All considerations and observations fromExample 1 hold also true for this example. That is, the iterations (1) for the choice of the BB-type step-sizes significantly outper- form those with fixed step-size, as is to be expected. The novel alternating BB-method outperforms the other approaches. The nonmonotone linesearch method outperforms the monotone version. It also outperforms the BB1b version concerning gradient-like evaluations, but the cost of additional 639 function evaluations again does not pay off for overall computational time. The convergence behavior is also visualized in Fig. 7. To summarize, all the numerical results from the above examples show the capa- bilities of Algorithm 1 and the necessity for the use of a linesearch method. From the above example, we can conclude that the incorporation of nonmonotone linesearch and BB step-sizes leads to an efficient algorithm for a class of nonsmooth nonconvex PDE-constrained optimization problems. Moreover, the convergence behavior is far better than what our worst-case complexity results suggest. In particular, when com- bined with the BB methods, this is typical behavior, see e.g., [34, 35, 40]. Whether the strong quadratic growth condition helps to accelerate convergence in the elliptic example (PE) towards the end is not entirely clear, although the orange curves in Figs. 2c and 3 suggest it. Conclusion We have studied the nonmonotone FBS method for a class of nonsmooth infinite- dimensional composite problems in Hilbert spaces. Starting with the general noncon- vex setting, we have established global convergence with complexity (1/ √ k) and also provided a worst-case complexity analysis. Under additional convexity assumption, convergence has been improved to the sublinear of order (1/k) in function values. If additionally, a quadratic growth-type condition is satisfied, we have shown R-linear convergence, both in function values and iterates. Additional difficulties arising in the transition from weak to strong convergence in the infinite-dimensional setting have been discussed in detail. Finally, the nonmonotone FBS and the novel BB step-size 123 On the forward–backward method with nonmonotone… 1301 rules exploiting the nonsmooth part were successfully tested for elliptic and parabolic PDE-constrained problems. Appendix A Proofs A. 1 Proof of Theorem 7 Here, we restrict ourselves mainly to deriving (31). Justification of (32) follows by similar arguments. The proof is carried out by determining the minimum reduction of the objective function between iteration step uk+1 and its maximal predecessor, i.e. u (k), with respect to (11) as long as Algorithm 1 has not been terminated. Note that u (k) can be calculated without further evaluations of the objective function by storing previous iterates. When using Algorithm 1, we have the following two cases: • Assume that (8) in Step 4 of Algorithm 1 holds for αk := αint,k and therefore ik = 0. Then we have a decrease �(u (k)) − �(uk+1) ≥ δ αint,k ‖Gαk (uk)‖2H ≥ δ αsup ‖Gαk (uk)‖2H . In this case, only one additional function evaluation is necessary to obtain this decrease. • Assume that (8) in Step 4 of Algorithm 1 fails to hold for αint,k and therefore ik ≥ 1 backtracking steps are performed. Due to (iii) of Lemma 4 we have αk = αint,kη ik < ηLF ′ 2(1−δ) , and, as a consequence, we can bound ik as follows ik <∣∣∣logη ( ηLF ′ 2αint,k (1−δ) )∣∣∣ . Hence, Step 4 requires at most n1 := ⌊∣∣∣logη ( ηLF ′ 2αinf (1−δ) )∣∣∣⌋ function evaluations. At the time that the linesearch strategy terminates in this step, we have the decrease �(u (k)) − �(uk+1) ≥ δ αk ‖Gαk (uk)‖2H > 2(1−δ)δ ηLF ′ ‖Gαk (uk)‖2H . Gathering the two above cases, we can see that function value decrease per function evaluation is given, in the worst case, by �(u (k)) − �(uk+1) ≥ γdecr‖Gαk (uk)‖2H , 123 1302 B. Azmi, M. Bernreuther with γdecr := min { δ αsup , 2(1−δ)δ n1ηLF ′ } . Further, using similar arguments as in the proof of Theorem 6, we obtain �(u0) − �̄ ≥ �(u0) − �(uk) ≥ �(uν(0)) − �(u ν( ⌈ k mmax+1 ⌉ ) ) = ⌈ k mmax+1 ⌉ ∑ i=1 �(uν(i−1)) − �(uν(i)) ≥ ⌈ k mmax+1 ⌉ ∑ i=1 γdecr‖Gαν(i)−1(uν(i)−1))‖2H ≥ γdecr ⌈ k mmax+1 ⌉ min 1≤i≤ ⌈ k mmax+1 ⌉ ‖Gαν(i)−1(uν(i)−1)‖2H ≥ ( γdecrk mmax+1 ) min 0≤i≤ ⌈ k mmax+1 ⌉ (mmax+1)−1 ‖Gαi (ui )‖2H ≥ ( γdecrk C2mmax G (mmax+1) ) min 0≤i≤k ‖Gαi (ui )‖2H . (A1) To find an εtol-stationary point, we assume that up to the current iteration k Algorithm 1 has not been terminated, i.e. ‖Gαi (ui )‖H > ε for all i = 0, . . . , k. In this case, using (A1), we can write �(u0) − �̄ > ( γdecrk C2mmax G (mmax+1) ) ε2tol, and, therefore, the total number of function evaluations of � in Algorithm 1 is bounded from above by k f max ≤ ⌊ C2mmax G (mmax+1)(�(u0)−�̄) γdecrε 2 tol ⌋ = ⌊ γ f comp(�(u0)−�̄) ε2tol ⌋ . Thus, we are finished with the verification of (31). Similarly, using (23), it can be easily shown that the total number of Gαk (uk) evaluations is bounded by kg max ≤ ⌊ C2mmax G α(mmax+1)(�(u0)−�̄) δε2tol ⌋ = ⌊ γ g comp(�(u0)−�̄) ε2tol ⌋ , and, thus, the proof is complete. A. 2 Proof of Lemma 8 First, we show for every k ≥ 1 that �(uk) ≤ min w∈H Qαk−1(w, uk−1) + LF ′ 2αinfαk−1 ‖Gαk−1(uk−1)‖2H . (A2) Due to A3, the descent lemma, see [59, Lemma 1.30], implies F(uk) ≤ F(uk−1) + (∇F(uk), uk − uk−1)H + LF ′ 2α2 k−1 ‖Gαk−1(uk−1)‖2H . 123 On the forward–backward method with nonmonotone… 1303 Using the definition of Qαk−1 and the fact that uk is the minimizer of Qαk−1(·, uk−1), we obtain �(uk) ≤ min w∈H Qαk−1(w, uk−1) + ( LF ′ 2α2 k−1 − 1 2αk−1 ) ‖Gαk−1(uk−1)‖2H ≤ min w∈H Qαk−1(w, uk−1) + LF ′ 2α2 k−1 ‖Gαk−1(uk−1)‖2H , and, thus, (A2) follows from the fact that αk ≥ αinf for all k ∈ N0. Due to the convexity of F , we can write min w∈H Qαk−1(w, uk−1) ≤ min w∈H {�(w) + αk−1 2 ‖w − uk−1‖2H }. (A3) Further, due to the convexity of �, we obtain for every w = (1− λ)uk−1 + λu∗ with λ ∈ [0, 1] and u∗ ∈ S∗ that min w∈H {�(w) + αk−1 2 ‖w − uk−1‖2H } ≤ �((1 − λ)uk−1 + λu∗) + αk−1λ 2 2 ‖uk−1 − u∗‖2H ≤ (1 − λ)�(uk−1) + λ�∗ + αk−1λ 2 2 ‖uk−1 − u∗‖2H . Combining with (A2) and (A3), we can write �(uk) ≤ (1 − λ)�(uk−1) + λ�∗+αk−1λ 2 2 ‖uk−1−u∗‖2H + LF ′ 2αinfαk−1 ‖Gαk−1(uk−1)‖2H ≤ (1 − λ)�(u (k−1)) + λ�∗ + αk−1λ 2 2 ‖uk−1 − u∗‖2H + LF ′ 2αinfαk−1 ‖Gαk−1(uk−1)‖2H . (A4) Now, inserting ν(k) in the place of k in (A4) and using (20), we have �(uν(k)) ≤ (1 − λ)�(uν(k−1)) + λ�∗ + αν(k)−1λ 2 2 ‖uν(k)−1 − u∗‖2H + LF ′ 2αinfαν(k)−1 ‖Gαν(k)−1(uν(k)−1)‖2H . (A5) Subtracting�∗ from both sides of (A5), setting C̃ := LF ′ 2αinf , and using (iii) of Lemma 4, we obtain �(uν(k)) − �∗ ≤ (1 − λ) ( �(uν(k−1)) − �∗)+ αλ2 2 ‖uν(k)−1 − u∗‖2H + C̃ αν(k)−1 ‖Gαν(k)−1(uν(k)−1)‖2H . (A6) Finally, (33) follows from the fact that u∗ ∈ S∗ was arbitrary and S∗ is non-empty, closed, and convex. Nowwe deal with the verification of (34). Let an arbitrary u∗ ∈ S∗ be given, setting k = 1 and λ = 1 in (A6), we obtain �(uν(1)) − �∗ ≤ α 2 ‖uν(1)−1 − u∗‖2H + C̃ αν(1)−1 ‖Gαν(1)−1(uν(1)−1)‖2H . (A7) 123 1304 B. Azmi, M. Bernreuther Using, the fact that 0 ≤ ν(1) − 1 ≤ mmax (see L1), the firm nonexpansiveness of the proximal operator, and the Lipschitz continuity of ∇F , we can write that ‖uν(1)−1 − u∗‖H ≤ ( 1 + LF ′ αν(1)−2 ) ‖uν(1)−2 − u∗‖H ≤ ( 1 + LF ′ αinf ) ‖uν(1)−2 − u∗‖H ≤ · · · ≤ ( 1 + LF ′ αinf )mmax ‖u0 − u∗‖H . (A8) Further, using (iv) of Lemma 4 successively, we obtain that ‖Gαν(1)−1(uν(1)−1)‖H ≤ Cmmax G ‖Gα0(u0)‖H . (A9) Combining (A7), (A8), and (A9), we can write �(uν(1)) − �∗ ≤ α 2 ( 1 + LF ′ αinf )2mmax ‖u0 − u∗‖2H + C̃C2mmax G αinf ‖Gα0(u0)‖2H ≤ α 2 ( 1 + LF ′ αinf )2mmax ‖u0 − u∗‖2H + α2C̃C2mmax G αinf ‖u1 − u0‖2H . (A10) Further, the firm nonexpansiveness of the proximal operator and the Lipschitz conti- nuity of ∇F again imply that ‖u1 − u0‖H ≤ ‖u0 − u∗‖H + ‖u1 − u∗‖H ≤ 2‖u0 − u∗‖H + LF ′ α0 ‖u0 − u∗‖H ≤ (2 + LF ′ αinf )‖u0 − u∗‖H . (A11) Thus, combining (A10) and (A11) and setting C0 := α 2 ( 1 + LF ′ αinf )2mmax + α2C̃C2mmax G αinf ( 2 + LF ′ αinf )2 , we arrive at �(uν(1)) − �∗ ≤ C0‖u0 − u∗‖2H . Hence, (34) follows from the fact that u∗ ∈ S∗ is arbitrary. Acknowledgements The work of Marco Bernreuther was supported by the German Research Foundation (DFG) under grant number VO 1658/5-2 within the priority program “Non-smooth and Complementarity- based Distributed Parameter Systems: Simulation and Hierarchical Optimization” (SPP 1962). Funding Open Access funding enabled and organized by Projekt DEAL. Data Availability The solution data generated during the current study are available from the corresponding author on reasonable request. Declarations Conflict of interest The authors declare that they have no conflict of interest. OpenAccess This article is licensedunder aCreativeCommonsAttribution 4.0 InternationalLicense,which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included 123 On the forward–backward method with nonmonotone… 1305 in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. References 1. Azmi, B., Kunisch, K., Rodrigues, S.S.: Saturated feedback stabilizability to trajectories for the schlögl parabolic equation. IEEE Trans. Autom. Control (2023). https://doi.org/10.1109/TAC.2023.3247511 2. O’Donoghue, B., Stathopoulos, G., Boyd, S.: A splitting method for optimal control. IEEE Trans. Control Syst. Technol. 21(6), 2432–2442 (2013). https://doi.org/10.1109/TCST.2012.2231960 3. Combettes, P.L., Pesquet, J.-C.: Proximal splitting methods in signal processing. In: Fixed-Point Algo- rithms for Inverse Problems in Science and Engineering. Springer Optimization Application, vol. 49, pp. 185–212. Springer, New York (2011). https://doi.org/10.1007/978-1-4419-9569-8_10 4. Parikh, N., Boyd, S., et al.: Proximal algorithms. Found. Trends Optim. 1(3), 127–239 (2014) 5. Beck, A.: First-order Methods in Optimization. MOS-SIAM Series on Optimization, vol. 25, p. 475. Society for Industrial and AppliedMathematics (SIAM), Philadelphia (2017). https://doi.org/10.1137/ 1.9781611974997.ch1 6. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009). https://doi.org/10.1137/080716542 7. Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. 140(1), 125–161 (2013). https://doi.org/10.1007/s10107-012-0629-5 8. Attouch, H., Bolte, J.: On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math. Program. 116(1–2), 5–16 (2009). https://doi.org/10.1007/s10107-007-0133- 5 9. Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Math. Program. 137(1–2), 91–129 (2013). https://doi.org/10.1007/s10107-011-0484-9 10. Bolte, J., Nguyen, T.P., Peypouquet, J., Suter, B.W.: From error bounds to the complexity of first-order descent methods for convex functions. Math. Program. 165(2), 471–507 (2017). https://doi.org/10. 1007/s10107-016-1091-6 11. Frankel, P., Garrigos, G., Peypouquet, J.: Splitting methods with variable metric for Kurdyka- łojasiewicz functions and general convergence rates. J. Optim. Theory Appl. 165(3), 874–900 (2015). https://doi.org/10.1007/s10957-014-0642-3 12. Ioan Boţ, R., Robert Csetnek, E.: A forward–backward dynamical approach to the minimization of the sum of a nonsmooth convex with a smooth nonconvex function. ESAIM Control Optim. Calc. Var. 24(2), 463–477 (2018). https://doi.org/10.1051/cocv/2017020 13. Bello-Cruz, Y., Li, G., Nghia, T.: On the linear convergence of forward–backward splitting method: Part I-convergence analysis. J. Optim. Theory Appl. 188(2), 378–401 (2021). https://doi.org/10.1007/ s10957-020-01787-7 14. Drusvyatskiy, D., Lewis, A.S.: Error bounds, quadratic growth, and linear convergence of proximal methods. Math. Oper. Res. 43(3), 919–948 (2018). https://doi.org/10.1287/moor.2017.0889 15. Garrigos, G., Rosasco, L., Villa, S.: Convergence of the forward–backward algorithm: beyond the worst-case with the help of geometry. Math. Program. (2022). https://doi.org/10.1007/s10107-022- 01809-4 16. Necoara, I., Nesterov, Y., Glineur, F.: Linear convergence of first ordermethods for non-strongly convex optimization. Math. Program. 175(1–2), 69–107 (2019). https://doi.org/10.1007/s10107-018-1232-1 17. Boţ, R.I., Csetnek, E.R., László, S.C.: An inertial forward–backward algorithm for the minimization of the sum of two nonconvex functions. EURO J. Comput. Optim. 4(1), 3–25 (2016). https://doi.org/ 10.1007/s13675-015-0045-8 18. Li, G., Pong, T.K.: Global convergence of splitting methods for nonconvex composite optimization. SIAM J. Optim. 25(4), 2434–2460 (2015). https://doi.org/10.1137/140998135 19. Boţ, R.I., Csetnek, E.R., Nguyen, D.-K.: A proximal minimization algorithm for structured noncon- vex and nonsmooth problems. SIAM J. Optim. 29(2), 1300–1328 (2019). https://doi.org/10.1137/ 18M1190689 123 http://creativecommons.org/licenses/by/4.0/ https://doi.org/10.1109/TAC.2023.3247511 https://doi.org/10.1109/TCST.2012.2231960 https://doi.org/10.1007/978-1-4419-9569-8_10 https://doi.org/10.1137/1.9781611974997.ch1 https://doi.org/10.1137/1.9781611974997.ch1 https://doi.org/10.1137/080716542 https://doi.org/10.1007/s10107-012-0629-5 https://doi.org/10.1007/s10107-007-0133-5 https://doi.org/10.1007/s10107-007-0133-5 https://doi.org/10.1007/s10107-011-0484-9 https://doi.org/10.1007/s10107-016-1091-6 https://doi.org/10.1007/s10107-016-1091-6 https://doi.org/10.1007/s10957-014-0642-3 https://doi.org/10.1051/cocv/2017020 https://doi.org/10.1007/s10957-020-01787-7 https://doi.org/10.1007/s10957-020-01787-7 https://doi.org/10.1287/moor.2017.0889 https://doi.org/10.1007/s10107-022-01809-4 https://doi.org/10.1007/s10107-022-01809-4 https://doi.org/10.1007/s10107-018-1232-1 https://doi.org/10.1007/s13675-015-0045-8 https://doi.org/10.1007/s13675-015-0045-8 https://doi.org/10.1137/140998135 https://doi.org/10.1137/18M1190689 https://doi.org/10.1137/18M1190689 1306 B. Azmi, M. Bernreuther 20. Ahookhosh, M., Themelis, A., Patrinos, P.: A Bregman forward–backward linesearch algorithm for nonconvex composite optimization: superlinear convergence to nonisolated local minima. SIAM J. Optim. 31(1), 653–685 (2021). https://doi.org/10.1137/19M1264783 21. DeMarchi, A.: Proximal gradientmethods beyondmonotony. J. NonsmoothAnal. Optim. 4, 18 (2023). https://doi.org/10.46298/jnsao-2023-10290 22. Bonettini, S., Loris, I., Porta, F., Prato, M.: Variable metric inexact line-search-based methods for non- smooth optimization. SIAM J. Optim. 26(2), 891–921 (2016). https://doi.org/10.1137/15M1019325 23. Scheinberg, K., Tang, X.: Practical inexact proximal quasi-Newton method with global complexity analysis. Math. Program. 160(1–2), 495–529 (2016). https://doi.org/10.1007/s10107-016-0997-3 24. Themelis, A., Stella, L., Patrinos, P.: Forward-backward envelope for the sum of two nonconvex functions: further properties and nonmonotone linesearch algorithms. SIAM J. Optim. 28(3), 2274– 2303 (2018). https://doi.org/10.1137/16M1080240 25. Kanzow, C., Lechner, T.: Globalized inexact proximal Newton-type methods for nonconvex composite functions. Comput. Optim. Appl. 78(2), 377–410 (2021). https://doi.org/10.1007/s10589-020-00243- 6 26. Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. CMS Books in Mathematics, 2nd edn., p. 619. Springer, New York (2017). https://doi.org/10.1007/ 978-3-319-48311-5 27. Bello-Cruz, Y., Li, G., Nghia, T.: Quadratic growth conditions and uniqueness of optimal solution to lasso. J. Optim. Theory Appl. 194(1), 167–190 (2022). https://doi.org/10.1007/s10957-022-02013-2 28. Salzo, S.: The variable metric forward–backward splitting algorithm under mild differentiability assumptions. SIAM J. Optim. 27(4), 2153–2181 (2017). https://doi.org/10.1137/16M1073741 29. Attouch, H., Peypouquet, J.: The rate of convergence of Nesterov’s accelerated forward-backward method is actually faster than 1/k2. SIAM J. Optim. 26(3), 1824–1834 (2016). https://doi.org/10. 1137/15M1046095 30. Clason, C., Valkonen, T.: Introduction to Nonsmooth Analysis and Optimization (2020). https://arxiv. org/abs/2001.00216 31. Banert, S., Boţ, R.I.: A general double-proximal gradient algorithm for dc programming. Math. Pro- gram. 178(1–2), 301–326 (2019). https://doi.org/10.1007/s10107-018-1292-2 32. Baraldi, R.J., Kouri, D.P.: A proximal trust-region method for nonsmooth optimization with inexact function and gradient evaluations. Math. Program. 201(1–2), 559–598 (2023). https://doi.org/10.1007/ s10107-022-01915-3 33. Baraldi, R.J., Kouri, D.P.: Local convergence analysis of an inexact trust-region method for nonsmooth optimization. Optim. Lett. 18(3), 663–680 (2024). https://doi.org/10.1007/s11590-023-02092-8 34. Azmi, B., Kunisch, K.: Analysis of the Barzilai–Borwein step-sizes for problems in Hilbert spaces. J. Optim. Theory Appl. 185(3), 819–844 (2020). https://doi.org/10.1007/s10957-020-01677-y 35. Barzilai, J., Borwein, J.M.: Two-point step size gradient methods. IMA J. Numer. Anal. 8(1), 141–148 (1988). https://doi.org/10.1093/imanum/8.1.141 36. Grippo, L., Lampariello, F., Lucidi, S.: A nonmonotone line search technique for Newton’s method. SIAM J. Numer. Anal. 23(4), 707–716 (1986). https://doi.org/10.1137/0723046 37. Wright, S.J., Nowak, R.D., Figueiredo, M.: Sparse reconstruction by separable approximation. IEEE Trans. Signal Process. 57(7), 2479–2493 (2009). https://doi.org/10.1109/TSP.2009.2016892 38. Hager, W.W., Phan, D.T., Zhang, H.: Gradient-based methods for sparse recovery. SIAM J. Imaging Sci. 4(1), 146–165 (2011). https://doi.org/10.1137/090775063 39. Kanzow, C., Mehlitz, P.: Convergence properties of monotone and nonmonotone proximal gradient methods revisited. J. Optim. Theory Appl. 195(2), 624–646 (2022). https://doi.org/10.1007/s10957- 022-02101-3 40. Azmi, B., Kunisch, K.: On the convergence and mesh-independent property of the Barzilai–Borwein method for PDE-constrained optimization. IMA J. Numer. Anal. 42(4), 2984–3021 (2022). https://doi. org/10.1093/imanum/drab056 41. Allgower, E.L., Böhmer,K., Potra, F.A., Rheinboldt,W.C.:Amesh-independence principle for operator equations and their discretizations. SIAM J. Numer. Anal. 23(1), 160–169 (1986). https://doi.org/10. 1137/0723011 42. Heinkenschloss, M.: Mesh independence for nonlinear least squares problems with norm constraints. SIAM J. Optim. 3(1), 81–117 (1993). https://doi.org/10.1137/0803005 43. Hintermüller, M., Ulbrich, M.: A mesh-independence result for semismooth Newton methods. Math. Program. 101(1), 151–184 (2004). https://doi.org/10.1007/s10107-004-0540-9 123 https://doi.org/10.1137/19M1264783 https://doi.org/10.46298/jnsao-2023-10290 https://doi.org/10.1137/15M1019325 https://doi.org/10.1007/s10107-016-0997-3 https://doi.org/10.1137/16M1080240 https://doi.org/10.1007/s10589-020-00243-6 https://doi.org/10.1007/s10589-020-00243-6 https://doi.org/10.1007/978-3-319-48311-5 https://doi.org/10.1007/978-3-319-48311-5 https://doi.org/10.1007/s10957-022-02013-2 https://doi.org/10.1137/16M1073741 https://doi.org/10.1137/15M1046095 https://doi.org/10.1137/15M1046095 https://arxiv.org/abs/2001.00216 https://arxiv.org/abs/2001.00216 https://doi.org/10.1007/s10107-018-1292-2 https://doi.org/10.1007/s10107-022-01915-3 https://doi.org/10.1007/s10107-022-01915-3 https://doi.org/10.1007/s11590-023-02092-8 https://doi.org/10.1007/s10957-020-01677-y https://doi.org/10.1093/imanum/8.1.141 https://doi.org/10.1137/0723046 https://doi.org/10.1109/TSP.2009.2016892 https://doi.org/10.1137/090775063 https://doi.org/10.1007/s10957-022-02101-3 https://doi.org/10.1007/s10957-022-02101-3 https://doi.org/10.1093/imanum/drab056 https://doi.org/10.1093/imanum/drab056 https://doi.org/10.1137/0723011 https://doi.org/10.1137/0723011 https://doi.org/10.1137/0803005 https://doi.org/10.1007/s10107-004-0540-9 On the forward–backward method with nonmonotone… 1307 44. Kelley, C.T., Sachs, E.W.: Quasi-Newton methods and unconstrained optimal control problems. SIAM J. Control. Optim. 25(6), 1503–1516 (1987). https://doi.org/10.1137/0325083 45. Kelley, C.T., Sachs, E.W.: Approximate quasi-Newton methods. Math. Program. 48(1), 41–70 (1990) 46. Volkwein, S.: Mesh-independence for an augmented Lagrangian-SQPmethod in Hilbert spaces. SIAM J. Control. Optim. 38(3), 767–785 (2000). https://doi.org/10.1137/S0363012998334468 47. Allgower, E.L., Böhmer, K.: Application of the mesh independence principle to mesh refinement strategies. SIAM J. Numer. Anal. 24(6), 1335–1351 (1987). https://doi.org/10.1137/0724086 48. Schirotzek, W.: Nonsmooth Analysis. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3- 540-71333-3 49. Hüther, B.: Global convergence of algorithms with nonmonotone line search strategy in unconstrained optimization. Results Math. 41(3–4), 320–333 (2002). https://doi.org/10.1007/BF03322774 50. Cartis, C., Sampaio, P.R., Toint, P.L.: Worst-case evaluation complexity of non-monotone gradient- related algorithms for unconstrained optimization. Optimization 64(5), 1349–1361 (2015). https://doi. org/10.1080/02331934.2013.869809 51. Alber, Y.I., Iusem, A.N., Solodov, M.V.: On the projected subgradient method for nonsmooth con- vex optimization in a Hilbert space. Math. Program. 81(1), 23–35 (1998). https://doi.org/10.1007/ BF01584842 52. Combettes, P.L.: Quasi-Fejérian analysis of some optimization algorithms. In: Inherently Parallel Algorithms in Feasibility andOptimization andTheirApplications (Haifa, 2000). Studies inComputing Mathematics, vol. 8, pp. 115–152. North-Holland, Amsterdam (2001). https://doi.org/10.1016/S1570- 579X(01)80010-0 53. Dontchev, A.L., Zolezzi, T.: Well-posed Optimization Problems. Lecture Notes in Mathematics, vol. 1543, p. 421. Springer, Berlin (1993). https://doi.org/10.1007/BFb0084195 54. Aragón Artacho, F.J., Geoffroy, M.H.: Characterization of metric regularity of subdifferentials. J. Convex Anal. 15(2), 365–380 (2008) 55. Artacho, F., Geoffroy, M.H.: Metric subregularity of the convex subdifferential in Banach spaces. J. Nonlinear Convex Anal. 15(1), 35–47 (2014) 56. Azé, D., Corvellec, J.-N.: Nonlinear local error bounds via a change of metric. J. Fixed Point Theory Appl. 16(1–2), 351–372 (2014). https://doi.org/10.1007/s11784-015-0220-9 57. Drusvyatskiy, D., Mordukhovich, B.S., Nghia, T.: Second-order growth, tilt stability, and metric reg- ularity of the subdifferential. J. Convex Anal. 21(4), 1165–1192 (2014) 58. Garrigos, G., Rosasco, L., Villa, S.: Thresholding gradient methods in Hilbert spaces: support identi- fication and linear convergence. ESAIM Control Optim. Calc. Var. 26, 28–20 (2020). https://doi.org/ 10.1051/cocv/2019011 59. Peypouquet, J.: Convex Optimization in Normed Spaces. SpringerBriefs in Optimization. Theory, Methods and Examples. With a Foreword by Hedy Attouch, p. 124. Springer, Cham (2015). https:// doi.org/10.1007/978-3-319-13710-0 60. Casas, E., Mateos, M., Rösch, A.: Analysis of control problems of nonmontone semilinear elliptic equations. ESAIMControl Optim. Calc. Var. 26, 80–21 (2020). https://doi.org/10.1051/cocv/2020032 61. Hinze, M., Pinnau, R., Ulbrich, M., Ulbrich, S.: Optimization with PDE constraints. In: Mathematical Modelling: Theory and Applications, vol. 23, p. 270. Springer, Dordrecht (2009). https://doi.org/10. 1007/978-1-4020-8839-1 62. Kunisch, K., Rodrigues, S.S.: Global stabilizability to trajectories for the schlögl equation in a sobolev norm. Discrete Contin. Dyn. Syst. (2023). https://doi.org/10.3934/dcds.2023017 63. Hinze, M., Kunisch, K.: Second order methods for optimal control of time-dependent fluid flow. SIAM J. Control. Optim. 40(3), 925–946 (2001). https://doi.org/10.1137/S0363012999361810 64. Casas, E.: A review on sparse solutions in optimal control of partial differential equations. SeMA J. 74(3), 319–344 (2017). https://doi.org/10.1007/s40324-017-0121-5 65. Casas, E.: Second order analysis for bang-bang control problems of PDEs. SIAM J. Control. Optim. 50(4), 2355–2372 (2012). https://doi.org/10.1137/120862892 66. Alnæs, M., Blechta, J., Hake, J., Johansson, A., Kehlet, B., Logg, A., Richardson, C., Ring, J., Rognes, M.E., Wells, G.N.: The fenics project version 1.5. Arch. Numer. Softw. 3(100) (2015) 67. Virtanen, P., Gommers, R., Oliphant, T.E., Haberland, M., Reddy, T., Cournapeau, D., Burovski, E., Peterson, P.,Weckesser,W., Bright, J., et al.: Scipy 1.0: fundamental algorithms for scientific computing in python. Nat. Methods 17(3), 261–272 (2020) 123 https://doi.org/10.1137/0325083 https://doi.org/10.1137/S0363012998334468 https://doi.org/10.1137/0724086 https://doi.org/10.1007/978-3-540-71333-3 https://doi.org/10.1007/978-3-540-71333-3 https://doi.org/10.1007/BF03322774 https://doi.org/10.1080/02331934.2013.869809 https://doi.org/10.1080/02331934.2013.869809 https://doi.org/10.1007/BF01584842 https://doi.org/10.1007/BF01584842 https://doi.org/10.1016/S1570-579X(01)80010-0 https://doi.org/10.1016/S1570-579X(01)80010-0 https://doi.org/10.1007/BFb0084195 https://doi.org/10.1007/s11784-015-0220-9 https://doi.org/10.1051/cocv/2019011 https://doi.org/10.1051/cocv/2019011 https://doi.org/10.1007/978-3-319-13710-0 https://doi.org/10.1007/978-3-319-13710-0 https://doi.org/10.1051/cocv/2020032 https://doi.org/10.1007/978-1-4020-8839-1 https://doi.org/10.1007/978-1-4020-8839-1 https://doi.org/10.3934/dcds.2023017 https://doi.org/10.1137/S0363012999361810 https://doi.org/10.1007/s40324-017-0121-5 https://doi.org/10.1137/120862892 1308 B. Azmi, M. Bernreuther 68. He, Y.: Two-level method based on finite element and Crank–Nicolson extrapolation for the time- dependent Navier–Stokes equations. SIAM J. Numer. Anal. 41(4), 1263–1285 (2003). https://doi.org/ 10.1137/S0036142901385659 Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. 123 https://doi.org/10.1137/S0036142901385659 https://doi.org/10.1137/S0036142901385659 On the forward–backward method with nonmonotone linesearch for infinite-dimensional nonsmooth nonconvex problems Abstract 1 Introduction Contributions Outline of the paper Notation 2 Problem formulation and algorithm 3 Convergence and complexity analysis 3.1 General case 3.2 Convex case 3.3 Convergence under quadratic growth-type conditions 3.4 On relaxing the global Lipschitz continuity of mathcalF for problems governed by PDEs 4 Application to PDE-constrained optimization 4.1 Elliptic problem 4.2 Parabolic problem 4.3 Discussion on the quadratic growth condition 5 Numerical experiments Conclusion Appendix A Proofs A. 1 Proof of Theorem 7 A. 2 Proof of Lemma 8 Acknowledgements References