HKU Scholars Hub

HKU Scholars Hubhttp://hub.hku.hkThe DSpace digital repository system captures, stores, indexes, preserves, and distributes digital research material.Sat, 27 Apr 2024 03:31:36 GMT2024-04-27T03:31:36Z50241Adaptive restart of accelerated gradient methods under local quadratic growth conditionhttp://hdl.handle.net/10722/275730Title: Adaptive restart of accelerated gradient methods under local quadratic growth condition Authors: Fercoq, O; Qu, Z Abstract: By analyzing accelerated proximal gradient methods under a local quadratic growth condition, we show that restarting these algorithms at any frequency gives a globally linearly convergent algorithm. This result was previously known only for long enough frequencies. Then as the rate of convergence depends on the match between the frequency and the quadratic error bound, we design a scheme to automatically adapt the frequency of restart from the observed decrease of the norm of the gradient mapping. Our algorithm has a better theoretical bound than previously proposed methods for the adaptation to the quadratic error bound of the objective. We illustrate the efficiency of the algorithm on Lasso, regularized logistic regression and total variation denoising problems. Tue, 01 Jan 2019 00:00:00 GMThttp://hdl.handle.net/10722/2757302019-01-01T00:00:00ZEven faster accelerated coordinate descent using non-uniform samplinghttp://hdl.handle.net/10722/235017Title: Even faster accelerated coordinate descent using non-uniform sampling Authors: Allen-Zhu, Z; Qu, Z; Richtarik, P; Yuan, Y Abstract: Accelerated coordinate descent is widely used in optimization due to its cheap per-iteration cost and scalability to large-scale problems. Up to a primal-dual transformation, it is also the same as accelerated stochastic gradient descent that is one of the central methods used in machine learning. In this paper, we improve the best known running time of accelerated coordinate descent by a factor up to square root of n. Our improvement is based on a clean, novel non-uniform sampling that selects each coordinate with a probability proportional to the square root of its smoothness parameter. Our proof technique also deviates from the classical estimation sequence technique used in prior work. Our speed-up applies to important problems such as empirical risk minimization and solving linear systems, both in theory and in practice. Description: This journal vol. entitled: Proceedings of the 33 rd International Conference on Machine Learning, ICML 2016; The full version of this paper can be found on http://arxiv.org/abs/1512.09103 Fri, 01 Jan 2016 00:00:00 GMThttp://hdl.handle.net/10722/2350172016-01-01T00:00:00ZContraction of Riccati flows applied to the convergence analysis of the max-plus curse of dimensionality free methodhttp://hdl.handle.net/10722/219738Title: Contraction of Riccati flows applied to the convergence analysis of the max-plus curse of dimensionality free method Authors: Qu, Zheng Abstract: Max-plus based methods have been recently explored for solution of first-order Hamilton-Jacobi-Bellman equations by several authors. In particular, McEneaney's curse-of-dimensionality free method applies to the equations where the Hamiltonian takes the form of a (pointwise) maximum of linear/quadratic forms. In previous works of McEneaney and Kluberg, the approximation error of the method was shown to be O(1/(Nτ))+O(√τ) where τ is the time discretization step and N is the number of iterations. Here we use a recently established contraction result of the indefinite Riccati flow in Thompson's metric to show that under different technical assumptions, still covering an important class of problems, the total error incorporating a pruning procedure of error order τ2 is O(e-αNτ) +O(τ) for some α > 0 related to the contraction rate of the indefinite Riccati flow. © 2013 EUCA. Tue, 01 Jan 2013 00:00:00 GMThttp://hdl.handle.net/10722/2197382013-01-01T00:00:00ZQuartz: Randomized dual coordinate ascent with arbitrary samplinghttp://hdl.handle.net/10722/235016Title: Quartz: Randomized dual coordinate ascent with arbitrary sampling Authors: Qu, Z; Richtarik, P; Zhang, T Abstract: We study the problem of minimizing the average of a large number of smooth convex functions penalized with a strongly convex regularizer. We propose and analyze a novel primal-dual method (Quartz) which at every iteration samples and updates a random subset of the dual variables, chosen according to an arbitrary distribution. In contrast to typical analysis, we directly bound the decrease of the primal-dual error (in expectation), without the need to first analyze the dual error. Depending on the choice of the sampling, we obtain efficient serial and mini-batch variants of the method. In the serial case, our bounds match the best known bounds for SDCA (both with uniform and importance sampling). With standard mini-batching, our bounds predict initial data-independent speedup as well as additional data-driven speedup which depends on spectral and sparsity properties of the data. Description: Free Access of NIPS Proceedings' website located at: http://papers.nips.cc/ Thu, 01 Jan 2015 00:00:00 GMThttp://hdl.handle.net/10722/2350162015-01-01T00:00:00ZMarkov operators on cones and non-commutative consensushttp://hdl.handle.net/10722/219736Title: Markov operators on cones and non-commutative consensus Authors: Gaubert, Stephane; Qu, Zheng Abstract: The analysis of classical consensus algorithms relies on contraction properties of Markov matrices with respect to the Hilbert semi-norm (infinitesimal version of Hilbert's projective metric) and to the total variation norm. We generalize these properties to the case of operators on cones. This is motivated by the study of 'non-commutative consensus', i.e., of the dynamics of linear maps leaving invariant cones of positive semi-definite matrices. Such maps appear in quantum information (Kraus maps), and in the study of matrix means. We give a characterization of the contraction rate of an abstract Markov operator on a cone, which extends classical formulæ obtained by Dœblin and Dobrushin in the case of Markov matrices. In the special case of Kraus maps, we relate the absence of contraction to the positivity of the 'zero-error capacity' of a quantum channel. We finally show that a number of decision problems concerning the contraction rate of Kraus maps reduce to finding a rank one matrix in linear spaces satisfying certain conditions and discuss complexity issues. © 2013 EUCA. Tue, 01 Jan 2013 00:00:00 GMThttp://hdl.handle.net/10722/2197362013-01-01T00:00:00ZChecking strict positivity of Kraus maps is NP-hardhttp://hdl.handle.net/10722/234648Title: Checking strict positivity of Kraus maps is NP-hard Authors: Gaubert, S; Qu, Z Sun, 01 Jan 2017 00:00:00 GMThttp://hdl.handle.net/10722/2346482017-01-01T00:00:00ZAn inexact proximal augmented Lagrangian framework with arbitrary linearly convergent inner solver for composite convex optimizationhttp://hdl.handle.net/10722/307877Title: An inexact proximal augmented Lagrangian framework with arbitrary linearly convergent inner solver for composite convex optimization Authors: Li, F; Qu, Z Abstract: We propose an inexact proximal augmented Lagrangian framework with explicit inner problem termination rule for composite convex optimization problems. We consider arbitrary linearly convergent inner solver including in particular stochastic algorithms, making the resulting framework more scalable facing the ever-increasing problem dimension. Each subproblem is solved inexactly with an explicit and self-adaptive stopping criterion, without requiring to set an a priori target accuracy. When the primal and dual domain are bounded, our method achieves O(1/ϵ√) and O(1/ϵ) complexity bound in terms of number of inner solver iterations, respectively for the strongly convex and non-strongly convex case. Without the boundedness assumption, only logarithm terms need to be added and the above two complexity bounds increase respectively to O~(1/√ϵ) and O~(1/ϵ), which hold both for obtaining ϵ-optimal and ϵ-KKT solution. Within the general framework that we propose, we also obtain O~(1/ϵ) and O~(1/ϵ2) complexity bounds under relative smoothness assumption on the differentiable component of the objective function. We show through theoretical analysis as well as numerical experiments the computational speedup possibly achieved by the use of randomized inner solvers for large-scale problems. Fri, 01 Jan 2021 00:00:00 GMThttp://hdl.handle.net/10722/3078772021-01-01T00:00:00ZSolving ergodic Markov decision processes and perfect information zero-sum stochastic games by variance reduced deflated value iterationhttp://hdl.handle.net/10722/316993Title: Solving ergodic Markov decision processes and perfect information zero-sum stochastic games by variance reduced deflated value iteration Authors: Akian, M; Gaubert, S; Qu, Z; Saadi, O Abstract: Recently, Sidford, Wang, Wu and Ye (2018) developed an algorithm combining variance reduction techniques with value iteration to solve discounted Markov decision processes. This algorithm has a sublinear complexity when the discount factor is fixed. Here, we extend this approach to mean-payoff problems, including both Markov decision processes and perfect information zero-sum stochastic games. We obtain sublinear complexity bounds, assuming there is a distinguished state which is accessible from all initial states and for all policies. Our method is based on a reduction from the mean payoff problem to the discounted problem by a Doob h-transform, combined with a deflation technique. The complexity analysis of this algorithm uses at the same time the techniques developed by Sidford et al. in the discounted case and non-linear spectral theory techniques (Collatz-Wielandt characterization of the eigenvalue). Tue, 01 Jan 2019 00:00:00 GMThttp://hdl.handle.net/10722/3169932019-01-01T00:00:00ZL-SVRG and L-Katyusha with Arbitrary Samplinghttp://hdl.handle.net/10722/307673Title: L-SVRG and L-Katyusha with Arbitrary Sampling Authors: Qian, X; Qu, Z; Richtarik, P Abstract: We develop and analyze a new family of nonaccelerated and accelerated loopless variancereduced methods for finite-sum optimization problems. Our convergence analysis relies on a novel expected smoothness condition which upper bounds the variance of the stochastic gradient estimation by a constant times a distance-like function. This allows us to handle with ease arbitrary sampling schemes as well as the nonconvex case. We perform an indepth estimation of these expected smoothness parameters and propose new importance samplings which allow linear speedup when the expected minibatch size is in a certain range. Furthermore, a connection between these expected smoothness parameters and expected separable overapproximation (ESO) is established, which allows us to exploit data sparsity as well. Our general methods and results recover as special cases the loop Description: Open Access Journal Fri, 01 Jan 2021 00:00:00 GMThttp://hdl.handle.net/10722/3076732021-01-01T00:00:00ZSDNA: Stochastic Dual Newton Ascent for epirical risk minimizationhttp://hdl.handle.net/10722/235018Title: SDNA: Stochastic Dual Newton Ascent for epirical risk minimization Authors: Qu, Z; Richtarik, P; Takac, M; Fercoq, O Abstract: We propose a new algorithm for minimizing regularized empirical loss: Stochastic Dual Newton Ascent (SDNA). Our method is dual in nature: in each iteration we update a random subset of the dual variables. However, unlike existing methods such as stochastic dual coordinate ascent, SDNA is capable of utilizing all local curvature information contained in the examples, which leads to striking improvements in both theory and practice – sometimes by orders of magnitude. In the special case when an L2-regularizer is used in the primal, the dual problem is a concave quadratic maximization problem plus a separable term. In this regime, SDNA in each step solves a proximal subproblem involving a random principal submatrix of the Hessian of the quadratic function; whence the name of the method. Description: This journal vol. entitled: Proceedings of the 33 rd International Conference on Machine Learning, ICML 2016; The full version of this paper can be found on https://arxiv.org/abs/1502.02268 Fri, 01 Jan 2016 00:00:00 GMThttp://hdl.handle.net/10722/2350182016-01-01T00:00:00ZCurse of dimensionality reduction in max-plus based approximation methods: Theoretical estimates and improved pruning algorithmshttp://hdl.handle.net/10722/219665Title: Curse of dimensionality reduction in max-plus based approximation methods: Theoretical estimates and improved pruning algorithms Authors: Gaubert, Stéphane; McEneaney, William; Qu, Zheng Abstract: Max-plus based methods have been recently developed to approximate the value function of possibly high dimensional optimal control problems. A critical step of these methods consists in approximating a function by a supremum of a small number of functions (max-plus "basis functions") taken from a prescribed dictionary. We study several variants of this approximation problem, which we show to be continuous versions of the facility location and k-center combinatorial optimization problems, in which the connection costs arise from a Bregman distance. We give theoretical error estimates, quantifying the number of basis functions needed to reach a prescribed accuracy. We derive from our approach a refinement of the curse of dimensionality free method introduced previously by McEneaney, with a higher accuracy for a comparable computational cost. © 2011 IEEE. Sat, 01 Jan 2011 00:00:00 GMThttp://hdl.handle.net/10722/2196652011-01-01T00:00:00ZSemi-stochastic coordinate descenthttp://hdl.handle.net/10722/245050Title: Semi-stochastic coordinate descent Authors: Konecny, J; Richtarik, P; Qu, Z Sun, 01 Jan 2017 00:00:00 GMThttp://hdl.handle.net/10722/2450502017-01-01T00:00:00ZThe contraction rate in Thompson's part metric of order-preserving flows on a cone - Application to generalized Riccati equationshttp://hdl.handle.net/10722/219740Title: The contraction rate in Thompson's part metric of order-preserving flows on a cone - Application to generalized Riccati equations Authors: Gaubert, Stéphane; Qu, Zheng Abstract: We give a formula for the Lipschitz constant in Thompson's part metric of any order-preserving flow on the interior of a (possibly infinite dimensional) closed convex pointed cone. This shows that in the special case of order-preserving flows, a general characterization of the contraction rate in Thompson's part metric, given by Nussbaum, leads to an explicit formula. As an application, we show that the flow of the generalized Riccati equation arising in stochastic linear quadratic control is a local contraction on the cone of positive definite matrices and characterize its Lipschitz constant by a matrix inequality. We also show that the same flow is no longer a contraction in other invariant Finsler metrics on this cone, including the standard invariant Riemannian metric. This is motivated by a series of contraction properties concerning the standard Riccati equation, established by Bougerol, Liverani, Wojtkowski, Lawson, Lee and Lim: we show that some of these properties do, and that some other do not, carry over to the generalized Riccati equation. © 2014 Elsevier Inc. Wed, 01 Jan 2014 00:00:00 GMThttp://hdl.handle.net/10722/2197402014-01-01T00:00:00ZA max-plus based randomized algorithm for solving a class of HJB PDEshttp://hdl.handle.net/10722/219789Title: A max-plus based randomized algorithm for solving a class of HJB PDEs Authors: Qu, Zheng Abstract: © 2014 IEEE. McEneaney introduced the curse of dimensionality free method for the special class of infinite horizon optimal control problems where the Hamiltonian is represented as a maximum of quadratic affine functions. This method is featured by its cubic complexity with respect to the state space dimension, but the number of basis functions is multiplied by the number of switches at each iteration, referred to as the 'curse of complexity'. In previous works, an SDP-based pruning technique was incorporated into the method in order to reduce the curse of complexity. Its efficiency was proved on many examples. In this paper we develop a new max-plus based randomized algorithm to solve the same class of infinite horizon optimal control problems. The major difference between the new algorithm and the previous SDP-based curse of dimensionality free method is that, instead of adding a large number of functions and then pruning the less useful ones, the new algorithm finds in cheap computation time (linear in the current number of basis functions), by a randomized procedure, useful quadratic functions and adds only those functions to the set of basis functions. Experimental results show that the max-plus randomized algorithm can reach the same precision order obtained by the SDP-based method with a speedup varying from 10 up to 100 and that the maximal precision order attainable by the new algorithm is much better than what can be done by the SDP-based algorithm in reasonable computation time. Besides, with the randomized algorithm we are now able to tackle switched problems with more number of switches, which will allow us to extend the algorithm to more general classes of optimal control problems. Wed, 01 Jan 2014 00:00:00 GMThttp://hdl.handle.net/10722/2197892014-01-01T00:00:00ZCoordinate descent with arbitrary sampling II: expected separable overapproximationhttp://hdl.handle.net/10722/234647Title: Coordinate descent with arbitrary sampling II: expected separable overapproximation Authors: Qu, Z; Richtarik, P Fri, 01 Jan 2016 00:00:00 GMThttp://hdl.handle.net/10722/2346472016-01-01T00:00:00ZSAGA with Arbitrary Samplinghttp://hdl.handle.net/10722/275289Title: SAGA with Arbitrary Sampling Authors: Qian, X; Qu, Z; Richtarik, P Abstract: We study the problem of minimizing the average of a very large number of smooth functions, which is of key importance in training supervised learning models. One of the most celebrated methods in this context is the SAGA algorithm of Defazio et al. (2014). Despite years of research on the topic, a general-purpose version of SAGA—one that would include arbitrary importance sampling and minibatching schemes—does not exist. We remedy this situation and propose a general and flexible variant of SAGA following the arbitrary sampling paradigm. We perform an iteration complexity analysis of the method, largely possible due to the construction of new stochastic Lyapunov functions. We establish linear convergence rates in the smooth and strongly convex regime, and under certain error bound conditions also in a regime without strong convexity. Our rates match those of the primal-dual method Quartz (Qu et al., 2015) for which an arbitrary sampling analysis is available, which makes a significant step towards closing the gap in our understanding of complexity of primal and dual methods for finite sum problems. Finally, we show through experiments that specific variants of our general SAGA method can perform better in practice than other competing methods. Tue, 01 Jan 2019 00:00:00 GMThttp://hdl.handle.net/10722/2752892019-01-01T00:00:00ZContraction of riccati flows applied to the convergence analysis of a max-plus curse-of-dimensionality-free methodhttp://hdl.handle.net/10722/219383Title: Contraction of riccati flows applied to the convergence analysis of a max-plus curse-of-dimensionality-free method Authors: Qu, Zheng Abstract: © 2014 Society for Industrial and Applied Mathematics Max-plus based methods have been recently explored for solution of first-order Hamilton-Jacobi-Bellman equations by several authors. Among several max-plus numerical methods, McEneaney's curse-of-dimensionality-free method applies to the equations where the Hamiltonian takes the form of a (pointwise) maximum of linear/quadratic forms. In previous works of McEneaney and Kluberg, the approximation error of the method was shown to be O (1/(NT))+ O(√t), where T is the time discretization step and N is the number of iterations. Here we use a recently established contraction result for the indefinite Riccati flow in Thompson's part metric to show that under different technical assumptions, still covering an important class of problems, the error is only of order O (e-αN T ) + O (T) for some α > 0. This also allows us to obtain improved estimates of the execution time and to tune the precision of the pruning procedure, which in practice is a critical element of the method. Wed, 01 Jan 2014 00:00:00 GMThttp://hdl.handle.net/10722/2193832014-01-01T00:00:00ZRestarting the accelerated coordinate descent method with a rough strong convexity estimatehttp://hdl.handle.net/10722/290916Title: Restarting the accelerated coordinate descent method with a rough strong convexity estimate Authors: Fercoq, O; Qu, Z Abstract: We propose new restarting strategies for the accelerated coordinate descent method. Our main contribution is to show that for a well chosen sequence of restarting times, the restarted method has a nearly geometric rate of convergence. A major feature of the method is that it can take profit of the local quadratic error bound of the objective function without knowing the actual value of the error bound. We also show that under the more restrictive assumption that the objective function is strongly convex, any fixed restart period leads to a geometric rate of convergence. Finally, we illustrate the properties of the algorithm on a regularized logistic regression problem and on a Lasso problem. Wed, 01 Jan 2020 00:00:00 GMThttp://hdl.handle.net/10722/2909162020-01-01T00:00:00ZCoordinate descent with arbitrary sampling I: algorithms and complexityhttp://hdl.handle.net/10722/234646Title: Coordinate descent with arbitrary sampling I: algorithms and complexity Authors: Qu, Z; Richtarik, P Fri, 01 Jan 2016 00:00:00 GMThttp://hdl.handle.net/10722/2346462016-01-01T00:00:00ZPolyhedron over-approximation for complexity reduction in static analysishttp://hdl.handle.net/10722/275055Title: Polyhedron over-approximation for complexity reduction in static analysis Authors: Seladji, Y; Qu, Z Abstract: Polyhedra are widely used in the verification of numerical programs. Specially, in the field of static analysis by abstract interpretation to express the program invariants. Polyhedra make the analysis very expressive but also very time consuming. That cost is mostly due to the minimization function, which is used to maintain polyhedra in their minimal representation without redundant constraints or generators. In this article, we propose method to over-approximate a polyhedron by minimizing the loss of accuracy. The idea is to find a good trade off between accuracy and execution time. The proposed method is applied as an alternative to the minimization function for the template polyhedra abstract domain. Mon, 01 Jan 2018 00:00:00 GMThttp://hdl.handle.net/10722/2750552018-01-01T00:00:00ZDobrushin’s Ergodicity Coefficient for Markov Operators on Coneshttp://hdl.handle.net/10722/219773Title: Dobrushin’s Ergodicity Coefficient for Markov Operators on Cones Authors: Gaubert, Stéphane; Qu, Zheng Abstract: © 2014, Springer Basel. Doeblin and Dobrushin characterized the contraction rate of Markov operators with respect the total variation norm. We generalize their results by giving an explicit formula for the contraction rate of a Markov operator over a cone in terms of pairs of extreme points with disjoint support in a set of abstract probability measures. By duality, we derive a characterization of the contraction rate of consensus dynamics over a cone with respect to Hopf’s oscillation seminorm (the infinitesimal seminorm associated with Hilbert’s projective metric). We apply these results to Kraus maps (noncommutative Markov chains, representing quantum channels), and characterize the ultimate contraction of the map in terms of the existence of a rank one matrix in a certain subspace. Wed, 01 Jan 2014 00:00:00 GMThttp://hdl.handle.net/10722/2197732014-01-01T00:00:00ZMultiply Accelerated Value Iteration for NonSymmetric Affine Fixed Point Problems and Application to Markov Decision Processeshttp://hdl.handle.net/10722/316749Title: Multiply Accelerated Value Iteration for NonSymmetric Affine Fixed Point Problems and Application to Markov Decision Processes Authors: Akian, M; Gaubert, S; Qu, Z; Saadi, O Sat, 01 Jan 2022 00:00:00 GMThttp://hdl.handle.net/10722/3167492022-01-01T00:00:00ZMaximizing concave piecewise affine functions on the unitary grouphttp://hdl.handle.net/10722/316699Title: Maximizing concave piecewise affine functions on the unitary group Authors: Gaubert, S; Qu, Z; Sridharan, S Fri, 01 Jan 2016 00:00:00 GMThttp://hdl.handle.net/10722/3166992016-01-01T00:00:00ZDobrushin’s Ergodicity Coefficient for Markov Operators on Coneshttp://hdl.handle.net/10722/316700Title: Dobrushin’s Ergodicity Coefficient for Markov Operators on Cones Authors: Gaubert, S; Qu, Z Thu, 01 Jan 2015 00:00:00 GMThttp://hdl.handle.net/10722/3167002015-01-01T00:00:00Z