Optimization Toolbox
For Use with MATLAB
?
Computation Visualization Programming
User’s Guide
Version 2
How to Contact The MathWorks:
www.mathworks.com comp.soft-sys.matlab support@mathworks.com suggest@mathworks.com bugs@mathworks.com doc@mathworks.com service@mathworks.com info@mathworks.com
Web Newsgroup Technical support Product enhancement suggestions Bug reports Documentation error reports Order status, license renewals, passcodes Sales, pricing, and general information Phone Fax Mail
508-647-7000 508-647-7001 The MathWorks, Inc. 3 Apple Hill Drive Natick, MA 01760-2098
For contact information about worldwide offices, see the MathWorks Web site. Optimization Toolbox User’s Guide ? COPYRIGHT 1990 - 2002 by The MathWorks, Inc.
The software described in this document is furnished under a license agreement. The software may be used or copied only under the terms of the license agreement. No part of this manual may be photocopied or reproduced in any form without prior written consent from The MathWorks, Inc. FEDERAL ACQUISITION: This provision applies to all acquisitions of the Program and Documentation by or for the federal government of the United States. By accepting delivery of the Program, the government hereby agrees that this software qualifies as "commercial" computer software within the meaning of FAR Part 12.212, DFARS Part 227.7202-1, DFARS Part 227.7202-3, DFARS Part 252.227-7013, and DFARS Part 252.227-7014. The terms and conditions of The MathWorks, Inc. Software License Agreement shall pertain to the government’s use and disclosure of the Program and Documentation, and shall supersede any conflicting contractual terms or conditions. If this license fails to meet the government’s minimum needs or is inconsistent in any respect with federal procurement law, the government agrees to return the Program and Documentation, unused, to MathWorks. MATLAB, Simulink, Stateflow, Handle Graphics, and Real-Time Workshop are registered trademarks, and TargetBox is a trademark of The MathWorks, Inc. Other product or brand names are trademarks or registered trademarks of their respective holders.
Printing History: November 1990 December 1996 January 1999 September 2000 June 2001 July 2002
First printing Second printing Third printing Fourth printing Online only Online only
For MATLAB 5 For Version 2 (Release 11) For Version 2.1 (Release 12) Revised for Version 2.1.1 (Release 12.1) Revised for Version 2.2 (Release 13)
Contents
Preface
Using This Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Related Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Typographical Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Introduction
1
What Is the Optimization Toolbox? . . . . . . . . . . . . . . . . . . . . . 1-2 New Features in Version 2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-3 New fsolve Default Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 1-3 Configuration Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-4 Technical Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-5 Matrix, Vector, and Scalar Notation . . . . . . . . . . . . . . . . . . . . . 1-5 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-6
Tutorial
2
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3 Problems Covered by the Toolbox . . . . . . . . . . . . . . . . . . . . . . . . 2-3 Using the Optimization Functions . . . . . . . . . . . . . . . . . . . . . . . 2-5
i
Examples that Use Standard Algorithms . . . . . . . . . . . . . . . . 2-7 Unconstrained Minimization Example . . . . . . . . . . . . . . . . . . . . 2-8 Nonlinear Inequality Constrained Example . . . . . . . . . . . . . . . 2-9 Constrained Example with Bounds . . . . . . . . . . . . . . . . . . . . . 2-11 Constrained Example with Gradients . . . . . . . . . . . . . . . . . . . 2-12 Gradient Check: Analytic Versus Numeric . . . . . . . . . . . . . . . 2-14 Equality Constrained Example . . . . . . . . . . . . . . . . . . . . . . . . . 2-15 Maximization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-16 Greater-Than-Zero Constraints . . . . . . . . . . . . . . . . . . . . . . . . 2-16 Additional Arguments: Avoiding Global Variables . . . . . . . . . 2-16 Nonlinear Equations with Analytic Jacobian . . . . . . . . . . . . . . 2-17 Nonlinear Equations with Finite-Difference Jacobian . . . . . . 2-20 Multiobjective Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-21 Large-Scale Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-33 Problems Covered by Large-Scale Methods . . . . . . . . . . . . . . . 2-34 Nonlinear Equations with Jacobian . . . . . . . . . . . . . . . . . . . . . 2-37 Nonlinear Equations with Jacobian Sparsity Pattern . . . . . . . 2-39 Nonlinear Least-Squares with Full Jacobian Sparsity Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-41 Nonlinear Minimization with Gradient and Hessian . . . . . . . 2-43 Nonlinear Minimization with Gradient and Hessian Sparsity Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-44 Nonlinear Minimization with Bound Constraints and Banded Preconditioner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-46 Nonlinear Minimization with Equality Constraints . . . . . . . . 2-50 Nonlinear Minimization with a Dense but Structured Hessian and Equality Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-51 Quadratic Minimization with Bound Constraints . . . . . . . . . . 2-55 Quadratic Minimization with a Dense but Structured Hessian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-57 Linear Least-Squares with Bound Constraints . . . . . . . . . . . . 2-60 Linear Programming with Equalities and Inequalities . . . . . . 2-61 Linear Programming with Dense Columns in the Equalities . 2-62 Default Parameter Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-65 Changing the Default Settings . . . . . . . . . . . . . . . . . . . . . . . . . 2-65 Displaying Iterative Output . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-68 Output Headings: Medium-Scale Algorithms . . . . . . . . . . . . . 2-68
ii
Contents
Output Headings: Large-Scale Algorithms . . . . . . . . . . . . . . . 2-71 Optimization of Inline Objects Instead of M-Files . . . . . . . 2-74 Typical Problems and How to Deal with Them . . . . . . . . . . 2-76 Converting Your Code to Version 2 Syntax . . . . . . . . . . . . . Using optimset and optimget . . . . . . . . . . . . . . . . . . . . . . . . . . New Calling Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example of Converting from constr to fmincon . . . . . . . . . . . .
2-80 2-81 2-81 2-89
Selected Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-92
Standard Algorithms
3
Optimization Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3 Unconstrained Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4 Quasi-Newton Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6 Line Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-8 Quasi-Newton Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 3-11 Least-Squares Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . Gauss-Newton Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Levenberg-Marquardt Method . . . . . . . . . . . . . . . . . . . . . . . . . Nonlinear Least-Squares Implementation . . . . . . . . . . . . . . . . Nonlinear Systems of Equations . . . . . . . . . . . . . . . . . . . . . . . Gauss-Newton Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Trust-Region Dogleg Method . . . . . . . . . . . . . . . . . . . . . . . . . . . Nonlinear Equations Implementation . . . . . . . . . . . . . . . . . . . Constrained Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequential Quadratic Programming (SQP) . . . . . . . . . . . . . . . Quadratic Programming (QP) Subproblem . . . . . . . . . . . . . . . SQP Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3-18 3-19 3-20 3-22 3-24 3-24 3-24 3-26 3-28 3-29 3-30 3-31
iii
Multiobjective Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Goal Attainment Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Algorithm Improvements for Goal Attainment Method . . . . .
3-38 3-38 3-44 3-45
Selected Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-48
Large-Scale Algorithms
4
Trust-Region Methods for Nonlinear Minimization . . . . . . . 4-2 Preconditioned Conjugate Gradients . . . . . . . . . . . . . . . . . . . 4-5 Linearly Constrained Problems . . . . . . . . . . . . . . . . . . . . . . . . 4-7 Linear Equality Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-7 Box Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-7 Nonlinear Least-Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-10 Quadratic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-11 Linear Least-Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-12 Large-Scale Linear Programming . . . . . . . . . . . . . . . . . . . . . 4-13 Main Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-13 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-16 Selected Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-17
iv
Contents
Function Reference
5
Functions —By Category . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Equation Solving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Least Squares (Curve Fitting) . . . . . . . . . . . . . . . . . . . . . . . . . . . Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Demos of Large-Scale Methods . . . . . . . . . . . . . . . . . . . . . . . . . . Demos of Medium-Scale Methods . . . . . . . . . . . . . . . . . . . . . . . .
5-2 5-2 5-2 5-3 5-3 5-3 5-4
Function Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-5 Optimization Options Parameters . . . . . . . . . . . . . . . . . . . . . 5-10 Functions — Alphabetical List . . . . . . . . . . . . . . . . . . . . . . . . 5-15
Index
v
vi
Contents
Preface
The Preface consists of these sections: Using This Guide (p. viii) Related Products (p. ix) Typographical Conventions (p. xi) Explains the organization of this guide. Lists products that may be relevant to the kinds of tasks you can perform with the Optimization Toolbox. Describes the typographical conventions used in this guide.
Preface
Using This Guide
This guide has the following chapters: ? The “Introduction” introduces the Optimization Toolbox, explains technical conventions used in the book, and lists features that are new in Version 2.2. It also directs you to installation and configuration information ? The “Tutorial” chapter shows you how to solve a variety of different optimization problems. It includes a section that highlights large-scale problems. This chapter also provides information on how to use the toolbox functions in conjunction with Simulink using multiobjective optimization. Other sections include information about changing default parameters and using inline objects. ? The “Standard Algorithms” and “Large-Scale Algorithms” chapters describe the algorithms used by the optimization functions. “Standard Algorithms” describes the problem formulations and algorithms for the medium-scale algorithms. “Large-Scale Algorithms” focuses on algorithms used to solve large sparse or structured problems. ? The “Function Reference” chapter provides a detailed reference description of each toolbox function. Reference descriptions include the function’s syntax, a description of the different calling sequences available, and detailed information about arguments to the function, including relevant optimization options parameters. Reference descriptions may also include examples, a summary of the function’s algorithms, and references to additional reading material.
viii
Related Products
Related Products
The MathWorks provides several products that are relevant to the kinds of tasks you can perform with the Optimization Toolbox. For more information about any of these products, see either ? The online documentation for that product, if it is installed or if you are reading the documentation from the CD ? The MathWorks Web site at www.mathworks.com; see the “products” section
Note The toolboxes listed below all include functions that extend the capabilities of MATLAB. The blocksets all include blocks that extend the capabilities of Simulink.
Product
Description
Curve Fitting Toolbox Data Acquisition Toolbox Database Toolbox Financial Time Series Toolbox Financial Toolbox GARCH Toolbox LMI Control Toolbox Neural Network Toolbox
Perform model fitting and analysis Acquire and send out data from plug-in data acquisition boards Exchange data with relational databases Analyze and manage financial time series data Model financial data and develop financial analysis algorithms Analyze financial volatility using univariate GARCH models Design robust controllers using convex optimization techniques Design and simulate neural networks
ix
Preface
Product
Description
Nonlinear Control Design Blockset Signal Processing Toolbox Simulink Spline Toolbox Statistics Toolbox Symbolic/Extended Symbolic Math Toolbox System Identification Toolbox
Optimize design parameters in nonlinear control systems Perform signal processing, analysis, and algorithm development Design and simulate continuous- and discrete-time systems Create and manipulate spline approximation models of data Apply statistical algorithms and probability models Perform computations using symbolic mathematics and variable-precision arithmetic Create linear dynamic models from measured input-output data
x
Typographical Conventions
Typographical Conventions
This manual uses some or all of these conventions.
Item Convention Monospace font Example
Example code
To assign the value 5 to A, enter
A = 5
Function names, syntax, filenames, directory/folder names, and user input
Monospace font
The cos function finds the cosine of each array element. Syntax line example is
MLGetVar ML_var_name
Buttons and keys Literal strings (in syntax descriptions in reference chapters) Mathematical expressions MATLAB output
Boldface with book title caps
Monospace bold for literals
Press the Enter key.
f = freqspace(n,'whole')
Italics for variables Standard text font for functions, operators, and constants
Monospace font
This vector represents the polynomial p = x2 + 2x + 3. MATLAB responds with
A = 5
Menu and dialog box titles New terms and for emphasis Omitted input arguments
Boldface with book title caps
Choose the File Options menu. An array is an ordered collection of information.
[c,ia,ib] = union(...)
Italics (...) ellipsis denotes all of the input/output arguments from preceding syntaxes.
Monospace italics
String variables (from a finite list)
sysc = d2c(sysd,'method')
xi
Preface
xii
1
Introduction
The Introduction consists of these sections: What Is the Optimization Toolbox? (p. 1-2) Introduces the Optimization Toolbox, and describes its intended use and its capabilities. New Features in Version 2.2 (p. 1-3) Configuration Information (p. 1-4) Technical Conventions (p. 1-5) Acknowledgments (p. 1-6) Introduces features that are new in Version 2.2. Directs you to installation and configuration information. Describes mathematical notation used in this guide. Acknowledges significant contributions to the Optimization Toolbox.
1
Introduction
What Is the Optimization Toolbox?
The Optimization Toolbox is a collection of functions that extend the capability of the MATLAB? numeric computing environment. The toolbox includes routines for many types of optimization including ? Unconstrained nonlinear minimization ? Constrained nonlinear minimization, including goal attainment problems, minimax problems, and semi-infinite minimization problems ? Quadratic and linear programming ? Nonlinear least squares and curve-fitting ? Nonlinear system of equation solving ? Constrained linear least squares ? Sparse and structured large-scale problems All the toolbox functions are MATLAB M-files, made up of MATLAB statements that implement specialized optimization algorithms. You can view the MATLAB code for these functions using the statement
type function_name
You can extend the capabilities of the Optimization Toolbox by writing your own M-files, or by using the toolbox in combination with other toolboxes, or with MATLAB or Simulink?.
1-2
New Features in Version 2.2
New Features in Version 2.2
Version 2.2 (Beta 2) of the Optimization Toolbox offers a “New fsolve Default Algorithm”
New fsolve Default Algorithm
The fsolve function, which is used to solve systems of nonlinear equations, has a new default algorithm for medium-scale systems where the number of equations is equal to the number of variables. The new algorithm uses a trust-region dogleg method that has improved convergence properties over the previous default method. In keeping with the new default trust-region dogleg algorithm, fsolve now defaults to the medium-scale method. A new 'NonlEqnAlgorithm' fsolve parameter enables you to choose the Levenberg-Marquardt or Gauss-Newton algorithm over the trust-region dogleg algorithm. For more information, see “Nonlinear Systems of Equations” on page 3-24, “Nonlinear Equations Implementation” on page 3-26, and the fsolve reference page.
1-3
1
Introduction
Configuration Information
To determine whether the Optimization Toolbox is installed on your system, type this command at the MATLAB prompt.
ver
When you enter this command, MATLAB displays information about the version of MATLAB you are running, including a list of all toolboxes installed on your system and their version numbers. If the Optimization Toolbox is not installed, check the Installation documentation for your platform for instructions on how to install it.
Note For the most up-to-date information about system requirements, see the individual product pages at the MathWorks Web site (http://www.mathworks.com).
1-4
Technical Conventions
Technical Conventions
Matrix, Vector, and Scalar Notation
Uppercase letters such as A are used to denote matrices. Lowercase letters such as x are used to denote vectors, except where noted that the variable is a scalar. For functions, the notation differs slightly to follow the usual conventions in optimization. For vector functions, we use an upper-case letter such as F in F ( x ) . A function that returns a scalar value is denoted with a lowercase letter such as f in f ( x ) .
1-5
1
Introduction
Acknowledgments
The MathWorks would like to acknowledge these contributors:
Thomas F. Coleman researched and contributed the large scale algorithms for constrained and unconstrained minimization, nonlinear least squares and curve fitting, constrained linear least squares, quadratic programming, and nonlinear equations.
Dr. Coleman is Professor of Computer Science and Applied Mathematics at Cornell University. He is Director of the Cornell Theory Center and the Cornell Computational Finance Institute. Dr. Coleman is Chair of the SIAM Activity Group on Optimization, and a member of the Editorial Boards of Applied Mathematics Letters, SIAM Journal of Scientific Computing, Computational Optimization and Applications, Communications on Applied Nonlinear Analysis, and Mathematical Modeling and Scientific Computing. Dr. Coleman has published 4 books and over 70 technical papers in the areas of continuous optimization and computational methods and tools for large-scale problems.
Yin Zhang researched and contributed the large-scale linear programming algorithm.
Dr. Zhang is Associate Professor of Computational and Applied Mathematics on the faculty of the Keck Center for Computational Biology at Rice University. He is on the Editorial Board of SIAM Journal on Optimization, and is Associate Editor of Journal of Optimization: Theory and Applications. Dr. Zhang has published over 40 technical papers in the areas of interior-point methods for linear programming and computation mathematical programming.
1-6
2
Tutorial
The Tutorial provides information on how to use the toolbox functions. It also provides examples for solving different optimization problems. It consists of these sections. Introduction (p. 2-3) Summarizes, in tabular form, the functions available for minimization, equation solving, and solving least-squares or data fitting problems. It also provides basic guidelines for using the optimization routines and introduces the algorithms and line-search strategies that are available for solving medium- and large-scale problems. Presents medium-scale algorithms through a selection of minimization examples. These examples include unconstrained and constrained problems, as well as problems with and without user-supplied gradients. This section also discusses maximization, greater-than-zero constraints, passing additional arguments, and multiobjective examples. Presents large-scale algorithms through a selection of large-scale examples. These examples include specifying sparsity structures, and preconditioners, as well as unconstrained and constrained problems. Describes the use of default parameter settings and tells you how to change them. It also tells you how to determine which parameters are used by a specified function, and provides examples of setting some commonly used parameters. Describes the column headings used in the iterative output of both medium-scale and large-scale algorithms.
Examples that Use Standard Algorithms (p. 2-7)
Large-Scale Examples (p. 2-33)
Default Parameter Settings (p. 2-65)
Displaying Iterative Output (p. 2-68)
2
Tutorial
Optimization of Inline Objects Instead of M-Files (p. 2-74) Typical Problems and How to Deal with Them (p. 2-76)
Tells you how to represent a mathematical function at the command line by creating an inline object from a string expression. Provides tips to help you improve solutions found using the optimization functions, improve efficiency of the algorithms, overcome common difficulties, and transform problems that are typically not in standard form. Compares a Version 1.5 call to the equivalent Version 2 call for each function. This section also describes the Version 2 calling sequences and provides a detailed example of converting from constr to its replacement fmincon. Lists published materials that support concepts implemented in the Optimization Toolbox.
Converting Your Code to Version 2 Syntax (p. 2-80)
Selected Bibliography (p. 2-92)
2-2
Introduction
Introduction
Optimization concerns the minimization or maximization of functions. The Optimization Toolbox consists of functions that perform minimization (or maximization) on general nonlinear functions. Functions for nonlinear equation solving and least-squares (data-fitting) problems are also provided. This introduction includes the following sections: ? Problems Covered by the Toolbox ? Using the Optimization Functions
Problems Covered by the Toolbox
The following tables show the functions available for minimization, equation solving, and solving least-squares or data-fitting problems.
Note The following tables list the types of problems in order of increasing complexity.
Table 2-1: Minimization Type Notation Function fminbnd fminunc, fminsearch linprog
Scalar Minimization Unconstrained Minimization
min f ( a ) such that a 1 < a < a2 a min f ( x ) x min f x such that x A ? x ≤ b, Aeq ? x = beq, l ≤ x ≤ u
T 1 T - x Hx + f x such that min -x 2 A ? x ≤ b, Aeq ? x = beq, l ≤ x ≤ u T
Linear Programming
Quadratic Programming
quadprog
2-3
2
Tutorial
Table 2-1: Minimization (Continued) Type Notation Function fmincon
Constrained Minimization
min f ( x ) such that x c ( x ) ≤ 0, ceq ( x ) = 0 A ? x ≤ b, Aeq ? x = beq, l ≤ x ≤ u min γ such that x, γ F ( x ) – w γ ≤ goal c ( x ) ≤ 0, ceq ( x ) = 0 A ? x ≤ b, Aeq ? x = beq, l ≤ x ≤ u min max { F i(x) } x {F } i
Goal Attainment
fgoalattain
Minimax
such that
fminimax
c ( x ) ≤ 0, ceq ( x ) = 0 A ? x ≤ b, Aeq ? x = beq, l ≤ x ≤ u Semi-Infinite Minimization min f ( x ) such that x K(x, w) ≤ 0 for all w c ( x ) ≤ 0, ceq ( x ) = 0 A ? x ≤ b, Aeq ? x = beq, l ≤ x ≤ u
fseminf
Table 2-2: Equation Solving Type Notation Function \ (slash) fzero
Linear Equations Nonlinear Equation of One Variable Nonlinear Equations
C ? x = d , n equations, n variables f( a ) = 0 F(x) = 0 , n equations, n variables
fsolve
2-4
Introduction
Table 2-3: Least-Squares (Curve Fitting) Type Notation Function
2 2 2 2 2
Linear Least-Squares Nonnegative Linear-Least-Squares Constrained Linear-Least-Squares Nonlinear Least-Squares Nonlinear Curve Fitting
min C ? x – d x min C ? x – d x
, m equations, n variables such that x ≥ 0
\ (slash) lsqnonneg
min C ? x – d 2 such that x A ? x ≤ b, Aeq ? x = beq, l ≤ x ≤ u 1 - F(x) min -x 2
2 2
lsqlin
1 = -2
Fi ( x) ∑ i
2
such that l ≤ x ≤ u such that l ≤ x ≤ u
lsqnonlin lsqcurvefit
1 - F(x, xdata) – ydata min -x 2
2 2
Using the Optimization Functions
Most of these optimization routines require the definition of an M-file containing the function to be minimized, i.e., the objective function. Alternatively, you can use an inline object created from a MATLAB expression. Maximization is achieved by supplying the routines with -f, where f is the function being optimized. Optimization options passed to the routines change optimization parameters. Default optimization parameters are used extensively but can be changed through an options structure. Gradients are calculated using an adaptive finite-difference method unless they are supplied in a function. Parameters can be passed directly to functions, avoiding the need for global variables. This guide separates “medium-scale” algorithms from “large-scale” algorithms. Medium-scale is not a standard term and is used here only to differentiate these algorithms from the large-scale algorithms, which are designed to handle large-scale problems efficiently.
2-5
2
Tutorial
Medium-Scale Algorithms
The Optimization Toolbox routines offer a choice of algorithms and line search strategies. The principal algorithms for unconstrained minimization are the Nelder-Mead simplex search method and the BFGS (Broyden, Fletcher, Goldfarb, and Shanno) quasi-Newton method. For constrained minimization, minimax, goal attainment, and semi-infinite optimization, variations of sequential quadratic programming (SQP) are used. Nonlinear least-squares problems use the Gauss-Newton and Levenberg-Marquardt methods. Nonlinear equation solving also uses the trust-region dogleg algorithm. A choice of line search strategy is given for unconstrained minimization and nonlinear least-squares problems. The line search strategies use safeguarded cubic and quadratic interpolation and extrapolation methods.
Large-Scale Algorithms
All the large-scale algorithms, except linear programming, are trust-region methods. Bound constrained problems are solved using reflective Newton methods. Equality constrained problems are solved using a projective preconditioned conjugate gradient iteration. You can use sparse iterative solvers or sparse direct solvers in solving the linear systems to determine the current step. Some choice of preconditioning in the iterative solvers is also available. The linear programming method is a variant of Mehrotra’s predictor-corrector algorithm, a primal-dual interior-point method.
2-6
Examples that Use Standard Algorithms
Examples that Use Standard Algorithms
This section presents the medium-scale (i.e., standard) algorithms through a tutorial. Examples similar to those in the first part of this tutorial (“Unconstrained Minimization Example” through the “Equality Constrained Example”) can also be found in the first demonstration, “Tutorial Walk Through,” in the M-file optdemo. The examples in this manual differ in that they use M-file functions for the objective functions, whereas the online demonstrations use inline objects for some functions.
Note Medium-scale is not a standard term and is used to differentiate these algorithms from the large-scale algorithms described in “Large-Scale Algorithms” on page 4-1.
The tutorial uses the functions fminunc, fmincon, and fsolve. The other optimization routines, fgoalattain, fminimax, lsqnonlin, and fseminf, are used in a nearly identical manner, with differences only in the problem formulation and the termination criteria. The section “Multiobjective Examples” on page 2-21 discusses multiobjective optimization and gives several examples using lsqnonlin, fminimax, and fgoalattain, including how Simulink can be used in conjunction with the toolbox. This section includes the following examples: ? Unconstrained Minimization Example ? Nonlinear Inequality Constrained Example ? Constrained Example with Bounds ? Constrained Example with Gradients ? Gradient Check: Analytic Versus Numeric ? Equality Constrained Example It also discusses ? Maximization ? Greater-Than-Zero Constraints ? Additional Arguments: Avoiding Global Variables ? Nonlinear Equations with Analytic Jacobian
2-7
2
Tutorial
? Nonlinear Equations with Finite-Difference Jacobian ? Multiobjective Examples
Unconstrained Minimization Example
Consider the problem of finding a set of values [x1, x2] that solves minimize f ( x ) = e 1 ( 4 x 1 + 2 x 2 + 4 x 1 x 2 + 2 x 2 + 1 ) x
x 2 2
(2-1)
To solve this two-dimensional problem, write an M-file that returns the function value. Then, invoke the unconstrained minimization routine fminunc.
Step 1: Write an M-file objfun.m.
function f = objfun(x) f = exp(x(1))*(4*x(1)^2+2*x(2)^2+4*x(1)*x(2)+2*x(2)+1);
Step 2: Invoke one of the unconstrained optimization routines.
x0 = [-1,1]; % Starting guess options = optimset('LargeScale','off'); [x,fval,exitflag,output] = fminunc(@objfun,x0,options);
After 40 function evaluations, this produces the solution
x = 0.5000 -1.0000
The function at the solution x is returned in fval:
fval = 1.3030e-10
The exitflag tells whether the algorithm converged. An exitflag > 0 means a local minimum was found:
exitflag = 1
The output structure gives more details about the optimization. For fminunc, it includes the number of iterations in iterations, the number of function evaluations in funcCount, the final step-size in stepsize, a measure of first-order optimality (which in this unconstrained case is the infinity norm of
2-8
Examples that Use Standard Algorithms
the gradient at the solution) in firstorderopt, and the type of algorithm used in algorithm:
output = iterations: funcCount: stepsize: firstorderopt: algorithm: 7 40 1 9.2801e-004 'medium-scale: Quasi-Newton line search'
When more than one local minimum exists, the initial guess for the vector [x1, x2] affects both the number of function evaluations and the value of the solution point. In the preceding example, x0 is initialized to [-1,1]. The variable options can be passed to fminunc to change characteristics of the optimization algorithm, as in
x = fminunc(@objfun,x0,options); options is a structure that contains values for termination tolerances and algorithm choices. An options structure can be created using the optimset function: options = optimset('LargeScale','off');
In this example, we have turned off the default selection of the large-scale algorithm and so the medium-scale algorithm is used. Other options include controlling the amount of command line display during the optimization iteration, the tolerances for the termination criteria, whether a user-supplied gradient or Jacobian is to be used, and the maximum number of iterations or function evaluations. See optimset, the individual optimization functions, and Table 5, Optimization Parameters, on page 5-10 for more options and information.
Nonlinear Inequality Constrained Example
If inequality constraints are added to Eq. 2-1, the resulting problem can be solved by the fmincon function. For example, find x that solves minimize f ( x ) = e 1 ( 4 x 1 + 2 x 2 + 4 x 1 x 2 + 2 x 2 + 1 ) x subject to the constraints
x 2 2
(2-2)
2-9
2
Tutorial
x 1 x 2 – x 1 – x 2 ≤ – 1.5 x 1 x 2 ≥ – 10 Because neither of the constraints is linear, you cannot pass the constraints to fmincon at the command line. Instead you can create a second M-file, confun.m, that returns the value at both constraints at the current x in a vector c. The constrained optimizer, fmincon, is then invoked. Because fmincon expects the constraints to be written in the form c ( x ) ≤ 0 , you must rewrite your constraints in the form x 1 x 2 – x 1 – x 2 + 1.5 ≤ 0 – x 1 x 2 – 10 ≤ 0
(2-3)
Step 1: Write an M-file confun.m for the constraints.
function [c, ceq] = confun(x) % Nonlinear inequality constraints c = [1.5 + x(1)*x(2) - x(1) - x(2); -x(1)*x(2) - 10]; % Nonlinear equality constraints ceq = [];
Step 2: Invoke constrained optimization routine.
x0 = [-1,1]; % Make a starting guess at the solution options = optimset('LargeScale','off'); [x, fval] = ... fmincon(@objfun,x0,[],[],[],[],[],[],@confun,options)
After 38 function calls, the solution x produced with function value fval is
x = -9.5474 fval = 0.0236 1.0474
We can evaluate the constraints at the solution
[c,ceq] = confun(x) c= 1.0e-14 * 0.1110
2-10
Examples that Use Standard Algorithms
-0.1776 ceq = []
Note that both constraint values are less than or equal to zero; that is, x satisfies c ( x ) ≤ 0 .
Constrained Example with Bounds
The variables in x can be restricted to certain limits by specifying simple bound constraints to the constrained optimizer function. For fmincon, the command
x = fmincon(@objfun,x0,[],[],[],[],lb,ub,@confun,options);
limits x to be within the range lb <= x <= ub. To restrict x in Eq. 2-2 to be greater than zero (i.e., x 1 ≥ 0 , x 2 ≥ 0 ), use the commands
x0 = [-1,1]; % Make a starting guess at the solution lb = [0,0]; % Set lower bounds ub = [ ]; % No upper bounds options = optimset('LargeScale','off'); [x,fval = ... fmincon(@objfun,x0,[],[],[],[],lb,ub,@confun,options) [c, ceq] = confun(x)
Note that to pass in the lower bounds as the seventh argument to fmincon, you must specify values for the third through sixth arguments. In this example, we specified [] for these arguments since there are no linear inequalities or linear equalities. After 13 function evaluations, the solution produced is
x = 0 fval = 8.5000 c = 0 -10 ceq = 1.5000
2-11
2
Tutorial
[]
When lb or ub contains fewer elements than x, only the first corresponding elements in x are bounded. Alternatively, if only some of the variables are bounded, then use -inf in lb for unbounded below variables and inf in ub for unbounded above variables. For example,
lb = [-inf 0]; ub = [10 inf];
bounds x 1 ≤ 10 , 0 ≤ x 2 ( x 1 has no lower bound and x 2 has no upper bound). Using inf and -inf give better numerical results than using a very large positive number or a very large negative number to imply lack of bounds. Note that the number of function evaluations to find the solution is reduced because we further restricted the search space. Fewer function evaluations are usually taken when a problem has more constraints and bound limitations because the optimization makes better decisions regarding step size and regions of feasibility than in the unconstrained case. It is, therefore, good practice to bound and constrain problems, where possible, to promote fast convergence to a solution.
Constrained Example with Gradients
Ordinarily the medium-scale minimization routines use numerical gradients calculated by finite-difference approximation. This procedure systematically perturbs each of the variables in order to calculate function and constraint partial derivatives. Alternatively, you can provide a function to compute partial derivatives analytically. Typically, the problem is solved more accurately and efficiently if such a function is provided. To solve Eq. 2-2 using analytically determined gradients, do the following.
Step 1: Write an M-file for the objective function and gradient.
function [f,G] = objfungrad(x) f = exp(x(1))*(4*x(1)^2+2*x(2)^2+4*x(1)*x(2)+2*x(2)+1); % Gradient of the objective function t = exp(x(1))*(4*x(1)^2+2*x(2)^2+4*x(1)*x(2)+2*x(2)+1); G = [ t + exp(x(1)) * (8*x(1) + 4*x(2)), exp(x(1))*(4*x(1)+4*x(2)+2)];
2-12
Examples that Use Standard Algorithms
Step 2: Write an M-file for the nonlinear constraints and the gradients of the nonlinear constraints.
function [c,ceq,DC,DCeq] = confungrad(x) c(1) = 1.5 + x(1) * x(2) - x(1) - x(2); %Inequality constraints c(2) = -x(1) * x(2)-10; % Gradient of the constraints DC= [x(2)-1, -x(2); x(1)-1, -x(1)]; % No nonlinear equality constraints ceq=[]; DCeq = [ ]; G contains the partial derivatives of the objective function, f, returned by objfungrad(x), with respect to each of the elements in x:
?f ----- = ?x
e ( 4 x 1 + 2 x 2 + 4 x 1 x 2 + 2 x2 + 1 ) + e ( 8 x1 + 4 x 2 )
(2-4)
x1
2
2
x1
e ( 4 x1 + 4 x2 + 2 ) The columns of DC contain the partial derivatives for each respective constraint (i.e., the ith column of DC is the partial derivative of the ith constraint with respect to x). So in the above example, DC is ?c 1 -------?x 1 ?c 1 -------?x 2 ?c 2 -------?x 1 = ?c 2 -------?x 2 x1 – 1 –x1
x1
x2 – 1
–x2
(2-5)
Since you are providing the gradient of the objective in objfungrad.m and the gradient of the constraints in confungrad.m, you must tell fmincon that these M-files contain this additional information. Use optimset to turn the parameters GradObj and GradConstr to 'on' in the example’s existing options structure:
options = optimset(options,'GradObj','on','GradConstr','on');
2-13
2
Tutorial
If you do not set these parameters to 'on' in the options structure, fmincon does not use the analytic gradients. The arguments lb and ub place lower and upper bounds on the independent variables in x. In this example, there are no bound constraints and so they are both set to [].
Step 3: Invoke constrained optimization routine.
x0 = [-1,1]; % Starting guess options = optimset('LargeScale','off'); options = optimset(options,'GradObj','on','GradConstr','on'); lb = [ ]; ub = [ ]; % No upper or lower bounds [x,fval] = fmincon(@objfungrad,x0,[],[],[],[],lb,ub,... @confungrad,options) [c,ceq] = confungrad(x) % Check the constraint values at x
After 20 function evaluations, the solution produced is
x = -9.5474 1.0474 fval = 0.0236 c = 1.0e-14 * 0.1110 -0.1776 ceq = []
Gradient Check: Analytic Versus Numeric
When analytically determined gradients are provided, you can compare the supplied gradients with a set calculated by finite-difference evaluation. This is particularly useful for detecting mistakes in either the objective function or the gradient function formulation. If you want such gradient checks, set the DerivativeCheck parameter to 'on' using optimset:
options = optimset(options,'DerivativeCheck','on');
2-14
Examples that Use Standard Algorithms
The first cycle of the optimization checks the analytically determined gradients (of the objective function and, if they exist, the nonlinear constraints). If they do not match the finite-differencing gradients within a given tolerance, a warning message indicates the discrepancy and gives the option to abort the optimization or to continue.
Equality Constrained Example
For routines that permit equality constraints, nonlinear equality constraints must be computed in the M-file with the nonlinear inequality constraints. For linear equalities, the coefficients of the equalities are passed in through the matrix Aeq and the right-hand-side vector beq. For example, if you have the nonlinear equality constraint x 1 + x 2 = 1 and the nonlinear inequality constraint x 1 x 2 ≥ – 10 , rewrite them as x1 + x2 – 1 = 0 – x 1 x 2 – 10 ≤ 0 and then solve the problem using the following steps.
2 2
Step 1: Write an M-file objfun.m.
function f = objfun(x) f = exp(x(1))*(4*x(1)^2+2*x(2)^2+4*x(1)*x(2)+2*x(2)+1);
Step 2: Write an M-file confuneq.m for the nonlinear constraints.
function [c, ceq] = confuneq(x) % Nonlinear inequality constraints c = -x(1)*x(2) - 10; % Nonlinear equality constraints ceq = x(1)^2 + x(2) - 1;
Step 3: Invoke constrained optimization routine.
x0 = [-1,1]; % Make a starting guess at the solution options = optimset('LargeScale','off'); [x,fval] = fmincon(@objfun,x0,[],[],[],[],[],[],... @confuneq,options) [c,ceq] = confuneq(x) % Check the constraint values at x
2-15
2
Tutorial
After 21 function evaluations, the solution produced is
x = -0.7529 0.4332 fval = 1.5093 c = -9.6739 ceq = 4.0684e-010
Note that ceq is equal to 0 within the default tolerance on the constraints of 1.0e-006 and that c is less than or equal to zero as desired.
Maximization
The optimization functions fminbnd, fminsearch, fminunc, fmincon, fgoalattain, fminimax, lsqcurvefit, and lsqnonlin all perform minimization of the objective function f ( x ) . Maximization is achieved by supplying the routines with – f ( x ) . Similarly, to achieve maximization for quadprog supply -H and -f, and for linprog supply -f.
Greater-Than-Zero Constraints
The Optimization Toolbox assumes that nonlinear inequality constraints are of the form C i ( x ) ≤ 0 . Greater-than-zero constraints are expressed as less-than-zero constraints by multiplying them by -1. For example, a constraint of the form C i ( x ) ≥ 0 is equivalent to the constraint ( – C i ( x ) ) ≤ 0 ; a constraint of the form C i ( x ) ≥ b is equivalent to the constraint ( – C i ( x ) + b ) ≤ 0 .
Additional Arguments: Avoiding Global Variables
You can pass parameters that would otherwise have to be declared as global directly to M-file functions using additional arguments at the end of the calling sequence. For example, entering a number of variables at the end of the call to fsolve
[x,fval] = fsolve(@objfun,x0,options,P1,P2,...)
passes the arguments directly to the function objfun when it is called from inside fsolve:
2-16
Examples that Use Standard Algorithms
F = objfun(x,P1,P2, ... )
Consider, for example, finding zeros of the function ellipj(u,m). The function needs parameter m as well as input u. To look for a zero near u = 3, for m = 0.5,
m = 0.5; options = optimset('Display','off'); % Turn off Display x = fsolve(@ellipj,3,options,m)
returns
x = 3.7081
Then, solve for the function ellipj:
f = ellipj(x,m) f = -2.9925e-008
The call to optimset to get the default options for fsolve implies that default tolerances are used and that analytic gradients are not provided.
Nonlinear Equations with Analytic Jacobian
This example demonstrates the use of the default medium-scale fsolve algorithm. It is intended for problems where ? The system of nonlinear equations is square, i.e., the number of equations equals the number of unknowns. ? There exists a solution x such that F ( x ) = 0 . The example uses fsolve to obtain the minimum of the banana (or Rosenbrock) function by deriving and then solving an equivalent system of nonlinear equations. The Rosenbrock function, which has a minimum at F ( x ) = 0 , is a common test problem in optimization. It has a high degree of nonlinearity and converges extremely slowly if you try to use steepest descent type methods. It is given by f ( x ) = 100 ( x 2 – x 1 ) + ( 1 – x 1 )
2 2 2
First generalize this function to an n-dimensional function, for any positive, even value of n:
2-17
2
Tutorial
n?2
f(x) =
∑ 100 ( x2 i – x2 i – 1 )
2 i=1
2
+ ( 1 – x2 i – 1 )
2
This function is referred to as the generalized Rosenbrock function. It consists of n squared terms involving n unknowns. Before you can use fsolve to find the values of x such that F ( x ) = 0 , i.e., obtain the minimum of the generalized Rosenbrock function, you must rewrite the function as the following equivalent system of nonlinear equations: F ( 1 ) = 1 – x1 F ( 2 ) = 10 ( x 2 – x 1 ) F ( 3 ) = 1 – x3 F ( 4 ) = 10 ( x 4 – x 3 )
2 2
. . .
F ( n – 1 ) = 1 – xn – 1 F ( n ) = 10 ( x n – x n – 1 ) This system is square, and you can use fsolve to solve it. As the example demonstrates, this system has a unique solution given by x i = 1, i = 1, …, n .
2
Step 1: Write an M-file bananaobj.m to compute the objective function values and the Jacobian.
function [F,J] = bananaobj(x); % Evaluate the vector function and the Jacobian matrix for % the system of nonlinear equations derived from the general % n-dimensional Rosenbrock function. % Get the problem size n = length(x); if n == 0, error('Input vector, x, is empty.'); end if mod(n,2) ~= 0, error('Input vector, x, must have an even number of components.'); end
2-18
Examples that Use Standard Algorithms
% Evaluate the vector function odds = 1:2:n; evens = 2:2:n; F = zeros(n,1); F(odds,1) = 1-x(odds); F(evens,1) = 10.*(x(evens)-x(odds).^2); % Evaluate the Jacobian matrix if nargout > 1 if nargout > 1 c = -ones(n/2,1); C = sparse(odds,odds,c,n,n); d = 10*ones(n/2,1); D = sparse(evens,evens,d,n,n); e = -20.*x(odds); E = sparse(evens,odds,e,n,n); J = C + D + E; end
Step 2: Call the solve routine for the system of equations.
n = 64; x0(1:n,1) = -1.9; x0(2:2:n,1) = 2; options=optimset('Display','iter','Jacobian','on'); [x,F,exitflag,output,JAC] = fsolve(@bananaobj,x0,options);
Use the starting point x ( i ) = – 1.9 for the odd indices, and x ( i ) = 2 for the even indices. Accept the fsolve default 'off' for the LargeScale parameter, and the default medium-scale nonlinear equation algorithm 'dogleg'. Then set Jacobian to 'on' to use the Jacobian defined in bananaobj.m . The fsolve function generates the following output:
Iteration Func-count f(x) 0 1 4281.92 1 2 1546.86 2 3 112.552 3 4 106.24 4 5 106.24 5 6 51.3854 6 7 51.3854 7 8 43.8722 8 9 37.0713 9 10 37.0713 10 11 26.2485 Norm of First-order Trust-region step optimality radius 615 1 1 329 1 2.5 34.8 2.5 6.25 34.1 6.25 6.25 34.1 6.25 1.5625 6.39 1.56 3.90625 6.39 3.91 0.976562 2.19 0.977 2.44141 6.27 2.44 2.44141 6.27 2.44 0.610352 1.52 0.61
2-19
2
Tutorial
11 12 20.6649 1.52588 4.63 12 13 17.2558 1.52588 6.97 13 14 8.48582 1.52588 4.69 14 15 4.08398 1.52588 3.77 15 16 1.77589 1.52588 3.56 16 17 0.692381 1.52588 3.31 17 18 0.109777 1.16206 1.66 18 19 0 0.0468565 0 Optimization terminated successfully: First-order optimality is less than options.TolFun
1.53 1.53 1.53 1.53 1.53 1.53 1.53 1.53
Nonlinear Equations with Finite-Difference Jacobian
In the preceding example, the function bananaobj evaluates F and computes the Jacobian J. What if the code to compute the Jacobian is not available? By default, if you do not indicate that the Jacobian can be computed in the objective function (using the Jacobian parameter in options), fsolve, lsqnonlin, and lsqcurvefit instead use finite differencing to approximate the Jacobian. This is the default Jacobian option.You can select finite differencing by setting the Jacobian parameter to 'off' in optimset. This example uses bananaobj from the preceding example as the objective function, but sets the Jacobian parameter to 'off' so that fsolve approximates the Jacobian and ignores the second bananaobjoutput. It accepts the fsolve default 'off' for the LargeScale parameter, and the default nonlinear equation medium-scale algorithm 'dogleg':
n = 64; x0(1:n,1) = -1.9; x0(2:2:n,1) = 2; options=optimset('Display','iter','Jacobian','off'); [x,F,exitflag,output,JAC] = fsolve(@bananaobj,x0,options);
The example produces the following output:
Iteration Func-count f(x) 0 65 4281.92 1 130 1546.86 2 195 112.552 3 260 106.24 4 261 106.24 Norm of First-order Trust-region step optimality radius 615 1 1 329 1 2.5 34.8 2.5 6.25 34.1 6.25 6.25 34.1 6.25
2-20
Examples that Use Standard Algorithms
5 326 51.3854 1.5625 6.39 6 327 51.3854 3.90625 6.39 7 392 43.8722 0.976562 2.19 8 457 37.0713 2.44141 6.27 9 458 37.0713 2.44141 6.27 10 523 26.2485 0.610352 1.52 11 588 20.6649 1.52588 4.63 12 653 17.2558 1.52588 6.97 13 718 8.48582 1.52588 4.69 14 783 4.08398 1.52588 3.77 15 848 1.77589 1.52588 3.56 16 913 0.692381 1.52588 3.31 17 978 0.109777 1.16206 1.66 18 1043 0 0.0468565 0 Optimization terminated successfully: First-order optimality is less than options.TolFun
1.56 3.91 0.977 2.44 2.44 0.61 1.53 1.53 1.53 1.53 1.53 1.53 1.53 1.53
The finite-difference version of this example requires the same number of iterations to converge as the analytic Jacobian version in the preceding example. It is generally the case that both versions converge at about the same rate in terms of iterations. However, the finite-difference version requires many additional function evaluations. The cost of these extra evaluations might or might not be significant, depending on the particular problem.
Multiobjective Examples
The previous examples involved problems with a single objective function. This section demonstrates solving problems with multiobjective functions using lsqnonlin, fminimax, and fgoalattain. Included is an example of how to optimize parameters in a Simulink model.
Simulink Example
Let’s say that you want to optimize the control parameters in the Simulink model optsim.mdl. (This model can be found in the Optimization Toolbox optim directory. Note that Simulink must be installed on your system to load this model.) The model includes a nonlinear process plant modeled as a Simulink block diagram shown in Figure 2-1, Plant with Actuator Saturation.
2-21
2
Tutorial
Actuator Model 1 u Limit Rate
Plant 1.5 50s 3+a2.s 2+a1.s+1 1 y
Figure 2-1: Plant with Actuator Saturation
The plant is an under-damped third-order model with actuator limits. The actuator limits are a saturation limit and a slew rate limit. The actuator saturation limit cuts off input values greater than 2 units or less than -2 units. The slew rate limit of the actuator is 0.8 units/sec. The open-loop response of the system to a step input is shown in Figure 2-2, Closed-Loop Response. You can see this response by opening the model (type optsim at the command line or click the model name), and selecting Start from the Simulation menu. The response plots to the scope.
2-22
Examples that Use Standard Algorithms
Figure 2-2: Closed-Loop Response
The problem is to design a feedback control loop that tracks a unit step input to the system. The closed-loop plant is entered in terms of the blocks where the plant and actuator have been placed in a hierarchical Subsystem block. A Scope block displays output trajectories during the design process. See Figure 2-3, Closed-Loop Model.
Figure 2-3: Closed-Loop Model
2-23
2
Tutorial
One way to solve this problem is to minimize the error between the output and the input signal. The variables are the parameters of the PID controller. If you only need to minimize the error at one time unit, it would be a single objective function. But the goal is to minimize the error for all time steps from 0 to 100, thus producing a multiobjective function (one function for each time step). The routine lsqnonlin is used to perform a least-squares fit on the tracking of the output. This is defined via a MATLAB function in the file tracklsq.m, shown below, that defines the error signal. The error signal is yout, the output computed by calling sim, minus the input signal 1. The function tracklsq must run the simulation. The simulation can be run either in the base workspace or the current workspace, i.e., the workspace of the function calling sim, which in this case is tracklsq’s workspace. In this example, the simset command is used to tell sim to run the simulation in the current workspace by setting 'SrcWorkspace' to 'Current'. To run the simulation in optsim, the variables Kp, Ki, Kd, a1, and a2 (a1 and a2 are variables in the Plant block) must all be defined. Kp, Ki, and Kd are the variables to be optimized. You can initialize a1 and a2 before calling lsqnonlin and then pass these two variables as additional arguments. lsqnonlin then passes a1 and a2 to tracklsq each time it is called, so you do not have to use global variables. After choosing a solver using the simset function, the simulation is run using sim. The simulation is performed using a fixed-step fifth-order method to 100 seconds. When the simulation completes, the variables tout, xout, and yout are now in the current workspace (that is, the workspace of tracklsq). The Outport block is used in the block diagram model to put yout into the current workspace at the end of the simulation.
Step 1: Write an M-file tracklsq.m.
function F = tracklsq(pid,a1,a2) Kp = pid(1); % Move variables into model parameter names Ki = pid(2); Kd = pid(3); % Choose solver and set model workspace to this function opt = simset('solver','ode5','SrcWorkspace','Current'); [tout,xout,yout] = sim('optsim',[0 100],opt); F = yout-1; % Compute error signal
2-24
Examples that Use Standard Algorithms
Step 2: Invoke optimization routine.
optsim % Load the model pid0 = [0.63 0.0504 1.9688] % Set initial values a1 = 3; a2 = 43; % Initialize plant variables in model options = optimset('LargeScale','off','Display','iter',... 'TolX',0.001,'TolFun',0.001); pid = lsqnonlin(@tracklsq, pid0, [], [], options, a1, a2) % Put variables back in the base workspace Kp = pid(1); Ki = pid(2); Kd = pid(3);
The variable options passed to lsqnonlin defines the criteria and display characteristics. In this case you ask for output, use the medium-scale algorithm, and give termination tolerances for the step and objective function on the order of 0.001. The optimization gives the solution for the proportional, integral, and derivative (Kp, Ki, Kd) gains of the controller after 64 function evaluations:
Iteration Func-count Residual Step-size 1 3 8.66531 1 2 17 5.21602 85.4 3 24 4.54036 1 4 31 4.47786 0.918 5 39 4.47552 2.12 6 46 4.47524 0.203 7 64 4.47524 -4.11e-007 Optimization terminated successfully: Search direction less than tolX pid = 2.9186 0.1398 12.6221 Directional derivative Lambda -3.48 -0.00813 0.0403059 -0.0331 0.393189 -0.00467 0.201985 0.00121 0.100992 -0.00193 0.0718569 -0.00157 2595.3
The resulting closed-loop step response is shown in Figure 2-4.
2-25
2
Tutorial
Figure 2-4: Closed-Loop Response Using lsqnonlin
Note The call to sim results in a call to one of the Simulink ordinary differential equation (ODE) solvers. A choice must be made about the type of solver to use. From the optimization point of view, a fixed-step solver is the best choice if that is sufficient to solve the ODE. However, in the case of a stiff system, a variable-step method might be required to solve the ODE. The numerical solution produced by a variable-step solver, however, is not a smooth function of parameters, because of step-size control mechanisms. This lack of smoothness can prevent the optimization routine from converging. The lack of smoothness is not introduced when a fixed-step solver is used. (For a further explanation, see [1].)
2-26
Examples that Use Standard Algorithms
The Nonlinear Control Design Blockset is recommended for solving multiobjective optimization problems in conjunction with variable-step solvers in Simulink. It provides a special numeric gradient computation that works with Simulink and avoids introducing a problem of lack of smoothness.
Another solution approach is to use the fminimax function. In this case, rather than minimizing the error between the output and the input signal, you minimize the maximum value of the output at any time t between 0 and 100. Then in the function trackmmobj the objective function is simply the output yout returned by the sim command. But minimizing the maximum output at all time steps may force the output far below unity for some time steps. To keep the output above 0.95 after the first 20 seconds, in the constraint function trackkmmcon add a constraint yout >= 0.95 from t=20 to t=100. Because constraints must be in the form g <= 0, the constraint in the function is g = -yout(20:100)+.95. Both trackmmobj and trackmmcon use the result yout from sim, calculated from the current pid values. The nonlinear constraint function is always called immediately after the objective function in fmincon, fminimax, fgoalattain, and fseminf with the same values. Thus you can avoid calling the simulation twice by using assignin to assign the current value of F to the variable F_TRACKMMOBJ in the base workspace. Then the first step in trackmmcon is to use evalin to evaluate the variable F_TRACKMMOBJ in the base workspace, and assign the result to F locally in trackmmcon.
Step 1: Write an M-file trackmmobj.m to compute objective function.
function F = trackmmobj(pid,a1,a2) Kp = pid(1); Ki = pid(2); Kd = pid(3); % Compute function value opt = simset('solver','ode5','SrcWorkspace','Current'); [tout,xout,yout] = sim('optsim',[0 100],opt); F = yout; assignin('base','F_TRACKMMOBJ',F);
Step 2: Write an M-file trackmmcon.m to compute nonlinear constraints.
function [c,ceq] = trackmmcon(pid,a1,a2) F = evalin('base','F_TRACKMMOBJ');
2-27
2
Tutorial
% Compute constraints c = -F(20:100)+.95; ceq = [ ];
Note that fminimax passes a1 and a2 to the objective and constraint values, so trackmmcon needs input arguments for these variables even though it does not use them.
Step 3: Invoke constrained optimization routine.
optsim pid0 = [0.63 0.0504 1.9688] a1 = 3; a2 = 43; options = optimset('Display','iter',... 'TolX',0.001,'TolFun',0.001); pid = fminimax(@trackmmobj,pid0,[],[],[],[],[],[],... 'trackmmcon',options,a1,a2) % Put variables back in the base workspace Kp = pid(1); Ki = pid(2); Kd = pid(3);
resulting in
Max Directional Iter F-count {F,constraints} Step-size derivative Procedure 1 11 1.264 1 1.18 2 17 1.055 1 -0.172 3 23 1.004 1 -0.0128 Hessian modified twice 4 29 0.9997 1 3.48e-005 Hessian modified 5 35 0.9996 1 -1.36e-006 Hessian modified twice Optimization terminated successfully: Search direction less than 2*options.TolX and maximum constraint violation is less than options.TolCon Active Constraints: 1 14 182 pid = 0.5894
0.0605
5.5295
The last value shown in the MAX{F,constraints} column of the output shows that the maximum value for all the time steps is 0.9996. The closed loop response with this result is shown in Figure 2-5, Closed-Loop Response Using fminimax.
2-28
Examples that Use Standard Algorithms
This solution differs from the lsqnonlin solution, because you are solving different problem formulations.
Figure 2-5: Closed-Loop Response Using fminimax
Signal Processing Example
Consider designing a linear-phase Finite Impulse Response (FIR) filter. The problem is to design a lowpass filter with magnitude one at all frequencies between 0 and 0.1 Hz and magnitude zero between 0.15 and 0.5 Hz. The frequency response H(f) for such a filter is defined by
2-29
2
Tutorial
2M
H(f ) =
∑
n=0 M–1
h ( n ) e – j 2 π fn
= A ( f ) e –j 2 π fM A(f ) =
∑
n=0
a ( n ) cos ( 2 π fn )
(2-6)
where A(f) is the magnitude of the frequency response. One solution is to apply a goal attainment method to the magnitude of the frequency response. Given a function that computes the magnitude, the function fgoalattain will attempt to vary the magnitude coefficients a(n) until the magnitude response matches the desired response within some tolerance. The function that computes the magnitude response is given in filtmin.m. This function takes a, the magnitude function coefficients, and w, the discretization of the frequency domain we are interested in. To set up a goal attainment problem, you must specify the goal and weights for the problem. For frequencies between 0 and 0.1, the goal is one. For frequencies between 0.15 and 0.5, the goal is zero. Frequencies between 0.1 and 0.15 are not specified, so no goals or weights are needed in this range. This information is stored in the variable goal passed to fgoalattain. The length of goal is the same as the length returned by the function filtmin. So that the goals are equally satisfied, usually weight would be set to abs(goal). However, since some of the goals are zero, the effect of using weight=abs(goal) will force the objectives with weight 0 to be satisfied as hard constraints, and the objectives with weight 1 possibly to be underattained (see “Goal Attainment Method” on page 3-44). Because all the goals are close in magnitude, using a weight of unity for all goals will give them equal priority. (Using abs(goal) for the weights is more important when the magnitude of goal differs more significantly.) Also, setting
options = optimset('GoalsExactAchieve',length(goal));
specifies that each objective should be as near as possible to its goal value (neither greater nor less than).
2-30
Examples that Use Standard Algorithms
Step 1: Write an M-file filtmin.m.
function y = filtmin(a,w) n = length(a); y = cos(w'*(0:n-1)*2*pi)*a ;
Step 2: Invoke optimization routine.
% Plot with initial coefficients a0 = ones(15,1); incr = 50; w = linspace(0,0.5,incr); y0 = filtmin(a0,w); clf, plot(w,y0.'-:'); drawnow; % Set up the goal attainment problem w1 = linspace(0,0.1,incr) ; w2 = linspace(0.15,0.5,incr); w0 = [w1 w2]; goal = [1.0*ones(1,length(w1)) zeros(1,length(w2))]; weight = ones(size(goal)); % Call fgoalattain options = optimset('GoalsExactAchieve',length(goal)); [a,fval,attainfactor,exitflag] = fgoalattain(@filtmin,... a0,goal,weight,[],[],[],[],[],[],[],options,w0); % Plot with the optimized (final) coefficients y = filtmin(a,w); hold on, plot(w,y,'r') axis([0 0.5 -3 3]) xlabel('Frequency (Hz)') ylabel('Magnitude Response (dB)') legend('initial', 'final') grid on
Compare the magnitude response computed with the initial coefficients and the final coefficients (Figure 2-6). Note that you could use the remez function in the Signal Processing Toolbox to design this filter.
2-31
2
Tutorial
3 initial final 2
Magnitude Response (dB)
1
0
?1
?2
?3 0
0.05
0.1
0.15
0.2 0.25 0.3 Frequency (Hz)
0.35
0.4
0.45
0.5
Figure 2-6: Magnitude Response with Initial and Final Magnitude Coefficients
2-32
Large-Scale Examples
Large-Scale Examples
Some of the optimization functions include algorithms for continuous optimization problems especially targeted to large problems with sparsity or structure. The main large-scale algorithms are iterative, i.e., a sequence of approximate solutions is generated. In each iteration a linear system is (approximately) solved. The linear systems are solved using the sparse matrix capabilities of MATLAB and a variety of sparse linear solution techniques, both iterative and direct. Generally speaking, the large-scale optimization methods preserve structure and sparsity, using exact derivative information wherever possible. To solve the large-scale problems efficiently, some problem formulations are restricted (such as only solving overdetermined linear or nonlinear systems), or require additional information (e.g., the nonlinear minimization algorithm requires that the gradient be computed in the user-supplied function). This section summarizes the kinds of problems covered by large-scale methods and provides these examples: ? Nonlinear Equations with Jacobian ? Nonlinear Equations with Jacobian Sparsity Pattern ? Nonlinear Least-Squares with Full Jacobian Sparsity Pattern ? Nonlinear Minimization with Gradient and Hessian ? Nonlinear Minimization with Gradient and Hessian Sparsity Pattern ? Nonlinear Minimization with Bound Constraints and Banded Preconditioner ? Nonlinear Minimization with Equality Constraints ? Nonlinear Minimization with a Dense but Structured Hessian and Equality Constraints ? Quadratic Minimization with Bound Constraints ? Quadratic Minimization with a Dense but Structured Hessian ? Linear Least-Squares with Bound Constraints ? Linear Programming with Equalities and Inequalities ? Linear Programming with Dense Columns in the Equalities
2-33
2
Tutorial
Problems Covered by Large-Scale Methods
Not all possible problem formulations are covered by the large-scale algorithms. The following table describes what functionality is covered by the large-scale methods. For example, for fmincon, the large-scale algorithm covers the case where there are only bound constraints or only linear equalities. For each problem formulation, the third column describes what additional information is needed for the large-scale algorithms. For fminunc and fmincon, the gradient must be computed along with the objective in the user-supplied function (the gradient is not required for the medium-scale algorithms). Since these methods can also be used on small- to medium-scale problems that are not necessarily sparse, the last column of the table emphasizes what conditions are needed for large-scale problems to run efficiently without exceeding your computer system’s memory capabilities, e.g., the linear constraint matrices should be sparse. For smaller problems the conditions in the last column are unnecessary.
Note The following table lists the functions in order of increasing problem complexity.
Several examples, which follow this table, clarify the contents of the table.
Table 2-4: Large-Scale Problem Coverage and Requirements Function Problem Formulations Additional Information Needed For Large Problems
fminunc
min f ( x ) x
Must provide gradient for f(x) in fun.
? Provide sparsity structure of the Hessian, or compute the Hessian in fun. ? The Hessian should be sparse.
2-34
Large-Scale Examples
Table 2-4: Large-Scale Problem Coverage and Requirements (Continued) Function Problem Formulations Additional Information Needed For Large Problems
fmincon
? min f ( x ) x such that l ≤ x ≤ u where l<u ? min f ( x ) x such that Aeq ? x = beq , and Aeq is an m-by-n matrix where m ≤ n . 2 2 1 1 - F(x) 2 = -Fi ( x) ? min -2 x 2 i 2 2 1 -- F(x) 2 = 1 Fi ( x) ? min -2 2 x i such that l ≤ x ≤ u where l<u
Must provide gradient for f(x) in fun.
? Provide sparsity structure of the Hessian or compute the Hessian in fun. ? The Hessian should be sparse. ? Aeq should be sparse.
lsqnonlin
∑ ∑
None
? Provide sparsity structure of the Jacobian or compute the Jacobian in fun. ? The Jacobian should be sparse.
lsqcurvefit
F(x) must be overdetermined (have at least as many equations as variables). 1 - F(x, xdata) – ydata ? min -x 2 1 - F(x, xdata) – ydata ? min -x 2 such that l ≤ x ≤ u where l<u F(x, xdata) must be overdetermined (have at least as many equations as variables).
2 2 2 2
None
? Provide sparsity structure of the Jacobian or compute the Jacobian in fun. ? The Jacobian should be sparse.
2-35
2
Tutorial
Table 2-4: Large-Scale Problem Coverage and Requirements (Continued) Function Problem Formulations Additional Information Needed For Large Problems
fsolve
F(x) = 0 F ( x ) must have the same number of equations as variables.
2
None
? Provide sparsity structure of the Jacobian or compute the Jacobian in fun. ? The Jacobian should be sparse.
lsqlin
min C ? x – d 2 x such that l ≤ x ≤ u where l < u C is an m-by-n matrix where m ≥ n , i.e., the problem must be overdetermined.
None
C should be sparse.
linprog
quadprog
min f x x such that A ? x ≤ b and Aeq ? x = beq , where l ≤ x ≤ u 1 T T - x Hx + f x ? min -2 x such that l ≤ x ≤ u where l<u T 1 T - x Hx + f x ? min -2 x such that Aeq ? x = beq , and Aeq is an m-by-n matrix where m ≤ n .
T
None
A and Aeq should be sparse.
None
? H should be sparse. ? Aeq should be sparse.
In the following examples, many of the M-file functions are available in the Optimization Toolbox optim directory. Most of these do not have a fixed problem size, i.e., the size of your starting point xstart determines the size problem that is computed. If your computer system cannot handle the size suggested in the examples below, use a smaller-dimension start point to run
2-36
Large-Scale Examples
the problems. If the problems have upper or lower bounds or equalities, you must adjust the size of those vectors or matrices as well.
Nonlinear Equations with Jacobian
Consider the problem of finding a solution to a system of nonlinear equations whose Jacobian is sparse. The dimension of the problem in this example is 1000. The goal is to find x such that F(x) = 0. Assuming n = 1000, the nonlinear equations are F ( 1 ) = 3 x 1 – 2 x 1 – 2 x2 + 1 F ( i ) = 3 x i – 2 x i – xi – 1 – 2 x i + 1 + 1 F ( n ) = 3 xn – 2 xn – xn – 1 + 1 To solve a large nonlinear system of equations, F(x) = 0, use the large-scale method available in fsolve.
2 2 2
Step 1: Write an M-file nlsf1.m that computes the objective function values and the Jacobian.
function [F,J] = nlsf1(x); % Evaluate the vector function n = length(x); F = zeros(n,1); i = 2:(n-1); F(i) = (3-2*x(i)).*x(i)-x(i-1)-2*x(i+1)1+ 1; F(n) = (3-2*x(n)).*x(n)-x(n-1) + 1; F(1) = (3-2*x(1)).*x(1)-2*x(2) + 1; % Evaluate the Jacobian if nargout > 1 if nargout > 1 d = -4*x + 3*ones(n,1); D = sparse(1:n,1:n,d,n,n); c = -2*ones(n-1,1); C = sparse(1:n-1,2:n,c,n,n); e = -ones(n-1,1); E = sparse(2:n,1:n-1,e,n,n); J = C + D + E; end
Step 2: Call the solve routine for the system of equations.
xstart = -ones(1000,1); fun = @nlsf1;
2-37
2
Tutorial
options = optimset('Display','iter','LargeScale','on','Jacobian','on'); [x,fval,exitflag,output] = fsolve(fun,xstart,options);
A starting point is given as well as the function name. The default method for fsolve is medium-scale, so it is necessary to specify 'LargeScale' as 'on' in the options argument. Setting the Display option to 'iter' causes fsolve to display the output at each iteration. Setting the Jacobian parameter 'on', causes fsolve to use the Jacobian information available in nlsf1.m. The commands display this output:
Norm of First-order CGIteration Func-count f(x) step optimality Iterations 1 2 1011 1 19 0 2 3 16.1942 7.91898 2.35 3 3 4 0.0228027 1.33142 0.291 3 4 5 0.000103359 0.0433329 0.0201 4 5 6 7.3792e-007 0.0022606 0.000946 4 6 7 4.02299e-010 0.000268381 4.12e-005 5 Optimization terminated successfully: Relative function value changing by less than OPTIONS.TolFun
A linear system is (approximately) solved in each major iteration using the preconditioned conjugate gradient method. The default value for PrecondBandWidth is 0 in options, so a diagonal preconditioner is used. (PrecondBandWidth specifies the bandwidth of the preconditioning matrix. A bandwidth of 0 means there is only one diagonal in the matrix.) From the first-order optimality values, fast linear convergence occurs. The number of conjugate gradient (CG) iterations required per major iteration is low, at most five for a problem of 1000 dimensions, implying that the linear systems are not very difficult to solve in this case (though more work is required as convergence progresses). It is possible to override the default choice of preconditioner (diagonal) by choosing a banded preconditioner through the use of the parameter PrecondBandWidth. If you want to use a tridiagonal preconditioner, i.e., a preconditioning matrix with three diagonals (or bandwidth of one), set PrecondBandWidth to the value 1:
options = optimset('Display','iter','Jacobian','on',... 'LargeScale','on','PrecondBandWidth',1);
2-38
Large-Scale Examples
[x,fval,exitflag,output] = fsolve(fun,xstart,options);
In this case the output is
Norm of First-order CGIteration Func-count f(x) step optimality Iterations 1 2 1011 1 19 0 2 3 16.0839 7.92496 1.92 1 3 4 0.0458181 1.3279 0.579 1 4 5 0.000101184 0.0631898 0.0203 2 5 6 3.16615e-007 0.00273698 0.00079 2 6 7 9.72481e-010 0.00018111 5.82e-005 2 Optimization terminated successfully: Relative function value changing by less than OPTIONS.TolFun
Note that although the same number of iterations takes place, the number of PCG iterations has dropped, so less work is being done per iteration. See “Preconditioned Conjugate Gradients” on page 4-5.
Nonlinear Equations with Jacobian Sparsity Pattern
In the preceding example, the function nlsf1 computes the Jacobian J, a sparse matrix, along with the evaluation of F. What if the code to compute the Jacobian is not available? By default, if you do not indicate that the Jacobian can be computed in nlsf1 (using the Jacobian parameter in options), fsolve, lsqnonlin, and lsqcurvefit instead uses finite differencing to approximate the Jacobian. In order for this finite differencing to be as efficient as possible, you should supply the sparsity pattern of the Jacobian, using the JacobPattern parameter in options. That is, supply a sparse matrix Jstr whose nonzero entries correspond to nonzeros of the Jacobian for all x. Indeed, the nonzeros of Jstr can correspond to a superset of the nonzero locations of J; however, in general the computational cost of the sparse finite-difference procedure will increase with the number of nonzeros of Jstr. Providing the sparsity pattern can drastically reduce the time needed to compute the finite differencing on large problems. If the sparsity pattern is not provided (and the Jacobian is not computed in the objective function either) then, in this problem nlsfs1, the finite-differencing code attempts to compute all 1000-by-1000 entries in the Jacobian. But in this case there are only 2998 nonzeros, substantially less than the 1,000,000 possible nonzeros the
2-39
2
Tutorial
finite-differencing code attempts to compute. In other words, this problem is solvable if you provide the sparsity pattern. If not, most computers run out of memory when the full dense finite-differencing is attempted. On most small problems, it is not essential to provide the sparsity structure. Suppose the sparse matrix Jstr, computed previously, has been saved in file nlsdat1.mat. The following driver calls fsolve applied to nlsf1a, which is the same as nlsf1 except that only the function values are returned; sparse finite-differencing is used to estimate the sparse Jacobian matrix as needed.
Step 1: Write an M-file nlsf1a.m that computes the objective function values.
function F = nlsf1a(x); % Evaluate the vector function n = length(x); F = zeros(n,1); i = 2:(n-1); F(i) = (3-2*x(i)).*x(i)-x(i-1)-2*x(i+1) + 1; F(n) = (3-2*x(n)).*x(n)-x(n-1) + 1; F(1) = (3-2*x(1)).*x(1)-2*x(2) + 1;
Step 2: Call the system of equations solve routine.
xstart = -ones(1000,1); fun = @nlsf1a; load nlsdat1 % Get Jstr options = optimset('Display','iter','JacobPattern',Jstr,... 'LargeScale','on','PrecondBandWidth',1); [x,fval,exitflag,output] = fsolve(fun,xstart,options);
In this case, the output displayed is
Norm of First-order CGIteration Func-count f(x) step optimality Iterations 1 6 1011 1 19 0 2 11 16.0839 7.92496 1.92 1 3 16 0.0458181 1.3279 0.579 1 4 21 0.000101184 0.0631898 0.0203 2 5 26 3.16615e-007 0.00273698 0.00079 2 6 31 9.72482e-010 0.00018111 5.82e-005 2 Optimization terminated successfully:
2-40
Large-Scale Examples
Relative function value changing by less than OPTIONS.TolFun
Alternatively, it is possible to choose a sparse direct linear solver (i.e., a sparse QR factorization) by indicating a “complete” preconditioner. I.e., if you set PrecondBandWidth to Inf, then a sparse direct linear solver is used instead of a preconditioned conjugate gradient iteration:
xstart = -ones(1000,1); fun = @nlsf1a; load nlsdat1 % Get Jstr options = optimset('Display','iter','JacobPattern',Jstr,... 'LargeScale','on','PrecondBandWidth',inf); [x,fval,exitflag,output] = fsolve(fun,xstart,options);
and the resulting display is
Norm of First-order CGIteration Func-count f(x) step optimality Iterations 1 6 1011 1 19 0 2 11 15.9018 7.92421 1.89 1 3 16 0.0128163 1.32542 0.0746 1 4 21 1.73538e-008 0.0397925 0.000196 1 5 26 1.13169e-018 4.55544e-005 2.76e-009 1 Optimization terminated successfully: Relative function value changing by less than OPTIONS.TolFun
When the sparse direct solvers are used, the CG iteration is 1 for that (major) iteration, as shown in the output under CG-Iterations. Notice that the final optimality and f(x) value (which for fsolve, f(x), is the sum of the squares of the function values) are closer to zero than using the PCG method, which is often the case.
Nonlinear Least-Squares with Full Jacobian Sparsity Pattern
The large-scale methods in lsqnonlin, lsqcurvefit, and fsolve can be used with small- to medium-scale problems without computing the Jacobian in fun or providing the Jacobian sparsity pattern. (This example also applies to the case of using fmincon or fminunc without computing the Hessian or supplying the Hessian sparsity pattern.) How small is small- to medium-scale? No absolute answer is available, as it depends on the amount of virtual memory available in your computer system configuration.
2-41
2
Tutorial
Suppose your problem has m equations and n unknowns. If the command J = sparse(ones(m,n)) causes an Out of memory error on your machine, then this is certainly too large a problem. If it does not result in an error, the problem might still be too large, but you can only find out by running it and seeing if MATLAB is able to run within the amount of virtual memory available on your system. Let’s say you have a small problem with 10 equations and 2 unknowns, such as finding x that minimizes
10
∑ (2 + 2k – e
k=1
kx 1
–e
kx 2 2
)
starting at the point x = [0.3, 0.4]. Because lsqnonlin assumes that the sum of squares is not explicitly formed in the user function, the function passed to lsqnonlin should instead compute the vector valued function Fk ( x ) = 2 + 2 k – e
kx 1
–e
kx 2
for k = 1 to 10 (that is, F should have k components).
Step 1: Write an M-file myfun.m that computes the objective function values.
function F = myfun(x) k = 1:10; F = 2 + 2*k-exp(k*x(1))-exp(k*x(2));
Step 2: Call the nonlinear least-squares routine.
x0 = [0.3 0.4] % Starting guess [x,resnorm] = lsqnonlin(@myfun,x0) % Invoke optimizer
Because the Jacobian is not computed in myfun.m , and no Jacobian sparsity pattern is provided using the JacobPattern parameter in options, lsqnonlin calls the large-scale method with JacobPattern set to Jstr = sparse(ones(10,2)). This is the default for lsqnonlin. Note that the Jacobian parameter in options is 'off' by default.
2-42
Large-Scale Examples
When the finite-differencing routine is called the first time, it detects that Jstr is actually a dense matrix, i.e., that no speed benefit is derived from storing it as a sparse matrix. From then on the finite-differencing routine uses Jstr = ones(10,2) (a full matrix) for the optimization computations. After about 24 function evaluations, this example gives the solution
x = 0.2578 0.2578 resnorm % Residual or sum of squares resnorm = 124.3622
Most computer systems can handle much larger full problems, say into the 100’s of equations and variables. But if there is some sparsity structure in the Jacobian (or Hessian) that can be taken advantage of, the large-scale methods will always run faster if this information is provided.
Nonlinear Minimization with Gradient and Hessian
This example involves solving a nonlinear minimization problem with a tridiagonal Hessian matrix H(x) first computed explicitly, and then by providing the Hessian’s sparsity structure for the finite-differencing routine. The problem is to find x to minimize
n–1
f( x) =
∑
i=1
2 ( xi + 1 + 1 ) ( xi )
2
+
( xi + 1 ) 2 ( xi + 1 )
2
(2-7)
where n = 1000.
Step 1: Write an M-file brownfgh.m that computes the objective function, the gradient of the objective, and the sparse tridiagonal Hessian matrix.
This file is rather long and is not included here. You can view the code with the command
type brownfgh
Because brownfgh computes the gradient and Hessian values as well as the objective function, you need to use optimset to indicate that this information is available in brownfgh, using the GradObj and Hessian parameters.
2-43
2
Tutorial
Step 2: Call a nonlinear minimization routine with a starting point xstart.
n = 1000; xstart = -ones(n,1); xstart(2:2:n,1) = 1; options = optimset('GradObj','on','Hessian','on'); [x,fval,exitflag,output] = fminunc(@brownfgh,xstart,options);
This 1000 variable problem is solved in 8 iterations and 7 conjugate gradient iterations with a positive exitflag indicating convergence. The final function value and measure of optimality at the solution x are both close to zero. For fminunc, the first order optimality is the infinity norm of the gradient of the function, which is zero at a local minimum:
exitflag = 1 fval = 2.8709e-017 output.iterations ans = 8 output.cgiterations ans = 7 output.firstorderopt ans = 4.7948e-010
Nonlinear Minimization with Gradient and Hessian Sparsity Pattern
Next, solve the same problem but the Hessian matrix is now approximated by sparse finite differences instead of explicit computation. To use the large-scale method in fminunc, you must compute the gradient in fun; it is not optional as in the medium-scale method. The M-file function brownfg computes the objective function and gradient.
Step 1: Write an M-file brownfg.m that computes the objective function and the gradient of the objective.
function [f,g] = brownfg(x,dummy)
2-44
Large-Scale Examples
% BROWNFG Nonlinear minimization test problem % % Evaluate the function n=length(x); y=zeros(n,1); i=1:(n-1); y(i)=(x(i).^2).^(x(i+1).^2+1) + ... (x(i+1).^2).^(x(i).^2+1); f=sum(y); % Evaluate the gradient if nargout > 1 if nargout > 1 i=1:(n-1); g = zeros(n,1); g(i) = 2*(x(i+1).^2+1).*x(i).* ... ((x(i).^2).^(x(i+1).^2))+ ... 2*x(i).*((x(i+1).^2).^(x(i).^2+1)).* ... log(x(i+1).^2); g(i+1) = g(i+1) + ... 2*x(i+1).*((x(i).^2).^(x(i+1).^2+1)).* ... log(x(i).^2) + ... 2*(x(i).^2+1).*x(i+1).* ... ((x(i+1).^2).^(x(i).^2)); end
To allow efficient computation of the sparse finite-difference approximation of the Hessian matrix H(x), the sparsity structure of H must be predetermined. In this case assume this structure, Hstr, a sparse matrix, is available in file brownhstr.mat. Using the spy command you can see that Hstr is indeed sparse (only 2998 nonzeros). Use optimset to set the HessPattern parameter to Hstr. When a problem as large as this has obvious sparsity structure, not setting the HessPattern parameter requires a huge amount of unnecessary memory and computation because fminunc attempts to use finite differencing on a full Hessian matrix of one million nonzero entries. You must also set the GradObj parameter to 'on' using optimset, since the gradient is computed in brownfg.m. Then execute fminunc as shown in Step 2.
Step 2: Call a nonlinear minimization routine with a starting point xstart.
fun = @brownfg; load brownhstr spy(Hstr) n = 1000; % Get Hstr, structure of the Hessian % View the sparsity structure of Hstr
2-45
2
Tutorial
xstart = -ones(n,1); xstart(2:2:n,1) = 1; options = optimset('GradObj','on','HessPattern',Hstr); [x,fval,exitflag,output] = fminunc(fun,xstart,options);
This 1000-variable problem is solved in eight iterations and seven conjugate gradient iterations with a positive exitflag indicating convergence. The final function value and measure of optimality at the solution x are both close to zero (for fminunc, the first-order optimality is the infinity norm of the gradient of the function, which is zero at a local minimum):
exitflag = 1 fval = 7.4738e-017 output.iterations ans = 8 output.cgiterations ans = 7 output.firstorderopt ans = 7.9822e-010
Nonlinear Minimization with Bound Constraints and Banded Preconditioner
The goal in this problem is to minimize the nonlinear function
n n -2
f(x) = 1 +
∑
i=1
( 3 – 2 xi ) x i – x i – 1 – x i + 1 + 1 +
p
∑
i=1
xi + x i + n ? 2
p
such that – 10.0 ≤ x i ≤ 10.0 , where n is 800 (n should be a multiple of 4), p = 7 ? 3 , and x 0 = x n + 1 = 0 .
2-46
Large-Scale Examples
Step 1: Write an M-file tbroyfg.m that computes the objective function and the gradient of the objective
The M-file function tbroyfg.m computes the function value and gradient. This file is long and is not included here. You can see the code for this function using the command
type tbroyfg
The sparsity pattern of the Hessian matrix has been predetermined and stored in the file tbroyhstr.mat. The sparsity structure for the Hessian of this problem is banded, as you can see in the followint spy plot.
load tbroyhstr spy(Hstr)
0
100
200
300
400
500
600
700
800
0
100
200
300
400 nz = 4794
500
600
700
800
In this plot, the center stripe is itself a five-banded matrix. The following plot shows the matrix more clearly:
spy(Hstr(1:20,1:20))
2-47
2
Tutorial
0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 nz = 94 14 16 18 20
Use optimset to set the HessPattern parameter to Hstr. When a problem as large as this has obvious sparsity structure, not setting the HessPattern parameter requires a huge amount of unnecessary memory and computation. This is because fmincon attempts to use finite differencing on a full Hessian matrix of 640,000 nonzero entries. You must also set the GradObj parameter to 'on' using optimset, since the gradient is computed in tbroyfg.m. Then execute fmincon as shown in Step 2.
Step 2: Call a nonlinear minimization routine with a starting point xstart.
fun = @tbroyfg; load tbroyhstr % Get Hstr, structure of the Hessian n = 800; xstart = -ones(n,1); xstart(2:2:n) = 1; lb = -10*ones(n,1); ub = -lb; options = optimset('GradObj','on','HessPattern',Hstr); [x,fval,exitflag,output] = ... fmincon(fun,xstart,[],[],[],[],lb,ub,[],options);
After eight iterations, the exitflag, fval, and output values are
2-48
Large-Scale Examples
exitflag = 1 fval = 270.4790 output = iterations: funcCount: cgiterations: firstorderopt: algorithm:
8 8 18 0.0163 'large-scale: trust-region reflective Newton'
For bound constrained problems, the first-order optimality is the infinity norm of v.*g, where v is defined as in “Box Constraints” on page 4-7, and g is the gradient. Because of the five-banded center stripe, you can improve the solution by using a five-banded preconditioner instead of the default diagonal preconditioner. Using the optimset function, reset the PrecondBandWidth parameter to 2 and solve the problem again. (The bandwidth is the number of upper (or lower) diagonals, not counting the main diagonal.)
fun = @tbroyfg; load tbroyhstr % Get Hstr, structure of the Hessian n = 800; xstart = -ones(n,1); xstart(2:2:n,1) = 1; lb = -10*ones(n,1); ub = -lb; options = optimset('GradObj','on','HessPattern',Hstr, ... 'PrecondBandWidth',2); [x,fval,exitflag,output] = ... fmincon(fun,xstart,[],[],[],[],lb,ub,[],options);
The number of iterations actually goes up by two; however the total number of CG iterations drops from 18 to 15. The first-order optimality measure is reduced by a factor of 1e-3:
exitflag = 1 fval = 2.7048e+002 output = iterations: 10 funcCount: 10
2-49
2
Tutorial
cgiterations: 15 firstorderopt: 7.5339e-005 algorithm: 'large-scale: trust-region reflective Newton'
Nonlinear Minimization with Equality Constraints
The large-scale method for fmincon can handle equality constraints if no other constraints exist. Suppose you want to minimize the same objective as in Eq. 2-7, which is coded in the function brownfgh.m, where n = 1000, such that Aeq ? x = beq for Aeq that has 100 equations (so Aeq is a 100-by-1000 matrix).
Step 1: Write an M-file brownfgh.m that computes the objective function, the gradient of the objective, and the sparse tridiagonal Hessian matrix.
As before, this file is rather long and is not included here. You can view the code with the command
type brownfgh
Because brownfgh computes the gradient and Hessian values as well as the objective function, you need to use optimset to indicate that this information is available in brownfgh, using the GradObj and Hessian parameters. The sparse matrix Aeq and vector beq are available in the file browneq.mat:
load browneq
The linear constraint system is 100-by-1000, has unstructured sparsity (use spy(Aeq) to view the sparsity structure), and is not too badly ill-conditioned:
condest(Aeq*Aeq') ans = 2.9310e+006
Step 2: Call a nonlinear minimization routine with a starting point xstart.
fun = @brownfgh; load browneq % Get Aeq and beq, the linear equalities n = 1000; xstart = -ones(n,1); xstart(2:2:n) = 1; options = optimset('GradObj','on','Hessian','on', ... 'PrecondBandWidth', inf); [x,fval,exitflag,output] = ...
2-50
Large-Scale Examples
fmincon(fun,xstart,[],[],Aeq,beq,[],[],[],options);
Setting the parameter PrecondBandWidth to inf causes a sparse direct solver to be used instead of preconditioned conjugate gradients. The exitflag indicates convergence with the final function value fval after 16 iterations:
exitflag = 1 fval = 205.9313 output = iterations: funcCount: cgiterations: firstorderopt: algorithm:
16 16 14 2.1434e-004 'large-scale: projected trust-region Newton'
The linear equalities are satisfied at x.
norm(Aeq*x-beq) ans = 1.1913e-012
Nonlinear Minimization with a Dense but Structured Hessian and Equality Constraints
The fmincon and fminunc large-scale methods can solve problems where the Hessian is dense but structured. For these problems, fmincon and fminunc do not compute H*Y with the Hessian H directly, as they do for medium-scale problems and for large-scale problems with sparse H, because forming H would be memory-intensive. Instead, you must provide fmincon or fminunc with a function that, given a matrix Y and information about H, computes W = H*Y. In this example, the objective function is nonlinear and linear equalities exist so fmincon is used. The objective function has the structure
T ? 1 T -x VV x f ( x ) = f ( x ) – -2
? where V is a 1000-by-2 matrix. The Hessian of f is dense, but the Hessian of f ? ? is sparse. If the Hessian of f is H , then H , the Hessian of f , is
2-51
2
Tutorial
T ? H = H – VV
To avoid excessive memory usage that could happen by working with H directly, the example provides a Hessian multiply function, hmfleq1. This function, when passed a matrix Y, uses sparse matrices Hinfo, which ? , and V to compute the Hessian matrix product corresponds to H
W = H*Y = (Hinfo - V*V')*Y
? In this example, the Hessian multiply function needs H and V to compute the Hessian matrix product. V is a constant, so V can be passed as an additional parameter to fmincon. Then fmincon passes V as an additional parameter to hmfleq1. ? However, H is not a constant and must be computed at the current x. You can ? ? do this by computing H in the objective function and returning H as Hinfo in the third output argument. By using optimset to set the 'Hessian' options to 'on', fmincon knows to get the Hinfo value from the objective function and pass it to the Hessian multiply function hmfleq1.
Step 1: Write an M-file brownvv.m that computes the objective function, the gradient, and the sparse part of the Hessian.
The example passes brownvv to fmincon as the objective function. The brownvv.m file is long and is not included here. You can view the code with the command
type brownvv
Because brownvv computes the gradient and part of the Hessian as well as the objective function, the example (Step 3) uses optimset to set the GradObj and Hessian parameters to 'on'.
Step 2: Write a function to compute Hessian-matrix products for H given a matrix Y.
Now, define a function hmfleq1 that uses Hinfo, which is computed in brownvv, and V, which the example passes to fmincon as an additional parameter, to compute the Hessian matrix product W where W = H*Y = (Hinfo - V*V')*Y. This function must have the form
W = hmfleq1(Hinfo,Y,p1,p2...)
2-52
Large-Scale Examples
The first argument must be the same as the third argument returned by the objective function brownvv. The second argument to the Hessian multiply function is the matrix Y (of W = H*Y). Because fmincon expects the second argument Y to be used to form the Hessian matrix product, Y is always a matrix with n rows where n is the number of dimensions in the problem. The number of columns in Y can vary. Finally, any additional parameters passed to fmincon are passed to the Hessian multiply function, so hmfleq1 must accept the same additional parameters, e.g., the matrix V:
function W = hmfleq1(Hinfo,Y,V); %HMFLEQ1 Hessian-matrix product function for BROWNVV objective. % W = hmfleq1(Hinfo,Y,V) computes W = (Hinfo-V*V')*Y % where Hinfo is a sparse matrix computed by BROWNVV % and V is a 2 column matrix. W = Hinfo*Y - V*(V'*Y);
Note The function hmfleq1 is available in the Optimization Toolbox as the M-file hmfleq1.m.
Step 3: Call a nonlinear minimization routine with a starting point and linear equality constraints.
Load the problem parameter, V, and the sparse equality constraint matrices, Aeq and beq, from fleq1.mat, which is available in the Optimization Toolbox. Use optimset to set the GradObj and Hessian options to 'on' and to set the HessMult option to a function handle that points to hmfleq1. Call fmincon with objective function brownvv and with V as an additional parameter:
load fleq1 % Get V, Aeq, beq n = 1000; % problem dimension mtxmpy = @hmfleq1; % Function handle to function hmfleq1 xstart = -ones(n,1); xstart(2:2:n,1) = ones(length(2:2:n),1); options = optimset('GradObj','on','Hessian','on',... 'HessMult',mtxmpy,'Display','iter'); [x,fval,exitflag,output] = fmincon(@brownvv,xstart,[],[],... Aeq,beq,[],[],[],... options,V);
2-53
2
Tutorial
Note Type [fval,exitflag,output] = runfleq1 to run the preceding code. This command displays the values for fval, exitflag, and output, as well as the following iterative display.
Because the iterative display was set using optimset, the results displayed are
Norm of First-order Iteration f(x) step optimality CG-iterations 1 1997.07 1 555 0 2 1072.56 6.31716 377 1 3 480.232 8.19554 159 2 4 136.861 10.3015 59.5 2 5 44.3708 9.04697 16.3 2 6 44.3708 100 16.3 2 7 44.3708 25 16.3 0 8 -8.90967 6.25 28.5 0 9 -318.486 12.5 107 1 10 -318.486 12.5 107 1 11 -415.445 3.125 73.9 0 12 -561.688 3.125 47.4 2 13 -785.326 6.25 126 3 14 -785.326 4.30584 126 5 15 -804.414 1.07646 26.9 0 16 -822.399 2.16965 2.8 3 17 -823.173 0.40754 1.34 3 18 -823.241 0.154885 0.555 3 19 -823.246 0.0518407 0.214 5 2 -823.246 0.00977601 0.00724 6 Optimization terminated successfully: Relative function value changing by less than OPTIONS.TolFun
Convergence is rapid for a problem of this size with the PCG iteration cost increasing modestly as the optimization progresses. Feasibility of the equality constraints is maintained at the solution
norm(Aeq*x-beq) = 1.2861e-013
2-54
Large-Scale Examples
Preconditioning
In this example, fmincon cannot use H to compute a preconditioner because H only exists implicitly. Instead of H, fmincon uses Hinfo, the third argument returned by brownvv, to compute a preconditioner. Hinfo is a good choice because it is the same size as H and approximates H to some degree. If Hinfo were not the same size as H, fmincon would compute a preconditioner based on some diagonal scaling matrices determined from the algorithm. Typically, this would not perform as well.
Quadratic Minimization with Bound Constraints
To minimize a large-scale quadratic with upper and lower bounds, you can use the quadprog function. The problem stored in the MAT-file qpbox1.mat is a positive definite quadratic, and the Hessian matrix H is tridiagonal, subject to upper (ub) and lower (lb) bounds.
Step 1: Load the Hessian and define f, lb, ub.
load qpbox1 % Get H lb = zeros(400,1); lb(400) = -inf; ub = 0.9*ones(400,1); ub(400) = inf; f = zeros(400,1); f([1 400]) = -2;
Step 2: Call a quadratic minimization routine with a starting point xstart.
xstart = 0.5*ones(400,1); [x,fval,exitflag,output] = ... quadprog(H,f,[],[],[],[],lb,ub,xstart);
Looking at the resulting values of exitflag and output,
exitflag = 1 output = firstorderopt: iterations: cgiterations: algorithm:
7.8435e-006 20 1809 'large-scale: reflective trust-region'
you can see that while convergence occurred in 20 iterations, the high number of CG iterations indicates that the cost of the linear system solve is high. In
2-55
2
Tutorial
light of this cost, one strategy would be to limit the number of CG iterations per optimization iteration. The default number is the dimension of the problem divided by two, 200 for this problem. Suppose you limit it to 50 using the MaxPCGIter flag in options:
options = optimset('MaxPCGIter',50); [x,fval,exitflag,output] = ... quadprog(H,f,[],[],[],[],lb,ub,xstart,options);
This time convergence still occurs and the total number of CG iterations (1547) has dropped:
exitflag = 1 output = firstorderopt: iterations: cgiterations: algorithm:
2.3821e-005 36 1547 'large-scale: reflective trust-region'
A second strategy would be to use a direct solver at each iteration by setting the PrecondBandWidth parameter to inf:
options = optimset('PrecondBandWidth',inf); [x,fval,exitflag,output] = ... quadprog(H,f,[],[],[],[],lb,ub,xstart,options);
Now the number of iterations has dropped to 10:
exitflag = 1 output = firstorderopt: iterations: cgiterations: algorithm:
4.8955e-007 10 9 'large-scale: reflective trust-region'
Using a direct solve at each iteration usually causes the number of iterations to decrease, but often takes more time per iteration. For this problem, the tradeoff is beneficial, as the time for quadprog to solve the problem decreases by a factor of 10.
2-56
Large-Scale Examples
Quadratic Minimization with a Dense but Structured Hessian
The quadprog large-scale method can also solve large problems where the Hessian is dense but structured. For these problems, quadprog does not compute H*Y with the Hessian H directly, as it does for medium-scale problems and for large-scale problems with sparse H, because forming H would be memory-intensive. Instead, you must provide quadprog with a function that, given a matrix Y and information about H, computes W = H*Y. In this example, the Hessian matrix H has the structure H = B + A*A' where B is a sparse 512-by-512 symmetric matrix, and A is a 512-by-10 sparse matrix composed of a number of dense columns. To avoid excessive memory usage that could happen by working with H directly because H is dense, the example provides a Hessian multiply function, qpbox4mult. This function, when passed a matrix Y, uses sparse matrices A and B to compute the Hessian matrix product W = H*Y = (B + A*A )*Y. In this example, the matrices A and B need to be passed to the Hessian multiply function qpbox4mult from quadprog. There are two ways to indicate this in the call to quadprog. The first argument passed to quadprog is passed to the Hessian multiply function. Also, parameters passed to quadprog as additional parameters are passed to the Hessian multiply function.
Step 1: Decide what part of H to pass to quadprog as the first argument.
Either A, or B can be passed as the first argument to quadprog. The example chooses to pass B as the first argument because this results in a better preconditioner (see “Preconditioning” on page 2-59). A is then passed as an additional parameter:
quadprog(B,f,[],[],[],[],l,u,xstart,options,A)
Step 2: Write a function to compute Hessian-matrix products for H.
Now, define a function qpbox4mult that uses A and B to compute the Hessian matrix product W where W = H*Y = (B + A*A )*Y. This function must have the form
W = qpbox4mult(Hinfo,Y,p1,p2...) qpbox4mult must accept the same first argument as passed to quadprog, e.g., the example passes B as the first argument to quadprog, so qpbox4mult must accept B as the first argument.
2-57
2
Tutorial
The second argument to the Hessian multiply function is the matrix Y (of W = H*Y). Because quadprog expects Y to be used to form the Hessian matrix product, Y is always a matrix with n rows where n is the number of dimensions in the problem. The number of columns in Y can vary. Finally, any additional parameters passed to quadprog are passed to the Hessian multiply function, so qpbox4mult must accept the same additional parameters, e.g., the matrix A:
function W = qpbox4mult(B,Y,A); %QPBOX4MULT Hessian matrix product with dense structured Hessian. % W = qpbox4mult(B,Y,A) computes W = (B + A*A')*Y where % INPUT: % B - sparse square matrix (512 by 512) % Y - vector (or matrix) to be multiplied by B + A'*A. % A - sparse matrix with 512 rows and 10 columns. % % OUTPUT: % W - The product (B + A*A')*Y. % Order multiplies to avoid forming A*A', % which is large and dense W = B*Y + A*(A'*Y);
Note qpbox4mult is a subfunction of runqpbox4.m in the Optimization Toolbox.
Step 3: Call a quadratic minimization routine with a starting point.
Load the problem parameters from qpbox4.mat. Use optimset to set the HessMult option to a function handle that points to qpbox4mult. Call quadprog with B as the first argument and A as an additional parameter:
load qpbox4 % Get xstart, u, l, B, A, f mtxmpy = @qpbox4mult; % Function handle to function qpbox4mult options = optimset('HessMult',mtxmpy); [x,fval,exitflag,output] = quadprog(B,f,[],[],[],[],l,u,... xstart,options,A); Optimization terminated successfully: Relative function value changing by less than sqrt(OPTIONS.TolFun), no negative curvature detected in Hessian this iteration, and the rate of progress (change in f(x)) is slow
2-58
Large-Scale Examples
After 18 iterations with a total of 30 PCG iterations, the function value is reduced to
fval = -1.0538e+003
and the first-order optimality is
output.firstorderopt = 0.0043
Note Type [fval,exitflag,output] = runqpbox4 to run the preceding code and display the values for fval, exitflag, and output.
Preconditioning
In this example, quadprog cannot use H to compute a preconditioner because H only exists implicitly. Instead, quadprog uses B, the argument passed in instead of H, to compute a preconditioner. B is a good choice because it is the same size as H and approximates H to some degree. If B were not the same size as H, quadprog would compute a preconditioner based on some diagonal scaling matrices determined from the algorithm. Typically, this would not perform as well. Because the preconditioner is more approximate than when H is available explicitly, adjusting the TolPcg parameter to a somewhat smaller value might be required. This example is the same as the previous one, but reduces TolPcg from the default 0.1 to 0.01.
options = optimset('HessMult',mtxmpy,'TolPcg',0.01); [x,fval,exitflag,output]=quadprog(B,f,[],[],[],[],l,u,xstart,... options,A); Optimization terminated successfully: Relative function value changing by less than sqrt(OPTIONS.TolFun), no negative curvature detected in Hessian this iteration, and the rate of progress (change in f(x)) is slow
After 18 iterations and 50 PCG iterations, the function value has the same value to five significant digits
fval = -1.0538e+003
2-59
2
Tutorial
but the first-order optimality is further reduced.
output.firstorderopt = 0.0028
Note Decreasing TolPcg too much can substantially increase the number of PCG iterations.
Linear Least-Squares with Bound Constraints
Many situations give rise to sparse linear least-squares problems, often with bounds on the variables. The next problem requires that the variables be nonnegative. This problem comes from fitting a function approximation to a piecewise linear spline. Specifically, particles are scattered on the unit square. The function to be approximated is evaluated at these points, and a piecewise linear spline approximation is constructed under the condition that (linear) coefficients are not negative. There are 2000 equations to fit on 400 variables:
load particle % Get C, d lb = zeros(400,1); [x,resnorm,residual,exitflag,output] = ... lsqlin(C,d,[],[],[],[],lb);
The default diagonal preconditioning works fairly well:
exitflag = 1 resnorm = 22.5794 output = algorithm: firstorderopt: iterations: cgiterations:
'large-scale: trust-region reflective Newton' 2.7870e-005 10 42
For bound constrained problems, the first-order optimality is the infinity norm of v.*g, where v is defined as in “Box Constraints” on page 4-7, and g is the gradient.
2-60
Large-Scale Examples
You can improve (decrease) the first-order optimality by using a sparse QR factorization in each iteration. To do this, set PrecondBandWidth to inf.
options = optimset('PrecondBandWidth',inf); [x,resnorm,residual,exitflag,output] = ... lsqlin(C,d,[],[],[],[],lb,[],[],options);
The number of iterations and the first-order optimality both decrease:
exitflag = 1 resnorm = 22.5794 output = algorithm: firstorderopt: iterations: cgiterations:
'large-scale: trust-region reflective Newton' 5.5907e-015 12 11
Linear Programming with Equalities and Inequalities
The problem is Aeq ? x = beq min f x
T
such that
A?x≤b x≥0
and you can load the matrices and vectors A, Aeq, b, beq, f, and the lower bounds lb into the MATLAB workspace with
load sc50b
This problem in sc50b.mat has 48 variables, 30 inequalities, and 20 equalities. You can use linprog to solve the problem:
[x,fval,exitflag,output] = ... linprog(f,A,b,Aeq,beq,lb,[],[],optimset('Display','iter'));
Because the iterative display was set using optimset, the results displayed are
2-61
2
Tutorial
Residuals:
Primal Dual Duality Total Infeas Infeas Gap Rel A*x-b A'*y+z-f x'*z Error -----------------------------------------------------Iter 0: 1.50e+003 2.19e+001 1.91e+004 1.00e+002 Iter 1: 1.15e+002 2.94e-015 3.62e+003 9.90e-001 Iter 2: 1.16e-012 2.21e-015 4.32e+002 9.48e-001 Iter 3: 3.23e-012 5.16e-015 7.78e+001 6.88e-001 Iter 4: 5.78e-011 7.61e-016 2.38e+001 2.69e-001 Iter 5: 9.31e-011 1.84e-015 5.05e+000 6.89e-002 Iter 6: 2.96e-011 1.62e-016 1.64e-001 2.34e-003 Iter 7: 1.51e-011 2.74e-016 1.09e-005 1.55e-007 Iter 8: 1.51e-012 2.37e-016 1.09e-011 1.51e-013 Optimization terminated successfully.
For this problem, the large-scale linear programming algorithm quickly reduces the scaled residuals below the default tolerance of 1e-08. The exitflag value is positive, telling you linprog converged. You can also get the final function value in fval and the number of iterations in output.iterations:
exitflag = 1 fval = -70.0000 output = iterations: 8 cgiterations: 0 algorithm: 'lipsol'
Linear Programming with Dense Columns in the Equalities
The problem is min f x
T
such that
Aeq ? x = beq lb ≤ x ≤ ub
2-62
Large-Scale Examples
and you can load the matrices and vectors Aeq, beq, f, lb, and ub into the MATLAB workspace with
load densecolumns
The problem in densecolumns.mat has 1677 variables and 627 equalities with lower bounds on all the variables, and upper bounds on 399 of the variables. The equality matrix Aeq has dense columns among its first 25 columns, which is easy to see with a spy plot:
spy(Aeq)
You can use linprog to solve the problem:
[x,fval,exitflag,output] = ... linprog(f,[],[],Aeq,beq,lb,ub,[],optimset('Display','iter'));
Because the iterative display was set using optimset, the results displayed are
Residuals: Primal Dual Upper Duality Total Infeas Infeas Bounds Gap Rel A*x-b A'*y+z-w-f {x}+s-ub x'*z+s'*w Error --------------------------------------------------------------Iter 0: 1.67e+003 8.11e+002 1.35e+003 5.30e+006 2.92e+001 Iter 1: 1.37e+002 1.33e+002 1.11e+002 1.27e+006 2.48e+000 Iter 2: 3.56e+001 2.38e+001 2.89e+001 3.42e+005 1.99e+000 Iter 3: 4.86e+000 8.88e+000 3.94e+000 1.40e+005 1.89e+000 Iter 4: 4.24e-001 5.89e-001 3.44e-001 1.91e+004 8.41e-001 Iter 5: 1.23e-001 2.02e-001 9.97e-002 8.41e+003 5.79e-001 Iter 6: 3.98e-002 7.91e-002 3.23e-002 4.05e+003 3.52e-001 Iter 7: 7.25e-003 3.83e-002 5.88e-003 1.85e+003 1.85e-001 Iter 8: 1.47e-003 1.34e-002 1.19e-003 8.12e+002 8.52e-002
2-63
2
Tutorial
Iter Iter Iter Iter Iter Iter Iter
9: 10: 11: 12: 13: 14: 15:
2.52e-004 3.46e-005 6.95e-007 1.04e-006 3.08e-006 3.75e-007 5.21e-008
3.39e-003 1.08e-003 1.53e-012 2.26e-012 1.23e-012 1.09e-012 1.30e-012
2.04e-004 2.81e-005 5.64e-007 3.18e-008 3.86e-009 6.53e-012 3.27e-013
2.78e+002 1.09e+002 1.48e+001 8.32e-001 7.26e-002 1.11e-003 8.62e-008
2.99e-002 1.18e-002 1.62e-003 9.09e-005 7.94e-006 1.21e-007 9.15e-010
Optimization terminated successfully.
You can see the returned values of exitflag, fval, and output:
exitflag = 1 fval = 9.1464e+003 output = iterations: 15 cgiterations: 225 algorithm: 'lipsol'
This time the number of PCG iterations (in output.cgiterations) is nonzero because the dense columns in Aeq are detected. Instead of using a sparse Cholesky factorization, linprog tries to use the Sherman-Morrison formula to solve a linear system involving Aeq*Aeq'. If the Sherman-Morrison formula does not give a satisfactory residual, a PCG iteration is used. See the “Main Algorithm” section in “Large-Scale Linear Programming” on page 4-13.
2-64
Default Parameter Settings
Default Parameter Settings
The options structure contains parameters used in the optimization routines. If, on the first call to an optimization routine, the options structure is not provided, or is empty, a set of default parameters is generated. Some of the default options parameters are calculated using factors based on problem size, such as MaxFunEvals. Some parameters are dependent on the specific optimization routines and are documented on those function reference pages (See “Function Reference” on page 5-1). Table 5, Optimization Parameters, on page 5-10 provides an overview of all the parameters in the options structure.
Changing the Default Settings
The function optimset creates or updates an options structure to pass to the various optimization functions. The arguments to the optimset function are parameter name and parameter value pairs, such as TolX and 1e-4. Any unspecified properties have default values. You need to type only enough leading characters to define the parameter name uniquely. Case is ignored for parameter names. For parameter values that are strings, however, case and the exact string are necessary.
help optimset provides information that defines the different parameters and describes how to use them.
Here are some examples of the use of optimset.
Returning All Parameters
optimset returns all the parameters that can be set with typical values and default values.
Determining Parameters Used by a Function
The options structure defines the parameters that can be used by the functions provided by the toolbox. Because functions do not use all the parameters, it can be useful to find which parameters are used by a particular function. To determine which options structure fields are used by a function, pass the name of the function (in this example, fmincon) to optimset.
2-65
2
Tutorial
optimset('fmincon')
or
optimset fmincon
This statement returns a structure. Fields not used by the function have empty values ([]); fields used by the function are set to their default values for the given function.
Displaying Output
To display output at each iteration, enter
options = optimset('Display', 'iter');
This command sets the value of the Display parameter to 'iter', which causes the toolbox to display output at each iteration. You can also turn off any output display ('off'), display output only at termination ('final'), or display output only if the problem fails to converge ('notify').
Running Medium-Scale Optimization
For all functions that support medium- and large-scale optimization problems except fsolve, the default is for the function to use the large-scale algorithm. To use the medium-scale algorithm, enter
options = optimset('LargeScale', 'off');
For fsolve, the default is the medium-scale algorithm. To use the large-scale algorithm, enter
options = optimset('LargeScale', 'on');
Setting More Than One Parameter
You can specify multiple parameters with one call to optimset. For example, to reset the output option and the tolerance on x, enter
options = optimset('Display', 'iter', 'TolX', 1e-6);
Updating an options Structure
To update an existing options structure, call optimset and pass options as the first argument:
options = optimset(options, 'Display', 'iter', 'TolX', 1e-6);
2-66
Default Parameter Settings
Retrieving Parameter Values
Use the optimget function to get parameter values from an options structure. For example, to get the current display option, enter
verbosity = optimget(options, 'Display');
2-67
2
Tutorial
Displaying Iterative Output
This section describes the column headings used in the iterative output of ? Medium-scale algorithms ? Large-scale algorithms
Output Headings: Medium-Scale Algorithms
When the options Display parameter is set to 'iter' for fminsearch, fminbnd, fzero, fgoalattain, fmincon, lsqcurvefit, fminunc, fsolve, lsqnonlin, fminimax, and fseminf, output is produced in column format.
fminsearch
For fminsearch, the column headings are
Iteration Func-count min f(x) Procedure
where ? Iteration is the iteration number. ? Func-count is the number of function evaluations. ? min f(x) is the minimum function value in the current simplex. ? Procedure gives the current simplex operation: initial, expand, reflect, shrink, contract inside, and contract outside.
fzero and fminbnd
For fzero and fminbnd, the column headings are
Func-count x f(x) Procedure
where ? Func-count is the number of function evaluations (which for fzero is the same as the number of iterations). ? x is the current point. ? f(x) is the current function value at x. ? Procedure gives the current operation. For fzero, these include initial (initial point), search (search for an interval containing a zero), bisection
2-68
Displaying Iterative Output
(bisection search), and interpolation. For fminbnd, the possible operations are initial, golden (golden section search), and parabolic (parabolic interpolation).
fminunc
For fminunc, the column headings are
Iteration Func-count f(x) Step-size Directional derivative
where ? Iteration is the iteration number. ? Func-count is the number of function evaluations. ? f(x) is the current function value. ? Step-size is the step size in the current search direction. ? Directional derivative is the gradient of the function along the search direction.
lsqnonlin and lsqcurvefit
For lsqnonlin and lsqcurvefit, the headings are
Iteration Func-count Residual Step-size Directional derivative Lambda
where Iteration, Func-count, Step-size, and Directional derivative are the same as for fminunc, and ? Residual is the residual (sum of squares) of the function. ? Lambda is the λk value defined in “Least-Squares Optimization” on page 3-18. (This value is displayed when you use the Levenberg-Marquardt method and omitted when you use the Gauss-Newton method.)
fsolve
For fsolve with the default trust-region dogleg method, the headings are
Iteration Func-count f(x) Norm of step First-order Trust-region optimality radius
where
2-69
2
Tutorial
? Iteration is the iteration number. ? Func-count is the number of function evaluations. ? f(x) is the sum of squares of the current function value. ? Norm of step is the norm of the current step size. ? First-order optimality is the infinity norm of the current gradient. ? Trust-region radius is the radius of the trust region for that step. For fsolve with either the Levenberg-Marquardt or Gauss-Newton method, the headings are
Iteration Func-count Residual Step-size Directional derivative
where ? Residual is the residual (sum of squares) of the function. ? Step-size is the step-size in the current search direction. ? Directional derivative is the gradient of the function along the search direction.
fmincon and fseminf
For fmincon and fseminf, the headings are
Iter F-count f(x) max constraint Step-size Directional derivative Procedure
where ? Iter is the iteration number. ? F-count is the number of function evaluations. ? f(x) is the current function value. ? max constraint is the maximum constraint violation. ? Step-size is the step size in the search direction. ? Directional derivative is the gradient of the function along the search direction. ? Procedure gives a message about the Hessian update and QP subproblem.
2-70
Displaying Iterative Output
The Procedure messages are discussed in “Updating the Hessian Matrix” on page 3-31. For fgoalattain and fminimax, the headings are the same as for fmincon except that f(x) and max constraint are combined into Max{F,constraints}. Max{F,constraints} gives the maximum goal violation or constraint violation for fgoalattain and the maximum function value or constraint violation for fminimax.
Output Headings: Large-Scale Algorithms
fminunc
For fminunc, the column headings are
Iteration f(x) Norm of step First-order optimality CG-iterations
where ? Iteration is the iteration number. ? f(x) is the current function value. ? Norm of step is the norm of the current step size. ? First-order optimality is the infinity norm of the current gradient. ? CG-iterations is the number of iterations taken by PCG (see “Preconditioned Conjugate Gradients” on page 4-5) at the current (optimization) iteration.
lsqnonlin, lsqcurvefit, and fsolve
For lsqnonlin, lsqcurvefit, and fsolve, the column headings are
Iteration Func-count f(x) Norm of step First-order optimality CG-iterations
where ? Iteration is the iteration number. ? Func-count is the number of function evaluations. ? f(x) is the sum of the squares of the current function values. ? Norm of step is the norm of the current step size.
2-71
2
Tutorial
? First-order optimality is a measure of first-order optimality. For bound constrained problems, the first-order optimality is the infinity norm of v.*g, where v is defined as in “Box Constraints” on page 4-7 and g is the gradient. For unconstrained problems, it is the infinity norm of the current gradient. ? CG-iterations is the number of iterations taken by PCG (see “Preconditioned Conjugate Gradients” on page 4-5) at the current (optimization) iteration.
fmincon
For fmincon, the column headings are
Iteration f(x) Norm of step First-order optimality CG-iterations
where ? Iteration is the iteration number. ? f(x) is the current function value. ? Norm of step is the norm of the current step size. ? First-order optimality is a measure of first-order optimality. For bound constrained problems, the first-order optimality is the infinity norm of v.*g, where v is defined as in “Box Constraints” on page 4-7 and g is the gradient. For equality constrained problems, it is the infinity norm of the projected gradient. (The projected gradient is the gradient projected into the nullspace of Aeq.) ? CG-iterations is the number of iterations taken by PCG (see “Preconditioned Conjugate Gradients” on page 4-5) at the current (optimization) iteration.
linprog
For linprog, the column headings are
Residuals: Primal Infeas A*x-b Dual Infeas A'*y+z-w-f Upper Bounds {x}+s-ub Duality Gap x'*z+s'*w Total Rel Error
where ? Primal Infeas A*x-b is the norm of the residual A*x - b.
2-72
Displaying Iterative Output
? Dual Infeas A'*y+z-w-f is the norm of the residual A'*y+z-w-f (where w is all zero if there are no finite upper bounds). ? Upper Bounds {x}+s-ub is the norm of the residual spones(s).*x+s-ub (which is defined to be zero if all variables are unbounded above). This column is not printed if no finite upper bounds exist. ? Duality Gap x'*z+s'*w is the duality gap (see “Large-Scale Linear Programming” on page 4-13) between the primal objective and the dual objective. s and w only appear in this equation if there are finite upper bounds. ? Total Rel Error is the total relative error described at the end of the “Main Algorithm” subsection of “Large-Scale Linear Programming” on page 4-13.
2-73
2
Tutorial
Optimization of Inline Objects Instead of M-Files
The routines in the Optimization Toolbox also perform optimization on inline objects, avoiding the need to write M-files to define functions. To represent a mathematical function at the command line, create an inline object from a string expression. For example, you can create an inline object of the humps function (use the command type humps to see the M-file function humps.m):
f = inline('1./((x-0.3).^2 + 0.01) + 1./((x-0.9).^2 + 0.04)-6');
You can then evaluate f at 2.0:
f(2.0) ans = -4.8552
And you can pass f to an optimization routine to minimize it:
x = fminbnd(f, 3, 4)
You can also create functions of more than one argument with inline by specifying the names of the input arguments along with the string expression. For example, to use lsqcurvefit, you first need a function that takes two input arguments, x and xdata,
f= inline('sin(x).*xdata +(x.^2).*cos(xdata)','x','xdata') x = pi; xdata = pi*[4;2;3]; f(x, xdata) ans = 9.8696e+000 9.8696e+000 -9.8696e+000
and you then call lsqcurvefit.
% Assume ydata exists x = lsqcurvefit(f,x,xdata,ydata)
Other examples that use this technique: ? A matrix equation
x = fsolve(inline('x?x?x-[1,2;3,4]'),ones(2,2))
2-74
Optimization of Inline Objects Instead of M-Files
? A nonlinear least-squares problem x = lsqnonlin(inline('x?x-[3 5;9 10]'),eye(2,2)) ? An example using fgoalattain where the function has additional arguments to pass to the optimization routine. For example, if the function to be minimized has additional arguments A, B, and C,
fun = inline('sort(eig(A+B*x*C))','x','A','B','C'); x = fgoalattain(fun,-ones(2,2),[-5,-3,-1],[5, 3, 1],... [ ],[ ],[ ],[ ],-4*ones(2),4*ones(2),[ ],[ ],A,B,C);
solves the problem described on the fgoalattain reference page.
2-75
2
Tutorial
Typical Problems and How to Deal with Them
Optimization problems can take many iterations to converge and can be sensitive to numerical problems such as truncation and round-off error in the calculation of finite-difference gradients. Most optimization problems benefit from good starting guesses. This improves the execution efficiency and can help locate the global minimum instead of a local minimum. Advanced problems are best solved by an evolutionary approach, whereby a problem with a smaller number of independent variables is solved first. You can generally use solutions from lower order problems as starting points for higher order problems by using an appropriate mapping. The use of simpler cost functions and less stringent termination criteria in the early stages of an optimization problem can also reduce computation time. Such an approach often produces superior results by avoiding local minima. The Optimization Toolbox functions can be applied to a large variety of problems. Used with a little “conventional wisdom,” you can overcome many of the limitations associated with optimization techniques. Additionally, you can handle problems that are not typically in the standard form by using an appropriate transformation. Below is a list of typical problems and recommendations for dealing with them.
Table 2-1: Troubleshooting Problem Recommendation
The solution does not appear to be a global minimum.
There is no guarantee that you have a global minimum unless your problem is continuous and has only one minimum. Starting the optimization from a number of different starting points can help to locate the global minimum or verify that there is only one minimum. Use different methods, where possible, to verify results.
2-76
Typical Problems and How to Deal with Them
Table 2-1: Troubleshooting (Continued) Problem fminunc produces warning messages and seems to exhibit slow convergence near the solution. Recommendation
If you are not supplying analytically determined gradients and the termination criteria are stringent, fminunc often exhibits slow convergence near the solution due to truncation error in the gradient calculation. Relaxing the termination criteria produces faster, although less accurate, solutions. For the medium-scale algorithm, another option is adjusting the finite-difference perturbation levels, DiffMinChange and DiffMaxChange, which might increase the accuracy of gradient calculations. Place bounds on the independent variables or make a penalty function to give a large positive value to f and g when infeasibility is encountered. For gradient calculation, the penalty function should be smooth and continuous.
Sometimes an optimization problem has values of x for which it is impossible to evaluate the objective function fun or the nonlinear constraints function nonlcon. The function that is being minimized has discontinuities.
The derivation of the underlying method is based upon functions with continuous first and second derivatives. Some success might be achieved for some classes of discontinuities when they do not occur near solution points. One option is to smooth the function. For example, the objective function might include a call to an interpolation function to do the smoothing. Or, for the medium-scale algorithms, you can adjust the finite-difference parameters in order to jump over small discontinuities. The variables DiffMinChange and DiffMaxChange control the perturbation levels for x used in the calculation of finite-difference gradients. The perturbation, ? x , is always in the range DiffMinChange < Dx < DiffMaxChange.
2-77
2
Tutorial
Table 2-1: Troubleshooting (Continued) Problem Recommendation
Warning messages are displayed.
This sometimes occurs when termination criteria are overly stringent, or when the problem is particularly sensitive to changes in the independent variables. This usually indicates truncation or round-off errors in the finite-difference gradient calculation, or problems in the polynomial interpolation routines. These warnings can usually be ignored because the routines continue to make steps toward the solution point; however, they are often an indication that convergence will take longer than normal. Scaling can sometimes improve the sensitivity of a problem. This type of problem commonly occurs when, for example, the variables are the coefficients of a filter that are realized using finite-precision arithmetic or when the independent variables represent materials that are manufactured only in standard amounts. Although the Optimization Toolbox functions are not explicitly set up to solve discrete problems, you can solve some discrete problems by first solving an equivalent continuous problem. Do this by progressively eliminating discrete variables from the independent variables, which are free to vary. Eliminate a discrete variable by rounding it up or down to the nearest best discrete value. After eliminating a discrete variable, solve a reduced order problem for the remaining free variables. Having found the solution to the reduced order problem, eliminate another discrete variable and repeat the cycle until all the discrete variables have been eliminated.
dfildemo is a demonstration routine that shows how filters
The independent variables, x , can only take on discrete values, for example, integers.
with fixed-precision coefficients can be designed using this technique.
2-78
Typical Problems and How to Deal with Them
Table 2-1: Troubleshooting (Continued) Problem Recommendation
The minimization routine appears to enter an infinite loop or returns a solution that does not satisfy the problem constraints.
Your objective (fun), constraint (nonlcon, seminfcon), or gradient (computed by fun) functions might be returning Inf, NaN, or complex values. The minimization routines expect only real numbers to be returned. Any other values can cause unexpected results. Insert some checking code into the user-supplied functions to verify that only real numbers are returned (use the function isfinite). You might be forming the sum of squares explicitly and returning a scalar value. lsqnonlin expects a vector (or matrix) of function values that are squared and summed internally.
You do not get the convergence you expect from the lsqnonlin routine.
2-79
2
Tutorial
Converting Your Code to Version 2 Syntax
Most of the function names and calling sequences have changed in Version 2 to accommodate new functionality and to clarify the roles of the input and output variables. As a result, if you want to use the new versions of these functions, you need to modify any code that currently uses the old function names and calling sequences. This table lists the functions provided by the Optimization Toolbox and indicates the functions whose names changed in Version 2.
Old (Version 1.5) Name attgoal conls constr curvefit fmin fmins fminu fsolve fzero leastsq minimax nnls lp qp seminf New (Version 2) Name fgoalattain lsqlin fmincon lsqcurvefit fminbnd fminsearch fminunc fsolve (name unchanged) fzero (name unchanged) lsqnonlin fminimax lsqnonneg linprog quadprog fseminf
2-80
Converting Your Code to Version 2 Syntax
This section ? Tells you how to override default parameter settings with the new optimset and optimget functions ? Explains the reasons for the new calling sequences and explains how to convert your code ? Provides a detailed example of rewriting a call to the constr function to call the new fmincon function instead In addition to the information in this section, consult the M-file help for the new functions for more information about the arguments they take. For example, to see the help for fmincon, type
help fmincon
Using optimset and optimget
The optimset function replaces foptions for overriding default parameter settings. See “Changing the Default Settings” on page 2-65 for more information on using optimset and optimget.
New Calling Sequences
Version 2 of the toolbox makes these changes in the calling sequences: ? Equality constraints and inequality constraints are now supplied as separate input arguments. ? Linear constraints and nonlinear constraints are now supplied as separate input arguments. ? The gradient of the objective is computed in the same function as the objective, rather than in a separate function, in order to provide more efficient computation (because the gradient and objective often share similar computations). Similarly, the gradient of the nonlinear constraints is computed by the (now separate) nonlinear constraint function. ? The Hessian matrix can be provided by the objective function. (This matrix is used only by the new large-scale algorithms.) ? Each function takes an options structure to adjust parameters to the optimization functions (see optimset, optimget).
2-81
2
Tutorial
? Flags are required to indicate when extra information is available. These flags reside in the new options structure: - Set GradObj to 'on' to indicate that a user-supplied gradient of the objective function is available. - Set GradConstr to 'on' to indicate that a user-supplied gradient of the constraints is available. - Set the Hessian parameter to 'on' to indicate that a user-supplied Hessian of the objective function is available. ? The new default output gives information upon termination. The old default was no output. The new default for the Display parameter is 'final'. ? Each function returns an exitflag that denotes the termination state. ? Except for fsolve, the default uses the new large-scale methods when possible. If you want to use the older algorithms (referred to as medium-scale algorithms in other parts of this User’s Guide), set the LargeScale parameter to 'off' using optimset. For fsolve, the default is the medium-scale algorithm. To use the large-scale fsolve algorithm, set LargeScale to 'on'. Algorithm terminating conditions have been fine tuned. The stopping conditions relating to TolX and TolFun for the large-scale and medium-scale code are joined using OR instead of AND for these functions: fgoalattain, fmincon, fminimax, fminunc, fseminf, fsolve, and lsqnonlin. As a result, you might need to specify stricter tolerances; the defaults reflect this change. Each function now has an output structure that contains information about the problem solution relevant to that function. The lambda is now a structure where each field is the Lagrange multipliers for a type of constraint. For more information, see the individual functions in “Function Reference” on page 5-1. The following sections describe how to convert from the old function names and calling sequences to the new ones. The calls shown are the most general cases, involving all possible input and output arguments. Note that many of these arguments are optional; see the online help for these functions for more information.
Converting from attgoal to fgoalattain
In Version 1.5, you used this call to attgoal:
2-82
Converting Your Code to Version 2 Syntax
OPTIONS = foptions; [X,OPTIONS] = attgoal('FUN',x0,GOAL, WEIGHT, OPTIONS, VLB, VUB, 'GRADFUN', P1, P2,...);
with [F] = FUN(X,P1,...) and [DF] = GRADFUN(X,P1,...). In Version 2, you call fgoalattain like this
OPTIONS = optimset('fgoalattain'); [X,FVAL,ATTAINFACTOR,EXITFLAG,OUTPUT,LAMBDA] = fgoalattain(@FUN,x0,GOAL,WEIGHT,A,B,Aeq,Beq,VLB,VUB, @NONLCON,OPTIONS,P1,P2,...);
with [F,DF] = FUN(X,P1,P2,...) and NONLCON = []. The fgoalattain function allows nonlinear constraints, so you can now define
[Cineq,Ceq,DCineq,DCeq] = NONLCON(X,P1,...)
Converting from conls to lsqlin
In Version 1.5, you used this call to conls:
[X,LAMBDA,HOW] = conls(A,b,C,d,VLB,VUB,X0,N,DISPLAY);
In Version 2, convert the input arguments to the correct form for lsqlin by separating the equality and inequality constraints.
Ceq deq C = d = = C(1:N,:); = d(1:N); C(N+1:end,:); d(N+1:end,:);
Now call lsqlin like this:
OPTIONS = optimset('Display','final'); [X,RESNORM,RESIDUAL,EXITFLAG,OUTPUT,LAMBDA] = lsqlin(A,b,C,d,Ceq,deq,VLB,VUB,X0,OPTIONS);
Converting from constr to fmincon
In Version 1.5, you used this call to constr:
[X,OPTIONS,LAMBDA,HESS] = constr('FUN',x0,OPTIONS,VLB,VUB,'GRADFUN',P1,P2,...);
2-83
2
Tutorial
with [F,C] = FUN(X,P1,...) and [G,DC] = GRADFUN(X,P1,...). In Version 2, replace FUN and GRADFUN with two new functions: ? OBJFUN, which returns the objective function, the gradient (first derivative) of this function, and its Hessian matrix (second derivative)
[F,G,H] = OBJFUN(X,P1,...)
? NONLCON, which returns the functions for the nonlinear constraints (both inequality and equality constraints) and their gradients
[C,Ceq,DC,DCeq] = NONLCON(X,P1,...)
Now call fmincon like this:
% OBJFUN supplies the objective gradient and Hessian; % NONLCON supplies the constraint gradient. OPTIONS = optimset('GradObj','on','GradConstr','on','Hessian','on'); [X,FVAL,EXITFLAG,OUTPUT,LAMBDA,GRAD,HESSIAN] = fmincon(@OBJFUN,x0,A,B,Aeq,Beq,VLB,VUB,@NONLCON,OPTIONS, P1,P2,...);
See “Example of Converting from constr to fmincon” on page 2-89 for a detailed example.
Converting from curvefit to lsqcurvefit
In Version 1.5, you used this call to curvefit:
[X,OPTIONS,FVAL,JACOBIAN] = curvefit('FUN',x0,XDATA,YDATA,OPTIONS,'GRADFUN',P1,P2,...);
with F = FUN(X,P1,...) and G = GRADFUN(X,P1,...). In Version 2, replace FUN and GRADFUN with a single function that returns both F and J, the objective function and the Jacobian. (The Jacobian is the transpose of the gradient.):
[F,J] = OBJFUN(X,P1,...)
Now call lsqcurvefit like this:
OPTIONS = optimset('Jacobian','on'); % Jacobian is supplied VLB = []; VUB = []; % New arguments not in curvefit [X,RESNORM,F,EXITFLAG,OUTPUT,LAMBDA,JACOB] =
2-84
Converting Your Code to Version 2 Syntax
lsqcurvefit(@OBJFUN,x0,XDATA,YDATA,VLB,VUB,OPTIONS, P1,P2,...);
Converting from fmin to fminbnd
In Version 1.5, you used this call to fmin:
[X,OPTIONS] = fmin('FUN',x1,x2,OPTIONS,P1,P2,...);
In Version 2, you call fminbnd like this:
[X,FVAL,EXITFLAG,OUTPUT] = fminbnd(@FUN,x1,x2,... OPTIONS,P1,P2,...);
Converting from fmins to fminsearch
In Version 1.5, you used this call to fmins:
[X,OPTIONS] = fmins('FUN',x0,OPTIONS,[],P1,P2,...);
In Version 2, you call fminsearch like this:
[X,FVAL,EXITFLAG,OUTPUT] = fminsearch(@FUN,x0,... OPTIONS,P1,P2,...);
Converting from fminu to fminunc
In Version 1.5, you used this call to fminu:
[X,OPTIONS] = fminu('FUN',x0,OPTIONS,'GRADFUN',P1,P2,...);
with F = FUN(X,P1, ...) and G = GRADFUN(X,P1, ...). In Version 2, replace FUN and GRADFUN with a single function that returns both F and G (the objective function and the gradient).
[F,G] = OBJFUN(X,P1, ...)
(This function can also return the Hessian matrix as a third output argument.) Now call fminunc like this:
OPTIONS = optimset('GradObj','on'); % Gradient is supplied [X,FVAL,EXITFLAG,OUTPUT,GRAD,HESSIAN] = fminunc(@OBJFUN,x0,OPTIONS,P1,P2,...);
If you have an existing FUN and GRADFUN that you do not want to rewrite, you can pass them both to fminunc by placing them in a cell array.
2-85
2
Tutorial
OPTIONS = optimset('GradObj','on'); % Gradient is supplied [X,FVAL,EXITFLAG,OUTPUT,GRAD,HESSIAN] = fminunc({@FUN,@GRADFUN},x0,OPTIONS,P1,P2,...);
Converting to the New Form of fsolve
In Version 1.5, you used this call to fsolve:
[X,OPTIONS] = fsolve('FUN',x0,XDATA,YDATA,OPTIONS,'GRADFUN',P1,P2,...);
with F = FUN(X,P1,...) and G = GRADFUN(X,P1,...). In Version 2, replace FUN and GRADFUN with a single function that returns both F and G (the objective function and the gradient).
[F,G] = OBJFUN(X,P1, ...)
Now call fsolve like this:
OPTIONS = optimset('GradObj','on'); [X,FVAL,EXITFLAG,OUTPUT,JACOBIAN] = fsolve(@OBJFUN,x0,OPTIONS,P1,P2,...); % Gradient is supplied
If you have an existing FUN and GRADFUN that you do not want to rewrite, you can pass them both to fsolve by placing them in a cell array.
OPTIONS = optimset('GradObj','on'); % Gradient is supplied [X,FVAL,EXITFLAG,OUTPUT,JACOBIAN] = fsolve({@FUN,@GRADFUN},x0,OPTIONS,P1,P2,...);
Converting to the New Form of fzero
In Version 1.5, you used this call to fzero:
X = fzero('F',X,TOL,TRACE,P1,P2,...);
In Version 2, replace the TRACE and TOL arguments with
if TRACE == 0, val = 'none'; elseif TRACE == 1 val = 'iter'; end OPTIONS = optimset('Display',val,'TolX',TOL);
2-86
Converting Your Code to Version 2 Syntax
Now call fzero like this:
[X,FVAL,EXITFLAG,OUTPUT] = fzero(@F,X,OPTIONS,P1,P2,...);
Converting from leastsq to lsqnonlin
In Version 1.5, you used this call to leastsq:
[X,OPTIONS,FVAL,JACOBIAN] = leastsq('FUN',x0,OPTIONS,'GRADFUN',P1,P2,...);
with F = FUN(X,P1,...) and G = GRADFUN(X,P1, ...). In Version 2, replace FUN and GRADFUN with a single function that returns both F and J, the objective function and the Jacobian. (The Jacobian is the transpose of the gradient.):
[F,J] = OBJFUN(X,P1, ...)
Now call lsqnonlin like this:
OPTIONS = optimset('Jacobian','on'); % Jacobian is supplied VLB = []; VUB = []; % New arguments not in leastsq [X,RESNORM,F,EXITFLAG,OUTPUT,LAMBDA,JACOBIAN] = lsqnonlin(@OBJFUN,x0,VLB,VUB,OPTIONS,P1,P2,...);
Converting from lp to linprog
In Version 1.5, you used this call to lp:
[X,LAMBDA,HOW] = lp(f,A,b,VLB,VUB,X0,N,DISPLAY);
In Version 2, convert the input arguments to the correct form for linprog by separating the equality and inequality constraints.
Aeq = A(1:N,:); beq = b(1:N); A = A(N+1:end,:); b = b(N+1:end,:); if DISPLAY val = 'final'; else val = 'none'; end OPTIONS = optimset('Display',val);
2-87
2
Tutorial
Now call linprog like this:
[X,FVAL,EXITFLAG,OUTPUT,LAMBDA] = linprog(f,A,b,Aeq,beq,VLB,VUB,X0,OPTIONS);
Converting from minimax to fminimax
In Version 1.5, you used this call to minimax:
[X,OPTIONS] = minimax('FUN',x0,OPTIONS,VLB,VUB,'GRADFUN',P1,P2,...);
with F = FUN(X,P1,...) and G = GRADFUN(X,P1,...). In Version 2, you call fminimax like this:
OPTIONS = optimset('fminimax'); [X,FVAL,MAXFVAL,EXITFLAG,OUTPUT,LAMBDA] = fminimax(@OBJFUN,x0,A,B,Aeq,Beq,VLB,VUB,@NONLCON,OPTIONS, P1,P2,...);
with [F,DF] = OBJFUN(X,P1,...) and
[Cineq,Ceq,DCineq,DCeq] = NONLCON(X,P1,...).
Converting from nnls to lsqnonneg
In Version 1.5, you used this call to nnls:
[X,LAMBDA] = nnls(A,b,tol);
In Version 2, replace the tol argument with
OPTIONS = optimset('Display','none','TolX',tol);
Now call lsqnonneg like this:
[X,RESNORM,RESIDUAL,EXITFLAG,OUTPUT,LAMBDA] = lsqnonneg(A,b,X0,OPTIONS);
Converting from qp to quadprog
In Version 1.5, you used this call to qp:
[X,LAMBDA,HOW] = qp(H,f,A,b,VLB,VUB,X0,N,DISPLAY);
In Version 2, convert the input arguments to the correct form for quadprog by separating the equality and inequality constraints.
2-88
Converting Your Code to Version 2 Syntax
Aeq = A(1:N,:); beq = b(1:N); A = A(N+1:end,:); b = b(N+1:end,:); if DISPLAY val = 'final'; else val = 'none'; end OPTIONS = optimset('Display',val);
Now call quadprog like this:
[X,FVAL,EXITFLAG,OUTPUT,LAMBDA] = quadprog(H,f,A,b,Aeq,beq,VLB,VUB,X0,OPTIONS);
Converting from seminf to fseminf
In Version 1.5, you used this call to seminf:
[X,OPTIONS] = seminf('FUN',N,x0,OPTIONS,VLB,VUB,P1,P2,...);
with [F,C,PHI1,PHI2,...,PHIN,S] = FUN(X,S,P1,P2,...). In Version 2, call fseminf like this:
[X,FVAL,EXITFLAG,OUTPUT,LAMBDA] = fseminf(@OBJFUN,x0,N,@NONLCON,A,B,Aeq,Beq,VLB,VUB,OPTIONS, P1,P2,...);
with F = OBJFUN(X,P1,...) and
[Cineq,Ceq,PHI1,PHI2,...,PHIN,S] = NONLCON(X,S,P1,...).
Example of Converting from constr to fmincon
Old Call to constr
OPTIONS = foptions; OPTIONS(13) = 2; OPTIONS(1) = 1; OPTIONS(9) = 1; A1 = [ 1 4 -3]; b1 = 2; A2 = [ 2 5 0]; b2 = 9; % Two equality constraints
2-89
2
Tutorial
x0 = [1; .5; .8]; LB = []; UB = []; [X,OPTIONS,LAMBDA,HESS] = ... constr('myfuncon',x0,OPTIONS,LB,UB,'mygradcon',A1,b1,A2,b2); % myfuncon.m [F, C] = myfuncon(x,A1,b1,A2,b2) F = x(1) + 0.0009*x(2)^3 + sin(x(3)); C(1,1) = A1*x-b; C(2,1) = 3*x(1)^2-1; C(3,1) = A2*x-b2; C(4,1) = 7*sin(x(2))-1; % mygradcon.m [G, DC] = mygradcon(x,alpha) G = [1; 3*0.0009*x(2)^2; cos(x(3))]; DC(:,1) DC(:,2) DC(:,3) DC(:,4) = = = = % Equality linear constraint % Equality nonlinear constraint % Inequality linear constraint % Inequality nonlinear constraint
% Gradient of the objective
A1'; % Gradient of the constraints [6*x(1); 0; 0]; A2'; [0; 7*cos(x(2)); 0];
New Call to fmincon
OPTIONS = optimset(... 'Display', 'iter', ... 'GradCheck', 'on', ... % Check gradients. 'GradObj', 'on', ... % Gradient of objective is provided. 'GradConstr', 'on'); % Gradient of constraints is provided. A1 = [ 1 4 -3]; b1 = 2; A2 = [ 2 5 0]; b2 = 9; % Linear equalities % Linear inequalities
x0 = [1; .5; .8]; LB = []; UB = []; [X,FVAL,EXITFLAG,OUTPUT,LAMBDA,GRAD,HESSIAN] = ...
2-90
Converting Your Code to Version 2 Syntax
fmincon(@myfun,x0,A2,b2,A1,b1,LB,UB,@mycon,OPTIONS); % myfun.m function [F,G] = myfun(x) F = x(1) + 0.0009*x(2)^3 + sin(x(3)); G = [1; 3*0.0009*x(2)^2; cos(x(3))]; % mycon.m function [C,Ceq,DC,DCeq]= mycon(x) Ceq(1,1) = 3*x(1)^2-1; % Equality nonlinear constraint C(1,1) = 7*sin(x(2))-1; % Inequality nonlinear constraint DCeq(:,1) = [6*x(1); 0; 0]; % Gradient of equality % nonlinear constraint DC(:,1) = [0; 7*cos(x(2)); 0]; % Gradient of inequality % nonlinear constraint
2-91
2
Tutorial
Selected Bibliography
[1] Hairer, E., S. P. Norsett, and G. Wanner, Solving Ordinary Differential Equations I – Nonstiff Problems, Springer-Verlag, pp. 183-184.
2-92
3
Standard Algorithms
Standard Algorithms provides an introduction to the different optimization problem formulations, and describes the medium-scale (i.e., standard) algorithms used in the toolbox functions. These algorithms have been chosen for their robustness and iterative efficiency. The choice of problem formulation (e.g., unconstrained, least-squares, constrained, minimax, multiobjective, or goal attainment) depends on the problem being considered and the required execution efficiency. This chapter consists of these sections: Optimization Overview (p. 3-3) Introduces optimization as a way of finding a set of parameters that can in some way be defined as optimal. These parameters are obtained by minimizing or maximizing an objective function, subject to equality or inequality constraints and/or parameter bounds. Discusses the use of quasi-Newton and line search methods for unconstrained optimization. Also provides implementation details for the Hessian update and line search phases of the quasi-Newton algorithm used in fminunc. Discusses the use of the Gauss-Newton and Levenberg-Marquardt methods for nonlinear least-squaresleast-squares (LS) optimization. Also provides implementation details for the Gauss-Newton and Levenberg-Marquardt methods used in the nonlinear least-squares optimization routines, lsqnonlin and lsqcurvefit. Discusses the use of Gauss-Newton, Newton’s, and trust-region dogleg methods for the solution of nonlinear systems of equations. Also provides implementation details for the Gauss-Newton and trust-region dogleg methods used by the fsolve function.
Unconstrained Optimization (p. 3-4)
Least-Squares Optimization (p. 3-18)
Nonlinear Systems of Equations (p. 3-24)
3
Standard Algorithms
Constrained Optimization (p. 3-28) Discusses the use of the Kuhn-Tucker (KT) equations as the basis for Sequential Quadratic Programming (SQP) methods. Also provides implementation details for the Hessian matrix update, quadratic programming problem solution, and line search and merit function calculation phases of the SQP algorithm used in fmincon, fminimax, fgoalattain, and fseminf. Multiobjective Optimization (p. 3-38) Introduces multiobjective optimization and discusses strategies for dealing with competing objectives. It discusses in detail the use of the Goal Attainment method, and suggests improvements to the SQP method for use with the Goal Attainment method. Lists published materials that support concepts implemented in the medium-scale algorithms.
Selected Bibliography (p. 3-48)
Note Medium-scale is not a standard term and is used here only to differentiate these algorithms from the large-scale algorithms described in “Large-Scale Algorithms” on page 4-1.
3-2
Optimization Overview
Optimization Overview
Optimization techniques are used to find a set of design parameters, x = { x 1 ,x 2 ,… ,x n } , that can in some way be defined as optimal. In a simple case this might be the minimization or maximization of some system characteristic that is dependent on x. In a more advanced formulation the objective function, f(x), to be minimized or maximized, might be subject to constraints in the form of equality constraints, G i ( x ) = 0 ( i = 1 ,… ,m e ) ; inequality constraints, G i ( x ) ≤ 0 ( i = m e + 1 ,… ,m ) ; and/or parameter bounds, x l , x u . A General Problem (GP) description is stated as minimize f ( x ) n x∈? subject to G i ( x ) = 0, G i ( x ) ≤ 0, xl ≤ x ≤ xu where x is the vector of design parameters ( x ∈ ? ), f(x) is the objective n function that returns a scalar value ( f ( x ) : ? → ? ), and the vector function G(x) returns the values of the equality and inequality constraints evaluated at n m x ( G ( x ) : ? → ? ). An efficient and accurate solution to this problem depends not only on the size of the problem in terms of the number of constraints and design variables but also on characteristics of the objective function and constraints. When both the objective function and the constraints are linear functions of the design variable, the problem is known as a Linear Programming (LP) problem. Quadratic Programming (QP) concerns the minimization or maximization of a quadratic objective function that is linearly constrained. For both the LP and QP problems, reliable solution procedures are readily available. More difficult to solve is the Nonlinear Programming (NP) problem in which the objective function and constraints can be nonlinear functions of the design variables. A solution of the NP problem generally requires an iterative procedure to establish a direction of search at each major iteration. This is usually achieved by the solution of an LP, a QP, or an unconstrained subproblem.
n
(3-1)
i = 1 ,… ,m e i = m e + 1 ,… ,m
3-3
3
Standard Algorithms
Unconstrained Optimization
Although a wide spectrum of methods exists for unconstrained optimization, methods can be broadly categorized in terms of the derivative information that is, or is not, used. Search methods that use only function evaluations (e.g., the simplex search of Nelder and Mead [32]) are most suitable for problems that are very nonlinear or have a number of discontinuities. Gradient methods are generally more efficient when the function to be minimized is continuous in its first derivative. Higher order methods, such as Newton’s method, are only really suitable when the second order information is readily and easily calculated, because calculation of second order information, using numerical differentiation, is computationally expensive. Gradient methods use information about the slope of the function to dictate a direction of search where the minimum is thought to lie. The simplest of these is the method of steepest descent in which a search is performed in a direction, – ?f ( x ) , where ?f ( x ) is the gradient of the objective function. This method is very inefficient when the function to be minimized has long narrow valleys as, for example, is the case for Rosenbrock’s function f ( x ) = 100 ( x 2 – x 1 ) + ( 1 – x 1 )
2 2 2
(3-2)
The minimum of this function is at x = [1,1] where f ( x ) = 0 . A contour map of this function is shown in Figure 3-1, along with the solution path to the minimum for a steepest descent implementation starting at the point [-1.9,2]. The optimization was terminated after 1000 iterations, still a considerable distance from the minimum. The black areas are where the method is continually zigzagging from one side of the valley to another. Note that toward the center of the plot, a number of larger steps are taken when a point lands exactly at the center of the valley.
3-4
Unconstrained Optimization
3 2.5 Start Point 2
o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o oooo oo oo o
x2
1.5 1 0.5 0 -0.5 -1 -2
o o o o o o o o o o o o o
Solution
o o o o oo o oo o o
-1.5
-1
-0.5
0
0.5
1
1.5
2
x1
Figure 3-1: Steepest Descent Method on Rosenbrock’s Function (Eq. 2-2)
This type of function (Eq. 3-2), also known as the banana function, is notorious in unconstrained examples because of the way the curvature bends around the origin. Eq. 3-2 is used throughout this section to illustrate the use of a variety of optimization techniques. The contours have been plotted in exponential increments because of the steepness of the slope surrounding the U-shaped valley. This section continues with discussions of the following: ? Quasi-Newton Methods ? Line Search ? Quasi-Newton Implementation
3-5
3
Standard Algorithms
Quasi-Newton Methods
Of the methods that use gradient information, the most favored are the quasi-Newton methods. These methods build up curvature information at each iteration to formulate a quadratic model problem of the form
T 1 T - x Hx + c x + b min -x 2
(3-3)
where the Hessian matrix, H, is a positive definite symmetric matrix, c is a constant vector, and b is a constant. The optimal solution for this problem occurs when the partial derivatives of x go to zero, i.e., ?f ( x? ) = Hx? + c = 0 The optimal solution point, x? , can be written as x? = – H c
–1
(3-4)
(3-5)
Newton-type methods (as opposed to quasi-Newton methods) calculate H directly and proceed in a direction of descent to locate the minimum after a number of iterations. Calculating H numerically involves a large amount of computation. Quasi-Newton methods avoid this by using the observed behavior of f(x) and ?f ( x ) to build up curvature information to make an approximation to H using an appropriate updating technique. A large number of Hessian updating methods have been developed. However, the formula of Broyden [3], Fletcher [14], Goldfarb [22], and Shanno [39] (BFGS) is thought to be the most effective for use in a General Purpose method. The formula given by BFGS is Hk + 1 where sk = xk + 1 – xk q k = ? f(x k + 1) – ? f(x k) As a starting point, H 0 can be set to any symmetric positive definite matrix, for example, the identity matrix I. To avoid the inversion of the Hessian H, you can
T H s s H q k qk k k k k = H k + ------------ – ---------------------------Ts T qk s k k Hk sk T T
(3-6)
3-6
Unconstrained Optimization
derive an updating method that avoids the direct inversion of H by using a formula that makes an approximation of the inverse Hessian H – 1 at each update. A well known procedure is the DFP formula of Davidon [9], Fletcher, and Powell [16]. This uses the same formula as the BFGS method (Eq. 3-6) except that q k is substituted for s k . The gradient information is either supplied through analytically calculated gradients, or derived by partial derivatives using a numerical differentiation method via finite differences. This involves perturbing each of the design variables, x, in turn and calculating the rate of change in the objective function. At each major iteration, k, a line search is performed in the direction
– 1 ? ?f ( x ) d = –Hk k
(3-7)
The quasi-Newton method is illustrated by the solution path on Rosenbrock’s function (Eq. 3-2) in Figure 3-2, BFGS Method on Rosenbrock’s Function. The method is able to follow the shape of the valley and converges to the minimum after 140 function evaluations using only finite difference gradients.
3-7
3
Standard Algorithms
3 2.5 Start Point 2 1.5 1
o o o o
o o o o
Solution
o
0.5 0 -0.5 -1 -2
o o o o o o o o o o o
o
o
-1.5
-1
-0.5
0
0.5
1
1.5
2
Figure 3-2: BFGS Method on Rosenbrock’s Function
Line Search
Most unconstrained and constrained methods use the solution of a subproblem to yield a search direction in which the solution is estimated to lie. The minimum along the line formed from this search direction is generally approximated using a search procedure (e.g., Fibonacci, Golden Section) or by a polynomial method involving interpolation or extrapolation (e.g., quadratic, cubic). Polynomial methods approximate a number of points with a univariate polynomial whose minimum can be calculated easily. Interpolation refers to the condition that the minimum is bracketed (i.e., the minimum lies in the area spanned by the available points), whereas extrapolation refers to a minimum located outside the range spanned by the available points. Extrapolation methods are generally considered unreliable for estimating minima for nonlinear functions. However, they are useful for estimating step length when
3-8
Unconstrained Optimization
you are trying to bracket the minimum as shown in “Line Search Procedures” on page 3-11. Polynomial interpolation methods are generally the most effective in terms of efficiency when the function to be minimized is continuous. The problem is to find a new iterate x k + 1 of the form x k + 1 = x k + α? d
(3-8)
where x k denotes the current iterate, d is the search direction obtained by an appropriate method, and α? is a scalar step length parameter that is the distance to the minimum.
Quadratic Interpolation
Quadratic interpolation involves a data fit to a univariate function of the form mq ( α ) = a α 2 + b α + c where an extremum occurs at a step length of –b α? = -----2a
(3-10) (3-9)
This point can be a minimum or a maximum. It is a minimum when interpolation is performed (i.e., using a bracketed minimum) or when a is positive. Determination of coefficients a and b can be found using any combination of three gradient or function evaluations. It can also be carried out with just two gradient evaluations. The coefficients are determined through the formulation and solution of a linear set of simultaneous equations. Various simplifications in the solution of these equations can be achieved when particular characteristics of the points are used. For example, the first point can generally be taken as α = 0 . Other simplifications can be achieved when the points are evenly spaced. A general problem formula is as follows. Given three unevenly spaced points {x 1, x 2 ,x 3} and their associated function values {f ( x 1 ), f ( x 2 ) ,f ( x 3 )} the minimum resulting from a second-order fit is given by 1 β 23 f ( x 1 ) + β 31 f ( x 2 ) + β12 f ( x 3 ) - -----------------------------------------------------------------------------x k + 1 = -2 γ 23 f ( x 1 ) + γ 31 f ( x 2 ) + γ 12 f ( x 3 ) where
(3-11)
3-9
3
Standard Algorithms
β ij = x i – x j γ ij = x i – x j
2
2
For interpolation to be performed, as opposed to extrapolation, the minimum must be bracketed so that the points can be arranged to give f ( x 2 ) < f ( x 1 ) and f ( x 2 ) < f ( x 3 )
Cubic Interpolation
Cubic interpolation is useful when gradient information is readily available or when more than three function evaluations have been calculated. It involves a data fit to the univariate function mc( α ) = aα3 + bα2 + cα + d where the local extrema are roots of the quadratic equation 3 aα2 + 2bα + c = 0 To find the minimum extremum, take the root that gives 6 a α + 2 b as positive. You can determine coefficients a and b using any combination of four gradient or function evaluations or, alternatively, with just three gradient evaluations. The coefficients are calculated by the formulation and solution of a linear set of simultaneous equations. A general formula, given two points, {x1, x 2} , their corresponding gradients with respect to x, {?f ( x 1 ), ?f ( x 2 )} , and associated function values, {f ( x 1 ), f ( x 2 )} , is ?f ( x 2 ) + β2 – β 1 x k + 1 = x 2 – ( x 2 – x 1 ) -------------------------------------------------------?f ( x 2 ) – ?f ( x 1 ) + 2 β 2 where f ( x1 ) – f ( x2 ) β 1 = ?f ( x 1 ) + ?f ( x 2 ) – 3 ------------------------------x 1 – x2
2 – ?fx ?f ( x ) ) 1 / 2 . β2 = ( β1 1 2
(3-12)
(3-13)
3-10
Unconstrained Optimization
Quasi-Newton Implementation
A quasi-Newton algorithm is used in fminunc. The algorithm consists of two phases: ? Determination of a direction of search (Hessian update) ? Line search procedures Implementation details of the two phases are discussed below.
Hessian Update
The direction of search is determined by a choice of either the BFGS (Eq. 3-6) or the DFP method given in “Quasi-Newton Methods” on page 3-6 (set the options parameter HessUpdate to 'dfp' to select the DFP method). The Hessian, H, is always maintained to be positive definite so that the direction of search, d, is always in a descent direction. This means that for some arbitrarily small step α in the direction d, the objective function decreases in magnitude. You achieve positive definiteness of H by ensuring that H is initialized to be T s (from Eq. 3-14) is always positive. The positive definite and thereafter q k k T term q k s k is a product of the line search step length parameter α k and a combination of the search direction d with past and present gradient evaluations,
T s = α ( ?f ( x T T qk k k k + 1 ) d – ?f ( x k ) d )
(3-14)
T s is positive by performing a You always achieve the condition that q k k sufficiently accurate line search. This is because the search direction, d, is a descent direction, so that α k and – ?f ( x k ) T d are always positive. Thus, the possible negative term ?f ( x k + 1 ) T d can be made as small in magnitude as required by increasing the accuracy of the line search.
Line Search Procedures
Two line search strategies are used, depending on whether gradient information is readily available or whether it must be calculated using a finite difference method. When gradient information is available, the default is to use a cubic polynomial method. When gradient information is not available, the default is to use a mixed quadratic and cubic polynomial method.
Cubic Polynomial Method . In the proposed cubic polynomial method, a gradient and a function evaluation are made at every iteration k. At each iteration an
3-11
3
Standard Algorithms
update is performed when a new point is found, x k + 1 , that satisfies the condition f ( xk + 1 ) < f ( xk )
(3-15)
At each iteration a step, α k , is attempted to form a new iterate of the form xk + 1 = xk + αk d
(3-16)
If this step does not satisfy the condition (Eq. 3-15), then α k is reduced to form a new step, α k + 1 . The usual method for this reduction is to use bisection, i.e., to continually halve the step length until a reduction is achieved in f(x). However, this procedure is slow when compared to an approach that involves using gradient and function evaluations together with cubic interpolation/extrapolation methods to identify estimates of step length. When a point is found that satisfies the condition (Eq. 3-15), an update is T s is positive. If it is not, then further cubic interpolations are performed if q k k performed until the univariate gradient term ?f ( x k + 1 ) T d is sufficiently small T s is positive. so that q k k It is usual practice to reset α k to unity after every iteration. However, note that the quadratic model (Eq. 3-3) is generally only a good one near to the solution point. Therefore, α k is modified at each major iteration to compensate for the case when the approximation to the Hessian is monotonically increasing or decreasing. To ensure that, as x k approaches the solution point, the procedure T s – ?f ( x ) T d and α reverts to a value of α k close to unity, the values of q k k k k+1 are used to estimate the closeness to the solution point and thus to control the variation in α k . α k is attempted, following which a number of scenarios are possible. Consideration of all the possible cases is quite complicated and so they are represented pictorially below. For each case: ? The left point on the graph represents the point x k . ? The slope of the line bisecting each point represents the slope of the univariate gradient, ?f ( x k ) T d , which is always negative for the left point. ? The right point is the point x k + 1 after a step of α k is taken in the direction d.
Cubic Polynomial Line Search Procedures. After each update procedure, a step length
3-12
Unconstrained Optimization
Case 1: f(x k + 1) > f(x k), ? f(x k + 1) d > 0
T
f(x)
Reduce step length
0 αk αk + 1 α
? ? α ?2 αk + 1 = ? c ? αc ?
T
if α k < 0.1 otherwise
Case 2: f(x k + 1) ≤ f(x k), ? f(x k + 1) d ≥ 0
≥0 f(x) Update H Reset d 0 αk αk + 1 α
Ts qk k
<0
Reduce step length
α k + 1 = min { 1 ,α c }
α k + 1 = 0.9 α c
3-13
3
Standard Algorithms
Case 3:
f(x k + 1) < f(x k), ? f(x k + 1) d < 0 ≥0 f( x) Update H Reset d
Ts qk k
T
<0 Change to steepest descent method temporarily
0 αk αk + 1 α
α k + 1 = min { 2 , p , 1.2 α c } α k + 1 = min { 2, max { 1.5, α k }, α c }
Case 4: f(x k + 1) ≥ f(x k), ? f(x k + 1) d ≤ 0 where p = 1 + q k s k – ? f(x k + 1) d + min { 0, α k + 1 }
T
T
T
f( x) Reduce step length
0 αk αk + 1 α
α k + 1 = min { α c , α k ? 2 }
Cases 1 and 2 show the procedures performed when the value ?f ( x k + 1 ) T d is positive. Cases 3 and 4 show the procedures performed when the value ?f ( x k + 1 ) T d is negative. The notation min { a, b, c } refers to the smallest value of the set { a, b, c } . At each iteration a cubicly interpolated step length α c is calculated and then used to adjust the step length parameter α k + 1 . Occasionally, for very nonlinear functions α c can be negative, in which case α c is given a value of 2 αk . Certain robustness measures have also been included so that, even in the case when false gradient information is supplied, you can achieve a reduction in f(x) by taking a negative step. You do this by setting α k + 1 = – α k ? 2 when α k falls
3-14
Unconstrained Optimization
below a certain threshold value (e.g., 1e-8). This is important when extremely high precision is required, if only finite difference gradients are available.
Mixed Cubic/Quadratic Polynomial Method. The cubic interpolation/extrapolation method has proved successful for a large number of optimization problems. However, when analytic derivatives are not available, evaluating finite difference gradients is computationally expensive. Therefore, another interpolation/extrapolation method is implemented so that gradients are not needed at every iteration. The approach in these circumstances, when gradients are not readily available, is to use a quadratic interpolation method. The minimum is generally bracketed using some form of bisection method. This method, however, has the disadvantage that all the available information about the function is not used. For instance, a gradient calculation is always performed at each major iteration for the Hessian update. Therefore, given three points that bracket the minimum, it is possible to use cubic interpolation, which is likely to be more accurate than using quadratic interpolation. Further efficiencies are possible if, instead of using bisection to bracket the minimum, extrapolation methods similar to those used in the cubic polynomial method are used.
Hence, the method that is used in fminunc, lsqnonlin, lsqcurvefit, and fsolve is to find three points that bracket the minimum and to use cubic interpolation to estimate the minimum at each line search. The estimation of step length at each minor iteration, j, is shown in the following graphs for a number of point combinations. The left-most point in each graph represents the function value f ( x 1 ) and univariate gradient ?f ( x k ) obtained at the last update. The remaining points represent the points accumulated in the minor iterations of the line search procedure. The terms α q and α c refer to the minimum obtained from a respective quadratic and cubic interpolation or extrapolation. For highly nonlinear functions, α c and α q can be negative, in which case they are set to a value of 2 α k so that they are always maintained to be positive. Cases 1 and 2 use quadratic interpolation with two points and one gradient to estimate a third point that brackets the minimum. If this fails, cases 3 and 4 represent the possibilities for changing the step length when at least three points are available. When the minimum is finally bracketed, cubic interpolation is achieved using one gradient and three function evaluations. If the interpolated point is greater than any of the three used for the interpolation, then it is replaced with the
3-15
3
Standard Algorithms
point with the smallest function value. Following the line search procedure, the Hessian update procedure is performed as for the cubic polynomial line search method. The following graphs illustrate the line search procedures for cases 1 through 4, with a gradient only for the first point.
Case 1:
f(x j) ≥ f(x k)
f(x) Reduce step length
0 αj + 1 αj
Case 2:
α
αj + 1 = αq
f(x j) < f(x k)
f(x)
Increase step length
0 αj αj + 1
Case 3:
α
α j + 1 = 1.2 α q
f(x j + 1) < f(x k)
f(x)
Increase step length
α 0 α j αj + 1 α j + 2
α j + 2 = max { 1.2 α q, 2 α j + 1 }
3-16
Unconstrained Optimization
Case 4:
f(x j + 1) > f(x k )
f(x)
Reduce step length
α 0 αj α j + 1 αj + 2
αj + 2 = αc
3-17
3
Standard Algorithms
Least-Squares Optimization
The line search procedures used in conjunction with a quasi-Newton method are used in the function fminunc. They are also used as part of the nonlinear least-squares (LS) optimization routines, lsqnonlin and lsqcurvefit. In the least-squares problem a function f(x) is minimized that is a sum of squares. 1 - F(x ) min f ( x ) = -n 2 x∈?
2 2
1 = -2
∑ Fi ( x)
i
2
(3-17)
Problems of this type occur in a large number of practical applications, especially when fitting model functions to data, i.e., nonlinear parameter estimation. They are also prevalent in control where you want the output, y ( x, t ) , to follow some continuous model trajectory, φ ( t ) , for vector x and scalar t. This problem can be expressed as min n x∈?
∫t ( y ( x, t ) –φ ( t ) )
2
t1
2
dt
(3-18)
where y ( x, t ) and φ ( t ) are scalar functions. When the integral is discretized using a suitable quadrature formula, Eq. 3-18 can be formulated as a least-squares problem
m
min f(x) = x ∈ ?n
∑ ( y ( x, t i ) – φ ( t i ) ) 2
i=1
(3-19)
where y and φ include the weights of the quadrature scheme. Note that in this problem the vector F(x) is y ( x, t 1 ) – φ ( t 1 ) F(x) = y ( x, t 2 ) – φ ( t 2 ) … y ( x, tm ) – φ ( t m ) In problems of this kind, the residual F ( x ) is likely to be small at the optimum since it is general practice to set realistically achievable target trajectories. Although the function in LS (Eq. 3-18) can be minimized using a
3-18
Least-Squares Optimization
general unconstrained minimization technique, as described in “Unconstrained Optimization” on page 3-4, certain characteristics of the problem can often be exploited to improve the iterative efficiency of the solution procedure. The gradient and Hessian matrix of LS (Eq. 3-18) have a special structure. Denoting the m-by-n Jacobian matrix of F(x) as J(x), the gradient vector of f(x) as G ( x ) , the Hessian matrix of f(x) as H ( x ) , and the Hessian matrix of each F i ( x ) as H i ( x ) , you have G ( x ) = 2 J ( x )T F ( x ) H( x) = 2J( x)TJ(x ) + 2Q( x) where
m
(3-20)
Q(x) =
∑ Fi ( x ) ? Hi ( x )
i=1
The matrix Q(x) has the property that when the residual F ( x ) tends to zero as x k approaches the solution, then Q(x) also tends to zero. Thus when F ( x ) is small at the solution, a very effective method is to use the Gauss-Newton direction as a basis for an optimization procedure. This section continues with discussions of the following: ? Gauss-Newton Method ? Levenberg-Marquardt Method ? Nonlinear Least-Squares Implementation
Gauss-Newton Method
In the Gauss-Newton method, a search direction, d k , is obtained at each major iteration, k, that is a solution of the linear least-squares problem. min x ∈ ?n J ( xk ) dk – F ( xk )
2 2
(3-21)
The direction derived from this method is equivalent to the Newton direction when the terms of Q(x) can be ignored. The search direction d k can be used as part of a line search strategy to ensure that at each iteration the function f(x) decreases.
3-19
3
Standard Algorithms
Consider the efficiencies that are possible with the Gauss-Newton method. Figure 3-3 shows the path to the minimum on Rosenbrock’s function (Eq. 3-2) when posed as a least-squares problem. The Gauss-Newton method converges after only 48 function evaluations using finite difference gradients, compared to 140 iterations using an unconstrained BFGS method.
3 2.5 Start Point 2 1.5 1
o o o
Solution o o
0.5 0 -0.5 -1 -2
o o
o o o
-1.5
-1
-0.5
0
0.5
1
1.5
2
Figure 3-3: Gauss-Newton Method on Rosenbrock’s Function
The Gauss-Newton method often encounters problems when the second order term Q(x) in Eq. 3-20 is significant. A method that overcomes this problem is the Levenberg-Marquardt method.
Levenberg-Marquardt Method
The Levenberg-Marquardt [27],[29] method uses a search direction that is a solution of the linear set of equations ( J ( x k ) T J ( xk ) + λ k I ) d k = – J ( x k ) F ( xk )
(3-22)
where the scalar λ k controls both the magnitude and direction of d k . When λ k is zero, the direction d k is identical to that of the Gauss-Newton method. As
3-20
Least-Squares Optimization
λk tends to infinity, d k tends toward a vector of zeros and a steepest descent direction. This implies that for some sufficiently large λ k , the term F ( x k + d k ) < F ( x k ) holds true. The term λ k can therefore be controlled to ensure descent even when second order terms, which restrict the efficiency of the Gauss-Newton method, are encountered. The Levenberg-Marquardt method therefore uses a search direction that is a cross between the Gauss-Newton direction and the steepest descent. This is illustrated in Figure 3-4, Levenberg-Marquardt Method on Rosenbrock’s Function. The solution for Rosenbrock’s function (Eq. 3-2) converges after 90 function evaluations compared to 48 for the Gauss-Newton method. The poorer efficiency is partly because the Gauss-Newton method is generally more effective when the residual is zero at the solution. However, such information is not always available beforehand, and the increased robustness of the Levenberg-Marquardt method compensates for its occasional poorer efficiency.
3 2.5 2 1.5 1 0.5 0 -0.5 -1 -2
o o o o o o o o o o o o
Start Point
o o o
Solution
o
-1.5
-1
-0.5
0
0.5
1
1.5
2
Figure 3-4: Levenberg-Marquardt Method on Rosenbrock’s Function
3-21
3
Standard Algorithms
Nonlinear Least-Squares Implementation
For a general survey of nonlinear least-squares methods, see Dennis [10]. Specific details on the Levenberg-Marquardt method can be found in Moré [30]. Both the Gauss-Newton method and the Levenberg-Marquardt method are implemented in the Optimization Toolbox. Details of the implementations are discussed below: ? Gauss-Newton Implementation ? Levenberg-Marquardt Implementation
Gauss-Newton Implementation
The Gauss-Newton method is implemented using polynomial line search strategies similar to those discussed for unconstrained optimization. In solving the linear least-squares problem (Eq. 3-18), you can avoid exacerbation of the conditioning of the equations by using the QR decomposition of J ( x k ) and applying the decomposition to F ( x k ) (using the MATLAB \ operator). This is in contrast to inverting the explicit matrix, J ( x k ) T J ( x k ) , which can cause unnecessary errors to occur. Robustness measures are included in the method. These measures consist of changing the algorithm to the Levenberg-Marquardt method when either the step length goes below a threshold value (1e-15 in this implementation) or when the condition number of J ( x k ) is below 1e-10. The condition number is a ratio of the largest singular value to the smallest.
Levenberg-Marquardt Implementation
The main difficulty in the implementation of the Levenberg-Marquardt method is an effective strategy for controlling the size of λ k at each iteration so that it is efficient for a broad spectrum of problems. The method used in this implementation is to estimate the relative nonlinearity of f ( x ) using a linear predicted sum of squares fp ( x k ) and a cubicly interpolated estimate of the minimum f k ( x * ) . In this way the size of λ k is determined at each iteration. The linear predicted sum of squares is calculated as fp ( xk ) = J ( xk – 1 ) dk – 1 + F ( xk – 1 )
(3-23)
and the term fk ( x * ) is obtained by cubicly interpolating the points f ( x k ) and f ( x k – 1 ) . A step length parameter α * is also obtained from this interpolation, which is the estimated step to the minimum. If f p ( x k ) is greater than f k ( x * ) ,
3-22
Least-Squares Optimization
then λk is reduced, otherwise it is increased. The justification for this is that the difference between f p ( x k ) and f k ( x * ) is a measure of the effectiveness of the Gauss-Newton method and the linearity of the problem. This determines whether to use a direction approaching the steepest descent direction or the Gauss-Newton direction. The formulas for the reduction and increase in λ k , which have been developed through consideration of a large number of test problems, are shown in the following figure.
No
fp ( xk ) > fk ( x* )
Yes
Increase
λk
Reduce
λk
f k ( x * ) – f p ( xk ) λ k = λ k + ------------------------------------α*
Figure 3-5: Updating λk
λk λ k = --------------1 + α*
Following the update of λ k , a solution of Eq. 3-22 is used to obtain a search direction, d k . A step length of unity is then taken in the direction d k , which is followed by a line search procedure similar to that discussed for the unconstrained implementation. The line search procedure ensures that f ( x k + 1 ) < f ( x k ) at each major iteration and the method is therefore a descent method. The implementation has been successfully tested on a large number of nonlinear problems. It has proved to be more robust than the Gauss-Newton method and iteratively more efficient than an unconstrained method. The Levenberg-Marquardt algorithm is the default method used by lsqnonlin. You can select the Gauss-Newton method by setting the options parameter LevenbergMarquardt to 'off'.
3-23
3
Standard Algorithms
Nonlinear Systems of Equations
Solving a nonlinear system of equations F ( x ) involves finding a solution such that every equation in the nonlinear system is 0. That is, we have n equations n and n unknowns and we want to find x ∈ ? such that F ( x ) = 0 where F1 ( x ) F(x) = F2 ( x ) Fn( x ) The assumption is that a zero, or root, of the system exists. These equations may represent economic constraints, for example, that must all be satisfied.
Gauss-Newton Method
One approach to solving this problem is to use a Nonlinear Least-Squares solver, such those described in “Least-Squares Optimization” on page 3-18. Since we assume the system has a root, it would have a small residual, and so using the Gauss-Newton Method is effective. In this case, at each iteration we solve a linear least-squares problem, as described in Eq. 3-21, to find the search direction. (See “Gauss-Newton Method” on page 3-19 for more information.)
Trust-Region Dogleg Method
Another approach is to solve a linear system of equations to find the search direction, namely, Newton’s method says to solve for the search direction d k such that J ( x k ) d k = –F ( x k ) xk + 1 = xk + dk where J ( x k ) is the n-by-n Jacobian
...
3-24
Nonlinear Systems of Equations
? F 1 ( xk ) J ( xk ) = ? F 2 ( xk ) ? Fn ( xk )
T T
Newton’s method can run into difficulties. J ( x k ) may be singular, and so the Newton step d k is not even defined. Also, the exact Newton step d k may be expensive to compute. In addition, Newton’s method may not converge if the starting point is far from the solution. Using trust-region techniques (introduced in “Trust-Region Methods for Nonlinear Minimization” on page 4-2) improves robustness when starting far from the solution and handles the case when J ( x k ) is singular. To use a trust-region strategy, a merit function is needed to decide if x k + 1 is better or worse than x k . A possible choice is
T 1 - F(x k + d) F(x k + d) min f(d) = -2 d
But a minimum of f ( d ) is not necessarily a root of F ( x ) . The Newton step d k is a root of M ( xk + d ) = F ( xk ) + J( xk ) d and so it is also a minimum of m ( d ) where 1 - M ( xk + d ) min m(d) = -2 d
2 2
Then m ( d ) is a better choice of merit function than f ( d ) , and so the trust region subproblem is 1 T T T T 1 T - F ( x k ) F ( x k ) + d J ( x k ) F ( x k ) + -- d ( J ( xk ) J ( x k ) ) d min -2 2 d
(3-25)
...
T
1 - F ( xk ) + J ( xk ) d = -2
T T T 1 - F ( xk ) F ( xk ) + d J( xk ) F ( xk ) = -2
(3-24)
1 T T - d ( J ( x k ) J ( xk ) ) d + -2
3-25
3
Standard Algorithms
such that D ? d ≤ ? . This subproblem can be efficiently solved using a dogleg strategy. For an overview of trust-region methods, see Conn [5], and Nocedal [33].
Nonlinear Equations Implementation
Both the Gauss-Newton and trust-region dogleg methods are implemented in the Optimization Toolbox. Details of their implementations are discussed below.
Gauss-Newton Implementation
The Gauss-Newton implementation is the same as that for least-squares optimization. It is described in “Gauss-Newton Implementation” on page 3-22.
Trust-Region Dogleg Implementation
The key feature of this algorithm is the use of the Powell dogleg procedure for computing the step d , which minimizes Eq. 3-25. For a detailed description, see Powell [36]. The step d is constructed from a convex combination of a Cauchy step (a step along the steepest descent direction) and a Gauss-Newton step for f ( x ) . The Cauchy step is calculated as dC = –α J ( xk ) F ( xk ) where α is chosen to minimize Eq. 3-24. The Gauss-Newton step is calculated by solving J ( x k ) ? d GN = – F ( x k ) using the MATLAB \ (matrix left division) operator. The step d is chosen so that d = d C + λ ( d GN – d C ) where λ is the largest value in the interval [0,1] such that (nearly) singular, d is just the Cauchy direction. d ≤ ? . If J k is
T
3-26
Nonlinear Systems of Equations
The dogleg algorithm is efficient since it requires only one linear solve per iteration (for the computation of the Gauss-Newton step). Additionally, it can be more robust than using the Gauss-Newton method with a line search.
3-27
3
Standard Algorithms
Constrained Optimization
In constrained optimization, the general aim is to transform the problem into an easier subproblem that can then be solved and used as the basis of an iterative process. A characteristic of a large class of early methods is the translation of the constrained problem to a basic unconstrained problem by using a penalty function for constraints that are near or beyond the constraint boundary. In this way the constrained problem is solved using a sequence of parameterized unconstrained optimizations, which in the limit (of the sequence) converge to the constrained problem. These methods are now considered relatively inefficient and have been replaced by methods that have focused on the solution of the Kuhn-Tucker (KT) equations. The KT equations are necessary conditions for optimality for a constrained optimization problem. If the problem is a so-called convex programming problem, that is, f ( x ) and G i ( x ), i = 1, …, m , are convex functions, then the KT equations are both necessary and sufficient for a global solution point. Referring to GP (Eq. 3-1), the Kuhn-Tucker equations can be stated as
m
? f ( x? ) +
∑ λ i? ? ? G i ( x ? )
i=1
= 0 i = 1, …, m i = m e + 1, …, m
(3-26)
λ i * ? G i( x * ) = 0 λ i? ≥ 0
The first equation describes a canceling of the gradients between the objective function and the active constraints at the solution point. For the gradients to be canceled, Lagrange multipliers ( λ i, i = 1, … m ) are necessary to balance the deviations in magnitude of the objective function and constraint gradients. Because only active constraints are included in this canceling operation, constraints that are not active must not be included in this operation and so are given Lagrange multipliers equal to zero. This is stated implicitly in the last two equations of Eq. 3-26. The solution of the KT equations forms the basis to many nonlinear programming algorithms. These algorithms attempt to compute the Lagrange multipliers directly. Constrained quasi-Newton methods guarantee superlinear convergence by accumulating second order information regarding the KT equations using a quasi-Newton updating procedure. These methods are commonly referred to as Sequential Quadratic Programming (SQP)
3-28
Constrained Optimization
methods, since a QP subproblem is solved at each major iteration (also known as Iterative Quadratic Programming, Recursive Quadratic Programming, and Constrained Variable Metric methods). This section continues with discussions of the following: ? Sequential Quadratic Programming (SQP) ? A Quadratic Programming (QP) Subproblem ? SQP Implementation
Sequential Quadratic Programming (SQP)
SQP methods represent the state of the art in nonlinear programming methods. Schittkowski [38], for example, has implemented and tested a version that outperforms every other tested method in terms of efficiency, accuracy, and percentage of successful solutions, over a large number of test problems. Based on the work of Biggs [1], Han [24], and Powell ([34],[35]), the method allows you to closely mimic Newton’s method for constrained optimization just as is done for unconstrained optimization. At each major iteration, an approximation is made of the Hessian of the Lagrangian function using a quasi-Newton updating method. This is then used to generate a QP subproblem whose solution is used to form a search direction for a line search procedure. An overview of SQP is found in Fletcher [15], Gill et al. [21], Powell [37], and Schittkowski [25]. The general method, however, is stated here. Given the problem description in GP (Eq. 3-1) the principal idea is the formulation of a QP subproblem based on a quadratic approximation of the Lagrangian function.
m
L ( x, λ ) = f ( x ) +
∑ λi ? gi ( x )
i=1
(3-27)
Here you simplify Eq. 3-1 by assuming that bound constraints have been expressed as inequality constraints. You obtain the QP subproblem by linearizing the nonlinear constraints.
3-29
3
Standard Algorithms
Quadratic Programming (QP) Subproblem
1 T - d H k d + ?f ( x k ) T d minimize -n 2 d∈? ?g i ( x k ) T d + g i ( x k ) = 0 ?g i ( x k ) T d + g i ( x k ) ≤ 0 i = 1, … m e i = m e + 1, … m
(3-28)
This subproblem can be solved using any QP algorithm (see, for instance, “Quadratic Programming Solution” on page 3-33). The solution is used to form a new iterate xk + 1 = xk + αk dk The step length parameter α k is determined by an appropriate line search procedure so that a sufficient decrease in a merit function is obtained (see “Updating the Hessian Matrix” on page 3-31). The matrix H k is a positive definite approximation of the Hessian matrix of the Lagrangian function (Eq. 3-27). H k can be updated by any of the quasi-Newton methods, although the BFGS method (see “Updating the Hessian Matrix” on page 3-31) appears to be the most popular. A nonlinearly constrained problem can often be solved in fewer iterations than an unconstrained problem using SQP. One of the reasons for this is that, because of limits on the feasible area, the optimizer can make informed decisions regarding directions of search and step length. Consider Rosenbrock’s function (Eq. 3-2) with an additional nonlinear inequality constraint, g(x), x 1 + x 2 – 1.5 ≤ 0
2 2
(3-29)
This was solved by an SQP implementation in 96 iterations compared to 140 for the unconstrained case. Figure 3-6 shows the path to the solution point x = [0.9072,0.8228] starting at x = [– 1.9,2] .
3-30
Constrained Optimization
3 2.5 Start Point 2 1.5 1 0.5 Constraint boundary 0 -0.5 -1 -2 Feasible region
o o
Infeasible region
o o o o o oo o o o
Solution
-1.5
-1
-0.5
0
0.5
1
1.5
2
Figure 3-6: SQP Method on Nonlinear Linearly Constrained Rosenbrock’s Function (Eq.3-2)
SQP Implementation
The SQP implementation consists of three main stages, which are discussed briefly in the following subsections: ? Updating of the Hessian matrix of the Lagrangian function ? Quadratic programming problem solution ? Line search and merit function calculation
Updating the Hessian Matrix
At each major iteration a positive definite quasi-Newton approximation of the Hessian of the Lagrangian function, H, is calculated using the BFGS method, where λ i (i = 1, …, m) is an estimate of the Lagrange multipliers.
3-31
3
Standard Algorithms
T T qk q k Hk Hk H k + 1 = H k + ------------ – ------------------- where T T qk sk sk H k sk
(3-30)
sk = xk + 1 – xk
n ? ? ? λ i ? ?g i ( x k + 1 ) – ?f ( x k ) + λi ? ?g i ( x k )? q k = ?f ( x k + 1 ) + ? ? ? ? i=1 i=1 n
∑
∑
Powell [35] recommends keeping the Hessian positive definite even though it might be positive indefinite at the solution point. A positive definite Hessian is T s is positive at each update and that H is initialized maintained providing q k k T s is not positive, q is modified on with a positive definite matrix. When q k k k T an element-by-element basis so that q k s k > 0 . The general aim of this modification is to distort the elements of q k , which contribute to a positive definite update, as little as possible. Therefore, in the initial phase of the · is repeatedly halved. This modification, the most negative element of q k .? s k T s is greater than or equal to 1e-5. If, after this procedure is continued until q k k T s is still not positive, modify q by adding a vector v multiplied procedure, q k k k by a constant scalar w, that is, q k = q k + wv where v i = ?gi ( x k + 1 ) ? g i ( x k + 1 ) – ?g i ( x k ) ? gi ( x k ), if ( q k ) i ? w < 0 and ( q k ) i ? ( s k ) i < 0 ( i = 1, … m ) v i = 0 otherwise
T s k becomes positive. and increase w systematically until q k
(3-31)
The functions fmincon, fminimax, fgoalattain, and fseminf all use SQP. If the options parameter Display is set to 'iter', then various information is given such as function values and the maximum constraint violation. When the Hessian has to be modified using the first phase of the preceding procedure to keep it positive definite, then Hessian modified is displayed. If the Hessian has to be modified again using the second phase of the approach described above, then Hessian modified twice is displayed. When the QP subproblem
3-32
Constrained Optimization
is infeasible, then infeasible is displayed. Such displays are usually not a cause for concern but indicate that the problem is highly nonlinear and that convergence might take longer than usual. Sometimes the message no update T is displayed, indicating that q k s k is nearly zero. This can be an indication that the problem setup is wrong or you are trying to minimize a noncontinuous function.
Quadratic Programming Solution
At each major iteration of the SQP method, a QP problem of the following form is solved, where A i refers to the ith row of the m-by-n matrix A . minimize d ∈ ?n 1 T - d Hd + c T d q ( d ) = -2 Ai d = bi Ai d ≤ bi i = 1, …, m e i = m e + 1, …, m
(3-32)
The method used in the Optimization Toolbox is an active set strategy (also known as a projection method) similar to that of Gill et al., described in [20] and [19]. It has been modified for both Linear Programming (LP) and Quadratic Programming (QP) problems. The solution procedure involves two phases. The first phase involves the calculation of a feasible point (if one exists). The second phase involves the generation of an iterative sequence of feasible points that converge to the solution. In this method an active set, A k , is maintained that is an estimate of the active constraints (