Optimization algorithms: Fundamentals¶

In this course, we will use optimization methods that are iterative: They generate a series of points \(x_{1}\), \(x_{2}\),…,\(x_{n}\) that are designed to move towards a local minimizer \(x^{*}\) These methods work by imposing a requirement that \(f(x_{n+1}) < f(x_{n})\). Nonmonotone methods relax this requirement and use a condition that the objective must decrease when compared with the last few objective function values. In this course, we’ll examine two classes of unconstrained optimization methods: line search methods and trust region methods that achieve this goal in different ways.

Line Search Methods¶

The way a line search method works:

Start from \(x_{1}\), set \(k = 1\)
If \(||\nabla f(x_{k})|| \le \epsilon\) quit
Choose a direction \(p\) along which \(f(x_{k} + \alpha p_{k})\) is decreasing (a descent direction)
Find a step length by approximately solving \(\min_{\alpha} f(x_{k} + \alpha p_{k})\)
Set \(x_{k+1} = x_{k} + \alpha p_{k}\), \(k = k+1\) and goto step 2

Key components of a line search method:

Convergence check, \(||\nabla f(x_{k})|| \le \epsilon\), checks first-order necessary conditions
The search direction \(p\) must be a descent direction such that \(p^{T} \nabla f < 0\)
Different line search methods can be used to find \(\alpha\)
Different methods to select the search direction \(p\)

Methods for selecting the search direction:

Steepest descent: \(p_{k} = - \nabla f_{k}\), (very slow)
Conjugate gradient: \(p_{k} = - \nabla f_{k} + \beta p_{k-1}\), \(\beta = ||\nabla f_{k}||_{2}^{2}/||\nabla f_{k-1}||_{2}^{2}\), (much better than steepest descent)
Newton method: \(p_{k} = - H(x_{k})^{-1} \nabla f_{k}\), (slow if \(H_{k}\) is computationally expensive, may not be a descent direction)
Quasi-Newton: \(p_{k} = - B_{k}^{-1} \nabla f_{k}\), \(B_{k}\) approximates the Hessian, formed by accumulating gradient information

Trust Region Methods¶

The way a trust region method works:

Start from \(x_{1}\), set \(k = 1\)
If \(||\nabla f(x_{k})|| \le \epsilon\) quit
Build or update a model function \(m_{k}(x_{k} + p) \approx f(x_{k} + p)\) for \(||p|| \le \Delta\), \(\Delta\) is the trust region radius
Approximately solve \(\min_{p} m_{k}(x_{k} + p)\) such that \(||p|| \le \Delta\)
If the solution \(x_{k} + p_{k}\) does not produce a sufficient decrease in \(f\), shrink the trust region radius \(\Delta\) and goto step 4
Otherwise accept the step and set \(x_{k+1} = x_{k} + p_{k}\), set \(k = k+1\) and goto step 2

Typically the model \(m_{k}(x_{k} + p)\) is constructed using a quadratic model \(m_{k} = f(x_{k}) + \nabla f(x_{k})^{T}p + \frac{1}{2} p^{T}B_{k}p\)
\(B_{k}\) may be the Hessian, or may approximate the Hessian
Trust region methods allow indefinite or semi-definite Hessian approximations, line search methods do not

[ ]: