Optimization algorithms: Fundamentals

In this course, we will use optimization methods that are iterative: They generate a series of points \(x_{1}\), \(x_{2}\),…,\(x_{n}\) that are designed to move towards a local minimizer \(x^{*}\) These methods work by imposing a requirement that \(f(x_{n+1}) < f(x_{n})\). Nonmonotone methods relax this requirement and use a condition that the objective must decrease when compared with the last few objective function values. In this course, we’ll examine two classes of unconstrained optimization methods: line search methods and trust region methods that achieve this goal in different ways.

Line Search Methods

The way a line search method works:

  1. Start from \(x_{1}\), set \(k = 1\)

  2. If \(||\nabla f(x_{k})|| \le \epsilon\) quit

  3. Choose a direction \(p\) along which \(f(x_{k} + \alpha p_{k})\) is decreasing (a descent direction)

  4. Find a step length by approximately solving \(\min_{\alpha} f(x_{k} + \alpha p_{k})\)

  5. Set \(x_{k+1} = x_{k} + \alpha p_{k}\), \(k = k+1\) and goto step 2

Key components of a line search method:

  • Convergence check, \(||\nabla f(x_{k})|| \le \epsilon\), checks first-order necessary conditions

  • The search direction \(p\) must be a descent direction such that \(p^{T} \nabla f < 0\)

  • Different line search methods can be used to find \(\alpha\)

  • Different methods to select the search direction \(p\)

Methods for selecting the search direction:

  1. Steepest descent: \(p_{k} = - \nabla f_{k}\), (very slow)

  2. Conjugate gradient: \(p_{k} = - \nabla f_{k} + \beta p_{k-1}\), \(\beta = ||\nabla f_{k}||_{2}^{2}/||\nabla f_{k-1}||_{2}^{2}\), (much better than steepest descent)

  3. Newton method: \(p_{k} = - H(x_{k})^{-1} \nabla f_{k}\), (slow if \(H_{k}\) is computationally expensive, may not be a descent direction)

  4. Quasi-Newton: \(p_{k} = - B_{k}^{-1} \nabla f_{k}\), \(B_{k}\) approximates the Hessian, formed by accumulating gradient information

Trust Region Methods

The way a trust region method works:

  1. Start from \(x_{1}\), set \(k = 1\)

  2. If \(||\nabla f(x_{k})|| \le \epsilon\) quit

  3. Build or update a model function \(m_{k}(x_{k} + p) \approx f(x_{k} + p)\) for \(||p|| \le \Delta\), \(\Delta\) is the trust region radius

  4. Approximately solve \(\min_{p} m_{k}(x_{k} + p)\) such that \(||p|| \le \Delta\)

  5. If the solution \(x_{k} + p_{k}\) does not produce a sufficient decrease in \(f\), shrink the trust region radius \(\Delta\) and goto step 4

  6. Otherwise accept the step and set \(x_{k+1} = x_{k} + p_{k}\), set \(k = k+1\) and goto step 2

  • Typically the model \(m_{k}(x_{k} + p)\) is constructed using a quadratic model \(m_{k} = f(x_{k}) + \nabla f(x_{k})^{T}p + \frac{1}{2} p^{T}B_{k}p\)

  • \(B_{k}\) may be the Hessian, or may approximate the Hessian

  • Trust region methods allow indefinite or semi-definite Hessian approximations, line search methods do not

[ ]: