Minimization of Quadratic Functions

A good first example is the minimization of quadratic functions. This will be useful in more general situations since we’ll often form a quadratic approximation of the objective function. We’ll study a quadratic function that takes the following form

\begin{equation*} f(x) = \frac{1}{2} x^{T} A x + b^{T} x + c, \end{equation*}

where \(x \in \mathbb{R}^{n}\), \(A = A^{T} \in \mathbb{R}^{n \times n}\). Since \(A\) is symmetric, it has an eigenvalue decomposition \(A = Q \Lambda Q^{T}\).

The gradient of \(f(x)\) is

\begin{equation*} \nabla f(x) = \frac{1}{2}(A x + A^{T} x) + b = A x + b \end{equation*}

The Hessian of \(f(x)\) is

\begin{equation*} H(x) = A \end{equation*}

To analyze \(f(x)\), it is convenient to use a change of basis to use an eigenvector basis as follows

\begin{equation*} x = Q \xi, \end{equation*}

where \(\xi \in \mathbb{R}^{n}\) are the components along each eigenvector. For each \(\xi\) point, we can recover the original point \(x = Q \xi\).

The advantage of this new basis is that it leads to a simplified form for the quadratic as follows

\begin{equation*} \begin{aligned} f(\xi) & = \frac{1}{2} \xi^{T} Q^{T} A Q \xi + \xi^{T} Q^{T} b + c = \frac{1}{2} \xi^{T} \Lambda \xi + \xi^{T} \bar{b} + c \\ &= \sum_{i=1}^{n} (\frac{1}{2}\lambda_{i} \xi_{i}^2 + \bar{b}_{i} \xi_{i} ) + c \end{aligned} \end{equation*}

where we have defined \(\bar{b} = Q^{T} b\) or component-wise \(\bar{b}_{i} = q_{i}^{T}b\). This is a separable function because it can be written as

\begin{equation*} f(\xi) = \sum_{i=1}^{n} f_{i}(\xi_{i}) \end{equation*}

so that we can minimize each \(f_{i}(\xi_{i})\) independently since each of these functions share no common variables. Focusing on a single function, \(f_{i}(\xi_{i})\) we have the form

\begin{equation*} f_{i}(\xi_{i}) = \frac{1}{2} \lambda_{i} \xi_{i}^{2} + \bar{b}_{i} \xi_{i} \end{equation*}

(We ignore the constant term because it has no impact on the location or characterization of the minimizer.)

Based on minimization of a one-dimensional function, we can make progress by considering a number of different cases. First, if \(\lambda_{i} = 0\) and \(\bar{b}_{i} = 0\), then the function is constant and any \(\xi_{i}\) we pick is a minimizer or maximizer! If \(\lambda_{i} = 0\) and \(\bar{b}_{i} \ne 0\), then the function is a line and there is no minimizer or maximizer.

Now, if \(\lambda_{i} \ne 0\), we have a quadratic function. Taking the derivative of \(f_{i}(\xi_{i})\) and set it equal to zero to obtain the critical point gives

\begin{equation*} \xi_{i}^{cr} = -\frac{\bar{b}_{i}}{\lambda_{i}} \end{equation*}

Finding the second derivative of \(f_{i}(\xi_{i})\) gives just \(\lambda_{i}\). So if \(\lambda_{i} \ne 0\), then when \(\lambda_{i} > 0\), we have a minimizer at \(\xi_{i}^{*} = - {\bar{b}_{i}}/{\lambda_{i}}\). Whereas if \(\lambda_{i} < 0\), we have a maximizer at \(\xi_{i} = {\bar{b}_{i}}/{\lambda_{i}}\).

As a result we have the following cases:

  1. If \(\lambda_{i} > 0\), then \(\xi_{i}^{*}\) is a minimizer

  2. If \(\lambda_{i} < 0\), then \(\xi_{i}^{*}\) is a maximizer

  3. If \(\lambda_{i} = 0\) and \(\bar{b}_{i} = 0\), then any value of \(\xi_{i}\) is a minimizer or maximizer

  4. If \(\lambda_{i} = 0\) and \(\bar{b}_{i} \ne 0\), then the function is unbounded from below (no minimizer)

We can now apply these conditions to all the eigenvalues.

  1. If \(A\) is positive definite, then all \(\xi_{i}\) directions are minimizers

  2. If \(A\) is positive semi-definite, then the function is either unbounded from below or has a non-unique minimizer depending on \(b\)

  3. If \(A\) is indefinite, negative semi-definite or negative definite, then the function is unbounded from below and has no minimizer

For case 1, we have that the location of the minimizer is given by

\begin{equation*} \xi^{*} = -\Lambda^{-1}\bar{b} = -\Lambda^{-1} Q^{T} b \end{equation*}

Transforming this back into the design coordinates gives

\begin{equation*} x^{*} = Q \xi^{*} = -Q \Lambda^{-1} Q^{T} b = -A^{-1} b \end{equation*}

Note that \(x^{*} = -A^{-1} b\) is a minimizer only if \(A\) is positive definite!

For case 2, we have that \(\lambda_{i} \ge 0\). For those \(\lambda_{i} = 0\), we have to test the conditions on \(\bar{b}_{i} = q_{i}^{T} b\). So if \(q_{i}^{T} b = 0\) for each \(i\) where we have \(\lambda_{i} = 0\), then we have a non-unique minimizer, because we can move along \(\xi_{i}\) without changing the function value.

Note that for case 2, the matrix \(A\) is singular, so we have to be careful!

In the case where \(q_{i}^{T} b = 0\) for all \(\lambda_{i} = 0\), then minimizer can be expressed as follows:

\begin{equation*} \xi_{i}^{*} = \left\{ \begin{array}{lr} -\frac{\bar{b}_{i}}{\lambda_{i}} & \lambda_{i} \ne 0 \\ 0 & \lambda_{i} = 0 \\ \end{array} \right. \end{equation*}

Multiplying both sides by \(\lambda_{i}\), we see that \(\Lambda \xi^{*} = -\bar{b}\), but the matrix \(\Lambda\) has zeros at some entries along the diagonal. To address these, we add a matrix \(E\) whose entries are all zero, except on the diagonal we place a 1 wherever there was a zero eigenvalue. Note that in this case, \(\bar{b}_{i} = 0\), so we are setting \(\xi_{i}^{*} = 0\) whenever \(\lambda_{i} = 0\). Now we can write

\begin{equation*} \begin{aligned} (\Lambda + E) \xi^{*} &= -\bar{b} \\ (\Lambda + E) Q^{T} x^{*} &= -Q^{T} b \\ Q (\Lambda + E) Q^{T} x^{*} &= -b \\ (Q \Lambda Q^{T} + Q E Q^{T}) x^{*} &= -b \\ \left(A + \sum_{i s.t. \lambda_{i} = 0} q_{i} q_{i}^{T} \right) x^{*} &= -b \\ \bar{A} x^{*} &= -b \\ \end{aligned} \end{equation*}

where here we have define

\begin{equation*} \bar{A} = A + \sum_{i s.t. \lambda_{i} = 0} q_{i} q_{i}^{T} \end{equation*}

Note that \(\bar{A}\) is nonsingular.

In the following, we consider several examples that show several quadratic functions.

Examples

Example 1

Consider the quadratic function

\begin{equation*} f(x) = x_{1}^2 + x_{2}^2 + x_{1} x_{2} \end{equation*}

The \(A\) matrix and \(b\) vector for this case are

\begin{equation*} A = \begin{bmatrix} 2 & 1 \\ 1 & 2 \\ \end{bmatrix} \quad\quad b = \begin{bmatrix} 0 \\ 0 \end{bmatrix} \end{equation*}

The eigenvalues for this case are \(\lambda_{1} = 1\), \(\lambda_{2} = 3\). Therefore, the minimizer is the point \(x^{*} = (0, 0)\).

Example 2

Consider the quadratic function

\begin{equation*} f(x) = \frac{1}{2} x_{1}^2 + \frac{1}{2} x_{2}^2 - x_{1} x_{2} \end{equation*}

The \(A\) matrix and \(b\) vector for this case are

\begin{equation*} A = \begin{bmatrix} 1 & -1 \\ -1 & 1 \\ \end{bmatrix} \quad\quad b = \begin{bmatrix} 0 \\ 0 \end{bmatrix} \end{equation*}

The eigenvalues of the matrix \(A\) are \(\lambda_{1} = 0\), \(\lambda_{2} = 2\). Therefore the matrix is positive semi-definite. Since \(b\) is zero, there is an infinite line of minimizers. The matrix of eigenvectors is

\begin{equation*} Q = \frac{1}{\sqrt{2}} \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix} \end{equation*}

The line of minimizers is given by

\begin{equation*} x^{*} = \alpha q_{1} = \dfrac{\alpha}{\sqrt{2}} \begin{bmatrix} 1 \\ 1 \end{bmatrix} \end{equation*}

where \(\alpha \in \mathbb{R}\) is arbitrary. Note that the non-singular matrix \(\bar{A}\) is

\begin{equation*} \bar{A} = A + q_{1} q_{1}^{T} = \begin{bmatrix} 1 & -1 \\ -1 & 1 \\ \end{bmatrix} + \frac{1}{2} \begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix} = \frac{1}{2} \begin{bmatrix} 3 & -1 \\ -1 & 3 \end{bmatrix} \end{equation*}

Example 3

Consider the quadratic function

\begin{equation*} f(x) = \frac{1}{2} x_{1}^2 + \frac{1}{2} x_{2}^2 - x_{1} x_{2} + x_{1} + x_{2} \end{equation*}

The \(A\) matrix and \(b\) vector for this case are

\begin{equation*} A = \begin{bmatrix} 1 & -1 \\ -1 & 1 \\ \end{bmatrix} \quad\quad b = \begin{bmatrix} 1 \\ 1 \end{bmatrix} \end{equation*}

The \(A\) matrix is the same as Example 2. Since \(\lambda_{1} = 0\), and \(q_{1}^{T} b = \sqrt{2} \ne 0\), the function is unbounded from below. There is no minimizer.

Example 4

Consider the quadratic function

\begin{equation*} f(x) = \frac{1}{2} x_{1}^2 + \frac{1}{2} x_{2}^2 - x_{1} x_{2} + x_{1} - x_{2} \end{equation*}

The \(A\) matrix and \(b\) vector for this case are

\begin{equation*} A = \begin{bmatrix} 1 & -1 \\ -1 & 1 \\ \end{bmatrix} \quad\quad b = \begin{bmatrix} 1 \\ -1 \end{bmatrix} \end{equation*}

The \(A\) matrix is the same as Example 2. However, since \(\lambda_{1} = 0\) and \(q_{1}^{T} b = 0\), there are an infinite line of minimizers.

To find this line, we can first solve for a point along the line as

\begin{equation*} x^{*} = - \bar{A}^{-1} b = - 2 \begin{bmatrix} 3 & -1 \\ -1 & 3 \end{bmatrix}^{-1} \begin{bmatrix} 1 \\ -1 \end{bmatrix} = - \frac{1}{4} \begin{bmatrix} 3 & 1 \\ 1 & 3 \end{bmatrix} \begin{bmatrix} 1 \\ -1 \end{bmatrix} = \frac{1}{2} \begin{bmatrix} -1 \\ 1 \end{bmatrix} \end{equation*}

However, we can move along the direction \(q_{1}\) without changing the value of the quadratic. As a result, this is a line of minimizers

\begin{equation*} x^{*} = \frac{1}{2} \begin{bmatrix} -1 \\ 1 \end{bmatrix} + \dfrac{\alpha}{\sqrt{2}} \begin{bmatrix} 1 \\ 1 \end{bmatrix} \end{equation*}

Example 5

Consider the quadratic function

\begin{equation*} f(x) = 2 x_{1}^2 - 2 x_{2}^2 + x_{1} x_{2} \end{equation*}

The \(A\) matrix and \(b\) vector for this case are

\begin{equation*} A = \begin{bmatrix} 4 & 1 \\ 1 & -4 \\ \end{bmatrix} \quad\quad b = \begin{bmatrix} 0 \\ 0 \end{bmatrix} \end{equation*}

The eigenvalues can be obtained by solving

\begin{equation*} \det(A - \lambda 1) = -(4 - \lambda)(4 + \lambda) - 1 = \lambda^{2} - 17 = 0 \end{equation*}

Therefore the eigenvalues are \(\lambda_{1} = -\sqrt{17}\) and \(\lambda_{2} = \sqrt{17}\). The matrix \(A\) is indefinite. Therefore the function \(f(x)\) is a saddle point and unbounded from below. There is no minimizer.

[4]:
import numpy as np
import matplotlib.pylab as plt

def quadratic_decomposition(A, b, tol=1e-6):
    # Perform the eigenvalue decomposition
    lam, Q = np.linalg.eigh(A)

    # Compute bar{b} = Q^{T}*b
    bbar = np.dot(Q.T, b)

    if lam[0] > tol:
        print('Positive definite; Unique minimizer')
        xstar = -np.linalg.solve(A, b)
    elif lam[0] > -tol:
        # The problem is semi-definite

        # Keep track of whether this problem is unbounded from below or not
        unbounded = False

        # Copy the values of A to a new matrix
        Abar = np.array(A, dtype=float)

        for index, eig in enumerate(lam):
            if np.fabs(eig) < tol:
                # Add the outer product of the two vectors to make the matrix
                # Abar non-singular
                Abar += np.outer(Q[:, index], Q[:, index])

                # Keep track of whether bbar is zero or not
                if np.fabs(bbar[index]) > tol:
                    unbounded = True

        if not unbounded:
            print('Semi-definite; Infinite number of minimizers')
            xstar = -np.linalg.solve(Abar, b)
        else:
            print('Semi-definite; Quadratic unbounded')
            xstar = None
    else:
        # The smallest eigenvalue is negative
        print('Indefinite; Quadratic unbounded')
        xstar = None

    if xstar is not None:
        print('xstar = ', xstar)

    if b.shape[0] == 2:
        m = 50
        x1 = np.linspace(-2, 2, m)
        x2 = np.linspace(-2, 2, m)
        X1, X2 = np.meshgrid(x1, x2)

        Fx = np.zeros(X1.shape)
        Fxi = np.zeros(X1.shape)

        for j in range(m):
            for i in range(m):
                x = np.array([X1[i,j], X2[i,j]])
                Fx[i, j] = 0.5*np.dot(x, np.dot(A, x)) + np.dot(b, x)
                Fxi[i, j] = 0.5*np.sum(lam*x**2) + np.dot(bbar, x)

        fig, ax = plt.subplots(1, 2)
        ax[0].contour(X1, X2, Fxi)
        ax[1].contour(X1, X2, Fx)
        if xstar is not None:
            ax[1].plot([xstar[0]], [xstar[1]], 'bo')
        ax[0].set_aspect('equal', 'box')
        ax[0].set_title('Eig. space')

        ax[1].set_aspect('equal', 'box')
        ax[1].set_title('Design space')
        fig.tight_layout()
        plt.show()

# Positive definite example
A = np.array([[2.0, 1.0], [1.0, 2.0]])
b = np.zeros(2)
quadratic_decomposition(A, b)

# Semi-definite examples
A = np.array([[1.0, -1.0], [-1.0, 1.0]])
b = np.zeros(2)
quadratic_decomposition(A, b)

b = np.array([1.0, 1.0])
quadratic_decomposition(A, b)

b = np.array([1.0, -1.0])
quadratic_decomposition(A, b)

# Negative definite examples
A = np.array([[4.0, 1.0], [1.0, -4.0]])
b = np.zeros(2)
quadratic_decomposition(A, b)
Positive definite; Unique minimizer
xstar =  [-0. -0.]
../_images/jupyter_Minimization_of_Quadratic_Functions_1_1.png
Semi-definite; Infinite number of minimizers
xstar =  [-0. -0.]
../_images/jupyter_Minimization_of_Quadratic_Functions_1_3.png
Semi-definite; Quadratic unbounded
../_images/jupyter_Minimization_of_Quadratic_Functions_1_5.png
Semi-definite; Infinite number of minimizers
xstar =  [-0.5  0.5]
../_images/jupyter_Minimization_of_Quadratic_Functions_1_7.png
Indefinite; Quadratic unbounded
../_images/jupyter_Minimization_of_Quadratic_Functions_1_9.png
[ ]: