This post deals with understanding the necessary and sufficient conditions, fundamental Lipschitz continuity assumptions and the terminal boundary conditions imposed on the Hamilton-Jacobi equation to assure that the problem of minimizing an integral performance index is well-posed.

The Problem of Bolza

Suppose we have the following nonlinear dynamical system

$$\label{eq:system} \dot{x} =f(x, u, t), \qquad \qquad x(t_0) = x_0$$

which starts at state, $x_0$ and time, $t_0$.

Assumption I

If the function $f(\centerdot)$ is continuously differentiable in all its arguments, then the initial value problem (IVP) of \eqref{eq:system} has a unique solution on a finite time interval; this is a sufficient assumption (Khalil, 1976).

Assumption II

$T$ is sufficiently small enough to reside within the time interval where the system’s solutions are defined.

Qualitatively, our goal is to optimally control the system when it starts in a state $x_0$, at a fixed time $t_0$, to a neighborhood of the terminal manifold $T$, whilst exerting as minimal a control energy as possible. Quantitatively, we can define this goal in terms of an index of performance evaluation defined thus:

$$\label{eq:cost} J = J(x(t_0), u(\centerdot), t_0) = V(x(T)) + \int\limits_{t=t_0}^{T} L\left(x(\tau), u(\tau), \tau\right)d\tau.$$

With $L\left(x(\tau), u(\tau), \tau\right)$ as the instantaneous cost and $V(x(T))$ is essentially, the terminal cost – both are nonnegative functions of their arguments. We can think of $J$ as the total amount of actions we take (controls) and the state energy utilized in bearing the states from $x_0$ to a neighborhood of the terminal manifold where the cost $V(x(T)) = 0$. In the above equation, the initial time, $t_0$ is fixed, and the final time, $t_f$, is a variable. Problems of the form \eqref{eq:cost} that involve the cost on only the initial and final states are called problems of Mayer; those that involve only the integral or the running cost are problems of Lagrange; while those that are explicitly written out in the form of \eqref{eq:cost} are the so-called problems of Bolza, and $J$ is evaluated along the trajectories of the system $x(t)$, based on an applied control $u(\centerdot)|_{t_0 \le t \le T}$.

 Mayer problem: $J = x_0(t_f) + V(x(T), T); \qquad \dot{x}_0 = f(x, u, t); x(T) = 0$. Lagrange problem: $J = \int\limits_{t=t_0}^{T} L\left(x(\tau), u(\tau), \tau\right)d\tau; \qquad \dot{x} = 0; V(x(T), T) = 0$ .

The question to ask then is that given the cost of performance index $J$, how do we find a control law $u^\star$ that is optimal along a unique state trajectory, $x^\star$, in the interval $[t_0, T]$? This optimal cost would be the minimum of all the possible costs that we could possibly incur when we implement the optimal control law $u^\star$. Mathematically, we can express this cost as:

\begin{gather} J^\star(x(t_0), t_0) = \int\limits_{t=t_0}^{T} L \left(x^{\star}(\tau), u^\star(\tau), \tau \right) d\tau + V(x^\star(T))
= \min_{ u_{[t_0, T]}} J(x_0, u, t_0). \end{gather}

Therefore, the optimal cost is a function of the starting state and time so that we can write:

$$J^\star(x(t_0), t_0) = \min_{ u_{[t_0, T]}} J(x(t_0), u(\centerdot), t_0) = \min_{ u_{[t_0, T]}} \int\limits_{t_0}^{T} L\left(x(\tau), u(\tau), \tau\right)d\tau + V(x(T)).$$

Now, assume that we start at an arbitrary initial condition $x$, at time $t$, it follows that the optimal cost-to-go from $x(t)$ to $x(T)$ is (abusing notation and dropping the templated arguments in $J$):

$$\label{eq:cost-to-go} J^\star(x, t) = \min_{ u_{[t, T]}} \left[\int\limits_{t}^{T} L\left(x(\tau), u(\tau), \tau\right)d\tau\right] + V(x(T)).$$

Things get a little bit interesting when we splice up the integral kernel in \eqref{eq:cost-to-go} along two different time-paths, namely:

$$\label{eq:spliced} J^\star(x, t) = \min_{ u_{[t, T]}} \left[\int\limits_{t}^{t_1} L\left(x(\tau), u(\tau), \tau\right)d\tau + \int\limits_{t_1}^{t_2} L\left(x(\tau), u(\tau), \tau\right)d\tau\right] + V(x(T)).$$

We can split the minimization over two time intervals, e.g.,

$$\label{eq:two_mins} J^\star(x, t) = \min_{ u_{[t, t_1]}} \min_{ u_{[t_1, t_2]}} \left[\int\limits_{t}^{t_1} L\left(x(\tau), u(\tau), \tau\right)d\tau + \int\limits_{t_1}^{t_2} L\left(x(\tau), u(\tau), \tau\right)d\tau\right] + V(x(T)).$$

Equation \eqref{eq:two_mins} gives the beautiful intuition that one can divide the integration into two or more time slices, solve the optimal control problem for each time slice and in the overall, minimize the effective cost function $J$ of the overall system. This in essence is a statement of Richard E. Bellman’s principle of optimality:

Bellman’s Principle of Optimality: An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision.

– Bellman, Richard. Dynamic Programming, 1957, Chap. III.3.

With the principle of optimality, the problem takes a more intuitive meaning, namely that the cost-to-go from $x$ at time $t$ to a terminal state $x(T)$ can be computed by minimizing the sum of the cost to go from $x = x(t)$ to $x_1 = x(t_1)$ and then, the optimal cost-to-go from $x_1$ onwards.

Therefore, \eqref{eq:two_mins} can be restated as:

$$\label{eq:two_mins_sep} J^\star(x, t) = \min_{ u_{[t, t_1]}} \left[\int\limits_{t}^{t_1} L\left(x(\tau), u(\tau), \tau\right)d\tau + \underbrace{\min_{ u_{[t_1, t_2]}} \int\limits_{t_1}^{t_2} L\left(x(\tau), u(\tau), \tau\right)d\tau + V(x(T))}_{J^\star(x_1, \, t_1)} \right].$$

$J^\star(x_1, \, t_1)$ in \eqref{eq:two_mins_sep} can be seen as the optimal cost-to-go from $x_1$ to $x(T)$, so that the overall cost can be written as

$$\label{eq:optimal_pre} J^\star(x, t) = \min_{ u_{[t, t_1]}} \left[\int\limits_{t}^{t_1} L\left(x(\tau), u(\tau), \tau\right)d\tau + J^\star(x_1, \, t_1) \right].$$

Replacing $t_1$ by $t + \delta t$ and with the assumption that $J^\star(x, t)$ is differentiable, we can expand \eqref{eq:optimal_pre} into a first-order Taylor series around $(\delta t, x)$ as follows:

$$\label{eq:taylor} J^\star(x, t) = \min_{ u_{[t, t + \delta t]}} \left[ L\left(x, u, \tau\right)\delta t + J^\star(x, \, t) + \left(\dfrac{\partial J^\star(x, t)}{\partial t}\right) \delta t + \left(\dfrac{\partial J^\star(x, t)}{\partial x} \right)^T \delta x + o(\delta) \right]$$

where $o(\delta)$ denotes higher order terms satisfying $\lim_{\delta \rightarrow 0}\dfrac{o(\delta)}{\delta} = 0$.

Refactoring \eqref{eq:taylor}, we find that

$$\label{eq:hamiltonian_pre} \dfrac{\partial J^\star(x, t)}{\partial t} = -\min_{ u_{[t, t + \delta t]}} \left[ L\left(x, u, \tau\right) + \left(\dfrac{\partial J^\star(x, t)}{\partial x} \right) \underbrace{\dot{x}(\centerdot)}_{f(x,u,t)} \right],$$

which is in some sense a statement of the Hamilton-Jacobi (HJ-) equation. As $L(x(t), u(t), t)$ is a functional, the HJ equation is not precisely a partial differential equation (p.d.e) but a mixture of functionals and p.d.e’s. We shall define the components in the square column of the above equation as the Hamiltonian, $H(\centerdot)$ such that \eqref{eq:hamiltonian_pre} can be thus rewritten:

$$\label{eq:hamiltonian} \dfrac{\partial J^\star(x, t)}{\partial t} = -\min_{ u_{[t, t + \delta t]}} H\left(x, \nabla_x J^\star (x, t), u, t \right).$$

Based on the smoothness assumption of all function arguments in \eqref{eq:system}, when the linear sensitivity of the Hamiltonian to changes in $u$ is zero, then $\nabla H_u$ must vanish at the optimal point i.e.,

$$\label{eq:hamiltonian_deri} \nabla H_u(x, \nabla_x J^\star (x, t), u, t) = 0$$

ensuring that we satisfy the local optimality property of the controller. In addition, if the Hessian of the Hamiltonian is positive definite along the trajectories of the solution, i.e.,

\dfrac{\partial^2 H}{\partial^2 u} > 0

then we have the sufficient condition for global optimality. These conditions are referred to as the Legendre-Clebsch conditions, essentially guaranteeing that over a singular arc, the Hamiltonian is minimized i.e. the functional (in the Bolza problem) attains an extremum if the Hessian of the Hamiltonian is positive definite at all points of the extremal arc for all systems of variations consistent with the constraining equations..

You begin to see the beauty of optimal control in that \eqref{eq:hamiltonian_deri} allows us to translate the complicated functional minimization integral of \eqref{eq:cost} into a minimization problem that can be solved by ordinary calculus (i.e. a function minimization problem). If we let

H^\star(x, \nabla_x J^\star (x, t), t) = \min_u \left[H(x, \nabla_x J^\star (x, t), u, t)\right]

then it follows that solving \eqref{eq:hamiltonian_deri} for the optimal $u = u^\star$ and putting the result in \eqref{eq:hamiltonian}, one obtains the Hamilton-Jacobi-Bellman pde whose solution is the optimal cost $J^\star(x(t), t)$ such that

$$\label{eq:optimal_cost} \dfrac{\partial J^\star(x, t)}{\partial t} = -H^\star \left(x, \nabla_x J^\star (x, t), u, t \right).$$

We can introduce a boundary condition that assures that the cost function of \eqref{eq:cost} is well-posed viz,

$$\label{eq:boundary_cost} J^\star(x(T), T) = V(x(T)).$$

Taken together, equations \eqref{eq:optimal_cost} and \eqref{eq:boundary_cost} allow us to analytically solve for the instantaneous kinetic energy of the cost function in \eqref{eq:cost} and \eqref{eq:boundary_cost} allows us to solve for the boundary condition that assure the sufficiency of an optimal control law to exist. If we can solve for $u^\star$ from $J^\star(x,t)$, then \eqref{eq:boundary_cost} must constitute the optimal control policy for the nonlinear dynamical system in \eqref{eq:system} given the cost index \eqref{eq:cost}.

Optimality Conditions via Calculus of Variations

We can also derive the necessary and sufficient conditions by performing calculus of variations on the cost function of \eqref{eq:cost} subject to the constraint on the final state (where I have replaced $T$ by $t_f$):

\begin{align} \bar{V}(x(t_f), t_f)& = 0 \end{align}

with the smooth mapping $\bar{V}(x(t_f), t_f): \mathbb{R}^n \times \mathbb{R} \rightarrow \mathbb{R}^p, p>= 2n+2$. We can rewrite the equation of Bolza using Lagrange multipliers $\lambda \in \mathbb{R}^p, \text{ and } p(t) \in \mathbb{R}^n$ as follows,

\begin{align} \label{eq:modified_cost} \tilde{J} = V(x(t_f), t_f) + \lambda^T \bar{V}(x(t_f), t_f) + \int\limits_{t=t_0}^{t_f} L \left[\left(x(\tau), u(\tau), \tau \right) + p^T\left(f(x(\tau), u(\tau), \tau) - \dot{x}(\centerdot)\right)\right] d\tau \end{align}

where by the Legendre transformation, we may define the Hamiltonian as

\begin{align} H(x, p, u, t) = L(x, u, t) + p^T f(x, u, t). \end{align}

Assuming independent variations in $\delta u(\centerdot), \delta x(\centerdot), \delta p(\centerdot), \delta \lambda \text{ and } \delta t_f$, the variation of \eqref{eq:modified_cost} becomes

\begin{align} \delta \tilde{J} = \left(\dfrac{\partial V}{\partial x} + \dfrac{\partial \bar{V}^T}{\partial x}\lambda \right) \delta x|_{t_f} + \left(\dfrac{\partial V}{\partial t_f} + \dfrac{\partial \bar{V}^T}{\partial t_f}\lambda \right) \delta t|_{t_f} + \bar{V}^T \delta \lambda + \left(H-p^T \dot{x}\right) \delta t|_{t_f} + \nonumber \end{align}

\begin{align} \int\limits_{t_0}^{t_f} \left[ \dfrac{\partial H}{\partial x} \delta x + \dfrac{\partial H}{\partial u} \delta u - p^T \delta \dot{x} + \left(\dfrac{\partial H^T}{\partial p} - \dot{x}\right)^T \delta p \right] d\tau \end{align}

whereupon, integrating $\int p^T \delta\dot{x} d\tau$ by parts, we find that

\begin{align} \delta \tilde{J} = \left(\dfrac{\partial V}{\partial x} + \dfrac{\partial \bar{V}^T}{\partial x}\lambda -p^T\right) \delta x({t_f}) + \left(\dfrac{\partial V}{\partial t_f} + \dfrac{\partial \bar{V}^T}{\partial t_f}\lambda + H\right) \delta t({t_f}) + \bar{V}^T \delta \lambda + \nonumber \end{align}

\begin{align} \label{eq:cost-var} \int\limits_{t_0}^{t^f} \left[ \left(\dfrac{\partial H}{\partial x} + \dot{p}^T \right) \delta x + \dfrac{\partial H}{\partial u} \delta u + \left(\dfrac{\partial H^T}{\partial p} - \dot{x}\right)^T \delta p \right] d\tau. \end{align}

And for the independent variations $\delta \lambda, \delta x, \delta u, \delta p$, an optimal value of the modified cost $\tilde{J}$ occurs when \eqref{eq:cost-var} becomes zero i.e. $\tilde{J}=0$, so that we have the following for the necessary conditions for optimality:

Variation Name Equation
$\delta \lambda$ Final state constraint $\bar{V}(x(t_f), t_f) = 0$
$\delta p$ State Equation $\dot{x} = \dfrac{\partial H^T}{\partial p} \triangleq f(x, u, t)$
$\delta x$ Co-state Equation $\dot{p} = -\dfrac{\partial H^T}{\partial x}$
$\delta u$ Input stationarity $\dfrac{\partial H}{\partial u} =0$
$\delta x|_{t_f}$ Boundary cond. $\dfrac{\partial V}{\partial x} - p^T = -\dfrac{\partial \bar{V}^T}{\partial x} \lambda|_{t_f}$
$\delta t_f$ Boundary cond. $H + \dfrac{\partial V}{\partial t_f} = -\dfrac{\partial \bar{V}^T}{\partial t_f} \lambda |_{t_f}$.

These necessary conditions may be split into the following categories viz.,

Optimality conditions:

\begin{align} \dot{x} = \dfrac{\partial H^T}{\partial p} \nonumber \end{align}

\begin{align} \dot{p} = -\dfrac{\partial H^T}{\partial x}. \end{align}

Stationarity condition:

\begin{align} \dfrac{\partial H (x, u^\star, p)}{\partial u} = 0. \end{align}

Terminal Constraint: \begin{align} \bar{V}(x(t_f, t_f)) = 0. \end{align}

And the transversality condition:

\begin{align} \dfrac{\partial V}{\partial x} - p^T = -\dfrac{\partial \bar{V}^T}{\partial x} \lambda|_{t_f} \nonumber \end{align}

\begin{align} H + \dfrac{\partial V}{\partial t_f} = -\dfrac{\partial \bar{V}^T}{\partial t_f} \lambda |_{t_f}. \end{align}

To find the sufficient condition, we can do a second variation on $\tilde{J}$, but given the messiness, we can use the necessary conditions to guide us in finding the sufficient condition for local minimality of, say, a trajectory’s $(x^\star(\centerdot), u^\star(\centerdot), p^\star(\centerdot))$ local minimum. Essentially, as before local minimality requires that the Hessian of the Hamiltonian,

\begin{align} \label{eq:Hessian} \dfrac{ \partial^2 H(x^\star, u^\star, p^\star, t)}{\partial u} \end{align} % be positive definite along the optimal trajectory. A sufficient condition is to require positive definiteness of the Hessian i.e.

\begin{align}
\dfrac{\partial^2 H(x, u, p, t)}{\partial^2 u} > 0. \end{align}

For global minimality’s sufficient conditions, we basically perform an argmin over all $u$ i.e.

\begin{align} u^\star(t) = \arg \min_u H(x^\star(t), u, p^\star(t), t). \end{align}

If the final time, $t_f$ is fixed, then we have the fixed endpoint problem where there is no variation in $\delta t_f$ so that we have the boundary condition:

\begin{align} p^T(t_f) = \dfrac{\partial V}{\partial x} + \dfrac{\partial \bar{V}^T}{\partial t_f} \lambda|_{t_f}. \end{align}

And when no final state constraint exists, the boundary condition becomes

\begin{align} p(t_f) = \dfrac{\partial V^T}{\partial x}|_{t_f}. \end{align}

Time-Invariant Systems

In the situation where the dynamics of \eqref{eq:system} together with the running cost $L(x, u, t)$ have no explicit dependence on time, then the final time $t_f$ is fixed and these is no final state constraint so that the equations for the necessary conditions imply that

Variation Name Equation
$\delta p$ State Equation $\dot{x} = \dfrac{\partial H^T}{\partial p} \triangleq f(x, u)$
$\delta x$ Co-state Equation $\dot{p} = -\dfrac{\partial H^T}{\partial x} \triangleq = - \dfrac{\partial f^T}{\partial x} p + \dfrac{\partial L^T}{\partial x}$
$\delta u$ Input stationarity $\dfrac{\partial H}{\partial u} =0 \triangleq \dfrac{\partial L^T}{\partial u} + \dfrac{\partial f^T}{ \partial u} p$
$\delta x|_{t_f}$ Boundary cond. $\dfrac{\partial V}{\partial x} - p^T = -\dfrac{\partial \bar{V}^T}{\partial x} \lambda$
$\delta t_f$ Boundary cond. $H(t_f) = 0$.

Moreover, it is trivial to show that the total derivative of the Hamiltonian is given by the expression:

\begin{align} \dfrac{dH^\star}{dt} = \dfrac{\partial H^\star}{\partial x}(x, p)\dot{x} + \dfrac{\partial H^\star}{\partial p}(x, p)\dot{p} = 0. \end{align}

Connections with Classical Mechanics

Most of the context in this section comes from my education from V.I. Arnold’s Mathematical Methods of Classical Mechanics book

(Under development.)

Example Optimal Control using the Hamilton-Jacobi-Bellman Equation

Here, we’ll show by way of two examples how to solve the HJB pde+functional equation for a simple second-order system.

For consider the system of equations

\begin{align} \dot{x}_1 = x_1 + 0.8x_2 \end{align}

\begin{align} \dot{x}_2 = -x_1 -x_2 + x_1^2 x_2 + u \nonumber \end{align}

where $x(0) = [3, 5]^T$ and the performance index given by

\begin{align} J = \int \limits_0^5 (x_1^2 + u^2) dt + x_1^4(5) + x_2^4(5). \end{align}

We see that $L(x, u, t) = x_1^2 + u^2 \text{ and } V(x(T) = x_1^4(5) + x_2^4(5)$. If we write $x(5) = [x_1^2, x_2^2]$, then it follows that $V(x(T)) = x^T(5) \, x(5)$.

In addition, we can write the following:

Optimal cost: $J^\star(x, t) = \min_{u[0, 5]} J(x, u, t)$

Hamiltonian: $H(x, u, t) = L(x, u, t) + \left(\dfrac{\partial J^\star(x,t)}{\partial x}\right)\dot{x} := x_1^2 +u^2 + \dfrac{\partial J^\star}{\partial x_1}\dot{x}_1 + \dfrac{\partial J^\star}{\partial x_2}\dot{x}_2.$

By the stationarity condition, $\nabla H_u$ must vanish at the minimum point so that $\nabla H_u = 2u + \dfrac{\partial J^\star}{\partial x_2} = 0$ or $u^\star = -\dfrac{1}{2}\dfrac{\partial J^\star}{\partial x_2}$.

Now, putting $u = u^\star$ back in the Hamiltonian, we obtain the HJB equation

$\dfrac{\partial J^\star}{\partial t} = -H^\star(x, \nabla_x J^\star(x,t), u, t)$

i.e.

\begin{align} H^\star(x, \nabla_x J^\star, t) = x_1^2 + \dfrac{1}{4} \dfrac{\partial J^\star}{\partial x_2} + \dfrac{\partial J^\star}{\partial x_1} \left(x_1 + 0.8 x_2^2\right) + \dfrac{\partial J^\star}{\partial x_2} \left(-x_1 - x_2 +x_1^2x_2 - \dfrac{1}{2}\dfrac{\partial J^\star}{\partial x_2}\right) \end{align}

and

\begin{align} \dfrac{\partial J^\star}{\partial t} = -x_1^2 - \dfrac{\partial J^\star}{\partial x_1} \left(x_1 + 0.8 x_2^2\right) - \dfrac{\partial J^\star}{\partial x_2} \left(\dfrac{1}{4} -x_1 - x_2 +x_1^2x_2 - \dfrac{1}{2}\dfrac{\partial J^\star}{\partial x_2}\right) \end{align}

with boundary condition

\begin{align} V(x(T)) = x_1^4(5) + x_2^4(5). \end{align}

| Example 2: | Consider the damped pendulum system of equations

\begin{align} \dot{x}_1 = x_2 \end{align}

\begin{align} \dot{x}_2 = 0.2x_1 + sin(x_1) + u \nonumber \end{align}

with $x(0) = [0, 3]^T$ and the performance index given by

\begin{align} J = \int \limits_0^5 (x_1^2 + x_1x_2 + x_2^2 + u^2) dt + x_1(5)x_2(5)+ x_1^3(5)x_2^3(5). \end{align}

We see that $L(x, u, t) = x_1^2 + x_1x_2 + x_2^2 + u^2 \text{ and } V(x(T) = x_1(5)x_2(5)+ x_1^3(5)x_2^3(5)$.

As before, we write

\begin{align} H = x_1^2 + x_1x_2 + x_2^2 + u^2 + \dfrac{\partial J^\star}{\partial x_1} \dot{x}_1+ \dfrac{\partial J^\star}{\partial x_2} \dot{x}_2 \end{align}

\begin{align} \triangleq x_1^2 + x_1x_2 + x_2^2 + u^2 + \dfrac{\partial J^\star}{\partial x_1} x_2+ \dfrac{\partial J^\star}{\partial x_2} \left(0.2x_2 + sin(x_1) + u\right) \end{align}

the stationary condition of optimality shows that $u^\star = \dfrac{-1}{2}\dfrac{\partial J^\star}{\partial x_2}$, so that we can rewrite the Hamiltonian as

\begin{align} H = x_1^2 + x_1x_2 + x_2^2 + \dfrac{1}{4}\left(\dfrac{\partial J^\star}{\partial x_2}\right)^2 + \dfrac{\partial J^\star}{\partial x_2} \left(0.2x_2 + sin(x_1) -\dfrac{1}{2}\dfrac{\partial J^\star}{\partial x_2} \right) \end{align}

with boundary condition

\begin{align} J^\star(x(T)) = x_1(5) x_2(5) + x_1^2(5) x_2(5)^3 \end{align}

Summary

Properties Equations
Dynamics: $\dot{x} =f(x, u, t), \quad x(t_0) = x_0$
Cost: $J(x,u,\tau) = \int\limits_{t=t_0}^{T} L\left(x(\tau), u(\tau), \tau\right)d\tau + V(x(T))$
Optimal cost : $J^\star(x,t) = \min_{u[t,T]}J$
Hamiltonian: $H(x,u,t) = L\left(x, u, \tau\right) + \left(\dfrac{\partial J^\star(x, t)}{\partial x} \right) f(x,u,t)$
Optimal Control: $u^\star(t) = H^\star(x,u,t) = \nabla H_u(x,u,t)$
HJB Equation: $-\dfrac{\partial J^\star(x,t)}{\partial t} = H^\star(x, \nabla_x J^\star(x,t),t) )$ and $J^\star(x(T), T) = V(x(T))$

Changelog

• First Date: June 04, 2017
• Added Calculus of Variations Derivation: June 29, 2021
• Added HJB Examples: June 29, 2021