This post deals with understanding the necessary and sufficient conditions, fundamental Lipschitz continuity assumptions and the terminal boundary conditions imposed on the Hamilton-Jacobi equation to assure that the problem of minimizing an integral performance index is well-posed.

Problem Statement

Suppose we have the following nonlinear dynamical system

\begin{equation} \label{eq:system} \dot{x} =f(x, u, t), \qquad \qquad x(t_0) = x_0 \end{equation}

which starts at state, \(x_0\) and time, \(t_0\).

Assumption I

If the function \(f(\centerdot)\) is continuously differentiable in all its arguments, then the initial value problem (IVP) of \eqref{eq:system} has a unique solution on a finite time interval; this is a sufficient assumption (Khalil, 1976).

Assumption II

\(T\) is sufficiently small enough to reside within the time interval where the system’s solutions are defined.

Qualitatively, our goal is to optimally control the system when it starts in a state \(x_0\), at time \(t_0\), to a neighborhood of the terminal manifold \(T\), whilst exerting as minimal a control energy as possible. Quantitatively, we can define this goal in terms of an index of performance evaluation defined thus:

\begin{equation} \label{eq:cost} J = J(x(t_0, u(\centerdot), t_0) = \int\limits_{t=t_0}^{T} L\left(x(\tau), u(\tau), \tau\right)d\tau + V(x(T)) \end{equation}

where \(J\) is evaluated along the trajectories of the system \(x(t)\), based on an applied control \(u(\centerdot)|_{t_0 \le t \le T} \). With \(L\left(x(\tau), u(\tau), \tau\right)\) as the instantaneous cost and \(V(x(T))\) as the terminal cost (which are nonnegative funtions of their arguments), we can think of \(J\) as the total amount of actions we take (controls) and the state energy utilized in bearing the states from \(x_0\) to a neighborhood of the terminal manifold \(V(x(T)) = 0\).

The question to ask then is that given the cost of performance index \(J\), how do we find a control law \(u^\star\) that is optimal along a unique state trajectory, \(x^\star\), in the interval \([t_0, T]\)? This optimal cost would be the minimum of all the possible costs that we could possibly incur when we implement the optimal control law \(u^\star\). Mathematically, we can express this cost as:

\begin{gather} J^\star(x(t_0), t_0) = \int\limits_{t=t_0}^{T} L \left(x^{\star}(\tau), u^\star(\tau), \tau \right) d\tau + V(x^\star(T))
= \min_{ u_{[t_0, T]}} J(x_0, u, t_0) \end{gather}

Therefore, the optimal cost is a function of the starting state and time so that we can write:

\begin{equation} J^\star(x(t_0), t_0) = \min_{ u_{[t_0, T]}} J(x(t_0), u(\centerdot), t_0) = \min_{ u_{[t_0, T]}} \int\limits_{t_0}^{T} L\left(x(\tau), u(\tau), \tau\right)d\tau + V(x(T)) \end{equation}

Now, assume that we start at an arbitrary initial condition \(x\), at time \(t\), it follows that the optimal cost-to-go from \(x(t)\) to \(x(T)\) is (abusing notation and dropping the templated arguments in \(J\)):

\begin{equation} \label{eq:cost-to-go} J^\star(x, t) = \min_{ u_{[t, T]}} \left[\int\limits_{t}^{T} L\left(x(\tau), u(\tau), \tau\right)d\tau\right] + V(x(T)) \end{equation}

Things get a little bit interesting when we splice up the integral kernel in \eqref{eq:cost-to-go} along two different time-paths, namely:

\begin{equation} \label{eq:spliced} J^\star(x, t) = \min_{ u_{[t, T]}} \left[\int\limits_{t}^{t_1} L\left(x(\tau), u(\tau), \tau\right)d\tau + \int\limits_{t_1}^{t_2} L\left(x(\tau), u(\tau), \tau\right)d\tau\right] + V(x(T)) \end{equation}

We can split the minimization over two time intervals, e.g.,

\begin{equation} \label{eq:two_mins} J^\star(x, t) = \min_{ u_{[t, t_1]}} \min_{ u_{[t_1, t_2]}} \left[\int\limits_{t}^{t_1} L\left(x(\tau), u(\tau), \tau\right)d\tau + \int\limits_{t_1}^{t_2} L\left(x(\tau), u(\tau), \tau\right)d\tau\right] + V(x(T)) \end{equation}

Equation \eqref{eq:two_mins} gives the beautiful intuition that one can divide the integration into two or more time slices, solve the optimal control problem for each time slice and in the overall, minimize the effective cost function \(J\) of the overall system. This in essence is a statement of Richard E. Bellman’s principle of optimality:

Bellman’s Principle of Optimality: An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision.

– Bellman, Richard. Dynamic Programming, 1957, Chap. III.3.

With the principle of optimality, the problem takes a more intuitive meaning, namely that the cost to go from \(x\) at time \(t\) to a terminal state \(x(T)\) can be computed by minimizing the sum of the cost to go from \(x = x(t)\) to \(x_1 = x(t_1)\) and then, the optimal cost-to-go from \(x_1\) onwards.

Therefore, \eqref{eq:two_mins} can be restated as:

\begin{equation} \label{eq:two_mins_sep} J^\star(x, t) = \min_{ u_{[t, t_1]}} \left[\int\limits_{t}^{t_1} L\left(x(\tau), u(\tau), \tau\right)d\tau + \underbrace{\min_{ u_{[t_1, t_2]}} \int\limits_{t_1}^{t_2} L\left(x(\tau), u(\tau), \tau\right)d\tau + V(x(T))}_{J^\star(x_1, \, t_1)} \right] \end{equation}

\(J^\star(x_1, \, t_1)\) in \eqref{eq:two_mins_sep} can be seen as the optimal cost-to-go from \(x_1\) to \(x(T)\), with the overall cost given by

\begin{equation} \label{eq:optimal_pre} J^\star(x, t) = \min_{ u_{[t, t_1]}} \left[\int\limits_{t}^{t_1} L\left(x(\tau), u(\tau), \tau\right)d\tau + J^\star(x_1, \, t_1) \right] \end{equation}

Replacing \(t_1\) by \(t + \delta t\) and with the assumption that \(J^\star(x, t)\) is differentiable, we can expand \eqref{eq:optimal_pre} into a first-order Taylor series around \((\delta t, x)\) as follows:

\begin{equation} \label{eq:taylor} J^\star(x, t) = \min_{ u_{[t, t + \delta]}} \left[ L\left(x, u, \tau\right)\delta t + J^\star(x, \, t) + \left(\dfrac{\partial J^\star(x, t)}{\partial t}\right) \delta t + \left(\dfrac{\partial J^\star(x, t)}{\partial x} \right) \delta x + o(\delta) \right] \end{equation}

where \(o(\delta)\) denotes higher order terms satisfying \(\lim_{\delta \rightarrow 0}\dfrac{o(\delta)}{\delta} = 0\).

Refactoring \eqref{eq:taylor}, we find that

\begin{equation} \label{eq:hamiltonian_pre} \dfrac{\partial J^\star(x, t)}{\partial t} = -\min_{ u_{[t, t + \delta]}} \left[ L\left(x, u, \tau\right) + \left(\dfrac{\partial J^\star(x, t)}{\partial x} \right) \underbrace{\dot{x}(\centerdot)}_{f(x,u,t)} \right] \end{equation}

We shall define the components in the square column of the above equation as the Hamiltonian, \(H(\centerdot)\) such that \eqref{eq:hamiltonian_pre} can be thus rewritten:

\begin{equation} \label{eq:hamiltonian} \dfrac{\partial J^\star(x, t)}{\partial t} = -\min_{ u_{[t, t + \delta]}} H\left(x, \nabla_x J^\star (x, t), u, t \right) \end{equation}

Based on the smoothness assumption of all function arguments in \eqref{eq:system}, when the linear sensitivity of the Hamiltonian to changes in \(u\) is zero, then \(\nabla H_u\) must vanish at the optimal point i.e.,

\begin{equation} \label{eq:hamiltonian_deri} \nabla H_u(x, \nabla_x J^\star (x, t), u, t) = 0 \end{equation}

ensuring that we satisfy the local optimality property of the controller. In addition, if the Hessian of the Hamiltonian is positive definite along the trajectories of the solution, i.e.,

\dfrac{\partial^2 H}{\partial^2 u} > 0 \end{equation}

then we have the sufficient condition for global optimality. These conditions are referred to as the Legendre-Clebsch conditions, essentially guaranteeing that over a singular arc, the Hamiltonian is minimized.

You begin to see the beauty of optimal control in that \eqref{eq:hamiltonian_deri} allows us to translate the complicated functional minimization integral of \eqref{eq:cost} into a minimization problem that can be solved by ordinary calculus.

If we let

H^\star(x, \nabla_x J^\star (x, t), t) = \min_u \left[H(x, \nabla_x J^\star (x, t), u, t)\right] \end{equation}

then it follows that solving \eqref{eq:hamiltonian_deri} for the optimal \(u = u^\star\) and putting the result in \eqref{eq:hamiltonian}, one obtains the Hamilton-Jacobi-Bellman pde whose solution is the optimal cost \(J^\star(x(t), t)\) such that

\begin{equation} \label{eq:optimal_cost} \dfrac{\partial J^\star(x, t)}{\partial t} = -H^\star \left(x, \nabla_x J^\star (x, t), u, t \right) \end{equation}

We can introduce a boundary condition that assures that the cost function of \eqref{eq:cost} is well-posed viz,

\begin{equation} \label{eq:boundary_cost} J^\star(x(T), T) = V(x(T)) \end{equation}

Taken together, equations \eqref{eq:optimal_cost} allows us to analytically solve for the instanteneous kinetic energy of the cost function in \eqref{eq:cost} and \eqref{eq:boundary_cost} allows us to solve for the boundary condition that assure the sufficiency of an optimal control law to exist. If we can solve for \(u^\star\) from \(J^\star(x,t)\), then \eqref{eq:boundary_cost} must constitute the optimal control policy for the nonlinear dynamical system in \eqref{eq:system} given the cost index \eqref{eq:cost}.


Notice that the optimal policy \(u^\star(t)\) is basically an open-loop control strategy. Why so? \(u^\star\) was derived as a function of time \(t\). As a result, the strategy may not be robust to uncertainties and may be very sensitive. For practical applications, we generally want to have a feedback control policy that is state dependent in order to guarantee robustness to parametric variations and achieve robust stability and performance. Such a \(u = u^\star(x)\) would be helpful in analyzing the stability of states and convergence of system dynamics to equilibrium for all future times. Will post such methods in the future.


Properties Equations
Dynamics: \(\dot{x} =f(x, u, t), \quad x(t_0) = x_0 \)
Cost: \(J(x,u,\tau) = \int\limits_{t=t_0}^{T} L\left(x(\tau), u(\tau), \tau\right)d\tau + V(x(T))\)
Optimal cost : \(J^\star(x,t) = \min_{u[t,T]}J\)
Hamiltonian: \(H(x,u,t) = L\left(x, u, \tau\right) + \left(\dfrac{\partial J^\star(x, t)}{\partial x} \right) f(x,u,t)\)
Optimal Control: \(u^\star(t) = H^\star(x,u,t) = \nabla H_u(x,u,t)\)
HJB Equation: \(-\dfrac{\partial J^\star(x,t)}{\partial t} = H^\star(x, \nabla_x J^\star(x,t),t) )\) and \(J^\star(x(T), T) = V(x(T))\)

Further Readings

Richard Bellman: On The Birth Of Dynamic Programming

Optimal Control: Linear Quadratic Methods