Lekan Molu

Optimality vs. Stability of Feedback Control Systems.

Wed, 22 Aug 2018 13:10:00 +0000

Table of Contents
Introduction
Stability vs optimality
Optimality

Intro

The question of the connection between optimality and stability is a curious thing. On the one hand, we are led to believe that if we can find an optimal control law, that can execute a plan, with the least amount of energy possible, then we are satisfied to know we have fulfilled the specifications posed in our objective function. But consider: an optimal controller is not necessarily a stable controller for a system. Why so? I lay out my case in the next section.

Stability vs optimality

Systems under the influence of optimal control laws enjoy a nice set of properties, provided that the associated cost functional enforces a constraint that is desirable on the state and control. LQ optimal control systems have nice gain and phase margins coupled with reduced sensitivity and I understand that there are similar properties that have been shown for nonlinear systems. Optimal control has the attractive property that the control effort is not wasted in mitigating the effects of nonlinearities as it chooses among a set of policies (or stabilizing control sequences) that yield a desirable effect on the system. The intractability of the HJB equation however makes optimal control as a synthesis tool for nonlinear problems a painful one.

Enter Lyapunov stability. Lyapunov defines classical stability as the system’s behavior near an equilibrium point such that there exists a real number $\delta(\epsilon, t_0)>0$ for every real $\epsilon > 0$ for which the state of the system is bound – essentially a local stability concept, a scalar bound that expresses how far away a system could ever get from the equilibrium (based on how far away it started). As Engineers, we do not want to limit ourselves to this local stability context. We want every motion starting sufficiently close to the equilibrium state to converge to the equilibrium as time approaches ad infinitum. Asymptotic stability captures this need. But again, asymptotic stability is as well a local concept since we do not know a priori how much magnitude we want for the bound. Enter equiasymptotic stability in the large. For an $r>0$ that is fixed and arbitrarily large, we find that as $t \rightarrow \infty$, all motions converge to the equilibrium uniformly in the initial state from which they start for $|x_0| \le r $.

Note that all these definitions merely impose a constraint on the behavior of the states as they evolve over the trajectories of the system. That begs the question, can a control law be stable, yet not optimal (or vice versa)? I think so. Why?

Optimality

This section has an update based on what I found from Freeman and Kokotovic’s 1996 Paper in the Int’l J. Optimal & Control. Please skip to the updated part

Optimality, as Bellman would have us think, deals with reaching the goal state with as minimal an energy as possible. I would think that the principle of optimality and Lyapunov stability have a fundamental disconnect. It seems to me that we may find an optimal control law that is not stable (i.e it’s V(x) gradient function does not strictly decrease along the trajectories of the solution to the dynamic system’s differential equation).

To buttress this fact, consider that the concepts of stability and optimality appeared in the consciousness of control theorists at two distinct and disconnected eras (or so to say) in history. On the one hand, Lyapunov’s thesis got published in the Soviet union in the 1890’s but his work was not available in English until 1947. Even so, western researchers did not adequately grasp its usefulness until Kalman’s 1960 seminal paper on the second method of Lyapunov. Meanwhile, Bellman’s last formal work on DP and applied DP did not become published until 1962. What is more intriguing is that not anywhere in Bellman’s stability tests (as far as I can tell from what I have read from his books) did he use Lyapunov analyses’ rigor to establish the stability of his principle of optimality methods. Kalman, remarked in his paper in 1960 that few researchers were aware of Lyapunov methods. We can make a fairly accurate “guesstimation” that had Bellman been aware of Lyapunov’s analyses earlier, it might have creeped into his optimality analyses.

~~I had an exchange with someone about this a while ago, and I am quoting the caveats they expressed in their agreement with my observation below.~~

~~1) If optimality is concerned only with the cost from initial condition to final condition, a control law that makes the system unstable might be desirable as unstable systems tend to be very fast.~~

2) The problem is what happens when you reach the final condition? An unstable system will not stop there, but will overshoot the goal and go off to infinity. So you must have the ability to switch to a stabilizing controller when you reach the goal.

An example is in fighter aircraft. I understand that they become unstable during certain maneuvers such as tight turns so they can move very fast, but then “catch” themselves and stabilize before going too far from the equilibrium.

UPDATE [Aug 28, 2018]

Most of the discussions below are drawn from Freeman and Kokotovic’s ¹ 1996 work on point-wise min-norm control laws for robust control lyapunov functions.

They provide an optimality-based method for choosing a stabilizing control law once an rclf is known without resorting to cancellation or domination of nonlinear terms, which do not necessarily possess the desirable properties of optimality and may lead to poor robustness and wasted control effort.

The value function for a meaningful optimal stabilization problem is a Lyapunov function for the closed-loop system.

Every meaningful value function is a Lyapunov function (Freeman and Kokotovic, 1996). Every Lyapunov function for every stable closed-loop system is also a value function for a meaningful optimal stabilization problem.
Every Lyapunov function is a meaningful value function

Both bullets above are important since the first point helps with the analysis of the stability of an optimal feedback control system, while the second link will have implications for their synthesis.

Every robust control lyapunov function (rclf) is a meaningful upper value function
- Every rclf solves the Hamilton Jacobi Isaacs equation associated with a meaningful game. For a known rclf, a feedback law that is optimal w.r.t a meaningful cost functional can be constructed. Matter-of-factly, this can be accomplished without solving the HJI equation for the upper value function or without constructing a cost functional as the optimal feedback can be directly calculated from the rclf without recourse to the HJI equation. Such control laws are called pointwise min-norm control laws and each one inherits the desirable properties of optimality because every pointwise min-norm control law is optimal for a meaningful game.

Essentially, this task is an inverse optimal stabilization problem where for LTI systems, the solution involves choosing a candidate value function and then constructing a meaningful cost functional in order to make the HJB equation valid. For open-loop stable nonlinear systems, one can find a solution by choosing the candidate value function as a Lyapunov function for the open-loop system. For openloop unstable systems, one can choose a candidate value function as a clf for the system. In Freeman and Kokotovic’s ¹, actually the authors solve the inverse optimal robust stabilization problem for systems with disturbances and showed that evert rclf is an upper value function for a meaningful differential game.

Freeman, R. A., & Kokotovic, P. V. (1996). Inverse Optimality in Robust Stabilization. SIAM Journal on Control and Optimization, 34(4), 1365–1391. https://doi.org/10.1137/S0363012993258732 ↩ ↩²

Control Commons

Sat, 04 Aug 2018 13:10:00 +0000

Table of Contents
Introduction
Definitions, Theorems, Lemmas etc
Nonlinear Control Theory
Stability

Intro

Here are a few control theorems, concepts and diagrams that I think every control student should know. I keep updating this post, so please check back from time to time.

Definitions, Theorems, Lemmas and such.

Nonlinear Control Theory

A differential equation of the form

\begin{align} dx/dt = f(x, u(t), t), \quad -\infty < t < +\infty \label{eq:diff_eq} \end{align}

is said to be free (or unforced) if $u(t) \equiv 0$ for all $t$. That is \eqref{eq:diff_eq} becomes

\begin{align} dx/dt = f(x, t), \quad -\infty < t < +\infty \label{eq:unforced} \end{align}

If the differential equation in \eqref{eq:diff_eq} does not have an explicit dependence on time, but has an implicit dependence on time, through $u(t)$, then the system is said to be stationary. In other words, a dynamic system is stationary if

\begin{align} f(x, u(t), t) \equiv f(x, u(t)) \label{eq:stationary} \end{align}

A stationary system \eqref{eq:stationary} that is free is said to be invariant under time translation, i.e.

\begin{align} \Phi(t; x_0, t_0) = \Phi(t + \tau; x_0, t_0 + \tau) \label{eq:free_stat} \end{align} - $\Phi(t; x_0, t_0)$ is the analytical solution to \eqref{eq:diff_eq}; it is generally interpreted as the solution of \eqref{eq:diff_eq}, with fixed $u(t)$, going through state $x_0$ at time $t_0$ and observed at time $t$ later on. This is a clearer way of representing the d.e.’s solution as against $x(t)$, which is popularly used in most text nowadays.

$\Phi(\cdot)$ is generally referred to the transition function, since it relates the transformation from $x(t_0)$ to $x(t)$.
For a physical system, $\Phi$ has to be continuous in all of its arguments. .
If the rate of change $dE(x)/dx$ of an isolated physical system is negative for every possible state x, except for a single equilibrium state $x_e$, then the energy will continually decrease until it finally assumes its minimum value $E(x)$.
The first method of Lyapunov deals with questions of stability using an explicit representation of the solutions of a differential equation
- Note that the second method is more of a historical misnomer, perhaps more accurately described as a philosophical point of view rather than a systematic method. Successful application requires the user’s ingenuity.
In contrast to popular belief that the energy of a system and a Lyapunov function are the same, they are not the same. Why? Because the Lyapunov function, $V(x)$, is not unique. To quote Kalman, “a system whose energy $E$ decreases on the average, but not necessarily at each instant, is stable but $E$ is not necessarily a Lyapunov function.”
Lyapunov analysis and optimization: Suppose a performance index is defined to be the error criterion between a measured and an estimated signal; suppose further that this criterion is integrated w.r.t time, then the performance index is actually a Lyapunov function – provided that the error is not identically zero along any trajectory of the system.
Existence, uniqueness, and continuity theorem:

Let $f(x, t)$ be continuous in $x,t$, and satisfy a Lipschitz condition in some region about any state $x_0$ passing through time $t_0$:

\begin{align} R(x_0, t_0) &= ||x - x_0|| \le b(x_0) \nonumber \end{align}

\begin{align} R(x_0, t_0) &= ||t - t_0|| \le c(t_0) \quad (b, c) > 0 \end{align}

with the Lipschitz condition satisfied for $(x,t), (y,t)$ $\in$ $R(x_0, t_0)$, then it follows that \begin{align} ||f(x,t) - f(y,t)|| \le k \, ||x-y|| \nonumber \end{align}

where $k>0$ depends on $b, c$. THUS,

there exists a unique solution $\Phi(t; x_0, t_0)$ of $dx/dt$, that starts as $x_0, t_0$ for all $|t - t_0| \le a(t_0)$,
$a(t_0) \ge \text{ Min (}{c(t_0), b(x(t_0))/M(x_0, t_0)}$, where $M(x_0, t_0)$ is the maximum assumed by the continuous function $f(x,t)$ in the closed, bounded set $R(x_0, t_0)$
in some small neighborhood of $x_0, t_0$, the solution is continuous in its arguments

Observe that the Lipschitz condition only implies continuity of $f$ in $x$ but not necessarily in $t$; as it is implied by the bounded derivatives in $x$. Note that the local lipschitz condition required by the theorem only implies desired properties of a solution near $x_0, t_0$.

The finite escape time (that is the solution leaves any compact set within a finite time) quandary does not allow us to make conclusions surrounding arbitrarily large values of $t$. The phrase “finite escape time” describes the concept that a trajectory escapes to infinity at a finite time. In order that a differential equation accurately represent a physical system, the possibility of finite escape time has to be mitigated by an explicit assumption to the contrary. If the Lipschitz condition holds for $f$ everywhere, then there can be no finite escape time. The proof is easy by integrating both sides of \eqref{eq:diff_eq} and using

\begin{align} \Phi(t; x_0, t_0) \le ||x_0|| + || \int_{t_0}^{t}f(\Phi(\tau; x_0, t_0), \tau)d\tau || \end{align}

\begin{align} ||x_0|| + k \int_{t_0}^{t}f(\Phi(\tau; x_0, t_0), \tau)d\tau \end{align}

where $f(\cdot)$ obeys the lipschitz condition,

\begin{align} ||f(x,t) - f(y,t)|| \le k \, ||x-y||. \nonumber \end{align}

By the Gronwall-Bellman lemma,

\begin{align} ||\Phi(t; x_0, t_0) || \le [\exp \, k (t - t_0)] ||x_0 || \nonumber \end{align}

which is less than $\infty $ for any finite $(t - t_0)$.

Stability

My definitions follow from R.E Kalman’s 1960 seminal paper since they are clearer to understand compared to the myriad of definitions that exist in many texts today. Stability concerns the deviation about some fixed motion. So, we will be considering the deviations from the equilibrium state $x_e$ of a free dynamic system.

Simply put, here is how Kalman defines stability, if \eqref{eq:diff_eq} is slightly perturbed from its equilibrium state at the origin, all subsequent motions remain in a correspondingly small neighborhood of the origin. Harmonic oscillators are a good example of this kind of stability. Lyapunov himself defines stability like so:

An equilibrium state $x_e$ of a free dynamic system ios stable id for every real number $\epsilon>0$, there exists a real number $\delta(\epsilon, t_0)>0$ such that \(

x_0 - x_e

\le \delta \) implies

\begin{align} ||\Phi(t; x_0, t_0) - x_e|| \le \epsilon \quad \forall \quad t \ge t_0 \nonumber \end{align}

This is best imagined from the figure below:

Fig. 1. The basic concept of stability. Courtesy of R.E. Kalman

Put differently, the system trajectory can be kept arbitrarily close to the origin/equilibrioum if we start the trajectory sufficiently close to it. If there is stability at some initial time, $t_0$, there is stability for any other initial time $t_1$, provided that all motions are continuous in the initial state.

Asymptotic stability: The requirement that we start sufficiently close to the origin and stay in the neighborhood of the origin is a rather limiting one in most practical engineering applications. We would want to require that our motion should return to equilibrium after any small perturbation. Thus, the classical definition of Lyapunov stability is
- an equilibrium state $x_e$ of a free dynamic system is asymptotically stable if
  - it is stable and
  - every motion starting sufficiently near $x_e$ converges to $x_e$ as $t \rightarrow \infty$.
- put differently, there is some real constant $r(t_0)>0$ and to every real number $\mu > 0$ there corresponds a real number $T(\mu, x_0, t_0)$ such that $||x_0 - x_e|| \le r(t_0)$ implies
\begin{align} ||\Phi(T; x_0, t_0)|| \le \mu \quad \forall \quad t \ge t_0 + T \nonumber \end{align}

Fig. 1. Definition of asymptotic stability. Courtesy of R.E. Kalman

Asymptotic stability is also a local concept since we do not know aforetime how small $r(t_0)$ should be. For motions starting at the same distance from $x_e$, none will remain at a larger distance than $\mu$ from $x$ at arbitrarily large values of time. Or to use Massera’s definition:

An equilibrium state $x_e$ of a free dynamic system is equiasymptotically stable if
- it is stable
- every motion starting sufficiently near $x_e$ converges to $x$, as $t \rightarrow \infty$ uniformly in $x_0$
- Interrelations between stability concepts: This I gleaned from Kalman’s 1960 paper on the second method of Lyapunov.
  
  Fig. 1. Interrelations between stability concepts. Courtesy of R.E. Kalman
For linear systems, stability is independent of the distance of the initial state from $x_e$. Nicely defined as such:
- an equilibrium state $x_e$ of a free dynamic system is asymptotically (equiasymptotically) stable in the large if (i) it is stable
(ii) every motion converges to $x_e$ as $t \rightarrow \infty $, i.e., every motion converges to $x_e$, uniformluy in $x_0$ for $x_0 \le r$, where $r$ is fixed but arbitrarily large

To be Continued

What Good Research is Not

Thu, 02 Aug 2018 14:21:00 +0000

This post includes curated links to advice on doing good research from people I respect.

I hope you find time to enjoy reading them.

How to write a good research paper ~ Bill Freiman.

Many readers will skim over formulas on their first reading of your exposition. Therefore, your sentences should flow smoothly when all but the simplest formulas are replaced by “blah” or some other grunting noise.
How to do good research ~ Bill Freiman.

Sometimes it’s useful to think that everyone else is an idiot. This lets you do things that no one else is doing. It’s best not to be too vocal about that. You can say something like “Oh, I just thought I’d try out this direction”.
Elements of a Successful Graduate Career.

I think the most important thing in research is a story – not a theorem or an algorithm – but the story that makes the theorem or algorithm interesting and exciting. It’s important to have an “ear” for a good story… when do the stories make sense, when are they bogus? ~ Tomas Lozano-Perez.

The best students are possessed by a problem. They’re independent. They teach their advisors. They don’t do what they’re told…they do something more interesting. ~ Leslie Kaelbling.

Don’t tell your advisor you’re doing what they advised against until you’ve solved the problem. ~ Manolis Kellis.

Which brings us to the moral of the story: More important than your thesis topic is who your advisor is. ~ Charles Leiserson.

Eat, sleep, and breathe a problem until you crack it. Become the world’s foremost expert on your thesis topic. Surpass your advisor. ~ Daniel Jackson.
On how to write papers, Ted Adelson.

Start by stating which problem you are addressing, keeping the audience in mind. They must care about it, which means that sometimes you must tell them why they should care about the problem. Then state briefly what the other solutions are to the problem, and why they aren’t satisfactory. If they were satisfactory, you wouldn’t need to do the work. Then explain your own solution, compare it with other solutions, and say why it’s bettter. At the end, talk about related work where similar techniques and experiments have been used, but applied to a different problem. Since I developed this formula, it seems that all the papers I’ve written have been accepted.

Vladen Koltun’s Advice

Picking a problem
- Formulate a larger goal.
- Personally meaningful.
- Fits into the scheme of collective progress.
Analyze bottlenecks.
Understand the state of the art.
Making a contribution:
- Read the papers.
- Look for unwarranted assumptions.
- What are the limitations? When will this break? How could this be done better?
Reimplement a state-of-the-art technique
- Reproduce the results.
- Then bombard it with controlled experiments.
- Look for surprises, cracks that lead to deeper realizations.
Be on the lookout for interesting contributions.
Many important findings are not what the researchers set out to find
- “Scheele happened upon chlorine while trying to isolate manganese; Claude Bernard planned experiments to characterize the destructive agent in sugar but instead discovered the glycogenic function of the liver; and so on.” ~ Ramón y Cajal, Letters to a Young Investigator
Publications
- Quality, not quantity.
- Do not compromise on methodology or ethics.
- Be willing to bury drafts and move on.
Publication portfolio
- Most are prosaic.
- Some are significant.
- None are sloppy.
High standards
- Bury the weak, boring, and sloppy results.
- Weak and sloppy work is a drain on the community. Can mislead. Goes against the goal of contributing something useful to the community.
- Quantity is easy. The community doesn’t need more quantity.
Research over time
- Research begets research.
- Keep track of favorite problems, revisit occassionally.
- Go back to the larger goals.
- Read. A lot.
- Write down ideas. Talk to people.
- Quiet time for reading, writing, thinking.
For pete’s sake, get a good work ethic
- I do not believe a person can ever leave their business. They ought to think of it by day and dream of it by night. […] if they intend to go forward and do anything, the whistle is only a signal to start thinking over the day’s work in order to discover how it might be done better. […] The person who has the largest capacity for work and thought is the person who is bound to succeed. ~ Henry Ford, My Life and Work.
- In science as in the lottery, luck favors those who wager the most – that is, by another analogy, those who are tilling constantly the ground in their garden. ~ Ramón y Cajal, Letters to a Young Investigator.
- Successful people exhibit more activity, more energy, than most people do. They look more places, they work harder, they think longer than less successful people. Knowledge and ability are much like compound interest – the more you do the more you can do, and the more opportunities are open for you. ~ Hamming, Striving for Greatness in All You Do.

David Mermin.

Always punctuate your equations. Math is prose. Number all equations in your text. It helps your readers.

More Advice.

Daniel Liberzon Research Quotes.

How to write Mathematics.

+ [Alternative downloads website](downloads/Halmos.pdf)

Daniel Liberzon - How to write a good paper.

Daniel Liberzon - How to peer review.

George M. Whitesides – Writing a Paper.

Dmitri Bertsekas – Ten Simple Rules for Mathematical Writing.

Don Knuth –Mathematical Writing.

Don Knuth – The Elements of Mathematical Writing.

N. David Mermin – What’s Wrong With These Equations.

What's behind IEEE RAS Best Conference Papers?

Sat, 24 Jun 2017 09:15:00 +0000

Through the looking-glass, tired of working my brain out while familiarizing myself with a giant codebase I was reverse-engineering to prove a theory for an upcoming conference, I started asking myself why I was doing what I was doing? Just to get a paper out? Or to make a great contribution to science and my field? I googled something along the lines of “How to write a best IEEE RAS conference paper”. What I found was very interesting as I came across this IEEE Student Activities Committee Paper. It offers great insight into what constitutes writing a paper that merits IEEE Robotics and Automation Society best conference awards.

I hope you enjoy reading it as much as I did.

On the necessary and sufficient conditions for optimal controllers

Sun, 04 Jun 2017 13:28:00 +0000

This post deals with understanding the necessary and sufficient conditions, fundamental Lipschitz continuity assumptions and the terminal boundary conditions imposed on the Hamilton-Jacobi equation to assure that the problem of minimizing an integral performance index is well-posed.

Problem Statement

Suppose we have the following nonlinear dynamical system

\begin{equation} \label{eq:system} \dot{x} =f(x, u, t), \qquad \qquad x(t_0) = x_0 \end{equation}

which starts at state, $x_0$ and time, $t_0$.

Assumption I

If the function $f(\centerdot)$ is continuously differentiable in all its arguments, then the initial value problem (IVP) of \eqref{eq:system} has a unique solution on a finite time interval; this is a sufficient assumption (Khalil, 1976).

Assumption II

$T$ is sufficiently small enough to reside within the time interval where the system’s solutions are defined.

Qualitatively, our goal is to optimally control the system when it starts in a state $x_0$, at time $t_0$, to a neighborhood of the terminal manifold $T$, whilst exerting as minimal a control energy as possible. Quantitatively, we can define this goal in terms of an index of performance evaluation defined thus:

\begin{equation} \label{eq:cost} J = J(x(t_0, u(\centerdot), t_0) = \int\limits_{t=t_0}^{T} L\left(x(\tau), u(\tau), \tau\right)d\tau + V(x(T)) \end{equation}

where $J$ is evaluated along the trajectories of the system $x(t)$, based on an applied control $u(\centerdot)|_{t_0 \le t \le T} $. With $L\left(x(\tau), u(\tau), \tau\right)$ as the instantaneous cost and $V(x(T))$ as the terminal cost (which are nonnegative funtions of their arguments), we can think of $J$ as the total amount of actions we take (controls) and the state energy utilized in bearing the states from $x_0$ to a neighborhood of the terminal manifold $V(x(T)) = 0$.

The question to ask then is that given the cost of performance index $J$, how do we find a control law $u^\star$ that is optimal along a unique state trajectory, $x^\star$, in the interval $[t_0, T]$? This optimal cost would be the minimum of all the possible costs that we could possibly incur when we implement the optimal control law $u^\star$. Mathematically, we can express this cost as:

\begin{gather} J^\star(x(t_0), t_0) = \int\limits_{t=t_0}^{T} L \left(x^{\star}(\tau), u^\star(\tau), \tau \right) d\tau + V(x^\star(T))
= \min_{ u_{[t_0, T]}} J(x_0, u, t_0) \end{gather}

Therefore, the optimal cost is a function of the starting state and time so that we can write:

\begin{equation} J^\star(x(t_0), t_0) = \min_{ u_{[t_0, T]}} J(x(t_0), u(\centerdot), t_0) = \min_{ u_{[t_0, T]}} \int\limits_{t_0}^{T} L\left(x(\tau), u(\tau), \tau\right)d\tau + V(x(T)) \end{equation}

Now, assume that we start at an arbitrary initial condition $x$, at time $t$, it follows that the optimal cost-to-go from $x(t)$ to $x(T)$ is (abusing notation and dropping the templated arguments in $J$):

\begin{equation} \label{eq:cost-to-go} J^\star(x, t) = \min_{ u_{[t, T]}} \left[\int\limits_{t}^{T} L\left(x(\tau), u(\tau), \tau\right)d\tau\right] + V(x(T)) \end{equation}

Things get a little bit interesting when we splice up the integral kernel in \eqref{eq:cost-to-go} along two different time-paths, namely:

\begin{equation} \label{eq:spliced} J^\star(x, t) = \min_{ u_{[t, T]}} \left[\int\limits_{t}^{t_1} L\left(x(\tau), u(\tau), \tau\right)d\tau + \int\limits_{t_1}^{t_2} L\left(x(\tau), u(\tau), \tau\right)d\tau\right] + V(x(T)) \end{equation}

We can split the minimization over two time intervals, e.g.,

\begin{equation} \label{eq:two_mins} J^\star(x, t) = \min_{ u_{[t, t_1]}} \min_{ u_{[t_1, t_2]}} \left[\int\limits_{t}^{t_1} L\left(x(\tau), u(\tau), \tau\right)d\tau + \int\limits_{t_1}^{t_2} L\left(x(\tau), u(\tau), \tau\right)d\tau\right] + V(x(T)) \end{equation}

Equation \eqref{eq:two_mins} gives the beautiful intuition that one can divide the integration into two or more time slices, solve the optimal control problem for each time slice and in the overall, minimize the effective cost function $J$ of the overall system. This in essence is a statement of Richard E. Bellman’s principle of optimality:

Bellman’s Principle of Optimality: An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision.

– Bellman, Richard. Dynamic Programming, 1957, Chap. III.3.

With the principle of optimality, the problem takes a more intuitive meaning, namely that the cost to go from $x$ at time $t$ to a terminal state $x(T)$ can be computed by minimizing the sum of the cost to go from $x = x(t)$ to $x_1 = x(t_1)$ and then, the optimal cost-to-go from $x_1$ onwards.

Therefore, \eqref{eq:two_mins} can be restated as:

\begin{equation} \label{eq:two_mins_sep} J^\star(x, t) = \min_{ u_{[t, t_1]}} \left[\int\limits_{t}^{t_1} L\left(x(\tau), u(\tau), \tau\right)d\tau + \underbrace{\min_{ u_{[t_1, t_2]}} \int\limits_{t_1}^{t_2} L\left(x(\tau), u(\tau), \tau\right)d\tau + V(x(T))}_{J^\star(x_1, \, t_1)} \right] \end{equation}

$J^\star(x_1, \, t_1)$ in \eqref{eq:two_mins_sep} can be seen as the optimal cost-to-go from $x_1$ to $x(T)$, with the overall cost given by

\begin{equation} \label{eq:optimal_pre} J^\star(x, t) = \min_{ u_{[t, t_1]}} \left[\int\limits_{t}^{t_1} L\left(x(\tau), u(\tau), \tau\right)d\tau + J^\star(x_1, \, t_1) \right] \end{equation}

Replacing $t_1$ by $t + \delta t$ and with the assumption that $J^\star(x, t)$ is differentiable, we can expand \eqref{eq:optimal_pre} into a first-order Taylor series around $(\delta t, x)$ as follows:

\begin{equation} \label{eq:taylor} J^\star(x, t) = \min_{ u_{[t, t + \delta]}} \left[ L\left(x, u, \tau\right)\delta t + J^\star(x, \, t) + \left(\dfrac{\partial J^\star(x, t)}{\partial t}\right) \delta t + \left(\dfrac{\partial J^\star(x, t)}{\partial x} \right) \delta x + o(\delta) \right] \end{equation}

where $o(\delta)$ denotes higher order terms satisfying $\lim_{\delta \rightarrow 0}\dfrac{o(\delta)}{\delta} = 0$.

Refactoring \eqref{eq:taylor}, we find that

\begin{equation} \label{eq:hamiltonian_pre} \dfrac{\partial J^\star(x, t)}{\partial t} = -\min_{ u_{[t, t + \delta]}} \left[ L\left(x, u, \tau\right) + \left(\dfrac{\partial J^\star(x, t)}{\partial x} \right) \underbrace{\dot{x}(\centerdot)}_{f(x,u,t)} \right] \end{equation}

We shall define the components in the square column of the above equation as the Hamiltonian, $H(\centerdot)$ such that \eqref{eq:hamiltonian_pre} can be thus rewritten:

\begin{equation} \label{eq:hamiltonian} \dfrac{\partial J^\star(x, t)}{\partial t} = -\min_{ u_{[t, t + \delta]}} H\left(x, \nabla_x J^\star (x, t), u, t \right) \end{equation}

Based on the smoothness assumption of all function arguments in \eqref{eq:system}, when the linear sensitivity of the Hamiltonian to changes in $u$ is zero, then $\nabla H_u$ must vanish at the optimal point i.e.,

\begin{equation} \label{eq:hamiltonian_deri} \nabla H_u(x, \nabla_x J^\star (x, t), u, t) = 0 \end{equation}

ensuring that we satisfy the local optimality property of the controller. In addition, if the Hessian of the Hamiltonian is positive definite along the trajectories of the solution, i.e.,

\begin{equation}
\dfrac{\partial^2 H}{\partial^2 u} > 0 \end{equation}

then we have the sufficient condition for global optimality. These conditions are referred to as the Legendre-Clebsch conditions, essentially guaranteeing that over a singular arc, the Hamiltonian is minimized.

You begin to see the beauty of optimal control in that \eqref{eq:hamiltonian_deri} allows us to translate the complicated functional minimization integral of \eqref{eq:cost} into a minimization problem that can be solved by ordinary calculus.

If we let

\begin{equation}
H^\star(x, \nabla_x J^\star (x, t), t) = \min_u \left[H(x, \nabla_x J^\star (x, t), u, t)\right] \end{equation}

then it follows that solving \eqref{eq:hamiltonian_deri} for the optimal $u = u^\star$ and putting the result in \eqref{eq:hamiltonian}, one obtains the Hamilton-Jacobi-Bellman pde whose solution is the optimal cost $J^\star(x(t), t)$ such that

\begin{equation} \label{eq:optimal_cost} \dfrac{\partial J^\star(x, t)}{\partial t} = -H^\star \left(x, \nabla_x J^\star (x, t), u, t \right) \end{equation}

We can introduce a boundary condition that assures that the cost function of \eqref{eq:cost} is well-posed viz,

\begin{equation} \label{eq:boundary_cost} J^\star(x(T), T) = V(x(T)) \end{equation}

Taken together, equations \eqref{eq:optimal_cost} allows us to analytically solve for the instanteneous kinetic energy of the cost function in \eqref{eq:cost} and \eqref{eq:boundary_cost} allows us to solve for the boundary condition that assure the sufficiency of an optimal control law to exist. If we can solve for $u^\star$ from $J^\star(x,t)$, then \eqref{eq:boundary_cost} must constitute the optimal control policy for the nonlinear dynamical system in \eqref{eq:system} given the cost index \eqref{eq:cost}.

Conclusions

Notice that the optimal policy $u^\star(t)$ is basically an open-loop control strategy. Why so? $u^\star$ was derived as a function of time $t$. As a result, the strategy may not be robust to uncertainties and may be very sensitive. For practical applications, we generally want to have a feedback control policy that is state dependent in order to guarantee robustness to parametric variations and achieve robust stability and performance. Such a $u = u^\star(x)$ would be helpful in analyzing the stability of states and convergence of system dynamics to equilibrium for all future times. Will post such methods in the future.

Summary

Properties	Equations
Dynamics:	$\dot{x} =f(x, u, t), \quad x(t_0) = x_0 $
Cost:	$J(x,u,\tau) = \int\limits_{t=t_0}^{T} L\left(x(\tau), u(\tau), \tau\right)d\tau + V(x(T))$
Optimal cost :	$J^\star(x,t) = \min_{u[t,T]}J$
Hamiltonian:	$H(x,u,t) = L\left(x, u, \tau\right) + \left(\dfrac{\partial J^\star(x, t)}{\partial x} \right) f(x,u,t)$
Optimal Control:	$u^\star(t) = H^\star(x,u,t) = \nabla H_u(x,u,t)$
HJB Equation:	$-\dfrac{\partial J^\star(x,t)}{\partial t} = H^\star(x, \nabla_x J^\star(x,t),t) )$ and $J^\star(x(T), T) = V(x(T))$

Should I use ROS or MuJoCo?

Sun, 14 May 2017 13:28:00 +0000

This was my answer to a question posted in an email thread to our research group’s email lists. The question goes like this:

QUESTION
______________
From: XXX@uni-x.edu
Sent: Sunday, May 14, 2017 9:29 AM
To: XXX@lists.uni-x.edu
Subject: RE: [robotec] MuJoCo

From the documentation, looks like MuJoCo is faster and therefore better to simulate computational intensive controllers like MPC. Gazebo provides other engines as well and seems like is more popular in the ROS community. How to choose between the available options? Which one would you recommend?

Thanks,
XXX

Answer

TL; DR: If you do not care for accuracy of simulated controller numerical results, if you are not simulating parallel linkages, if you do not need code parallelization (i.e. your computation is not crazy intensive) or if you are not simulating contact and friction, I would choose ROS. Easy to use and straightforward to build fairly complex models.

Proper (Long) Answer

ROS and Gazebo (OSRF tools) are indeed popular in the robotics community like you mentioned and they have their pros. It took me a while to see their limit when using them for research purposes.

Definition

ROS = Plumbing + Tools + Capabilities + Ecosystem

Plumbing: ROS provides publish-subscribe messaging infrastructure designed to support the quick and easy construction of distributed computing systems.

Tools: ROS provides an extensive set of tools for configuring, starting, introspecting, debugging, visualizing, logging, testing, and stopping distributed computing systems.

Capabilities: ROS provides a broad collection of libraries that implement useful robot functionality, with a focus on mobility, manipulation, and perception.

Ecosystem: ROS is supported and improved by a large community, with a strong focus on integration and documentation. ros.org is a one-stop-shop for finding and learning about the thousands of ROS packages that are available from developers around the world. answers.ros.org is a rich online community of ros packages users from around the world asking questions and getting help on how to use ROS.

So getting a simple dynamics kicking should not be a lot of hassle as the documentation is rich and the online community is very active in supporting newbies.

In the early days, the plumbing, tools, and capabilities were tightly coupled, which has both advantages and disadvantages. On the one hand, by making strong assumptions about how a particular component will be used, developers are able to quickly and easily build and test complex integrated systems. On the other hand, users are given an “all or nothing” choice: to use an interesting ROS component, you pretty much had to jump in to using all of ROS.

8+ years in after Andrew Ng and co. conceived the platform, the core system has matured considerably, and developers are hard at work refactoring code to separate plumbing from tools from capabilities, so that each may be used in isolation. In particular, people are aiming for important libraries that were developed within ROS to become available to non-ROS users in a minimal-dependency fashion (e.g. OMPL and PCL libraries).

Disclaimer: Borrowed from Brian Gerkey’s/my answer to a similar quora question about a year ago.

For serially linked robot arms and other non-parallel linkages, ROS is a great simulation tool and “middleware”. However, there are bottlenecks with ROS.

What ROS calls URDF (Universal Robot Description Format), which is the abstraction tool for rigid body dynamics, is not universal in any sense of the word. URDF models written in ROS are out-of-the-box incompatible with Gazebo, its sister physics engine (see this question/wiki). More so, state representation in OSRF tools such as ROS is represented in a tree-like manner. I learnt this late last year when simulating parallel linkages. The internal ROS XML parser interprets constructed linkages as a deep binary tree and not graphs. This makes simulating parallel linkages almost (actually) impossible. Repeat, actually impossible. They have a fix for this in Gazebo SDF but it is not straightforward. So developers spend a huge chunk of time migrating code from one OSRF framework to another.

Good controller algorithm formulations are based on numerical optimization (think MPC, differential dynamic programming, sampling-based motion-planning or reinforcement learning). Gazebo was designed around the ODE (Open Dynamics Engine) and Bullet physics engines which provide the states in over-complete Cartesian coordinates and enforce joint constraints via numerical optimization. This is good enough for disconnected bodies with few joint constraints but becomes a pain for complex dynamics such as humanoids or simulating human-robot interactions. Running complex simulations for huge candidate evaluations of humanoids can run into months using ROS (e.g. Todorov’s de novo synthesis). Whereas MUJOCO is optimized for parallel processing, distributed evaluation of possible controllers from which a candidate controller is chosen.

ODE simulators optimize the controller to the engine. This makes the controller cheat during simulations in ways that mean generated control laws may be physically unrealizable. Speed and accuracy? Controller optimization with MoveIt! (a motion planning framework from OSRF) is mostly done in a single threaded code without the advantage of explicit parallelization of code to make e.g. IK solutions faster. Implementation of concurrency and multithreading is left to the user (this is a big no-no for someone not interested in software engineering).

ROS is strictly written based on the assumption that the user is running a Linux kernel. So users not familiar with Linux are thrown aback when they first get exposed to it. With MUJOCO, you do not need Linux or OSX as it works on Windows OS just fine. MUJOCO also use an XML parser to interpret links and joints and so it is able to read ROS URDFs and xacro files okay. But it doesn’t work the other way (see this answer from Todorov)

MPC implementations are elegant only when the model is accurate. Unexpected poor performance of an MPC controller will often be due to poor modeling assumptions (Rossiter). If the simulation engine emphasizes simulation stability over control law precision, we have a problem. And this is my problem with Gazebo and ROS generally. I read it somewhere in one of Todorov’s papers (can’t remember where I found it) that the floating point ops of MUJOCO were unit tested to >>355 decimal points. The OSRF community may be good at community based software engineering for robotics but you have to give Todorov the credit. He had the patience and tenacity to develop such a robust software for control simulation. People stopped paying attention to floating point operational precision back in the late 80’s/90’s.

What’s more? MUJOCO allows you to write your models in C. Engineers are head over boots for Matlab but I am all for a program or modeling software that stays close to ones and zeros as much as possible. It means being less dumbfounded when things do not work as you envisioned and greater flexibility in being the master and architect of your creation.

Twanging git pull, push and clone

Fri, 12 May 2017 11:37:00 +0000

Table of Contents
Introduction
Pulling/Pushing git remotes from LAN/WAN repos
Cloning git remotes from LAN/WAN repos

Introduction

Git is a useful tool for remote/online work collaboration, as well as social coding. It is useful being able to share one’s work among different computers using native git commands such as merge, fetch, push, clone, or pull without resolving to using ssh, or scp which are without the benefits of diff and merge strategies of git. More so, not everyone enjoys exposing their incomplete work/code to a remote repo for the sake of fetching to local origins on different computers. This post is meant to show how to go about these git ops strategies without going through a remote e.g. an http[s] server.

Pulling/Pushing git remotes from a LAN/WAN repo

As an example, suppose we have a repo named sensors in the Documents directory of a computer with username and group name drumpf@dissembler and we have a few commits ahead of a tracking repo on a computer named robots@killem, we can fetch and merge our recent commits on drumpf@dissembler into robots@killem as follows:

We could use ssh, http[s], ftp[s] or rsync transport protocols. To pull updates from drump@dissembler:~/Documents/sensors.git to robots@killem:~/Documents/sensors.git repo, we would do one of the following:

via ssh:

robots@killem:~/Documents/sensors$ git pull ssh://drumpf@dissembler:/~/Documents/sensors.git

via https:

robots@killem:~/Documents/sensors$ git pull http[s]://drumpf@dissembler:/robots/killem/Documents/sensors.git

via ftp

robots@killem:/home/drumpf/Documents/sensors$ git pull ftp[s]://drumpf@dissembler:/robots/killem/Documents/sensors.git

via rsync

robots@killem:/home/drumpf/Documents/sensors$ git pull rsync://drumpf@dissembler:/~/Documents/sensors.git

Note that we have used user expansion for both ssh and git. ftp[s] and rsync do not allow user expansion when pulling, pushing or cloning, so the full path to the repo has to be specified. The https syntax has no authentication and can be dangerous on unsecured networks. If the group names of the computers are not advertised by /etc/hosts, you can use the ip address of the computer in place of the host names. Note that ftp[s] can be used for fetching while rsync can be used for both fetching and pushing. Both are not very efficient, however, and they are actually deprecated; so you should refrain from using them as much as you can.

All the commands above would also work for git push.

SCP-like syntaxes are valid as well:

scp [user@]host.ng:path/to/repo.git/

but note that the first character after the first column must not be a slash to help distinguish a local path from an ssh url

All of the above commands also support cloning git repos from one directory to another on the same host or between workstations on the same LAN/WAN. All that would need to change would be to replace the LAN/WAN hostname with the path we are cloning from. See examples below:

Cloning git remotes from a LAN/WAN repo

The procedure is the same as above save we replace pull/push with clone, e.g

git clone ssh://[you@]remote.ng[:port]/path/to/repo.git/

git clone  git://remote.ng[:port]/path/to/repo.git/

git clone  http[s]://remote.ng[:port]/path/to/repo.git/

git clone  ftp[s]://remote.ng[:port]/path/to/repo.git/

git clone rsync://remote.ng/path/to/repo.git/

If when doing any of the operations specified so far, the transport protocol is not specified, no problem! Git assumes a remote url transport protocol if it does not know what the remote address is. So we could for example do

robots@killem:~/Documents/sensors$ git push transport::address

where address is the path to the repo on the LAN/WAN and transport is replaced by https.

An alternative scp-like syntax is also valid when using the ssh protocol:

git clone [you@]remote.ng:path/to/repo.git/

Just as is the case for pull/push, https is not secure and should be used with caution.

Ubuntu-16.04 and Cuda-8.0 Install Guide

Fri, 28 Apr 2017 10:32:00 +0000

Introduction

NVIDIA libraries are notorious for breaking Xserver particularly in the ubuntu Linux distro. Here’s my installation guide on how to do a clean install without breaking display drivers. Hope it helps.

Installation

Pull Ubuntu 8.0 from here

Add a blacklist-nouveau.conf file to your etc/modprobe.d directory like so:

  sudo touch /etc/modprobe.d/blacklist-nouveau.conf

Add the following contents to the file you just created using your fave editor:

blacklist nouveau
options nouveau modeset=0

Turn off X server

  sudo service lightdm stop

Install Cuda 8.0
- cd to the directory where the cuda install file was stored and run it with admin rights e.g.
```
sudo ./cuda_8.0.61_375.26_linux.run
```
- Accept the EULA Licence agreement
- Accept yes for NVIDIA drivers install
- Accept yes for cuda-8.0 and cuda symlink
- Decline the installation of OpenGL Libraries (this breaks Xserver)
- Install Samples
- Decline the installation of nvidia-xconfig (you wouldn’t need it)
- Reboot your system after installation

Voila! We’re set to start developing with cuda.

PyTorch and rospy interoperability

Thu, 27 Apr 2017 09:15:00 +0000

Yesterday was mighty nightmarish in the life of this developer. I had trained a conv-net meant to classify an object I was trying to recognize and later on manipulate using vision-based control. Since PYTORCH had tensor computation with strong GPU acceleration and differential backprop capabilities based on the torch auto-grad system, I took advantage of its python compatibility since it would mean I could easily write my control code in rospy or roscpp and publish vision/control topics that reduces interoperability issues when working with different Linux processes. Only that I didn’t anticipate Python 2 and Python 3 module import problems way ahead of time. I would give more background below.

Background

For the record, I run ROS 1.x (indigo bare bones) on a ubuntu 14.04 machine with a 32GB RAM. The pytorch developers encourage users to install Torch with conda and typically use python3 since python 2 will be phased out in the near future. So, I had been using pytorch in a conda environment that both had a python 2 and python 3 environment. I could easily switch environments by turning on or off whichever python version I wanted. For details on how to do this, see this doc from the folks at conda.

So far, everything was working great. For ros applications that does not involve image processing classes such as CvBridge, I was able to get ros and pytorch to talk in python3 despite python3 being unofficially supported for ROS 1.x (see this github wiki). Getting this to work involves pip installing the necessary ros dependencies in python3 using this requirements.txt file. This github repo page shows how I do this.

Anyways, so I trained a conv net model in pytorch, no big deal. I had a roscpp node in running on a different workstation, but within the same ros network broadcasting sensor_msgs/Image RGB images on a designated topic. Given what I know, it should be easy subscribing to the image topic and forwarding the video stream through the pre-trained neural network model to obtain classification results. But boy was I wrong.

Importing `torch` into `rospy`

When you install pytorch with conda, it typically places the installation relative to your anaconda install path. For me this was in /home/$USER/anaconda3. So to be able to import Torch and use rospy’s’ CvBridge class simultaneously, I installed the following modules: netifaces, catkin_pkgs and rospkg via pip while in the python3 conda environment. Then I tried to import the convnet model from a different module’s class into a rospy module I had written.

to be able to import Torch and use rospy's CvBridge simultaneously, I installed the following modules: netifaces, catkin_pkgs and rospkg via pip

Say convnet.py model had entries like so:

    import torch
    import torch.nn as nn

    class ResNet(object):
      def __init__(self, args, **kwargs)

      def convModel(self, arg1, arg2):
        '''
          define some conv models
        '''

      def forward(self, x):
        '''
         do stuff with conv layers
        '''
        return self.fc(prev_layer(x))

and process_images.py file had an import statement like so

  from convnet import ResNet

  '''
    do stuff with imported model
  '''

I got weird errors like

    >> Python 3.6.0 (default, Oct 26 2016, 20:30:19)
    [GCC 4.8.4] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >> No module named Torch

Huh? Country boy comes to town. But the convnet.py model imports Torch okay. I figured the problem must be because I installed pytorch with the python3 version. And so I pulled the python2 version of pytorch from Soumith’s channel.

Now when I import, it says stuff like convnet module xx compiled with a different Torch version. What the heck pytorch?

Solution

At this moment, I stepped out for a walk, and caught a brainchild. What if I do away with the conda build of pytorch and instead install pytorch from source or PyPI?

It turns out that this is the most error-less prone way to import pytorch models into a rospy file or indeed a python2 file. To do this, I temporarily moved my anaconda3 folder out of bash’s native path, pulled the latest pytorch commit from github and then installed with python setup.py install.

Now when I try out the above commands, everything works well.

It turns out that this is the most error-less prone way to import Pytorch models into a rospy file or indeed a python2 file. To do this, I temporarily moved my anaconda3 folder out of bash’s native path, pulled the latest pytorch commit from github and then installed with python setup.py install.

So my two cents to the robotics community running neural net models in pytorch or tensorflow and using such models in rospy or equivalent environments is to always go for the source installation whenever and if possible. You would save yourself a lot of headache and time-waste.

Backpropagation and convex programming in MRAS systems

Wed, 05 Apr 2017 11:15:00 +0000

Introduction

The backpropagation algorithm is very useful for general optimization tasks, particularly in neural network function approximators and deep learning applications. Great progress in nonlinear function approximation has been made due to the effectiveness of the backprop algorithm. Whereas in traditional control applications, we typically use feedback regulation to stabilize the states of the system, in model reference adaptive control systems, we want to specify an index of performance to determine the “goodness” of our adaptation. An auxiliary dynamic system called the reference model is used in generating this index of performance (IP). The reference model specifies in terms of the input and states of the model a given index of performance and a comparison check determines appropriate control laws by comparing the given IP and measured IP based on the outputs of the adjustable system to that of the reference model system. This is called the error state space.

Nonlinear Model Reference Adaptive Systems

With nonlinear systems, the unknown nonlinearity, say $f(.)$, is usually approaximated with a function approximator such as a single hidden layer neural network. To date, the state-of-the-art used in adjusting the weights of a neural network is the backpropagation algorithm. The optimization in classical backprop is unrolled end-to-end so that the complexity of the network increases when we want to add an argmin differentiation layer before the final neural network layer. The final layer determines the controller parameters or generates the control laws used in adjusting the plant behavior. Fitting the control laws into actuator constraints such as model predictive control schemes allow is not explicitly formulated when using the backprop algorithm; ideally, we would want to fit a quadratic convex layer to compute controller parameters exactly. We cannot easily fit a convex optimization layer into the backprop algorithm using classical gradient descent because the explicit Jacobians of the gradients of the system’s energy function with respect to system parameters is not exactly formulated (but rather are ordered derivatives which fluctuate about the global/local minimum when the weights of the network converge).

The final layer determines the controller parameters or generates the control laws used in adjusting the plant behavior. Fitting the control laws into actuator constraints such as model predictive control schemes allow is not explicitly formulated when using the backprop algorithm; ideally, we would want to fit a quadratic convex layer to compute controller parameters exactly.

To generate control laws such as torques to control a motor arm in a multi-dof robot-arm for example, we would want to define a quadratic programming layer as the last layer of our neural network optimization algorithm so that effective control laws that exactly fit into actuator saturation limits are generated. Doing this requires a bit of tweaking of the backprop algorithm on our part.

Solving Quadratic Programming in a Backprop setting

When trying to construct a controller for a regulator, or an MRAS system, we may imagine that the control law determination is a search process for a control scheme that takes an arbitrary nonzero initial state to a zero state, ideally in a short amount of time. If the system is controllable, then we may require the controller taking the system, from state $x(t_0)$ to the zero state at time $T$. If $T$ is closer to $t_0$ than not, more control effort would be required to bear states to $t_0$. This would ensure the transfer of states. In most engineering systems, an upper bound is set on the magnitudes of the variables for pragmatic purposes. It therefore becomes impossible to take $T$ to $0$ without exceeding the control bounds. Unless we are ready to tolerate high gain terms in the controller parameters, the control is not feasible for finite T. So what do we do? To meet the practical bounds manufacturers place on physical actuators, it suffices to manually formulate these bounds as constraints into the control design objectives.

Model predictive controllers have explicit ways of incorporating these constraints into the control design. There are no rules for tuning the parameters of an MRAC system so that the control laws generated in our adjustment mechanism are scaled into the bounds of the underlying actuator.

Since most controller hardware constraints are specified in terms of lower and upper bounded saturation, the QP problem formulated below is limited to inequality constraints. For equality-constrained QP problems, Mattingley and Boyd, Vanderberghe’s CVX Optimization, or Brandon Amos’ ICML submission offer good treatments.

There are no rules for tuning the parameters of an MRAC system so that the control laws generated in our adjustment mechanism are scaled into the bounds of the underlying actuator.

We define the standard QP canonical form problem with inequality contraints thus:

\begin{align} \text{minimize} \quad \frac{1}{2}x^TQx + q^Tx \label{eq:orig} \end{align}

subject to

\begin{align} G x \le h \nonumber \end{align}

where $Q \succeq \mathbb{S}^n_+ $ (i.e. a symmetric, positive semi-definite matrix) $\in \mathbb{R}^n, q \in \mathbb{R}^n, G \in \mathbb{R}^{p \times n}, \text{ and } h \in \mathbb{R}^p $. Suppose we have our convex quadratic optimization problem in canonical form, we can use primal-dual interior point methods (PDIPM) to find an optimal solution to such a problem. PDIPMs are the state-of-the-art in solving such problems. Primal-dual methods with Mehrota predictor-corrector are consistent for reliably solving QP embedded optimization problems within 5-25iterations, without warm-start (Boyd and Mattingley, 2012).

Slack Variables

Given \eqref{eq:orig}, one can introduce slack variables, $s \in \mathbb{R}^p$ as follows,

\begin{align} \text{minimize} \quad \frac{1}{2}x^TQx + q^Tx \label{eq:orig1} \end{align}

subject to

\begin{align} \quad G x + s = h, \qquad s \ge 0 \nonumber \end{align}

where $x \in \mathbb{R}^n, s \in \mathbb{R}^p$. If we let a dual variable $z \in \mathbb{R}^p $ be associated with the inequality constraint, then we can define the KKT conditions for \eqref{eq:orig1} as

\[Gx + s = h, \quad s \ge 0 \\ z \ge 0 \\ Qx + q + G^T = 0 \\ \\ z_i s_i = 0, i = 1, \ldots, p.\]

More formally, if we write the Lagrangian of system \eqref{eq:orig} as

\begin{align} L(z, \lambda) = \frac{1}{2}x^TQx + q^Tx +\lambda^T(Gz -h) \label{eq:Lagrangian} \end{align}

it follows that the KKT for stationarity, primal feasibility and complementary slackness are,

\begin{align} Q x^\ast + q + G^T \lambda^\ast = 0 , \label{eq:KKTLagrangian} \end{align}

\[K \left(\lambda^\ast\right) \left(G x^\ast - h\right) = 0\]

where $K(\cdot) = \textbf{diag}(k) $ is an operator that creates a matrix diagonal of the entries of the vector $k$. Computing the time-derivative of \eqref{eq:KKTLagrangian}, we find that

\begin{align} dQ x^* + Q dx + dq + dG^T \lambda^* + G^T d\lambda = 0 \label{eq:KKTDiff} \end{align}

\[K(\lambda^*)\left(G x^* - h\right) = 0\]

QP Layer as the last layer in backpropagation

Vectorizing \eqref{eq:KKTDiff}, we find

\[\begin{bmatrix} Q & G^T \\ K(\lambda^\ast) G & K(dGx^\ast - h) \\ \end{bmatrix} \begin{bmatrix} dx \\ d\lambda \\ \end{bmatrix} = \begin{bmatrix} -dQ x^\ast - dq - dG^T \lambda^\ast \\ -K(\lambda^\ast) dG x^\ast + DK(\lambda^\ast) dh \\ \end{bmatrix}\]

so that the Jacobians of the variables to be optimized can be formed with respect to the states of the system. Finding $\dfrac{\partial J}{\partial h^*}$, for example, would involve passing $dh$ as identity and setting other terms on the rhs in the equation above to zero. After solving the equation, the desired Jacobian would be $dz$. With backpropagation, however, the explicit Jacobians are useless since the gradients of the network parameters are computed using chain rule for ordered derivatives i.e.

\[\dfrac{\partial ^+ J}{ \partial h_i} = \dfrac{\partial J}{ \partial h_i} + \sum_{j > i} \dfrac{\partial ^+ J}{\partial h_j} \dfrac{ {\partial} h_j}{ \partial h_i}\]

where the derivatives with superscripts denote ordered derivatives and those with subscripts denote ordinary partial derivatives. The simple partial derivatives denote the direct effect of $h_i$ on $h_j$ through the linear set of equations that determine $h_j$. To illustrate further, suppose that we have a system of equations given by

\[x_2 = 3 x_1 \\ x_3 = 5 x_1 + 8 x_2\]

The ordinary partial derivatives of $x_3$ with respect to $x_1$ would be $5$. However, the ordered derivative of $x_3$ with respect to $x_1$ would be $29$ (because of the indirect effect by way of $x_2$).

So with the backprop algorithm, we would form the left matrix-vector product with a previous backward pass vector, $\frac{\partial J}{\partial x^\ast} \in \mathbb{R}^n $; this is mathematically equivalent to $\frac{\partial J}{ \partial x^\ast} \cdot \frac{\partial x^\ast}{ \partial h} $. Therefore, computing the solution for the derivatives of the optimization variables $dx, d\lambda$, we have through the matrix inversion of \eqref{eq:KKTDiff},

\[\begin{bmatrix} dx \\ d\lambda \end{bmatrix} = \begin{bmatrix} Q & G^T K(\lambda^\ast) \\ G & K(Gx^\ast - h) \end{bmatrix}^{-1} = \begin{bmatrix} {\dfrac{dJ}{dx^\ast}}^T \\ 0 \end{bmatrix}.\]

The relevant gradients with respect to every QP paramter is given by

\[\dfrac{\partial J}{\partial q} = d_x, \qquad \dfrac{\partial J}{ \partial h} = -K(\lambda^\ast) d_\lambda \\ \dfrac{\partial J}{\partial Q} = \frac{1}{2}(d_x x^T + x d_x^T), \qquad \dfrac{\partial J}{\partial G} = K(\lambda^\ast)(d_\lambda z^T + \lambda d_z^T )\]

QP Initialization

For the primal problem,

\[\text{minimize} \quad \frac{1}{2}x^T Q x + p^T x + (\frac{1}{2}\|s\|^2_2) \\ \text{ subject to } \quad Gx + s = h \\\]

with $x$ and $s$ as variables to be optimized, the corresponding dual problem is,

\[\text{maximize} \quad -\frac{1}{2}w^T Q w - h^T z + (\frac{1}{2}\|z\|^2_2) \\ \text{ subject to } \quad Qw + G^T z + q = 0 \\\]

with variables $w$ and $z$ to be optimized.

Optimization Steps

When the primal and dual starting points $\hat{x}, \hat{s}, \hat{y}, \hat{z} $ are unknown, they can be initialized as proposed by Vanderberghe in cvxopt namely, we solve the following linear equations

\[\begin{bmatrix} G & -I \\ Q & G^T \end{bmatrix} \begin{bmatrix} z \\ x \\ \end{bmatrix} = \begin{bmatrix} h \\ -q \\ \end{bmatrix}\]

with the assumption that $\hat{x} = x,\hat{y} = y$.

The initial value of $\hat{s}$ is computed from the residual $h - Gx = -z$, as

\[\hat{s} = \begin{cases} -z \qquad \text{ if } \alpha_p < 0 \qquad else \\ -z + (1+\alpha_p)\textbf{e} \end{cases}\]

for $\alpha_p = \text{ inf } { \alpha | -z + \alpha \textbf{e} \succeq 0 } $.

Similarly, $z$ at the first iteration is computed as follows

\[\hat{z} = \begin{cases} z \qquad \text{ if } \alpha_d < 0 \qquad else \\ z + (1+\alpha_d)\textbf{e} \end{cases}\]

for $\alpha_d = \text{ inf } { \alpha | z + \alpha \textbf{e} \succeq 0 } $.

Note $\textbf{e}$ is identity.

Following Boyd and Mattingley’s convention, we can compute the afiine scaling directions by solving the system,
\[\begin{bmatrix} G &I &0\\ 0 &K(z) & K(s) \\ Q &0 &G^T \end{bmatrix} \begin{bmatrix} \Delta z^{aff} \\ \Delta s^{aff} \\ \Delta x^{aff} \end{bmatrix} = \begin{bmatrix} -Gx - s + h \\ -K(s)z \\ -G^Tz + Qx + q \end{bmatrix}\]
with $ K(s) \text{ as } \textbf{diag}(s) \text{ and } K(z) \text{ as } \textbf{diag(z)} $
The centering-plus-corrector directions can be used to efficiently compute the primal and sualvariables by solving
\[\begin{bmatrix} G &I &0\\ 0 &K(z) & K(s) \\ Q &0 &G^T \end{bmatrix} \begin{bmatrix} \Delta z^{cc} \\ \Delta s^{cc} \\ \Delta x^{cc} \end{bmatrix} = \begin{bmatrix} 0 \\ \sigma \mu \textbf{e} - K(\Delta s^{aff}) \Delta z^{aff} \\ 0 \end{bmatrix}\]
where

\begin{align} \alpha = \left(\dfrac{(s+ \alpha \Delta s^{aff})^T(z + \alpha \Delta z^{aff})}{s^Tz}\right)^3 \nonumber \end{align}

and the step size $\alpha = \text{sup} {\alpha \in [0, 1] | s + \alpha \Delta s^{aff} \ge 0, \, z + \alpha \Delta z^{aff} \ge 0}. $
Finding the primal and dual variables is then a question of composing the two updates in the foregoing to yield
\[x \leftarrow x + \alpha \Delta x, \\ s \leftarrow s + \alpha \Delta s, \\ z \leftarrow z + \alpha \Delta z.\]

Example code

An example implementation of this algorithm in the PyTorch Library is available on my github page.

Acknowledgment

I would like to thank Brandon Amos of the CMU Locus Lab for his generosity in answering my questions while using his qpth OptNET framework.

Properties	Equations
Dynamics:	\(\dot{x} =f(x, u, t), \quad x(t_0) = x_0 \)
Cost:	\(J(x,u,\tau) = \int\limits_{t=t_0}^{T} L\left(x(\tau), u(\tau), \tau\right)d\tau + V(x(T))\)
Optimal cost :	\(J^\star(x,t) = \min_{u[t,T]}J\)
Hamiltonian:	\(H(x,u,t) = L\left(x, u, \tau\right) + \left(\dfrac{\partial J^\star(x, t)}{\partial x} \right) f(x,u,t)\)
Optimal Control:	\(u^\star(t) = H^\star(x,u,t) = \nabla H_u(x,u,t)\)
HJB Equation:	\(-\dfrac{\partial J^\star(x,t)}{\partial t} = H^\star(x, \nabla_x J^\star(x,t),t) )\) and \(J^\star(x(T), T) = V(x(T))\)

Lekan Molu

Optimality vs. Stability of Feedback Control Systems.

Intro

Stability vs optimality

Optimality

Control Commons

Intro

Definitions, Theorems, Lemmas and such.

Nonlinear Control Theory

Stability

What Good Research is Not

Vladen Koltun’s Advice

David Mermin.

More Advice.

Daniel Liberzon Research Quotes.

How to write Mathematics.

Daniel Liberzon - How to write a good paper.

Daniel Liberzon - How to peer review.

George M. Whitesides – Writing a Paper.

Dmitri Bertsekas – Ten Simple Rules for Mathematical Writing.

Don Knuth –Mathematical Writing.

Don Knuth – The Elements of Mathematical Writing.

N. David Mermin – What’s Wrong With These Equations.

What's behind IEEE RAS Best Conference Papers?

On the necessary and sufficient conditions for optimal controllers

Problem Statement

Assumption I

Assumption II

Conclusions

Summary

Further Readings

Should I use ROS or MuJoCo?

Proper (Long) Answer

Twanging git pull, push and clone

Introduction

Pulling/Pushing git remotes from a LAN/WAN repo

Cloning git remotes from a LAN/WAN repo

Ubuntu-16.04 and Cuda-8.0 Install Guide

Introduction

Installation

PyTorch and rospy interoperability

Table of Contents:

Introduction

Background

Importing torch into rospy

Solution

Backpropagation and convex programming in MRAS systems

Table of Contents:

Introduction

Nonlinear Model Reference Adaptive Systems

Solving Quadratic Programming in a Backprop setting

Slack Variables

QP Layer as the last layer in backpropagation

QP Initialization

Optimization Steps

Example code

Acknowledgment

Importing `torch` into `rospy`