Published in: The IEEE Control Systems Letters (L-CSS)
Authors: Jin Gyu Lee and Hyungbo Shim
DOI: 10.1109/LCSYS.2019.2923475
Abstract: In this letter, we review an existing distributed least-squares solver and share some new insights on it. Then, by the observation that an estimation of a constant vector under output noise can be translated into finding the least-squares solution, we present an algorithm for distributed estimation of the state of linear time-invariant systems under measurement noise. The proposed algorithm consists of a network of local observers, where each of them utilizes local measurements and information transmitted from the neighbors. It is proven that even under non-vanishing and time-varying measurement noise, we could obtain an almost best possible estimate with arbitrary precision. Some discussions regarding the plug-and-play operation are also given.

This work is a continued effort of CDSL to the distributed estimation problem, where a method to handle non-vanishing and time-varying measurement noise is provided. The main idea of this paper is the distributed least-squares solver. For more technical details, I refer to the paper.

A distributed least-squares solver

We first introduce the distributed least-squares solver for the linear equation $Ax = b$ , developed by G. Shi and B. D. O. Anderson, whose individual dynamics is given by

$$\dot{x}_i(t) = -A_i^T(A_ix_i(t) – b_i) + \gamma\sum_{j \in \mathcal{N}_i}(x_j(t) – x_i(t)),$$

where $\gamma > 0$ is the coupling gain, and $\mathcal{N}_i$ is a subset of $\mathcal{N}:= \{1, 2, \dots, N\}$ , whose elements are the indices of the connected agents that send the information to the agent $i$ . Each agent $i$ uses the information of $A_i$ and $b_i$ only, in order to solve the global problem $Ax = b$ , where $A = {\rm col}(A_1, \dots, A_N)$ and $b = {\rm col}(b_1, \dots, b_N)$ .

Here, another insight behind this distributed least-squares solver is given from the so-called blended dynamics approach. In this approach, it has been shown that the behavior of heterogeneous multi-agent systems

$$\dot{x}_i(t) = f_i(t,x_i(t)) + \gamma \sum_{j \in \mathcal{N}_i} (x_j(t) – x_i(t)), \quad i \in \mathcal{N},$$

with large coupling gain $\gamma$ , is approximated by the behavior of the blended dynamics defined by

$$\dot{\hat{x}}(t) = \frac{1}{N}\sum_{i=1}^N f_i(t,\hat{x}(t)), \qquad \hat x(0) = \frac{1}{N} \sum_{i=1}^N x_i(0)$$

when the blended dynamics has a certain type of stability such as asymptotic stability with respect to a compact set. In our case, the blended dynamics of the distributed least-squares solver is obtained as the gradient descent algorithm that solves the minimization of $\|Ax -b\|^2$ given by

$$\dot{\hat{x}}(t) = -\frac{1}{N}A^T(A\hat{x}(t) – b),$$

which is a linear system. Since the solution exponentially converges to $(A^TA)^{-1}A^Tb$ , it is expected for the solution of the distributed least-squares solver to converge approximately to the least-squares solution.

Least-squares solver is a best possible estimator

Consider for a moment estimation of a constant vector $\chi$ from a corrupted measurement $b$ , which is given by $b = A\chi + o$ where $A$ is a known output matrix and $o$ is an unknown constant noise. Therefore, any estimator $\mathsf{E}$ can be represented as a mapping from the pair $(A, b)$ to its estimate $\mathsf{E}(A, b)$ . Then, a fundamental limitation arises that applies to any possible estimator, as follows.

For any estimator $\mathsf{E}$ , there exists a pair $(\chi,o)$ such that the estimate $\mathsf{E}(A, b)$ has an estimation error larger than or equal to $|o|/\sigma_1(A)$ , i.e., $$|\mathsf{E}(A, b) – \chi| = |\mathsf{E}(A, A\chi + o) – \chi| \ge |o|/\sigma_1(A)$$
where $\sigma_1(A)$ denotes the smallest singular value of $A$ .

According to this fundamental limitation, let us call $\mathsf{E}$ by a best possible estimator if there is no $(\chi, o)$ such that
$$|\mathsf{E}(A, A\chi + o) – \chi| > |o|/\sigma_1(A).$$
In fact, the mapping $\mathsf{E}_{\text{lss}}$ , that maps the pair $(A, b)$ to the least-squares solution of $Ax = b$ , is an example of a best possible estimator, because we have $$\mathsf{E}_{\text{lss}}(A, A\chi + o) = \chi + (A^TA)^{-1}A^To,$$
from which, we get for any $\chi$ and $o$ ,
$$|\mathsf{E}_{\text{lss}}(A, A\chi + o) – \chi| = |(A^TA)^{-1}A^To| \le |o|/\sigma_1(A).$$

A distributed algorithm that finds almost best possible estimate

Based on the discussion previously made, we now propose a distributed state estimator for the linear system $\dot{\chi}(t) = S\chi(t)$ and $y_i(t) = G_i\chi(t) + n_i(t)$ . For the time being, suppose first that the information about the state vector $\chi(t)$ is available in the form of the vector $b_i(t)$ , $i=1,\dots,N$ , in $b_i(t) = A_i\chi(t) + o_i(t)$ with the full-column rank matrix $A$ . Here we note that the vector $\chi(t)$ to be estimated is time-varying, and so, the distributed least-squares solver may not efficiently track the variation of $\chi(t)$ . Therefore, inspired by the internal model principle, our idea is to embed the model for the state $\chi(t)$ into the distributed least-squares solver as
$$\dot{\chi}_i(t) = S\chi_i(t) -\kappa A_i^T(A_i \chi_i(t) – b_i(t)) + \kappa\gamma\sum_{j \in \mathcal{N}_i}(\chi_j(t) – \chi_i(t)).$$
The algorithm achieves an almost best possible estimate of $\chi(t)$ with sufficiently large gains $\kappa$ and $\gamma$ .

Here, the notion of a best possible estimate is extended to cover non-vanishing and time-varying but bounded output noise $o(t)$ in the sense of limit supremum, i.e., the estimate $\hat{\chi}(t)$ generated by the knowledge of the output matrix $A$ and the output $A\chi(t) + o(t)$ is called a best possible estimate if we have
$$\limsup_{t \to \infty} |\hat{\chi}(t) – \chi(t)| \le \frac{1}{\sigma_1(A)} \limsup_{t \to \infty} |o(t)|.$$

1. Utilization of local partial state observer

Now, let us consider the system $\dot{\chi}(t) = S\chi(t)$ with the measurement output $y_i(t) = G_i\chi(t) + n_i(t)$ . Then, the information vector $b_i(t)$ might be an outcome of a local partial state observer that estimates the detectable portion of the state $\chi(t)$ , from the actual measurement $y_i(t)$ .

In particular, by performing the detectability decomposition for each pair $(G_i, S)$ , we obtain an orthogonal matrix $\begin{bmatrix} A_i^T & B_i^T\end{bmatrix}$ such that
$$\begin{bmatrix} A_i \ B_i \end{bmatrix} S \begin{bmatrix}A_i^T & B_i^T \end{bmatrix} = \begin{bmatrix} \bar{S}_i^{11} & 0 \\ \bar{S}_i^{21} & \bar{S}_i^{22} \end{bmatrix}, \quad G_i\begin{bmatrix} A_i^T & B_i^T \end{bmatrix} = \begin{bmatrix} \bar{G}_i & 0 \end{bmatrix},$$
and that $(\bar{G}_i, \bar{S}_i^{11})$ is detectable. Then, one can always find a local partial state observer of the form
$$\dot{b}_i(t) = P_i(b_i(t), y_i(t)),$$
which provides an estimate of $A_i\chi(t)$ with the estimation error $o_i(t) := b_i(t) - A_i\chi(t)$ . Since the detectability of the pair $(G, S)$ ensures that the matrix $A$ has a full-column rank.

An example of local partial state observer is given as
$$\dot{b}_i(t) = \bar{S}_i^{11}b_i(t) – \bar{L}_i(\bar{G}_i b_i(t) – y_i(t)),$$ where $\bar{L}_i$ is any matrix such that $\bar{S}_i^{11} - \bar{L}_i\bar{G}_i$ is Hurwitz, which exists because $(\bar{G}_i, \bar{S}_i^{11})$ is detectable. For this case, the estimation error $o_i$ satisfies $$\dot{o}_i(t) = (\bar{S}_i^{11} – \bar{L}_i\bar{G}_i)o_i(t) + \bar{L}_in_i(t) =: \bar{S}_i^H o_i(t) + \bar{L}_in_i(t),$$ and thus,
$$o_i(t) = e^{\bar{S}_i^{H}t}o_i(0) + \int_0^t e^{\bar{S}_i^{H}(t-\tau)}\bar{L}_in_i(\tau) d\tau,$$
which gives that $o_i(t)$ is bounded whenever $n_i(t)$ is bounded, and in addition satisfies
$$\limsup_{t \to \infty} |o(t)| \le \sqrt{\begin{matrix}\sum_{i=1}^N \left|(\bar{S}i^H)^{-1}\bar{L}_i\right|^2\end{matrix}}\limsup_{t \to \infty} |n(t)|.$$