Consider the system in scalar variables, for t=1,2,…,h :
xtytx^0=xt−1+ut−1=xt−1+ηt=x0+η0
where x^0 is given, yt,ut are observed at t, but x0,x1,… and η0,η1,… are unobservable, and η0,η1,… are independent random variables with mean 0 and variance v. Define x^t−1 to be the estimator of xt−1 with minimum variance amongst all estimators that are unbiased and linear functions of Wt−1=(x^0,y1,…,yt−1,u0,…,ut−2). Suppose x^t−1=aTWt−1 and its variance is Vt−1. After observation at t of (yt,ut−1), a new unbiased estimator of xt−1, linear in Wt, is expressed
xt−1∗=(1−H)bTWt−1+Hyt
Find b and H to minimize the variance of xt−1∗. Hence find x^t in terms of x^t−1,yt,ut−1, Vt−1 and v. Calculate Vh.
Suppose η0,η1,… are Gaussian and thus x^t=E[xt∣Wt]. Consider minimizing E[xh2+∑t=0h−1ut2], under the constraint that the control ut can only depend on Wt. Show that the value function of dynamic programming for this problem can be expressed
F(Wt)=Πtx^t2+⋯
where F(Wh)=x^h2+Vh and +⋯ is independent of Wt and linear in v.