Paper 1, Section II, $11 K$

Let $\Sigma_{1}=\left\{\mu_{1}, \ldots, \mu_{N}\right\}$ be a finite alphabet and $X$ a random variable that takes each value $\mu_{i}$ with probability $p_{i}$ . Define the entropy $H(X)$ of $X$ .

Suppose $\Sigma_{2}=\{0,1\}$ and $c: \Sigma_{1} \rightarrow \Sigma_{2}^{*}$ is a decipherable code. Write down an expression for the expected word length $E(S)$ of $c$ .

Prove that the minimum expected word length $S^{*}$ of a decipherable code $c: \Sigma_{1} \rightarrow \Sigma_{2}^{*}$ satisfies

H(X) \leqslant S^{*}<H(X)+1 .

[You can use Kraft's and Gibbs' inequalities as long as they are clearly stated.]

Suppose a decipherable binary code has word lengths $s_{1}, \ldots, s_{N}$ . Show that

N \log N \leqslant s_{1}+\cdots+s_{N} .

Suppose $X$ is a source that emits $N$ sourcewords $a_{1}, \ldots, a_{N}$ and $p_{i}$ is the probability that $a_{i}$ is emitted, where $p_{1} \geqslant p_{2} \geqslant \cdots \geqslant p_{N}$ . Let $b_{1}=0$ and $b_{i}=\sum_{j=1}^{i-1} p_{j}$ for $2 \leqslant i \leqslant N$ . Let $s_{i}=\left\lceil-\log p_{i}\right\rceil$ for $1 \leqslant i \leqslant N$ . Now define a code $c$ by $c\left(a_{i}\right)=b_{i}^{*}$ where $b_{i}^{*}$ is the (fractional part of the) binary expansion of $b_{i}$ to $s_{i}$ decimal places. Prove that this defines a decipherable code.

What does it mean for a code to be optimal? Is the code $c$ defined in the previous paragraph in terms of the $b_{i}^{*}$ necessarily optimal? Justify your answer.