Azuma-Hoeffding inequality: Cosmos — All that is, or was, or ever will be

Azuma-Hoeffding inequality

cosmos 29th May 2018 at 5:08pm

Note that the random variables of a martingale are not in general independent. However, the following general concentration bound holds for every martingale.

Theorem [Azuma-Hoeffding inequality]: Let $X_0,X_1,\ldots,X_n$ be a Martingale such that

$|X_i - X_{i-1}| \leq c_i.$

Then for any $\lambda>0$ ,

$P(X_n-X_0\geq \lambda) \leq \exp\left(-\frac{\lambda^2}{2\sum_{i=1}^n c_i^2}\right), \text{and}$ $P(X_n-X_0\leq -\lambda) \leq \exp\left(-\frac{\lambda^2}{2\sum_{i=1}^n c_i^2}\right).$

Proof: We will only prove the first inequality, as the proof of the second one is very similar. The proof is again an application of Markov’s inequality to an appropriate random variable and it is similar to the proof of Chernoff’s bounds.

To simplify the notation, we use the variables $Y_i=X_i-X_{i-1}$ . The steps of the proof are

We use the standard technique of Chernoff bounds and instead of bounding $P(X_n-X_0\geq \lambda)$ , we bound $P(e^{t(X_n-X_0)}\geq e^{\lambda t})$ using Markov's inequality

$P(e^{t(X_n-X_0)}\geq e^{\lambda t}) \leq e^{-\lambda t} E[e^{t(X_n-X_0)}].$

From now on we focus on $E[e^{t(X_n-X_0)}]$ , which can be rewritten in terms of $Y_i$ ’s instead of $X_i$ ’s, as

$E[e^{t(X_n-X_0)}]=E\left[\prod_{i=1}^n e^{t Y_i}\right],$

by telescoping, $X_n-X_0=\sum_{i=1}^n (X_i-X_{i-1})=\sum_{i=1}^n Y_i$ .

At this point in the proof of the Chernoff bounds, we used the fact that the variables are independent and we rewrote the expectation of the product as a product of expectations. We cannot do this here because random variables $Y_i$ are not independent. Instead, we consider the conditional expectation

$E\left[\prod_{i=1}^n e^{t Y_i} \,|\, X_0,\ldots, X_{n-1} \right] = \left( \prod_{i=1}^{n-1} e^{t Y_i} \right) E\left[e^{t Y_n} \,|\, X_0,\ldots, X_{n-1} \right],$

because for fixed $X_0,\ldots, X_{n-1}$ , all but the last factor in the product are constants and can be moved out of the expectation.

With this in mind, we turn our attention on finding an upper bound on $E[e^{t Y_i} \,|\, X_0,\ldots, X_{i-1}]$ .

We first observe that $E[Y_i \,|\, X_0,\ldots, X_{i-1}]=0$ , by the martingale property:

$E[Y_i \,|\, X_0,\ldots, X_{i-1}] = E[X_i-X_{i-1} \,|\,X_0,\ldots,X_{i-1}]$ $= E[X_i \,|\, X_0,\ldots,X_{i-1}]-E[X_{i-1} \,|\, X_0,\ldots,X_{i-1}] =X_{i-1}-X_{i-1} =0$

Using the premise that $|Y_i|\leq c_i$ , we bound

$e^{t Y_i} \leq \beta_i + \gamma_i Y_i,$

for $\beta_i=(e^{tc_i}+e^{-tc_i})/2 \leq e^{(tc_i)^2/2}$ , and $\gamma_i=(e^{tc_i}+e^{-tc_i})/(2c_i)$ . To show this, rewrite $Y_i$ as $Y_i=r c_i + (1-r) (-c_i)$ , where $r=\frac{1+Y_i/c_i}{2}\in [0,1]$ , and use the convexity of $e^{t x}$ to get

$e^{t Y_i} \leq r e^{t c_i}+(1-r) e^{-t c_i} = \frac{e^{t c_i}+e^{-t c_i}}{2} + Y_i \ \frac{e^{t c_i}-e^{-t c_i}}{2 c_i} = \beta_i + \gamma_i Y_i.$

To bound $\beta_i$ from above, use the fact that for every $x$ : $e^x+e^{-x} \leq 2 e^{x^2/2}$ .

Combine the above to get $E[e^{t Y_i} \,|\, X_0,\ldots, X_{i-1}] \leq E[\beta_i + \gamma_i Y_i \,|\, X_0,\ldots, X_{i-1}] = \beta_i \leq e^{(tc_i)^2/2}.$

It follows that

$E\left[ \prod_{i=1}^n e^{t Y_i} \right] = E\left[(\prod_{i=1}^{n-1} e^{t Y_i}) \, e^{t Y_n} \,|\, X_0,\ldots, X_{n-1} \right]$ $= (\prod_{i=1}^{n-1} e^{t Y_i}) \, E[e^{t Y_n} \,|\, X_0,\ldots, X_{n-1}] \leq (\prod_{i=1}^{n-1} e^{t Y_i}) \, e^{(t c_n)^2/2}.$

We now take expectations on both sides to get rid of the conditional expectation,

$E\left[ \prod_{i=1}^n e^{t Y_i} \right] = E\left[ E\left[(\prod_{i=1}^{n-1} e^{t Y_i}) e^{t Y_n} \,|\, X_0,\ldots, X_{n-1}\right] \right] \leq E\left[ \prod_{i=1}^{n-1} e^{t Y_i} \right] e^{(tc_n)^2/2}.$

Using standard techniques we can now finish the proof. By induction, $E[\prod_{i=1}^n e^{t Y_i}]\leq \prod_{i=1}^n e^{(tc_i)^2/2} = e^{t^2 \sum_{i=1}^n c_i^2/2}$

Therefore $P(e^{t(X_n-X_0)}\geq e^{\lambda t}) \leq e^{-\lambda t} e^{t^2 \sum_{i=1}^n c_i^2/2}$

Set $t=\lambda/\sum_{i=1}^n c_i^2$ to minimise the above expression and get the bound of the theorem.

Step 2 in the proof is crucial because, using conditionals, it bounds the product of the random variables $\prod_{i=1}^{n-1} e^{t Y_i}$ and $e^{t Y_n}$ , although the two variables are not in general independent.

Applications

See here