Azuma-Hoeffding inequality

cosmos 29th May 2018 at 5:08pm
Martingale

Note that the random variables of a martingale are not in general independent. However, the following general concentration bound holds for every martingale.

Theorem [Azuma-Hoeffding inequality]: Let X0,X1,,XnX_0,X_1,\ldots,X_n be a Martingale such that

XiXi1ci. |X_i - X_{i-1}| \leq c_i.

Then for any λ>0\lambda>0,

P(XnX0λ)exp(λ22i=1nci2),and P(X_n-X_0\geq \lambda) \leq \exp\left(-\frac{\lambda^2}{2\sum_{i=1}^n c_i^2}\right), \text{and} P(XnX0λ)exp(λ22i=1nci2). P(X_n-X_0\leq -\lambda) \leq \exp\left(-\frac{\lambda^2}{2\sum_{i=1}^n c_i^2}\right).


Proof: We will only prove the first inequality, as the proof of the second one is very similar. The proof is again an application of Markov’s inequality to an appropriate random variable and it is similar to the proof of Chernoff’s bounds.

To simplify the notation, we use the variables Yi=XiXi1Y_i=X_i-X_{i-1}. The steps of the proof are

We use the standard technique of Chernoff bounds and instead of bounding P(XnX0λ)P(X_n-X_0\geq \lambda), we bound P(et(XnX0)eλt)P(e^{t(X_n-X_0)}\geq e^{\lambda t}) using Markov's inequality

P(et(XnX0)eλt)eλtE[et(XnX0)]. P(e^{t(X_n-X_0)}\geq e^{\lambda t}) \leq e^{-\lambda t} E[e^{t(X_n-X_0)}].

From now on we focus on E[et(XnX0)]E[e^{t(X_n-X_0)}], which can be rewritten in terms of YiY_i’s instead of XiX_i’s, as

E[et(XnX0)]=E[i=1netYi], E[e^{t(X_n-X_0)}]=E\left[\prod_{i=1}^n e^{t Y_i}\right],

by telescoping, XnX0=i=1n(XiXi1)=i=1nYiX_n-X_0=\sum_{i=1}^n (X_i-X_{i-1})=\sum_{i=1}^n Y_i.

At this point in the proof of the Chernoff bounds, we used the fact that the variables are independent and we rewrote the expectation of the product as a product of expectations. We cannot do this here because random variables YiY_i are not independent. Instead, we consider the conditional expectation

E[i=1netYiX0,,Xn1]=(i=1n1etYi)E[etYnX0,,Xn1], E\left[\prod_{i=1}^n e^{t Y_i} \,|\, X_0,\ldots, X_{n-1} \right] = \left( \prod_{i=1}^{n-1} e^{t Y_i} \right) E\left[e^{t Y_n} \,|\, X_0,\ldots, X_{n-1} \right],

because for fixed X0,,Xn1X_0,\ldots, X_{n-1}, all but the last factor in the product are constants and can be moved out of the expectation.

With this in mind, we turn our attention on finding an upper bound on E[etYiX0,,Xi1]E[e^{t Y_i} \,|\, X_0,\ldots, X_{i-1}].

We first observe that E[YiX0,,Xi1]=0E[Y_i \,|\, X_0,\ldots, X_{i-1}]=0, by the martingale property:

E[YiX0,,Xi1]=E[XiXi1X0,,Xi1] E[Y_i \,|\, X_0,\ldots, X_{i-1}] = E[X_i-X_{i-1} \,|\,X_0,\ldots,X_{i-1}] =E[XiX0,,Xi1]E[Xi1X0,,Xi1]=Xi1Xi1=0= E[X_i \,|\, X_0,\ldots,X_{i-1}]-E[X_{i-1} \,|\, X_0,\ldots,X_{i-1}] =X_{i-1}-X_{i-1} =0

Using the premise that Yici|Y_i|\leq c_i, we bound

etYiβi+γiYi, e^{t Y_i} \leq \beta_i + \gamma_i Y_i,

for βi=(etci+etci)/2e(tci)2/2\beta_i=(e^{tc_i}+e^{-tc_i})/2 \leq e^{(tc_i)^2/2}, and γi=(etci+etci)/(2ci)\gamma_i=(e^{tc_i}+e^{-tc_i})/(2c_i). To show this, rewrite YiY_i as Yi=rci+(1r)(ci)Y_i=r c_i + (1-r) (-c_i), where r=1+Yi/ci2[0,1]r=\frac{1+Y_i/c_i}{2}\in [0,1], and use the convexity of etxe^{t x} to get

etYiretci+(1r)etci=etci+etci2+Yi etcietci2ci=βi+γiYi. e^{t Y_i} \leq r e^{t c_i}+(1-r) e^{-t c_i} = \frac{e^{t c_i}+e^{-t c_i}}{2} + Y_i \ \frac{e^{t c_i}-e^{-t c_i}}{2 c_i} = \beta_i + \gamma_i Y_i.

To bound βi\beta_i from above, use the fact that for every xx: ex+ex2ex2/2e^x+e^{-x} \leq 2 e^{x^2/2}.

Combine the above to get E[etYiX0,,Xi1]E[βi+γiYiX0,,Xi1]=βie(tci)2/2. E[e^{t Y_i} \,|\, X_0,\ldots, X_{i-1}] \leq E[\beta_i + \gamma_i Y_i \,|\, X_0,\ldots, X_{i-1}] = \beta_i \leq e^{(tc_i)^2/2}.

It follows that

E[i=1netYi]=E[(i=1n1etYi)etYnX0,,Xn1] E\left[ \prod_{i=1}^n e^{t Y_i} \right] = E\left[(\prod_{i=1}^{n-1} e^{t Y_i}) \, e^{t Y_n} \,|\, X_0,\ldots, X_{n-1} \right] =(i=1n1etYi)E[etYnX0,,Xn1](i=1n1etYi)e(tcn)2/2.= (\prod_{i=1}^{n-1} e^{t Y_i}) \, E[e^{t Y_n} \,|\, X_0,\ldots, X_{n-1}] \leq (\prod_{i=1}^{n-1} e^{t Y_i}) \, e^{(t c_n)^2/2}.

We now take expectations on both sides to get rid of the conditional expectation,

E[i=1netYi]=E[E[(i=1n1etYi)etYnX0,,Xn1]]E[i=1n1etYi]e(tcn)2/2. E\left[ \prod_{i=1}^n e^{t Y_i} \right] = E\left[ E\left[(\prod_{i=1}^{n-1} e^{t Y_i}) e^{t Y_n} \,|\, X_0,\ldots, X_{n-1}\right] \right] \leq E\left[ \prod_{i=1}^{n-1} e^{t Y_i} \right] e^{(tc_n)^2/2}.

Using standard techniques we can now finish the proof. By induction, E[i=1netYi]i=1ne(tci)2/2=et2i=1nci2/2E[\prod_{i=1}^n e^{t Y_i}]\leq \prod_{i=1}^n e^{(tc_i)^2/2} = e^{t^2 \sum_{i=1}^n c_i^2/2}

Therefore P(et(XnX0)eλt)eλtet2i=1nci2/2P(e^{t(X_n-X_0)}\geq e^{\lambda t}) \leq e^{-\lambda t} e^{t^2 \sum_{i=1}^n c_i^2/2}

Set t=λ/i=1nci2t=\lambda/\sum_{i=1}^n c_i^2 to minimise the above expression and get the bound of the theorem.

Step 2 in the proof is crucial because, using conditionals, it bounds the product of the random variables i=1n1etYi\prod_{i=1}^{n-1} e^{t Y_i} and etYne^{t Y_n}, although the two variables are not in general independent.


Applications

See here