Note that the random variables of a martingale are not in general independent. However, the following general concentration bound holds for every martingale.
Theorem [Azuma-Hoeffding inequality]: Let X0,X1,…,Xn be a Martingale such that
∣Xi−Xi−1∣≤ci.
Then for any λ>0,
P(Xn−X0≥λ)≤exp(−2∑i=1nci2λ2),and P(Xn−X0≤−λ)≤exp(−2∑i=1nci2λ2).
Proof: We will only prove the first inequality, as the proof of the second one is very similar. The proof is again an application of Markov’s inequality to an appropriate random variable and it is similar to the proof of Chernoff’s bounds.
To simplify the notation, we use the variables Yi=Xi−Xi−1. The steps of the proof are
We use the standard technique of Chernoff bounds and instead of bounding P(Xn−X0≥λ), we bound P(et(Xn−X0)≥eλt) using Markov's inequality
P(et(Xn−X0)≥eλt)≤e−λtE[et(Xn−X0)].
From now on we focus on E[et(Xn−X0)], which can be rewritten in terms of Yi’s instead of Xi’s, as
E[et(Xn−X0)]=E[∏i=1netYi],
by telescoping, Xn−X0=∑i=1n(Xi−Xi−1)=∑i=1nYi.
At this point in the proof of the Chernoff bounds, we used the fact that the variables are independent and we rewrote the expectation of the product as a product of expectations. We cannot do this here because random variables Yi are not independent. Instead, we consider the conditional expectation
E[∏i=1netYi∣X0,…,Xn−1]=(∏i=1n−1etYi)E[etYn∣X0,…,Xn−1],
because for fixed X0,…,Xn−1, all but the last factor in the product are constants and can be moved out of the expectation.
With this in mind, we turn our attention on finding an upper bound on E[etYi∣X0,…,Xi−1].
We first observe that E[Yi∣X0,…,Xi−1]=0, by the martingale property:
E[Yi∣X0,…,Xi−1]=E[Xi−Xi−1∣X0,…,Xi−1] =E[Xi∣X0,…,Xi−1]−E[Xi−1∣X0,…,Xi−1]=Xi−1−Xi−1=0
Using the premise that ∣Yi∣≤ci, we bound
etYi≤βi+γiYi,
for βi=(etci+e−tci)/2≤e(tci)2/2, and γi=(etci+e−tci)/(2ci). To show this, rewrite Yi as Yi=rci+(1−r)(−ci), where r=21+Yi/ci∈[0,1], and use the convexity of etx to get
etYi≤retci+(1−r)e−tci=2etci+e−tci+Yi 2cietci−e−tci=βi+γiYi.
To bound βi from above, use the fact that for every x: ex+e−x≤2ex2/2.
Combine the above to get
E[etYi∣X0,…,Xi−1]≤E[βi+γiYi∣X0,…,Xi−1]=βi≤e(tci)2/2.
It follows that
E[∏i=1netYi]=E[(∏i=1n−1etYi)etYn∣X0,…,Xn−1] =(∏i=1n−1etYi)E[etYn∣X0,…,Xn−1]≤(∏i=1n−1etYi)e(tcn)2/2.
We now take expectations on both sides to get rid of the conditional expectation,
E[∏i=1netYi]=E[E[(∏i=1n−1etYi)etYn∣X0,…,Xn−1]]≤E[∏i=1n−1etYi]e(tcn)2/2.
Using standard techniques we can now finish the proof.
By induction, E[∏i=1netYi]≤∏i=1ne(tci)2/2=et2∑i=1nci2/2
Therefore P(et(Xn−X0)≥eλt)≤e−λtet2∑i=1nci2/2
Set t=λ/∑i=1nci2 to minimise the above expression and get the bound of the theorem.
Step 2 in the proof is crucial because, using conditionals, it bounds the product of the random variables ∏i=1n−1etYi and etYn, although the two variables are not in general independent.
Applications
See here