Sufficient statistic

cosmos 3rd April 2019 at 12:41am
Statistic

A sufficient statistic for random variable yy w.r.t to r.v. xx is a function of yy that retains all of its information relevant to xx.

An equivalent formulation is given by the Data processing theorem. See page 35 on Elements of Information theory by Thomas and Clover

A Minimal sufficient statistic is a function of every other sufficient statistic.

The notion of minimal sufficient statistics was introduced by Lehmann and Scheff´e (Lehmann and Scheff´e, 1950) as the simplest sufficient statistics, or the coarsest sufficient partition of the sample space which captures the relevant components of the sample with respect to the parameter.

An equivalent definition is that P(yx,T)=P(yT) P(y|x,T) = P(y|T), so that if we are given TT, knowing xx doesn't change what we know about yy. So TT is a sufficient statistic of yy w.r.t xx.

Basically the first definition tells us that T tells us everything we need to know of y about x, and the second tells us that T tells us that knowing y or not is irrelevant for x, if we already know y, which makes sense if you think about it