In information theory and statistics, Kullback's inequality is a lower bound on the Kullback–Leibler divergence expressed in terms of the large deviations rate function.[1]  If P and Q are probability distributions on the real line, such that P is absolutely continuous with respect to Q, i.e. P << Q, and whose first moments exist, then
 where
where  is the rate function, i.e. the convex conjugate of the cumulant-generating function, of
 is the rate function, i.e. the convex conjugate of the cumulant-generating function, of  , and
, and  is the first moment of
 is the first moment of  
The Cramér–Rao bound is a corollary of this result.
Proof
Let P and Q be probability distributions (measures) on the real line, whose first moments exist, and such that P << Q. Consider the natural exponential family of Q given by
 for every measurable set A, where
for every measurable set A, where  is the moment-generating function of Q.  (Note that Q0 = Q.)  Then
 is the moment-generating function of Q.  (Note that Q0 = Q.)  Then
 By Gibbs' inequality we have
By Gibbs' inequality we have  so that
 so that
 Simplifying the right side, we have, for every real θ where
Simplifying the right side, we have, for every real θ where  
 where
where  is the first moment, or mean, of P, and
 is the first moment, or mean, of P, and  is called the cumulant-generating function.  Taking the supremum completes the process of convex conjugation and yields the rate function:
 is called the cumulant-generating function.  Taking the supremum completes the process of convex conjugation and yields the rate function:
 
Corollary: the Cramér–Rao bound
Start with Kullback's inequality
Let Xθ be a family of probability distributions on the real line indexed by the real parameter θ, and satisfying certain regularity conditions.  Then
 
where  is the convex conjugate of the cumulant-generating function of
 is the convex conjugate of the cumulant-generating function of  and
 and  is the first moment of
 is the first moment of  
Left side
The left side of this inequality can be simplified as follows:
![{\displaystyle {\begin{aligned}\lim _{h\to 0}{\frac {D_{KL}(X_{\theta +h}\parallel X_{\theta })}{h^{2}}}&=\lim _{h\to 0}{\frac {1}{h^{2}}}\int _{-\infty }^{\infty }\log \left({\frac {\mathrm {d} X_{\theta +h}}{\mathrm {d} X_{\theta }}}\right)\mathrm {d} X_{\theta +h}\\&=-\lim _{h\to 0}{\frac {1}{h^{2}}}\int _{-\infty }^{\infty }\log \left({\frac {\mathrm {d} X_{\theta }}{\mathrm {d} X_{\theta +h}}}\right)\mathrm {d} X_{\theta +h}\\&=-\lim _{h\to 0}{\frac {1}{h^{2}}}\int _{-\infty }^{\infty }\log \left(1-\left(1-{\frac {\mathrm {d} X_{\theta }}{\mathrm {d} X_{\theta +h}}}\right)\right)\mathrm {d} X_{\theta +h}\\&=\lim _{h\to 0}{\frac {1}{h^{2}}}\int _{-\infty }^{\infty }\left[\left(1-{\frac {\mathrm {d} X_{\theta }}{\mathrm {d} X_{\theta +h}}}\right)+{\frac {1}{2}}\left(1-{\frac {\mathrm {d} X_{\theta }}{\mathrm {d} X_{\theta +h}}}\right)^{2}+o\left(\left(1-{\frac {\mathrm {d} X_{\theta }}{\mathrm {d} X_{\theta +h}}}\right)^{2}\right)\right]\mathrm {d} X_{\theta +h}&&{\text{Taylor series for }}\log(1-t)\\&=\lim _{h\to 0}{\frac {1}{h^{2}}}\int _{-\infty }^{\infty }\left[{\frac {1}{2}}\left(1-{\frac {\mathrm {d} X_{\theta }}{\mathrm {d} X_{\theta +h}}}\right)^{2}\right]\mathrm {d} X_{\theta +h}\\&=\lim _{h\to 0}{\frac {1}{h^{2}}}\int _{-\infty }^{\infty }\left[{\frac {1}{2}}\left({\frac {\mathrm {d} X_{\theta +h}-\mathrm {d} X_{\theta }}{\mathrm {d} X_{\theta +h}}}\right)^{2}\right]\mathrm {d} X_{\theta +h}\\&={\frac {1}{2}}{\mathcal {I}}_{X}(\theta )\end{aligned}}}](./_assets_/07eddde9b54dbbf114421992c3c7a8994577c318.svg) which is half the Fisher information of the parameter θ.
which is half the Fisher information of the parameter θ.
Right side
The right side of the inequality can be developed as follows:
 This supremum is attained at a value of t=τ where the first derivative of the cumulant-generating function is
This supremum is attained at a value of t=τ where the first derivative of the cumulant-generating function is  but we have
 but we have  so that
 so that
 Moreover,
Moreover,
 
Putting both sides back together
We have:
 which can be rearranged as:
which can be rearranged as:
 
See also
Notes and references