In information theory, the error exponent of a channel code or source code over the block length of the code is the rate at which the error probability decays exponentially with the block length of the code. Formally, it is defined as the limiting ratio of the negative logarithm of the error probability to the block length of the code for large block lengths.  For example, if the probability of error  of a decoder drops as
 of a decoder drops as  , where
, where  is the block length, the error exponent is
 is the block length, the error exponent is  . In this example,
. In this example,  approaches
 approaches  for large
 for large  . Many of the information-theoretic theorems are of asymptotic nature, for example, the channel coding theorem states that for any rate less than the channel capacity, the probability of the error of the channel code can be made to go to zero as the block length goes to infinity. In practical situations, there are limitations to the delay of the communication and the block length must be finite. Therefore, it is important to study how the probability of error drops as the block length go to infinity.
. Many of the information-theoretic theorems are of asymptotic nature, for example, the channel coding theorem states that for any rate less than the channel capacity, the probability of the error of the channel code can be made to go to zero as the block length goes to infinity. In practical situations, there are limitations to the delay of the communication and the block length must be finite. Therefore, it is important to study how the probability of error drops as the block length go to infinity.
Error exponent in channel coding
For time-invariant DMC's
The channel coding theorem states that for any ε > 0 and for any rate less than the channel capacity, there is an encoding and decoding scheme that can be used to ensure that the probability of block error is less than ε > 0 for sufficiently long message block X. Also, for any rate greater than the channel capacity, the probability of block error at the receiver goes to one as the block length goes to infinity.
Assuming a channel coding setup as follows: the channel can transmit any of  messages, by transmitting the corresponding codeword (which is of length n). Each component in the codebook is drawn i.i.d. according to some probability distribution with probability mass function Q. At the decoding end, maximum likelihood decoding is done.
 messages, by transmitting the corresponding codeword (which is of length n). Each component in the codebook is drawn i.i.d. according to some probability distribution with probability mass function Q. At the decoding end, maximum likelihood decoding is done.
Let  be the
 be the  th random codeword in the codebook, where
th random codeword in the codebook, where  goes from
 goes from  to
 to  . Suppose the first message is selected, so codeword
. Suppose the first message is selected, so codeword  is transmitted. Given that
 is transmitted. Given that  is received, the probability that the codeword is incorrectly detected as
 is received, the probability that the codeword is incorrectly detected as  is:
 is:
 
The function  has upper bound
 has upper bound
 
for  Thus,
 Thus,
 
Since there are a total of M messages, and the entries in the codebook are i.i.d., the probability that  is confused with any other message is
 is confused with any other message is  times the above expression. Using the union bound, the probability of confusing
 times the above expression. Using the union bound, the probability of confusing  with any message is bounded by:
 with any message is bounded by:
 
for any  . Averaging over all combinations of
. Averaging over all combinations of  :
:
![{\displaystyle P_{\mathrm {error} \ 1\to \mathrm {any} }\leq M^{\rho }\sum _{y_{1}^{n}}\left(\sum _{x_{1}^{n}}Q(x_{1}^{n})[p(y_{1}^{n}\mid x_{1}^{n})]^{1-s\rho }\right)\left(\sum _{x_{2}^{n}}Q(x_{2}^{n})[p(y_{1}^{n}\mid x_{2}^{n})]^{s}\right)^{\rho }.}](./_assets_/4c80e6f8a936e63538f616d5203ef62494cb5c2f.svg) 
Choosing  and combining the two sums over
 and combining the two sums over  in the above formula:
 in the above formula:
![{\displaystyle P_{\mathrm {error} \ 1\to \mathrm {any} }\leq M^{\rho }\sum _{y_{1}^{n}}\left(\sum _{x_{1}^{n}}Q(x_{1}^{n})[p(y_{1}^{n}\mid x_{1}^{n})]^{\frac {1}{1+\rho }}\right)^{1+\rho }.}](./_assets_/47e585bc57bb56d410e23c12a6d2b04f75dc1415.svg) 
Using the independence nature of the elements of the codeword, and the discrete memoryless nature of the channel:
![{\displaystyle P_{\mathrm {error} \ 1\to \mathrm {any} }\leq M^{\rho }\prod _{i=1}^{n}\sum _{y_{i}}\left(\sum _{x_{i}}Q_{i}(x_{i})[p_{i}(y_{i}\mid x_{i})]^{\frac {1}{1+\rho }}\right)^{1+\rho }}](./_assets_/d9d6a1c7968f08d5731ea2f63747a9ffdb07ac9e.svg) 
Using the fact that each element of codeword is identically distributed and thus stationary:
![{\displaystyle P_{\mathrm {error} \ 1\to \mathrm {any} }\leq M^{\rho }\left(\sum _{y}\left(\sum _{x}Q(x)[p(y\mid x)]^{\frac {1}{1+\rho }}\right)^{1+\rho }\right)^{n}.}](./_assets_/6b1c86e37afc9dc15e6007ba4c6257c2de860a66.svg) 
Replacing M by 2nR and defining
![{\displaystyle E_{o}(\rho ,Q)=-\ln \left(\sum _{y}\left(\sum _{x}Q(x)[p(y\mid x)]^{1/(1+\rho )}\right)^{1+\rho }\right),}](./_assets_/6bd7c00e52e92af68e0f30d3ef6ea1e8363ddd3b.svg) 
probability of error becomes
 
Q and  should be chosen so that the bound is tighest. Thus, the error exponent can be defined as
 should be chosen so that the bound is tighest. Thus, the error exponent can be defined as
![{\displaystyle E_{r}(R)=\max _{Q}\max _{\rho \in [0,1]}E_{o}(\rho ,Q)-\rho R.\;}](./_assets_/748bec95dc126d3630cce2d744c88f1df2f64fe7.svg) 
Error exponent in source coding
For time invariant discrete memoryless sources
The source coding theorem states that for any  and any discrete-time i.i.d. source such as
 and any discrete-time i.i.d. source such as  and for any rate less than the entropy of the source, there is large enough
 and for any rate less than the entropy of the source, there is large enough  and an encoder that takes
 and an encoder that takes  i.i.d. repetition of the source,
 i.i.d. repetition of the source,  , and maps it to
, and maps it to  binary bits such that the source symbols
 binary bits such that the source symbols  are recoverable from the binary bits with probability at least
 are recoverable from the binary bits with probability at least  .
.
Let  be the total number of possible messages. Next map each of the possible source output sequences to one of the messages randomly using a uniform distribution and independently from everything else. When a source is generated the corresponding message
 be the total number of possible messages. Next map each of the possible source output sequences to one of the messages randomly using a uniform distribution and independently from everything else. When a source is generated the corresponding message  is then transmitted to the destination. The message gets decoded to one of the possible source strings. In order to minimize the probability of error the decoder will decode to the source sequence
 is then transmitted to the destination. The message gets decoded to one of the possible source strings. In order to minimize the probability of error the decoder will decode to the source sequence  that maximizes
 that maximizes  , where
, where  denotes the event that message
 denotes the event that message  was transmitted. This rule is equivalent to finding the source sequence
 was transmitted. This rule is equivalent to finding the source sequence  among the set of source sequences that map to message
 among the set of source sequences that map to message  that maximizes
 that maximizes  . This reduction follows from the fact that the messages were assigned randomly and independently of everything else.
. This reduction follows from the fact that the messages were assigned randomly and independently of everything else.
Thus, as an example of when an error occurs, supposed that the source sequence  was mapped to message
 was mapped to message  as was the source sequence
 as was the source sequence  . If
. If  was generated at the source, but
 was generated at the source, but  then an error occurs.
 then an error occurs.
Let  denote the event that the source sequence
 denote the event that the source sequence  was generated at the source, so that
 was generated at the source, so that  Then the probability of error can be broken down as
 Then the probability of error can be broken down as  Thus, attention can be focused on finding an upper bound to the
 Thus, attention can be focused on finding an upper bound to the  .
.
Let  denote the event that the source sequence
 denote the event that the source sequence  was mapped to the same message as the source sequence
 was mapped to the same message as the source sequence  and that
 and that  . Thus, letting
. Thus, letting  denote the event that the two source sequences
 denote the event that the two source sequences  and
 and  map to the same message, we have that
 map to the same message, we have that
 
and using the fact that  and is independent of everything else have that
 and is independent of everything else have that
 
A simple upper bound for the term on the left can be established as
![{\displaystyle \left[P(P(X_{1}^{n}(i'))\geq P(X_{1}^{n}(i)))\right]\leq \left({\frac {P(X_{1}^{n}(i'))}{P(X_{1}^{n}(i))}}\right)^{s}\,}](./_assets_/42d81f364b8577381d1d0c2720eb23b64e1f44b2.svg) 
for some arbitrary real number  This upper bound can be verified by noting that
 This upper bound can be verified by noting that  either equals
 either equals  or
 or  because the probabilities of a given input sequence are completely deterministic. Thus, if
 because the probabilities of a given input sequence are completely deterministic. Thus, if  then
 then  so that the inequality holds in that case. The inequality holds in the other case as well because
 so that the inequality holds in that case. The inequality holds in the other case as well because
 
for all possible source strings. Thus, combining everything and introducing some ![{\displaystyle \rho \in [0,1]\,}](./_assets_/3b3530f01a767a0b2ad9364b732d1128ce796ffb.svg) , have that
, have that
 
Where the inequalities follow from a variation on the Union Bound. Finally applying this upper bound to the summation for  have that:
 have that:
 
Where the sum can now be taken over all  because that will only increase the bound. Ultimately yielding that
 because that will only increase the bound. Ultimately yielding that
 
Now for simplicity let  so that
 so that  Substituting this new value of
 Substituting this new value of  into the above bound on the probability of error and using the fact that
 into the above bound on the probability of error and using the fact that  is just a dummy variable in the sum gives the following as an upper bound on the probability of error:
 is just a dummy variable in the sum gives the following as an upper bound on the probability of error:
 
 and each of the components of and each of the components of are independent. Thus, simplifying the above equation yields are independent. Thus, simplifying the above equation yields
![{\displaystyle P(E)\leq \exp \left(-n\left[\rho R-\ln \left(\sum _{x_{i}}P(x_{i})^{\frac {1}{1+\rho }}\right)(1+\rho )\right]\right).}](./_assets_/92fb3aa7f9ec867c0ae54cb611b5b54671029ff4.svg) 
The term in the exponent should be maximized over  in order to achieve the highest upper bound on the probability of error.
 in order to achieve the highest upper bound on the probability of error.
Letting  see that the error exponent for the source coding case is:
 see that the error exponent for the source coding case is:
![{\displaystyle E_{r}(R)=\max _{\rho \in [0,1]}\left[\rho R-E_{0}(\rho )\right].\,}](./_assets_/fec9cb0211c0e4bd31cb9c32342dc8dcec163caf.svg) 
See also
References
R. Gallager, Information Theory and Reliable Communication, Wiley 1968