In probability theory, a Markov kernel (also known as a stochastic kernel or probability kernel) is a map that in the general theory of Markov processes plays the role that the transition matrix does in the theory of Markov processes with a finite state space.[1]
Let  and
 and  be measurable spaces. A Markov kernel with source
 be measurable spaces. A Markov kernel with source  and target
 and target  , sometimes written as
, sometimes written as  , is a function
, is a function ![{\displaystyle \kappa :{\mathcal {B}}\times X\to [0,1]}](./_assets_/6e14346f77cafa02a3bb9f0e58fd64cbef2fadc5.svg) with the following properties:
 with the following properties: 
- For every (fixed)  , the map , the map is is -measurable -measurable
- For every (fixed)  , the map , the map is a probability measure on is a probability measure on 
In other words it associates to each point  a probability measure
 a probability measure  on
 on  such that, for every measurable set
 such that, for every measurable set  , the map
, the map  is measurable with respect to the
 is measurable with respect to the  -algebra
-algebra  .[2]
.[2]
Examples
Take  , and
, and  (the power set of
 (the power set of  ). Then a Markov kernel is fully determined by the probability it assigns to singletons
). Then a Markov kernel is fully determined by the probability it assigns to singletons  for each
 for each  :
:
 . .
Now the random walk   that goes to the right with probability
  that goes to the right with probability  and to the left with probability
  and to the left with probability  is defined by
 is defined by 
 
where  is the Kronecker delta. The transition probabilities
 is the Kronecker delta. The transition probabilities  for the random walk are equivalent to the Markov kernel.
 for the random walk are equivalent to the Markov kernel.
More generally take  and
 and  both countable and
 both countable and  . 
Again a Markov kernel is defined by the probability it assigns to singleton sets for each
. 
Again a Markov kernel is defined by the probability it assigns to singleton sets for each  
 , ,
We define a Markov process by defining a transition probability  where the numbers
 where the numbers  define a (countable) stochastic matrix
 define a (countable) stochastic matrix  i.e.
 i.e. 
 
We then define 
 . .
Again the transition probability, the stochastic matrix and the Markov kernel are equivalent reformulations.
Markov kernel defined by a kernel function and a measure
Let  be a measure on
 be a measure on  , and
, and ![{\displaystyle k:Y\times X\to [0,\infty ]}](./_assets_/cc0c8e93b08e6b794aafb6d0addfb0a89682844f.svg) a measurable function with respect to the product
 a measurable function with respect to the product  -algebra
-algebra  such that
 such that 
 , ,
then  i.e. the mapping
 i.e. the mapping 
![{\displaystyle {\begin{cases}\kappa :{\mathcal {B}}\times X\to [0,1]\\\kappa (B|x)=\int _{B}k(y,x)\nu (\mathrm {d} y)\end{cases}}}](./_assets_/108781193dc73ba702e77da371698684096dec55.svg) 
defines a Markov kernel.[3] This example generalises the countable Markov process example where  was the counting measure. Moreover it encompasses other important examples such as the convolution kernels, in particular the Markov kernels defined by the heat equation. The latter example includes the Gaussian kernel on
 was the counting measure. Moreover it encompasses other important examples such as the convolution kernels, in particular the Markov kernels defined by the heat equation. The latter example includes the Gaussian kernel on  with
 with  standard Lebesgue measure and
 standard Lebesgue measure and 
 . .
Measurable functions
Take  and
 and  arbitrary measurable spaces, and let
 arbitrary measurable spaces, and let  be a measurable function. Now define
 be a measurable function. Now define  i.e.
 i.e. 
 for all for all . .
Note that the indicator function  is
 is  -measurable for all
-measurable for all  iff
 iff  is measurable.
 is measurable.   
This example allows us to think of a Markov kernel as a generalised function with a (in general) random rather than certain value. That is, it is a multivalued function where the values are not equally weighted.
As a less obvious example, take  , and
, and  the real numbers
 the real numbers  with the standard sigma algebra of Borel sets.  Then
 with the standard sigma algebra of Borel sets.  Then
 
where  is the number of element at the state
 is the number of element at the state  ,
,  are i.i.d. random variables (usually with mean 0) and where
  are i.i.d. random variables (usually with mean 0) and where  is the indicator function. For the simple case of coin flips this models the different levels of a Galton board.
 is the indicator function. For the simple case of coin flips this models the different levels of a Galton board.
Composition of Markov Kernels
Given measurable spaces  ,
,   we consider a Markov kernel
 we consider a Markov kernel ![{\displaystyle \kappa :{\mathcal {B}}\times X\to [0,1]}](./_assets_/6e14346f77cafa02a3bb9f0e58fd64cbef2fadc5.svg) as a morphism
 as a morphism  . Intuitively, rather than assigning to each
. Intuitively, rather than assigning to each  a sharply defined point
 a sharply defined point  the kernel assigns a "fuzzy" point in
 the kernel assigns a "fuzzy" point in  which is only known  with some level of uncertainty, much like actual physical measurements. If we have a third measurable space
 which is only known  with some level of uncertainty, much like actual physical measurements. If we have a third measurable space  , and probability kernels
, and probability kernels  and
 and  , we can define a composition
, we can define a composition  by the Chapman-Kolmogorov equation
 by the Chapman-Kolmogorov equation
 . .
The composition is associative by the Monotone Convergence Theorem and the identity function considered as a Markov kernel (i.e. the delta measure   ) is the unit for this composition.
) is the unit for this composition. 
This composition defines the structure of a category on the measurable spaces with Markov kernels as morphisms, first defined by Lawvere,[4] the category of Markov kernels.
Probability Space defined by Probability Distribution and a Markov Kernel
A composition of a probability space  and a probability kernel
 and a probability kernel  defines a probability space
 defines a probability space  , where the probability measure is given by
, where the probability measure is given by
 
Properties
Semidirect product
Let  be a probability space and
 be a probability space and   a Markov kernel from
 a Markov kernel from   to some
 to some  . Then there exists a unique measure
. Then there exists a unique measure   on
 on   , such that:
, such that:
 
Regular conditional distribution
Let  be a Borel space,
 be a Borel space,  a
 a  -valued random variable on the measure space
-valued random variable on the measure space  and
 and  a sub-
 a sub- -algebra. Then there exists a Markov kernel
-algebra. Then there exists a Markov kernel  from
 from  to
 to  ,  such that
,  such that  is a version of the conditional expectation
 is a version of the conditional expectation ![{\displaystyle \mathbb {E} [\mathbf {1} _{\{X\in B\}}\mid {\mathcal {G}}]}](./_assets_/ea3eb6fa30d760bf9cc72ef0e9d1a736f5df0a5c.svg) for every
 for every  , i.e.
, i.e.
![{\displaystyle P(X\in B\mid {\mathcal {G}})=\mathbb {E} \left[\mathbf {1} _{\{X\in B\}}\mid {\mathcal {G}}\right]=\kappa (\cdot ,B),\qquad P{\text{-a.s.}}\,\,\forall B\in {\mathcal {G}}.}](./_assets_/b6c55d8e1b610aff7692854fdf6dcb544680b221.svg) 
It is called regular conditional distribution of  given
 given  and is not uniquely defined.
 and is not uniquely defined.
Generalizations
Transition kernels generalize Markov kernels in the sense that for all  , the map
, the map
 
can be any type of (non negative) measure,  not necessarily a probability measure.
External links
References
- Bauer, Heinz (1996), Probability Theory, de Gruyter, ISBN 3-11-013935-9
- §36. Kernels and semigroups of kernels
 
See also