Disintegration theorem

Theorem in measure theory

In mathematics, the disintegration theorem is a result in measure theory and probability theory. It rigorously defines the idea of a non-trivial "restriction" of a measure to a measure zero subset of the measure space in question. It is related to the existence of conditional probability measures. In a sense, "disintegration" is the opposite process to the construction of a product measure.

Motivation

Consider the unit square S = [ 0 , 1 ] × [ 0 , 1 ] {\displaystyle S=[0,1]\times [0,1]} in the Euclidean plane R 2 {\displaystyle \mathbb {R} ^{2}} . Consider the probability measure μ {\displaystyle \mu } defined on S {\displaystyle S} by the restriction of two-dimensional Lebesgue measure λ 2 {\displaystyle \lambda ^{2}} to S {\displaystyle S} . That is, the probability of an event E S {\displaystyle E\subseteq S} is simply the area of E {\displaystyle E} . We assume E {\displaystyle E} is a measurable subset of S {\displaystyle S} .

Consider a one-dimensional subset of S {\displaystyle S} such as the line segment L x = { x } × [ 0 , 1 ] {\displaystyle L_{x}=\{x\}\times [0,1]} . L x {\displaystyle L_{x}} has μ {\displaystyle \mu } -measure zero; every subset of L x {\displaystyle L_{x}} is a μ {\displaystyle \mu } -null set; since the Lebesgue measure space is a complete measure space,

E L x μ ( E ) = 0. {\displaystyle E\subseteq L_{x}\implies \mu (E)=0.}

While true, this is somewhat unsatisfying. It would be nice to say that μ {\displaystyle \mu } "restricted to" L x {\displaystyle L_{x}} is the one-dimensional Lebesgue measure λ 1 {\displaystyle \lambda ^{1}} , rather than the zero measure. The probability of a "two-dimensional" event E {\displaystyle E} could then be obtained as an integral of the one-dimensional probabilities of the vertical "slices" E L x {\displaystyle E\cap L_{x}} : more formally, if μ x {\displaystyle \mu _{x}} denotes one-dimensional Lebesgue measure on L x {\displaystyle L_{x}} , then

μ ( E ) = [ 0 , 1 ] μ x ( E L x ) d x {\displaystyle \mu (E)=\int _{[0,1]}\mu _{x}(E\cap L_{x})\,\mathrm {d} x}
for any "nice" E S {\displaystyle E\subseteq S} . The disintegration theorem makes this argument rigorous in the context of measures on metric spaces.

Statement of the theorem

(Hereafter, P ( X ) {\displaystyle {\mathcal {P}}(X)} will denote the collection of Borel probability measures on a topological space ( X , T ) {\displaystyle (X,T)} .) The assumptions of the theorem are as follows:

  • Let Y {\displaystyle Y} and X {\displaystyle X} be two Radon spaces (i.e. a topological space such that every Borel probability measure on it is inner regular, e.g. separably metrizable spaces; in particular, every probability measure on it is outright a Radon measure).
  • Let μ P ( Y ) {\displaystyle \mu \in {\mathcal {P}}(Y)} .
  • Let π : Y X {\displaystyle \pi :Y\to X} be a Borel-measurable function. Here one should think of π {\displaystyle \pi } as a function to "disintegrate" Y {\displaystyle Y} , in the sense of partitioning Y {\displaystyle Y} into { π 1 ( x )   |   x X } {\displaystyle \{\pi ^{-1}(x)\ |\ x\in X\}} . For example, for the motivating example above, one can define π ( ( a , b ) ) = a {\displaystyle \pi ((a,b))=a} , ( a , b ) [ 0 , 1 ] × [ 0 , 1 ] {\displaystyle (a,b)\in [0,1]\times [0,1]} , which gives that π 1 ( a ) = a × [ 0 , 1 ] {\displaystyle \pi ^{-1}(a)=a\times [0,1]} , a slice we want to capture.
  • Let ν P ( X ) {\displaystyle \nu \in {\mathcal {P}}(X)} be the pushforward measure ν = π ( μ ) = μ π 1 {\displaystyle \nu =\pi _{*}(\mu )=\mu \circ \pi ^{-1}} . This measure provides the distribution of x {\displaystyle x} (which corresponds to the events π 1 ( x ) {\displaystyle \pi ^{-1}(x)} ).

The conclusion of the theorem: There exists a ν {\displaystyle \nu } -almost everywhere uniquely determined family of probability measures { μ x } x X P ( Y ) {\displaystyle \{\mu _{x}\}_{x\in X}\subseteq {\mathcal {P}}(Y)} , which provides a "disintegration" of μ {\displaystyle \mu } into { μ x } x X {\displaystyle \{\mu _{x}\}_{x\in X}} , such that:

  • the function x μ x {\displaystyle x\mapsto \mu _{x}} is Borel measurable, in the sense that x μ x ( B ) {\displaystyle x\mapsto \mu _{x}(B)} is a Borel-measurable function for each Borel-measurable set B Y {\displaystyle B\subseteq Y} ;
  • μ x {\displaystyle \mu _{x}} "lives on" the fiber π 1 ( x ) {\displaystyle \pi ^{-1}(x)} : for ν {\displaystyle \nu } -almost all x X {\displaystyle x\in X} ,
    μ x ( Y π 1 ( x ) ) = 0 , {\displaystyle \mu _{x}\left(Y\setminus \pi ^{-1}(x)\right)=0,}
    and so μ x ( E ) = μ x ( E π 1 ( x ) ) {\displaystyle \mu _{x}(E)=\mu _{x}(E\cap \pi ^{-1}(x))} ;
  • for every Borel-measurable function f : Y [ 0 , ] {\displaystyle f:Y\to [0,\infty ]} ,
    Y f ( y ) d μ ( y ) = X π 1 ( x ) f ( y ) d μ x ( y ) d ν ( x ) . {\displaystyle \int _{Y}f(y)\,\mathrm {d} \mu (y)=\int _{X}\int _{\pi ^{-1}(x)}f(y)\,\mathrm {d} \mu _{x}(y)\,\mathrm {d} \nu (x).}
    In particular, for any event E Y {\displaystyle E\subseteq Y} , taking f {\displaystyle f} to be the indicator function of E {\displaystyle E} ,[1]
    μ ( E ) = X μ x ( E ) d ν ( x ) . {\displaystyle \mu (E)=\int _{X}\mu _{x}(E)\,\mathrm {d} \nu (x).}

Applications

Product spaces

The original example was a special case of the problem of product spaces, to which the disintegration theorem applies.

When Y {\displaystyle Y} is written as a Cartesian product Y = X 1 × X 2 {\displaystyle Y=X_{1}\times X_{2}} and π i : Y X i {\displaystyle \pi _{i}:Y\to X_{i}} is the natural projection, then each fibre π 1 1 ( x 1 ) {\displaystyle \pi _{1}^{-1}(x_{1})} can be canonically identified with X 2 {\displaystyle X_{2}} and there exists a Borel family of probability measures { μ x 1 } x 1 X 1 {\displaystyle \{\mu _{x_{1}}\}_{x_{1}\in X_{1}}} in P ( X 2 ) {\displaystyle {\mathcal {P}}(X_{2})} (which is ( π 1 ) ( μ ) {\displaystyle (\pi _{1})_{*}(\mu )} -almost everywhere uniquely determined) such that

μ = X 1 μ x 1 μ ( π 1 1 ( d x 1 ) ) = X 1 μ x 1 d ( π 1 ) ( μ ) ( x 1 ) , {\displaystyle \mu =\int _{X_{1}}\mu _{x_{1}}\,\mu \left(\pi _{1}^{-1}(\mathrm {d} x_{1})\right)=\int _{X_{1}}\mu _{x_{1}}\,\mathrm {d} (\pi _{1})_{*}(\mu )(x_{1}),}
which is in particular[clarification needed]
X 1 × X 2 f ( x 1 , x 2 ) μ ( d x 1 , d x 2 ) = X 1 ( X 2 f ( x 1 , x 2 ) μ ( d x 2 x 1 ) ) μ ( π 1 1 ( d x 1 ) ) {\displaystyle \int _{X_{1}\times X_{2}}f(x_{1},x_{2})\,\mu (\mathrm {d} x_{1},\mathrm {d} x_{2})=\int _{X_{1}}\left(\int _{X_{2}}f(x_{1},x_{2})\mu (\mathrm {d} x_{2}\mid x_{1})\right)\mu \left(\pi _{1}^{-1}(\mathrm {d} x_{1})\right)}
and
μ ( A × B ) = A μ ( B x 1 ) μ ( π 1 1 ( d x 1 ) ) . {\displaystyle \mu (A\times B)=\int _{A}\mu \left(B\mid x_{1}\right)\,\mu \left(\pi _{1}^{-1}(\mathrm {d} x_{1})\right).}

The relation to conditional expectation is given by the identities

E ( f π 1 ) ( x 1 ) = X 2 f ( x 1 , x 2 ) μ ( d x 2 x 1 ) , {\displaystyle \operatorname {E} (f\mid \pi _{1})(x_{1})=\int _{X_{2}}f(x_{1},x_{2})\mu (\mathrm {d} x_{2}\mid x_{1}),}
μ ( A × B π 1 ) ( x 1 ) = 1 A ( x 1 ) μ ( B x 1 ) . {\displaystyle \mu (A\times B\mid \pi _{1})(x_{1})=1_{A}(x_{1})\cdot \mu (B\mid x_{1}).}

Vector calculus

The disintegration theorem can also be seen as justifying the use of a "restricted" measure in vector calculus. For instance, in Stokes' theorem as applied to a vector field flowing through a compact surface Σ R 3 {\displaystyle \Sigma \subset \mathbb {R} ^{3}} , it is implicit that the "correct" measure on Σ {\displaystyle \Sigma } is the disintegration of three-dimensional Lebesgue measure λ 3 {\displaystyle \lambda ^{3}} on Σ {\displaystyle \Sigma } , and that the disintegration of this measure on ∂Σ is the same as the disintegration of λ 3 {\displaystyle \lambda ^{3}} on Σ {\displaystyle \partial \Sigma } .[2]

Conditional distributions

The disintegration theorem can be applied to give a rigorous treatment of conditional probability distributions in statistics, while avoiding purely abstract formulations of conditional probability.[3]

See also

References

  1. ^ Dellacherie, C.; Meyer, P.-A. (1978). Probabilities and Potential. North-Holland Mathematics Studies. Amsterdam: North-Holland. ISBN 0-7204-0701-X.
  2. ^ Ambrosio, L.; Gigli, N.; Savaré, G. (2005). Gradient Flows in Metric Spaces and in the Space of Probability Measures. ETH Zürich, Birkhäuser Verlag, Basel. ISBN 978-3-7643-2428-5.
  3. ^ Chang, J.T.; Pollard, D. (1997). "Conditioning as disintegration" (PDF). Statistica Neerlandica. 51 (3): 287. CiteSeerX 10.1.1.55.7544. doi:10.1111/1467-9574.00056. S2CID 16749932.
  • v
  • t
  • e
Basic conceptsSetsTypes of MeasuresParticular measuresMapsMain resultsOther results
For Lebesgue measure
Applications & related