Next:Entropy-Type MeasuresUp:Continuous Relative Information
Previous:Continuous Relative Information Go to:Table of Contents

Minimum Relative Information Principle


In Chapter 1, we saw that the "principle of maximum entropy" is applicable in situations where information becomes available in the form of expected value or constraints and we wish to determine the underlying probability density. This restriction on the form of available information limits the type of problem which we can consider; the principle being applicable only if the partial information is in the form of averages.

First, as example, consider a situation where we have partial information in the form of averages and also that the prior estimate of the probability distribution is say the Poisson distribution with a certain value of the parameter which is the mean. This problem cannot be handled by M.E.P. If we were to apply the M.E.P. we would not be able to use the partial information which is coded in the estimate of the unknown distribution.

This serious limitation of M.E.P. cannot in general be relaxed. However, the problem can be handled in situations when some prior knowledge about the underlying distribution is known in addition to the partial information in the form of averages. This leads to the minimum relative information principle given as follows:
 

When a prior distribution $ q(x)$, that estimates the underlying density $ p(x)$, is known in addition to some constraints; then of all the densities $ p(x)$ which satisfy the constraints we should select underlying probability density as that which yields the minimum relative information.


Given a prior density $ q(x)$, we wish to arrive at a density function $ p(x)$, when this underlying density function$ p(x)$ satisfies the usual probability constraint.

$\displaystyle \int_{I\!\!R}{p(x)\ dx} =1$
    (2.12)

and the partial informations in the form of averages

$\displaystyle \int_{I\!\!R}{f_k(x)p(x)\ dx} =\overline{f}_k, k=1,2,...,\vartheta$
    (2.13)

By the minimum P.R.I. $ p(x)$ is to be such that it minimizes (2.9) subject to the constraints (2.10) and (2.11).

Using Lagrange's method of multipliers, we have

$\displaystyle D(p\vert\vert q)-\lambda'_0-\sum_{k=1}^{\vartheta}{\lambda_k\overline{f}_k}$
$\displaystyle =\int_{I\!\!R}{p(x) \ell n {p(x)\over q(x)}\,dx}+ \lambda'_0\int_{I\!\!R}{p(x)\,dx}+\sum_{k=1}^{\vartheta}{\int_{I\!\!R}{f_k(x)p(x)\,dx}},$
    (2.14)

where $ \lambda'_0,\lambda_1,...,\lambda_{\vartheta}$ are constants.

Equating the variations of this quantity with respect to $ p$ to zero, we obtain

$\displaystyle \ell n {p(x)\over q(x)} +1+\lambda'_0+\sum_{i=1}^{\vartheta}{\lambda_kf_k(x)}=0.$
    (2.15)

This yields

$\displaystyle p(x)=q(x)exp\Big(-\lambda_0-\sum_{i=1}^{\vartheta}{\lambda_kf_k(x)}\Big),$
    (2.16)

where$ \lambda_0=1+\lambda'_0$. Thus we have

(i)$ D_{min}=-\lambda_0-\sum_{i=1}^{\vartheta}{\lambda_k\overline{f}_k}$.

(ii) $ {\partial\lambda_0\over\partial\lambda_k}=-\overline{f}_k$.

Next we show that $ p(x)$ obtained in (2.16) does indeed yield the M.R.I. and that it is unique.

The following property give some examples of the minimum relative information principle including the discrete case too.

Property 2.18. We have

(i) For the prior distribution
$\displaystyle q_k={n \choose k}\bigg({1\over 2}\bigg)^n, \,k=0,1,2,...,n,$
under the constraint
$\displaystyle p_k\geq 0, \sum_{k=0}^n{p_k}=1,$
the minimizing relative information distribution given by
$\displaystyle p_k={n \choose k}b^k(1-b)^{n-k},\, k=0,1,2,...,n.$
is the Binomial distribution.
(ii) For the prior distribution
$\displaystyle q_k={1\over k!}exp(-1),\, k=0,1,2,...$
under the constraints
$\displaystyle p_k\geq 0,\ \sum_{k=0}^{\infty}{p_k}=1,$
and
$\displaystyle \sum_{k=0}^{\infty}{k\, p_k}=m,$
the minimizing relative information distribution given by 
$\displaystyle p_k ={m^k exp(-m)\over k!},\,k=0,1,2,...$
is the Poisson distribution.
(iii)($ a_1$) For the prior distribution
$\displaystyle q(x)={1\over \sigma\sqrt{2\pi}}exp \Big[-{(x-y)^2\over2\sigma^2}\Big],$
under the constraints:
$\displaystyle \int_{-\infty}^{\infty}{p(x)\, dx}=1,$
and
$\displaystyle \int_{-\infty}^{\infty}{xp(x)\, dx}=a,\ a\neq y,$
the minimizing relative information distribution given by
$\displaystyle p(x)={1\over \sigma\sqrt{2\pi}}exp \Big[-{(x-b)^2\over2\sigma^2}\Big],$
is the normal distribution.
(iii)($ a_2$) For the prior distribution
$\displaystyle q(x)={1\over \sigma\sqrt{2\pi}}exp \Big[-{(x-y)^2\over2\sigma^2}\Big],$
under the constraints:
$\displaystyle \int_{-\infty}^{\infty}{p(x)\, dx}=1,$
and
$\displaystyle \int_{-\infty}^{\infty}{(x-y)^2p(x)\, dx}=a^2,\ a\neq\sigma^2,$
the minimizing relative information distribution given by
$\displaystyle p(x)={1\over a\sqrt{2\pi}}exp \Big[-{(x-y)^2\over 2a^2}\Big],$
is the normal distribution.
(iii)($ a_3$) For the prior distribution
$\displaystyle q(x)={1\over \sigma\sqrt{2\pi}}exp \Big[-{(x-y)^2\over2\sigma^2}\Big],$
under the constraints:
$\displaystyle \int_{-\infty}^{\infty}{p(x)\, dx}=1,$
$\displaystyle \int_{-\infty}^{\infty}{xp(x)\, dx}=b,$
and
$\displaystyle \int_{-\infty}^{\infty}{x^2p(x)\, dx}=m^2,\ $   where$\displaystyle \m^2=a^2+b^2,$
the minimizing relative information distribution given by 
$\displaystyle p(x)={1\over a\sqrt{2\pi}}exp \Big[-{(x-b)^2\over2a^2}\Big],$
is the normal distribution.
(iv)($ a_1$) For the prior distribution
$\displaystyle q(x)=a\ exp (-ax),\ 0<x<\infty,$
under the constraints:
$\displaystyle \int_{0}^{\infty}{p(x)\, dx}=1,$
and 
$\displaystyle \int_{0}^{\infty}{xp(x)\, dx}={1\over b},$
the minimizing relative information distribution given by 
$\displaystyle p(x)=b\ exp (-bx),$
is the exponential distribution.
(iv)($ a_2$) For the prior distribution
$\displaystyle q(x)=a\ exp (-ax),\ 0<x<\infty,$
under the constraints
$\displaystyle \int_{0}^{\infty}{p(x)\, dx}=1,$
and 
$\displaystyle \int_{0}^{\infty}{(\elln x) p(x)\, dx}= b,$
the minimizing relative information distribution given by 
$\displaystyle p(x)={x^{\alpha -1}\beta^{\alpha} exp(-\beta x)\over \Gamma(\alpha)},$
is the gamma distribution.
(v)($ a_1$) For the prior distribution
$\displaystyle q(x)={x^{\alpha -1}\beta^{\alpha} exp (-\beta x)\over\Gamma(\alpha)},$
under the constraints
$\displaystyle \int_{0}^{\infty}{p(x)\, dx}=1,$
and 
$\displaystyle \int_{0}^{\infty}{(\elln x) p(x)\, dx}= b,$
the minimizing relative information distribution given by 
$\displaystyle p(x)={x^{\alpha +\lambda -1}\beta^{\alpha+\lambda} exp (-\beta x)\over \Gamma(\alpha + \lambda)},$
is the gamma distribution.
(v)($ a_2$) For the prior distribution
$\displaystyle q(x)={x^{\alpha -1}\beta^{\alpha} exp (-\beta x)\over\Gamma(\alpha)},$
under the constraints
$\displaystyle \int_{0}^{\infty}{p(x)\, dx}=1,$
and 
$\displaystyle \int_{0}^{\infty}{xp(x)\, dx}={1\over b},$
the minimizing relative information distribution given by 
$\displaystyle p(x)={(\alpha b)^{\alpha}x^{\alpha -1} exp(-\alpha b x)\over \Gamma(\alpha)},$
is the gamma distribution.
(vi) For the prior distribution
$\displaystyle q(x)=mx^{m-1},$
under the constraints
$\displaystyle \int_{0}^{\infty}{p(x)\, dx}=1,$
and
$\displaystyle \int_{0}^{\infty}{ \ell n (1-x)\ p(x)\, dx}= a,$
the minimizing relative information distribution given by
$\displaystyle p(x)={x^{m-1}(1-x)^{\lambda} \over B(m,\lambda -1)},\ \lambda >1.$
is the Beta distribution.
(vii) For the prior distribution
$\displaystyle q(x)= \left\{ \begin{array}{ll}{\alpha \over x_0}\Big({x_0\over......},& x_0 < x <\infty, \alpha >0\\  0, & \mbox{otherwise}\end{array} \right. $
under the constraints 
$\displaystyle \int_{x_0}^{\infty}{p(x)\, dx}=1,$
and
$\displaystyle \int_{x_0}^{\infty}{(\ell n x) p(x)\, dx}= m,$
the minimizing relative information distribution given by 
$\displaystyle p(x)={\alpha \overx_0}\Big({x_0\over x}\Big)^{\alpha +1}x^{-\beta}exp(-\lambda),$
is the Pareto distribution.

21-06-2001
Inder Jeet Taneja
Departamento de Matemática - UFSC
88.040-900 Florianópolis, SC - Brazil