Next:Mutual InformationUp:Shannon's Entropy
Previous:Properties of Shannon's Entropy Go to:Table of Contents

Multivariate Entropies

Let $X=\{ x_1,x_2,...,x_n \}$ and $Y=\{y_1,y_2,...,y_n \}$ be two discrete finite random variables with joint and individual probability distributions given by

$\displaystyle p(x_i,y_j)=Pr\{ X=x_i,Y=y_j \},\ p(x_i,y_j)\geq 0,\\sum_{i=1}^n{\sum_{j=1}^m{p(x_i,y_j)}}=1,$ $\displaystyle p(x_i)=Pr\{ X=x_i\},\ p(x_i)\geq 0,\ \sum_{i=1}^n{p(x_i)}=1,$ and $\displaystyle p(y_j)=Pr\{Y=y_j \},\ p(y_j)\geq 0,\ \sum_{i=1}^n{p(y_j)}=1.$ The conditional probability of

given

is given by $\displaystyle p(y_j\vert x_i)=Pr\{ Y=y_j\vert X=x_i \},\ p(y_j\vert x_i)\geq 0 \ {\rm for \each} \ i$ with $\displaystyle \sum_{j=1}^m{p(y_j\vert x_i)}=1, \ \forall\i=1,2,...,n.$ The conditional probability of

given

is given by $\displaystyle p(x_i\vert y_j)=Pr\{ X=x_i\vert Y=y_j \},\p(x_i\vert y_j)\geq 0 \ {\rm for \ each} \ j$ with $\displaystyle \sum_{i=1}^n{p(x_i\vert y_j)}=1, \ \forall\ j=1,2,...,m.$ The following relations are well known in the literature:

$\displaystyle p(x_i,y_j)=p(x_i)p(y_j\vert x_i)=p(y_j)p(x_i\vert y_j),$

$\displaystyle p(x_i)=\sum_{j=1}^m{p(x_i,y_j)},$ and $\displaystyle p(y_j)=\sum_{i=1}^n{p(x_i,y_j)},$ for each

When

and

are independent, we have

$\displaystyle p(x_i\vert y_j)=p(x_i);\ p(y_j\vert x_i)=p(y_j),$

and $\displaystyle p(x_i,y_j)=p(x_i)p(y_j),$ for each $i=1,2,...,n; \ j=1,2,...m.$ Based on the above notations, we give now the joint, individual and conditional measures of uncertainty. The joint measure of uncertainty of

is given by $\displaystyle H(X,Y)=-\sum_{i=1}^n{\sum_{j=1}^m{p(x_i,y_j)\log p(x_i,y_j)}}.$ The individual measures of uncertainty of

and

are given by

$\displaystyle H(X)=-\sum_{i=1}^n{p(x_i)\logp(x_i)}=-\sum_{i=1}^n{\sum_{j=1}^m{p(x_i,y_j)\log p(x_i)}},$

and $\displaystyle H(Y)=-\sum_{j=1}^m{p(y_j)\logp(y_j)}=-\sum_{i=1}^n{\sum_{j=1}^m{p(x_i,y_j)\log p(y_j)}},$ respectively The conditional uncertainty of Y given

is given by $\displaystyle H(Y\vert X=x_i)=-\sum_{j=1}^m{p(y_j\vert x_i) \log p(y_j\vert x_i)},$ for each

. The conditional uncertainty of

given

is the average uncertainty of $H(Y\vert X=x_i)$ with the probabilities

is given by $\displaystyle H(Y\vert X)=\sum_{i=1}^n{p(x_i)H(Y\vert X=x_i)}=-\sum_{i=1}^n{\sum_{j=1}^m{p(x_i,y_j)\logp(y_j\vert x_i).}}$ Similarly, we can write the conditional uncertainty of

given

as $\displaystyle H(X\vert Y)=\sum_{j=1}^m{p(y_j)H(X\vert Y=y_j)}=-\sum_{i=1}^n{\sum_{j=1}^m{p(x_i,y_j)\logp(x_i\vert y_j).}}$ In case of three random variables $X=\{x_1,...,x_n \}, \ Y=\{ y_1,...,y_m \}$ and $Z=\{ z_1,...,z_l \}$ with their respective probability distributions, we have the following measures of uncertainty

$\displaystyle H(X,Y,Z) =-\sum_{i=1}^{n}\sum_{j=1}^{m}\sum_{k=1}^{l}p(x_i,y_j,z_k)\logp(x_i,y_j,z_k)$

$\displaystyle H(X)=-\sum_{i=1}^{n}\sum_{j=1}^{m}\sum_{k=1}^{l}p(x_i,y_j,z_k)\logp(x_i)$ $\displaystyle H(X\vert Y) =-\sum_{i=1}^{n}\sum_{j=1}^{m}\sum_{k=1}^{l}p(x_i,y_j,z_k)\logp(x_i\vert y_j)$ $\displaystyle H(X,Y) =-\sum_{i=1}^{n}\sum_{j=1}^{m}\sum_{k=1}^{l}p(x_i,y_j,z_k)\logp(x_i,y_j)$ $\displaystyle H(X,Y\vert Z) = \sum_{k=1}^{l}p(z_k)H(X,Y\vert Z=z_k)= -\sum_{i=1}^{n}\sum_{j=1}^{m}\sum_{k=1}^{l}p(x_i,y_j,z_k)\log p(x_i,y_j\vert z_k)$ $\displaystyle H(X\vert Y,Z) =\sum_{j=1}^{m}\sum_{k=1}^{l}p(y_j,z_k)H(X\vert Y=y_j,Z=z_k)$ $\displaystyle =-\sum_{i=1}^{n}\sum_{j=1}^{m} \sum_{k=1}^{l}p(x_i,y_j,z_k)\logp(x_i\vert y_j,z_k)$ etc..

The following properties hold for the above uncertainty measures.

Property 1.38. We have

(i) $H(X) \geq 0, \ H(X,Y) \geq 0,\ H(X,Y,Z) \geq 0.$

(ii) $H(X\vert Y) \geq 0, \ H(X\vert Y,Z) \geq 0, \ H(X,Y\vert Z) \geq 0.$

Property 1.39. We have

(i) $H(X,Y)=H(X)+H(Y\vert X)=H(Y)+H(X\vert Y).$

(ii) $H(X,Y,Z) = H(X)+H(Y,Z\vert X)$ $= H(X,Y)+H(Z\vert X,Y)$ $= H(X)+H(Y\vert X)+H(Z\vert X,Y).$

Property 1.40. We have

(i) $H(X\vert Y) \leq H(X)$ , with equality iff

and

are independent i.e.,

$\forall \,\, i,j.$

(ii) $H(X\vert Y,Z) \leq H(X\vert Z)$ , with equality iff

and

are conditionally independent given

i.e., $p(x_i,y_j\vert z_k)=p(x_i\vert z_k)p(y_j\vert z_k),\, \forall \,\, i,j$ and each

(iii) $H(X,Y\vert Z) \leq H(X,Y)$ , with equality iff

and

are independent i.e., $p(x_i,y_j,z_k)=p(x_i,y_j)p(z_k), \, \forall\,\, i,j,k.$

Note 1.2. Since the random variables and are symmetric among them, then from the property 1.40(ii), we can write

$\displaystyle H(X\vert Y,Z) \leq max \{H(X\vert Y),H(X\vert Z)\}.$

Property 1.41. We have

(i) $H(X,Y) \geq max\{H(X),H(Y)\}.$

(ii) $H(X,Y,Z) \geq max\{H(X),H(Y),H(Z)\}.$

(iii) $H(X,Y,Z) \geq max\{H(X,Y),H(Y,Z),H(Z,X)\}.$

(iv) $H(X,Y\vert Z) \geq max\{H(X\vert Y),H(Y\vert Z)\}.$

Property 1.42. We have

(i) $H(X,Y) \leq H(X)+H(Y)$ , with equality iff

and

are independent i.e., $\ p(x_i,y_j)=p(x_i)p(y_j), \ \forall\ i,j.$

(ii) $H(X,Y,Z) \leq H(X)+H(Y)+H(Z)$ , with equality iff

and

are independent i.e., iff $p(x_i,y_j,z_k)=p(x_i)p(y_j)p(z_k), \\forall\ i,j,k.$

(iii) $H(X,Y\vert Z) \leq H(X\vert Z)+H(Y\vert Z)$ , with equality iff

and

are conditionally independent given

i.e., iff $p(x_i,y_j\vert z_k)=p(x_i\vert z_k)p(y_j\vert z_k),\forall\ i,j,k.$

Property 1.43. We have

(i) $H(X\vert Z) \leq H(X\vert Y)+H(Y\vert Z).$

(ii) If $H(\dot,\dot) \neq 0$ , then

$\displaystyle {H(X\vert Z)\over H(X,Z)} \leq {H(X\vert Y)\over H(X,Y)}+{H(Y\vert Z)\overH(Y,Z)}.$ Property 1.44. For each k, define $\displaystyle A(z_k)=\sum_{i=1}^n{\sum_{j=1}^m{p(y_j)p(z_k\vert x_i,y_j)}}.$ Then $\displaystyle H(X\vert Y) \leq H(Z)+ \sum_{k=1}^l{p(z_k)\log A(z_k)}.$ Property 1.45. Let $P_e=Pr\{X\not= Y\}$ . Then

$\displaystyle H(X\vert Y)\leq H(P_e)+ P_e \log (n-1).$

Note 1.3. The property 1.45 is famous as "Fano-inequality".

For four discrete random variables $X_1,\ X_2,\ X_3{\rm\ and \ } X_4$ the following property holds.

Property 1.46. We have

(i) $H(X_1,X_2,X_3,X_4)=H(X_1,X_2,X_3)+H(X_4\vert X_1,X_2,X_3)$ $=H(X_1,X_2) + H(X_3\vert X_1,X_2) + H(X_4\vert X_1,X_2,X_3)$ $=H(X_1) + H(X_2\vert X_1) + H(X_3\vert X_1,X_2) + H(X_4\vert X_1,X_2,X_3).$

(ii) $H(X_1,X_2\vert X_3,X_4)=H(X_1\vert X_3,X_4)+H(X_2\vert X_1,X_3,X_4).$

(iii) $H(X_1,X_2,X_3\vert X_4)=H(X_1,X_2\vert X_3,X_4)+H(X_3\vert X_1,X_2,X_4).$

(iv) $H(X_1,X_2,X_3,X_4)\geq max\{H(X_t)\}, t=1,2,3 {\rm\ and \ } 4.$

(v) $H(X_1,X_2,X_3,X_4)\geq max\{H(X_t,X_r)\}, t,r=1,2,3 {\rm\ and \ } 4.$

(vi) $H(X_1,X_2,X_3,X_4)\geq max\{H(X_t,X_r,X_p)\}, t,r,p=1,2,3 \ {\rm and }\ 4.$

(vii) $max\{H(X_1\vert X_2,X_3,X_4),H(X_2\vert X_1,X_3,X_4)\}$ $\leq H(X_1,X_2\vert X_3,X_4)$ $\leq min\{H(X_1,X_2,X_3\vert X_4),H(X_1,X_2,X_4\vert X_3)\}.$

(viii) $H(X_1,X_2\vert X_3,X_4) \leq H(X_1\vert X_3,X_4)+H(X_2\vert X_3,X_4).$

21-06-2001

Inder Jeet Taneja

Departamento de Matemática - UFSC

88.040-900 Florianópolis, SC - Brazil