Next:Shannon's EntropyUp:book

# Chapter 0

• Introduction

Information theory is a relatively new branch of mathematics that was made mathematically rigorous only in the 1940s. The term 'information theory' does not possess a unique definition. Broadly speaking, informatin theory deals with the study of problems concerning any system. This includes information processing, information storage, informaion retrival and decision making. In a narrow sense, information theory studies all theoretical problems connected with the transmission of information over communication channels. This includes the study of uncertainty (information) measures and various practical and economical methods of coding information for transmission.

The first studies in this direction were undertaken by Nyquist in 1924 [75] and 1928 [76]and by Hartley in 1928 [44] who recognized the logarithmic nature of the measure of information. In 1948, Shannon [86] published a remarkable paper on the properties of information sources and of the communication channels used to transmit the outputs of these sources. Around the same time Wiener (1948) [120] also considered the communication situation and came up, independently, with results similar to those of Shannon.

Both Shannon and Wiener considered the communication situation as one in which a signal, chosen from a specified class, is to be transmitted through a channel. The output of the channel is described statically by each permissible input. The basic problem of communication is to reconstruct as closely as possible the input signal after observing the received signal at the output.

However, the approach used by Shannon differs from that of Wiener in the nature of the transmitted signal and in the type of decision made at the receiver. In the Shannon model messages are first encoded and then transmitted, whereas in the Wiener model the signal is communicated directly through the channel without being encoded.

In the past fifty years the literature on information theory has grown quite voluminous and apart from communication theory it has found deep applications in many social, physical and biological sciences, for example, economics, statistics, accounting, language, psychology, ecology, pattern recognition, computer sciences, fuzzy sets, etc..

A key feature of Shannon information theory is the term "information" that can often be given a mathematical meaning as a numerically measurable quantity, on the basis of a probabilistic model, in such a way that the solutions of many important problems of information storage and the transmission can be formulated in terms of this measure of the amount of information. This important measure has a very concrete operational interpretation: it is roughly equals the minimum number of binary digits needed, on the average, to encode the message in question. The coding theorems of information theory provide such overwhelming evidence for the adequateness of the Shannon information measure that to look for essentially different measures of information might appear to make no sense at all. Moreover, it has been shown by several authors, starting with Shannon (1948) [86], that the measure of amount of information is uniquely determined by some rather natural postulates. Still, all the evidence that the Shannon information measure is the only possible one, is valid only within restricted scope of coding problems considered by Shannon. As pointed out by Rényi (1961) [82] in his fundamental paper on generalized information measures, in other sort of problems other quantities may serve just as well, or even better, as measures of information. This should be supported either by their operational significance or by a set of natural postulates characterizing them, or, preferably, by both. Thus the idea of generalized entropies arises in the literature. It started with Rényi (1961) [82] who characterized a scaler parametric entropy as entropy of order , which includes Shannon entropy as a limiting case.

On the other side, Kullback and Leiber in 1951 [65] studied a measure of information from statistical aspects of view, involving two probability distributions associated with the same experiment, calling discrimination function, later different authors named as cross entropy, relative information, etc.. At the same time Kullback and Leiber also studied a divergence measure, calling divergence, the measure already studied by Jeffreys in 1946 [49]. Kerrige in 1961 [63] studied a different kind of measure,calling inaccuracy measure, involving again two probability distributions. Sibson in 1969 [95] studied another divergence measure involving two probability distributions, using mainly the concavity property of Shannon's entropy, calling information radius. Later Burbea and Rao in 1982 [18],[19] studied extensively the information radius and its parametric generalization, calling this measure as Jensen difference divergence measure. Thus, the Shannon's entropy, the Kullback-Leibler's relative information, the Kerridge's inaccurary, the Jeffreys invariant(ordivergence) and Sibson's information radius are the five classical measures of information associated with one and two probability distributions. These five classical measures have found deep applications in the areas of information theory and statistics. During the past years various measures have been introduced in the literature generalizing these measures. These generalizations include one and two scalar parameters. Taneja in 1995 [108] studied a new measure of divergence and its two parametric generalizations involving two probability distributions based on arithmetic and geometric mean inequaity.

The aim of this book is to study these generalized information and divergence measures in the unified forms, and then apply them towards transmission of information and statistical concepts. In what to refer the content of the book we devided it in eleven chapters covering the study of fundamental concepts of unified information and divergence measures including entropy-type measures and the Shannon's entropy and the application part include the applications of generalized information measures in transmission of information covering important areas of information theory viz., Noiseless coding, Huffman procedure, Redundancy, Channel capacity, Coding theorems, Statistical aspects of generalized information measures towards Fisher's information measure, Comparison of experiment, and bounds on the bayesian probability of error in feature selection problems. Connections to generalized information measures with several probability distributions is also made. Still it is planned to include in this part the applications of generalized information measures in Fuzzy sets theory.

21-06-2001
Inder Jeet Taneja
Departamento de Matemática - UFSC
88.040-900 Florianópolis, SC - Brazil