Information theory: Cosmos — All that is, or was, or ever will be

Information theory

cosmos 14th March 2019 at 8:00pm

Information theory is the study of information, which is formalized via entropy. Information is, roughly, the minimal number of yes/no questions needed to specify the state of a system (more precisely, the minimal expected number of such questions). Mathematically, the informaiton or entropy, is a property of a Probability distribution. Physically, it is a property of a physical system, when it is modelled with a probability distribution. In the simplest case, the probability distribution is uniform, in which case the entropy is just a measure of the number of possible states of the system! This is intuitive, a system holds lots of information if it can be in many possible states.

The philosophy of information gets tricky, simply because the philosophy of probability gets tricky, as one is defined in terms of the other. I.e. is information subjective or objective, is a question of wether probability is subjective or objective.

Information Theory, Information Theory (CUHK)

Entropy/Information

Entropy is the number of yes/no questions you expect you need to ask to identify the state of the world, under a Model of the world (Probability distribution over states of the world). I.e. how ignorant I think I am about the world.

Btw "The information something has" is just its entropy, under a model of the something.

If you then for some reason update your model of the world, your expectations change. Because of this, the expected number of yes/no questions using the previously optimal scheme can change. The new number, called Cross entropy represents how ignorant you think now that you *were* about the world.

Relative entropy (aka KL divergence) is the difference between how ignorant I think I am *now* after the update (new value of entropy), versus how ignorant I think I was before. I.e. how much less ignorant about the world do I think I have become after the update – how much information I think I have learned

I am calling -logP(x) "ignorance". It's more typically called "Surprise".

Some basic results and quantities:

Asymptotic equipartition property
Information measures
Data processing theorem
Fano's inequality

Coding theory

A code is a representation of information/data.

Coding theory (and/or coding methods) is the study of the properties of codes and their fitness for a specific application. These applications include Data transmission, Data compression, Cryptography, and Network information theory

Data transmission

See Source-channel separation theorem

The main problem of study in data transmission theory is: for a particular Communication channel, find code so that data transmission rate is as high as possible, while receiver receives the information with negligible probability of error.

The limit in data transmission rate turns out to be the Channel capacity, as established by the Channel coding theorem.

Data transmission is part of the broader area of study called Communication theory, which includes consideration of the information source and destination.

Data compression

Study of theoretical limits and implementation of codes that make average length of the value of a random variable as short as possible, whether in a lossless, or lossy way.

The limit in the average length of codewords in a lossless code turns out to be the entropy, as established by the Source coding theorem

Limits in lossy codes are established in Rate compression theory

More related areas

Ergodic theory. A dynamical system is ergodic if it has a unique probabliy distribution in the long time limit, I think Asymptotic equipartition theorem gives probability of each typical sequence.
Hypothesis testing (See Statistics).
Statistical mechanics
Quantum information theory
Inference. Kolmogorov complexity is often applied to extrapolate from data. See also Solomonoff's algorithmic probability, and Machine learning
Signal processing
Gambling and investment. Theory of investment in stock markets. See Game theory.
- Doubling rate. Has parallels with entropy
Probability theory
Sequence spaces and Symbolic dynamics
Complexity theory
- Descriptional complexity