Information theory is the study of information, which is formalized via entropy. Information is, roughly, the minimal number of yes/no questions needed to specify the state of a system (more precisely, the minimal expected number of such questions). Mathematically, the informaiton or entropy, is a property of a Probability distribution. Physically, it is a property of a physical system, when it is modelled with a probability distribution. In the simplest case, the probability distribution is uniform, in which case the entropy is just a measure of the number of possible states of the system! This is intuitive, a system holds lots of information if it can be in many possible states.
The philosophy of information gets tricky, simply because the philosophy of probability gets tricky, as one is defined in terms of the other. I.e. is information subjective or objective, is a question of wether probability is subjective or objective.
Information Theory, Information Theory (CUHK)
Entropy is the number of yes/no questions you expect you need to ask to identify the state of the world, under a Model of the world (Probability distribution over states of the world). I.e. how ignorant I think I am about the world.
If you then for some reason update your model of the world, your expectations change. Because of this, the expected number of yes/no questions using the previously optimal scheme can change. The new number, called Cross entropy represents how ignorant you think now that you *were* about the world.
Relative entropy (aka KL divergence) is the difference between how ignorant I think I am *now* after the update (new value of entropy), versus how ignorant I think I was before. I.e. how much less ignorant about the world do I think I have become after the update – how much information I think I have learned
I am calling -logP(x) "ignorance". It's more typically called "Surprise".
Some basic results and quantities:
A code is a representation of information/data.
Coding theory (and/or coding methods) is the study of the properties of codes and their fitness for a specific application. These applications include Data transmission, Data compression, Cryptography, and Network information theory
See Source-channel separation theorem
The main problem of study in data transmission theory is: for a particular Communication channel, find code so that data transmission rate is as high as possible, while receiver receives the information with negligible probability of error.
The limit in data transmission rate turns out to be the Channel capacity, as established by the Channel coding theorem.
Data transmission is part of the broader area of study called Communication theory, which includes consideration of the information source and destination.
Study of theoretical limits and implementation of codes that make average length of the value of a random variable as short as possible, whether in a lossless, or lossy way.
The limit in the average length of codewords in a lossless code turns out to be the entropy, as established by the Source coding theorem
Limits in lossy codes are established in Rate compression theory
Kolmogorov complexity. Shortest program that will produced desired output in Turing machine. Occam's razor
Shannon - A Mathematical Theory of Communication
General theory of information transfer: Updated
Storing and Transmitting Data: Rudolf Ahlswede’s Lectures on Information ...
Information Theory, Combinatorics, and Search Theory
Theory of ordering (see Entropy reduction)
Back from infinity: a constrained resources approach to information theory