Cause –> Effect
Bayesian networks – Causal inference
Causality refers to structural properties (e.g. Independence relations) of a Bayesian network modeling a system, which includes variables relevant to any possible experiment which involves that system, specially experiments involving intervention. The issue is that one can't in general infer the whole Bayesian network from a single, or even many experiments, unless one makes assumptions. A common assumption is the "effective Free will" assumption, which assumes that the choice of the experimenter is independent of any relevant variables; this rules out superdeterminist explanations. This assumption is often not quite true, and to correct it, we use methods such as double-blindness in trials, etc.
Causality: Models, Reasoning, and Inference
http://bayes.cs.ucla.edu/PRIMER/
[Коллоквиум]: Causal inference and Kolmogorov complexity
Randomized controlled trials are used to infer causal relations.
When the Map Is Better Than the Territory uses Mutual information as a measure of causal influence (he talks about causal Emergence). He finds that the mutual information at higher scales can be higher than those at lower scales. This may sound counterintuitive. Mutual information is defined as how much information does an intervention give you on the future state of a system, i.e. how much does knowing which intervention is performed determine the future of a system, versus not knowing it. However, crucially, mutual information is defined as the average of this quantity over interventions. And in particular, he defines as the average when the distribution is uniform over interventions.
So, if we are talking about intervening at a small scale (the "territory"), by his definition, we consider all possible interventions with equal probability (note intervention is defined as setting the system in a particular state before letting it evolve). If most of these interventions don't determine the future much, then in average we won't determine the future much.
What does intervening at a higher scale (the "map") means? It means, for example, intervening now on a set of macrostates, which correspond to sets of microstates. Crucially, an uniform distribution on macrostates, can cause a non-uniform distribution on microstates. In particular, it can put much more weight on those microstates that effectively determined the future. Therefore this macro intervention will have higher causal influence.
This doesn't mean it's impossible to have as much causal influence in the micro scale, we could just choose the non-uniform distribution at that scale. But it means that in some sense, the macroscale can give us that for free, making the job much easier, and it's thus better in that respect.
He makes some connections to Channel capacity which I haven't looked at in much detail.
Some philosophical points at the end seem like they are stretching the implications a bit too much. But I do see this possibly giving insight into questions of emergence