See Simplicity bias in finite state transducers
On the second question, there is actually a stumbling block due to the random FST ensemble I'm using, which consists only of accessible FSTs (of given size). Accessible means that any state can be reached from the initial state (so that there are no 'useless' states). This is in contrast to random unrestricted FSTs, where each of the K_i n out-stubs are connected to a state, independently and uniformly at random.. Answering probabilistic questions for the latter is much easier than for accessible FSTs (see attached or http://bit.ly/290fHji). I guess we could simulate random unrestricted FSTs, though I think accessible FSTs are a more interesting ensemble, because you fix the actual number of states in the automaton. Anyway, there may still be some things to say here, because in the article I attach he finds a way of relating statistics of automata to those of accessible automata, but only asymptotically, and with inequalities. There may be other approaches with Analytic combinatorics, but they are potentially quite hard.
Regarding the first question, I've been refining my ideas about loops of 'noncoding states' (with output symbols being equal). In particular, looking at the experimental results, I've noticed that bias is associated with 'absorbing regions' that contain at least one non-coding state (approximately absorbing regions also show some bias). An absorbing region is a set of states which you can reach, but which you can't leave. Now, I've found two main factors determining the frequency/neutral-set-size/designability (call this NSS) of an output of an FST that contains this:
Now, I've also found that the NSS is multiplied by 2^(a*m), where a depends on the structure of the absorbing region (in an interesting combinatorial way). So the NSS \propto 2^(a*m). The proportionality constant will depend on the particular string, and the number of noncoding states it passes through, outside the absorbing region (this requires more attention).
Now, if the m output bits from the absorbing region are composed of a repeating pattern (often the case, but I can think of exceptions..), the Lempel-Ziv complexity C <= n-m + const., where n is the total number of bits, and const is the number of bits in the repeating pattern.
Under these assumptions, one can see that the frequency of an output obeys P =NSS/2^n <= 2^(-a*C + b), where I lump all the proportionality constants above in b..
The Fibonacci GP map described in the paper on constrained/unconstrained parts, is actually an example of the simplest kind of FST with the properties above. It can be implemented as a 3-state FST, with an absorbing region consisting of a single non-coding state, and no non-coding states anywhere else. Thus the arguments above work very cleanly. Unfortunately, general FSTs can show more complicated things, like:
All these make complicate the picture, and should be taken into account more fully to improve the argument above. In any case, it makes sense the argument above can't be exact (except for simple cases like Fibonacci GP map), because most FSTs show a complexity/frequency plot which is not perfect, but has some noise.
Hope that wasn't too long. I think also that all this will be easier to understand with pictures...