Note this is calculated with zlib complexity..
Average bias over 100 samples: 0.74. 74% of the outputs states have most of the inputs.
I also have got the code working. Due to the way the libraries I'm using works, it has to be done in 5 steps: generating the fst files, converting them, running the fsts on random inputs, counting number of inputs per output, computing complexities of outputs. I'm going to write a bash script that calls these in the right order. I'm also using the (modified) Lempel-Ziv complexity measure that you use, that Chico gave me. At the moment, the random generation of fsts is done in python. I think this is fine, as the bulk of the computation is the "running" and complexity steps, which are c++. However, I found a C++ library that can randomly generate automata (http://regal.univ-mlv.fr/); I haven't yet managed to make it work, but if we do, it's maybe better to use that one.
From preliminary runs, I have indeed found the c++ to be much faster, so that I could rather quickly run 10^6 input strings on 50 random 5-states transducers. Of those 11-13 showed clear simplicity bias, the rest showing much smaller bias. This was actually using some python code that is now c++, and should now work even better.
Other statistics and complexity measures that we were talking about are yet to be implemented.