Discussion on finite state complexity and GP map bias

guillefix 4th November 2016 at 2:43pm

See MMathPhys oral presentation

~*Different definition of finite-state complexity here: http://web.mit.edu/cocosci/Papers/complex.pdf (though still not the one we need below)

Using finite-state complexity we can define the complexity of a string produced by a finite transducer. It is effectively the length of the smallest program that describes both a transducer (according to some encoding) and the string itself. This definition is not universal as for Turing machines, because of the non-universality of finite transducers.

This is not what I want, because they don't fix the transducer, while a given GPM would fix the transducer. Assuming we can use the same idea as above, if the length of the input is much larger than the length of the transducer, we are effectively inputing random fixed-length strings to a Turing machine (that we know halt hm..), and by Levins coding theorem (applied here to not asymptotic case..) we expect that "strings with many long descriptions to have a short description too". Furthermore, if we assume that the map is many to one, then each strings would have many long descriptions, so each will have a short description. But if there are many such strings, not all of them can have short descriptions. Thus, the only consistent situation is for a few strings with simple descriptions having many long descriptions too, and a many strings with few long descriptions.

This assumed that the finite transducer is simple (defined by the condition above that the input string to the finite state transducer (FST) be much larger than the FST description). If it isn't, the bias argument above still holds I think, but because the transducer is complex, it's outputs will all be complex, with a complexity dominated by the transducer's.

It seems like the Levin's coding theorem holds for strings of all inputs, which means it works for argument above! However, I don't understand it fully, in particular it's derivation, so I'm not too confident on this. See this book

SEE EMAIL CONVERSATION FOR FOLLOW UP ON THIS. Reasoning above doesn't hold. Kamal's answer:

I read the three papers - thanks for those.

Shallit and Wang (2001) was not super interesting, though obviously relevant in the sense that focus on computable complexities.

Calude (2011) is more interesting. The most interesting result being that a kind of Invariance Theorem holds for finite state transducers (which are weak/weakest type of computation, UTMs being the most powerful because they can compute any algorithm). The Invariance theorm in AIT comes from the fact that any UTM can simulate any other UTM, while their Inv Thm for finite state machines does not invoke this property. Assuming prefix free descriptions of the transducers, this implies a kind of coding theorem for finite state transducers. This is nice because finite state transducers do not have the mystical air that UTMs do (uncomputable). I think it is worth citing this Calude article as a comment, but maybe not making too much of a deal about it.

I also looked Guillermo’s link – just a comment on some reasonsing in there (I know it is just notes):

Furthermore, if we assume that the map is many to one, then each strings would have many long descriptions, so each will have a short [shorter] description. But if there are many such strings, not all of them can have short descriptions[true, but they can all have shorter descriptions]. Thus, the only consistent situation is for a few strings with simple descriptions having many long descriptions too, and a many strings with few long descriptions[hence bias in the map].

The reasoning here is a little rushed – if the map is many to one, then all outputs strings have shorter descriptions. But this does not explain why some outputs have short descriptions and some long (which leads to bias). The central thing to explain in bias is why some outputs will have shorter descriptions than others. The statement "strings with many long descriptions to have a short description too"

does not say anything about how long these short descriptions are, whereas the argument presented assumes that these are short enough to be a problem, in the sense of "not all of them can have short descriptions".

As a trivial but illustrative example, consider the many-to-one map from binary string of length 10 to binary strings of length 5. We can easily construct a uniform distribution for this system. According to the argument above, this system should show bias….but it does not (even though the map is simple).