Simplicity bias in finite-state transducers

cosmos 22nd August 2018 at 4:12pm
Simplicity bias

An example of Simplicity bias in discrete systems. See also this gingko tree

See Redundancy calculations in papers in Lempel-Ziv complexity

See Random automata and Evolving automata

Universal prediction

Numerical experiments on the simplicity bias in finite-state transducers

On the theory/analysis side, I've been thinking about two questions:

  • Understanding the simplicity bias graph, for a particular FST, given its structure.
  • Understanding the statistical/average properties of random FSTs.

Ideas for understanding the simplicity bias in finite state transducers

To have sufficiently high bias, we need a small non-coding loop.

To have varied output, we need loops outside the non-coding regions. This is so that the time spent in non-coding regions can vary for different outputs.

The slope of the designability/complexity plot corresponds approximately to the Topological entropy of non-coding region. Computed using Determinant of a graph. However, there's also a factor due to the conversion between {KC complexity} and {number of bits spent in non-coding region}. For first FST below, for instance, by computing KC for strings like 10000000000000000...10000000000000000... and 10011110110000000...10011110110000000..., I found that KC2.7mKC \propto 2.7 m. Then from topological entropy, which is log232\frac{\log_2{3}}{2}, we find a(log23/2)/2.70.29a \approx (\log_2{3}/2)/2.7 \approx 0.29, which is consistent with what I found from the graph, approximately (log2100)/230.29(\log_2{100})/23 \approx 0.29.

LZ<nmlog2(nm)+cLZ < \frac{n-m}{\log_2{(n-m)}} + c

KCLZlog2n<nmlog2(nm)log2n+cKC \approx LZ\log_2{n}< \frac{n-m}{\log_2{(n-m)}}\log_2{n} + c'

P2am+bP \approx 2^{am+b}

log2Pam+b\log_2{P} \approx am+b

mlog2Pbam \approx \frac{\log_2{P} -b}{a}

KC<nlog2Pbalog2(nlog2Pba)log2n+cKC < \frac{n-\frac{\log_2{P} -b}{a}}{\log_2{(n-\frac{\log_2{P} -b}{a})}}\log_2{n} + c'

Now, this PP refers to the frequency, which is between 1012×10510710^{12} \times 10^{-5} \approx 10^7 and 1012×10310910^{12} \times 10^{-3} \approx 10^9 (24010122^{40} \approx 10^{12})

log2(40)/log2(40log2(10n)/0.79)\log_2(40)/\log_2(40-\log_2(10^n)/0.79)

If we average this quantity for n=7,8,9n=7,8,9, we get (4.8+2+1.55)/32.8(4.8+2+1.55)/3 \approx 2.8, which is close to the 2.72.7 found from estimates above.

See this paper about maximum LZ complexity, which goes like l/logll/\log{l} where ll is length of string.

See here for desmos graph.


From Data compression literature

On point-wise redundancy rate of Bender-Wolf's variant of SWLZ algorithm

On the pointwise redundancy of the LZ78 algorithm

Upper bounds on the probability of sequences emitted by finite-state sources and on the redundancy of the Lempel-Ziv algorithm

They give results similar to Simplicity bias/Coding theorem method, but for Finite-state sources.

Universal redundancy rates do not exist

Redundancy of the Lempel-Ziv codes


Examples of finite-state transducers and their simplicity bias


See related stuff in Descriptional complexity

Finite state channel

Information theoryCoding theoryAlgorithmic information theory

Ergodic theoryTopological dynamicsTopological entropy

More resources in simplicity bias in FSTs