Algorithm to compute LZ complexity measure

guillefix 4th November 2016 at 2:43pm

Algorithm to compute LZ complexity measure

written in Python

def KC_LZ(string):
    n=len(string)
    s = '0'+string
    c=1
    l=1
    i=0
    k=1
    k_max=1
    stop=0

    while stop==0:
        if s[i+k] != s[l+k]:
            if k>k_max:
                k_max=k # k_max stores the length of the longest pattern in the LA that has been matched somewhere in the SB

            i=i+1 # we increase i while the bit doesn't match, looking for a previous occurence of a pattern. s[i+k] is scanning the "search buffer" (SB)

            if i==l: # we stop looking when i catches up with the first bit of the "look-ahead" (LA) part.
                c=c+1 # If we were actually compressing, we would add the new token here. here we just count recounstruction STEPs
                l=l+k_max # we move the beginning of the LA to the end of the newly matched pattern.

                if l+1>n: # if the LA surpasses length of string, then we stop.
                    stop=1

                else: #after STEP,
                    i=0 # we reset the searching index to beginning of SB (beginning of string)
                    k=1 # we reset pattern matching index. Note that we are actually matching against the first bit of the string, because we added an extra 0 above, so i+k is the first bit of the string.
                    k_max=1 # and we reset max lenght of matched pattern to k.
            else:
                k=1 #we've finished matching a pattern in the SB, and we reset the matched pattern length counter.

        else: # I increase k as long as the pattern matches, i.e. as long as s[l+k] bit string can be reconstructed by s[i+k] bit string. Note that the matched pattern can "run over" l because the pattern starts copying itself (see LZ 76 paper). This is just what happens when you apply the cloning tool on photoshop to a region where you've already cloned...
            k=k+1

            if l+k>n: # if we reach the end of the string while matching, we need to add that to the tokens, and stop.
                c=c+1
                stop=1



    # a la Lempel and Ziv (IEEE trans inf theory it-22, 75 (1976),
    # h(n)=c(n)/b(n) where c(n) is the kolmogorov complexity
    # and h(n) is a normalised measure of complexity.
    complexity=c;

    #b=n*1.0/np.log2(n)
    #complexity=c/b;