Sequence space

cosmos 22nd March 2019 at 2:38pm
Information theory

A sequence space refers to the Set of all sequences of symbols, of a given length, where the symbols belong to an alphabet (another Set), which may be endowed with some more structure.

More precisely, a sequence is the set of all functions from an index set II to the alphabet set AA. This is the same as the set ×iIAi\times_{i \in I} A_i , where ×\times denotes Cartesian product. The sequence space can be notated AIA^I.

Two common examples of infinite sequence spaces are ANA^{\mathbb{N}}, where the index set is the naturals, and AZA^{\mathbb{Z}}, where the index set are the integers. Members of this latter example are also called bi-infinite sequences.

See this video

Topology

As the sequence set is constructed as a Cartesian product, we can endow it with the Product topology. The alphabet set, if finite can be endowed with the Discrete topology

Under this topology one can show that a set FF is closed iff there is a tree TT (a set of finite sequences, or strings) such that F=TF=T^\infty, where TT^\infty is the set of all paths through TT. See here for details and proof.

But the set of paths through a tree is a union of Cylinder sets!. Therefore, cylinder sets are Clopen sets. As elements of the topology, cylinder sets are by definition open sets. The complement of an open set is a closed set, but the complement of a cylinder set is a union of cylinders, and so cylinder sets are also closed, and are thus clopen.

Measure on sequence spaces

Math 574, Lesson 1-5: Measures on Sequence Spaces Warning: He uses non-standard terminology in the video (he's swapped the definition of "open cylinder" and "cylinder set"!). We'll standard one in here.

Also note, for finite alphabets with discrete topolgy, we may as well consider "basic" versions of open cylinders and cylinder sets (where instead of a product of entire sets and open proper subsets, we have a product of entire sets and singleton subsets), as the non-basic versions are just given by finite unions or intersections of these. We use the basic or non-basic versions interchangeably here.

As the Basic open cylinders generate the Product topology, which in turn generates a Borel sigma-algebra on our space, then if we find the algebra generated by the open cylinders, this algebra will generate the Borel-sigma algebra, and by the Caratheodory extension theorem, by defining a Measure on the sets of this algebra, we define a unique measure on the Borel sigma-algebra.

In fact he shows that the set of finite unions of open cylinders themselves already form an algebra. This is because a finite intersection of open cylinders can be expressed as a finite union of another set of of open cylinders.

Then, it turns out that we can define a unique measure on this algebra if we define the measure on a special subset of all the open sets only, and thus we can define a unique measure on the Borel σ\sigma-algebra of the sequence space. These sets are of the form [σ][\sigma], where this is the set of all sequences that begin with string σ\sigma (these is the cylinder set given by substring σ\sigma). He seems to consider only (basic) cylinder sets given by initial substrings?? These give a subbase, so it works also? As in these generate the algebra.

In summary, the idea here is that if we define a measure on all the initial finite substrings, we can extend it to the whole infinite sequence space.

This measure is simply constructed by using the Measure additivity axiom of countable unions of disjoint (non-overlapping) sets (here applied to finite unions, as it is an algebra), and use some properties of open cylinders under intersections, which convert other arbitrary unions of open cylinders to unions of disjoint sets. This is proved from the property in the following lemma as well as the next lemma. This latter lemma uses μ([σ])=μ([σ0])+μ([σ1])\mu([\sigma]) = \mu([\sigma0])+\mu([\sigma1]), which of course follows from the additivity property of measures.

You also require some normalization condition, like μ([ϵ])=1\mu([\epsilon]) = 1 where ϵ\epsilon is the empty string, and thus [ϵ][\epsilon] is the set of all sequences that begin with the empty string, i.e. the full set.

One can also define a Semimeasure on the sequence space


See also Symbolic dynamics, Shift space, Entropy reduction. See book on permutation entropy.