Exact learning

cosmos 8th February 2017 at 3:58pm
Computational learning theory

Learn a DFA representing a target language LΣL \subseteq \Sigma^* using membership and equivalence queries to a teacher

  • Membership query. selects string and asks whether that string is part of the langauge
  • Equivalence query. learner selects hypothesis DFA H and asks does L(H)=LL(H)=L? If yes, the algo halts. If no, the teacher gives a counterexample.

We want to exactly learn L using # of queries polynomial in the size of a in DFA for L and the length of the longer coutnerexample given by the teacher.

Angulin's LL^*-algorithm

See also Evolving automata.

Target language LL. The learner maintans

  • A set QΣQ \subseteq \Sigma^* of access worlds, with ϵQ\epsilon \in Q.
  • A set of TΣT \subseteq \Sigma^* of test words.

We say that v,wΣv,w \in \Sigma^* are T-equivalent, deonted VTwV\equiv_T w if vuLwuLvu \in L \Leftrightarrow wu \in L for all uT u \in T, i.e. L(vu)=L(wu)L(vu)=L(wu).

(Q,T)(Q,T) is separable if no two distinct words v,wQv,w \in Q are T-equivalent.

(Q,T)(Q,T) is closed if for all qQq\in Q and all aΣa\in\Sigma, there exists some qQq' \in Q s.t. qaTqqa \equiv_T q'.

If (Q,T)(Q,T) are separable and closed then we define a DFA HH as follows;

  • The set of states of HH is QQ.
  • The initial state is ϵ\epsilon.
  • Transition function: Given qQq\in Q and aΣa \in \Sigma, q makes an a-transition to the unique qQq' \in Q s.t. qaTqqa \equiv_T q'.
  • The set of accepting states QLQ \cap L

Algorithm

Invariant: (Q,T(Q,T is separable

  1. Q:={ϵ}Q:=\{\epsilon\}, T:={ϵ}T:=\{\epsilon\}.
  2. If (Q,T)(Q,T) is not closed then expand QQ using membership queries until (Q,T)(Q,T) is closed and separable. Add the necessary qaqa.
  3. Compute hypothesis automaton and ask equivalence query.
  4. If yes, then halt.
  5. If no, then use counterexample to properly expand QQ and TT to again get separable (Q,T)(Q,T).

L={w{a,b}:wb3(mod4)}L=\{w\in \{a,b\}^* : |w|_b \equiv 3 (mod 4)\}

b|\cdot|_b means number of bbs.

(Q,T)(Q,T)HHCounterexample
(ϵ,ϵ)(\epsilon, \epsilon)bbb

...

Prop (for step 5): Suppose (Q,T)(Q,T) is sep. and closed; with H the hyp. automaton. Given a counter example wΣw \in \Sigma^* , using logw\log{|w|} membership queries, one can find qΣQq \in \Sigma^* \setminus Q and tΣt \in\Sigma^* s.t. (Q{q},T{t})(Q\cup \{q\}, T\cup \{t\}) is separable.

Let w=w1w2...wnw=w_1w_2...w_n and let q0,q1,...qnq_0 , q_1, ... q_n be the run of HH on ww. We say that q_i is correct if L(qiwi+1...wn)=L(w)L(q_i w_{i+1}...w_n)=L(w)....

Using membership queries and binary shearch, we find ii s.t. qi1q_{i-1} is correct and qiq_i is not correct, that is,

L(qi1wi...wn)L(qiwi+1...wn)L(q_{i-1}w_i...w_n)\neq L(q_iw_{i+1}...w_n)

Let Q=Q{qi1wi} Q'=Q\cup \{q_{i-1}w_i\}

T=T{wi+1...wn}T'=T\cup\{w_{i+1}...w_n\}.

Note that qiTqi1wiq_i\equiv_T q_{i-1}w_i. But qi\nequivTqi1wiq_i\nequiv_{T'} q_{i-1}w_i, so (Q,T)(Q',T') is separable.

Proving that the algorithm halts means that it is correct. We can show Q is bounded by the size of the minimum automaton for L