Algorithms · FiniteStateTransducers

Closure

FiniteStateTransducers.closure — Function

closure(fst; star=true)

Returns a new WFST cfst that is the closure of fst. If fst transuces labels to olabel with weight w, cfst will be able to transduce repeat(ilabels,n) to repeat(olabels,m) with weight w^n for any integer n.

If star is true, cfst will transduce the empty string to itself with weight one.

source

FiniteStateTransducers.closure! — Function

closure!(fst; star=true)

Same as closure but modifies fst in-place.

source

Composition

Base.:∘ — Function

∘(A::WFST,B::WFST; filter=EpsSeq, connect=true)

Perform composition of the transucers A and B.

julia> A = linearfst(["a","b","c"],["α","β","γ"],ones(3))
WFST #states: 4, #arcs: 3, #isym: 3, #osym: 3
|1/0.0f0|
a:α/1.0f0 → (2)
(2)
b:β/1.0f0 → (3)
(3)
c:γ/1.0f0 → (4)
((4/0.0f0))

julia> B = matrix2wfst(["α","β","γ"],[:w,:x,:z],ones(3,3))
WFST #states: 4, #arcs: 9, #isym: 3, #osym: 3
|1/0.0f0|
α:w/1.0f0 → (2)
β:x/1.0f0 → (2)
γ:z/1.0f0 → (2)
(2)
α:w/1.0f0 → (3)
β:x/1.0f0 → (3)
γ:z/1.0f0 → (3)
(3)
α:w/1.0f0 → (4)
β:x/1.0f0 → (4)
γ:z/1.0f0 → (4)
((4/0.0f0))

julia> A∘B
WFST #states: 4, #arcs: 3, #isym: 3, #osym: 3
|1/0.0f0|
a:w/2.0f0 → (2)
(2)
b:x/2.0f0 → (3)
(3)
c:z/2.0f0 → (4)
((4/0.0f0))

The keyword filter can specify the composition filter to be used, which makes it possible to handle epsilon-transitions. See Allauzen et al. "Filters for Efficient Composition of Weighted Finite-State Transducers".

If connect is set to true after completing the composition the connect algorithm is applied.

source

FiniteStateTransducers.Trivial — Type

Trivial

Simplest composition filter, can be used for epsilon-free WFSTs.

source

FiniteStateTransducers.EpsMatch — Type

EpsMatch

Can be used for WFSTs containing epsilon labels. Avoids redundant epsilon paths and giving priority to those that match epsilon labels.

julia> A = txt2fst("0 1 a a 3.0
       1 2 b <eps> 77.0
       2 3 c <eps> 33.0
       3 4 d d 44.0
       4 90.0
       ",Dict("a"=>1,"b"=>2,"c"=>3,"d"=>4,"e"=>5))
WFST #states: 5, #arcs: 4, #isym: 5, #osym: 5
|1/0.0f0|
a:a/3.0f0 → (2)
(2)
b:ϵ/77.0f0 → (3)
(3)
c:ϵ/33.0f0 → (4)
(4)
d:d/44.0f0 → (5)
((5/90.0f0))

julia> B = txt2fst("0 1 a d 15.0
       1 2 <eps> e 19.0
       2 3 d a 12.0
       3 55.0
       ",Dict("a"=>1,"b"=>2,"c"=>3,"d"=>4,"e"=>5))
WFST #states: 4, #arcs: 3, #isym: 5, #osym: 5
|1/0.0f0|
a:d/15.0f0 → (2)
(2)
ϵ:e/19.0f0 → (3)
(3)
d:a/12.0f0 → (4)
((4/55.0f0))

julia> C = ∘(A,B;filter=EpsMatch)
WFST #states: 5, #arcs: 4, #isym: 5, #osym: 5
|1/0.0f0|
a:d/18.0f0 → (2)
(2)
b:e/96.0f0 → (3)
(3)
c:ϵ/33.0f0 → (4)
(4)
d:a/56.0f0 → 5)
((5/145.0f0))

source

FiniteStateTransducers.EpsSeq — Type

EpsSeq

Can be used for WFSTs containing epsilon labels. Avoids redundant epsilon paths. Gives priority to epsilon paths consisting of epsilon-arcs in A followed by epsilon-arcs in B.

julia> A = txt2fst("0 1 a a 3.0
       1 2 b <eps> 77.0
       2 3 c <eps> 33.0
       3 4 d d 44.0
       4 90.0
       ",Dict("a"=>1,"b"=>2,"c"=>3,"d"=>4,"e"=>5))
WFST #states: 5, #arcs: 4, #isym: 5, #osym: 5
|1/0.0f0|
a:a/3.0f0 → (2)
(2)
b:ϵ/77.0f0 → (3)
(3)
c:ϵ/33.0f0 → (4)
(4)
d:d/44.0f0 → (5)
((5/90.0f0))

julia> B = txt2fst("0 1 a d 15.0
       1 2 <eps> e 19.0
       2 3 d a 12.0
       3 55.0
       ",Dict("a"=>1,"b"=>2,"c"=>3,"d"=>4,"e"=>5))
WFST #states: 4, #arcs: 3, #isym: 5, #osym: 5
|1/0.0f0|
a:d/15.0f0 → (2)
(2)
ϵ:e/19.0f0 → (3)
(3)
d:a/12.0f0 → (4)
((4/55.0f0))

julia> C = ∘(A,B;filter=EpsSeq)
WFST #states: 6, #arcs: 5, #isym: 5, #osym: 5
|1/0.0f0|
a:d/18.0f0 → (2)
(2)
b:ϵ/77.0f0 → (3)
(3)
c:ϵ/33.0f0 → (4)
(4)
ϵ:e/19.0f0 → (5)
(5)
d:a/56.0f0 → (6)
((6/145.0f0))

source

Concatenation

Base.:* — Function

*(A::WFST,B::WFST)

Returns C, the concatenation/product of A and B. If A converts the sequence a_i to a_o with weight wa and B converts the sequence b_i to b_o with weight wb then Cconverts the sequence[ai,bi]to[ao,bo]with weightwa*wb`.

julia> sym = ["a","b","c","d","α","β","γ","δ"];

julia> A = WFST(sym);

julia> add_arc!(A,1=>2,"a"=>"α");

julia> add_arc!(A,2=>3,"b"=>"β");

julia> initial!(A,1); final!(A,3,5)
WFST #states: 3, #arcs: 2, #isym: 8, #osym: 8
|1/0.0f0|
a:α/0.0f0 → (2)
(2)
b:β/0.0f0 → (3)
((3/5.0f0))

julia> B = WFST(sym);

julia> add_arc!(B,1=>2,"c"=>"γ");

julia> add_arc!(B,2=>3,"d"=>"δ");

julia> initial!(B,1); final!(B,3,2)
WFST #states: 3, #arcs: 2, #isym: 8, #osym: 8
|1/0.0f0|
c:γ/0.0f0 → (2)
(2)
d:δ/0.0f0 → (3)
((3/2.0f0))

julia> C = A*B
WFST #states: 6, #arcs: 5, #isym: 8, #osym: 8
|1/0.0f0|
a:α/0.0f0 → (2)
(2)
b:β/0.0f0 → (3)
(3)
ϵ:ϵ/5.0f0 → (4)
(4)
c:γ/0.0f0 → (5)
(5)
d:δ/0.0f0 → (6)
((6/2.0f0))

julia> A(["a","b"])
(["α", "β"], 5.0f0)

julia> B(["c","d"])
(["γ", "δ"], 2.0f0)

julia> C(["a","b","c","d"])
(["α", "β", "γ", "δ"], 7.0f0)

source

Connect

FiniteStateTransducers.connect — Function

connect(fst::WFST, [i = get_initial(fst,single=true)])

Remove states that do not belong to paths that end in final states when starting from i.

source

FiniteStateTransducers.connect! — Function

connect!(fst::WFST, [i = get_initial(fst,single=true)])

Same as connect but modifies fst in-place.

source

Determinize

FiniteStateTransducers.determinize_fsa — Function

determinize_fsa(fsa)

Returns a deterministic finite state acceptor. fsa must be an acceptor (is_acceptor).

Requires the WFST to be defined in a left distributive and weakly divisible semiring.

See M. Mohri, F. Pereira, Fernando, M. Riley, Michael "Speech Recognition with Weighted Finite-State Transducers" in Springer Handb. Speech Process. 2008 for details.

source

Epsilon Removal

FiniteStateTransducers.rm_eps — Function

rm_eps(fst::WFST)

Returns an equivalent WFST where arcs with input and output labels are removed.

source

FiniteStateTransducers.rm_eps! — Function

rm_eps!(fst::WFST)

Same as rm_eps but operates in place.

source

Iterators

FiniteStateTransducers.BFS — Type

BFS(fst,initial)

Returns an iterator that performs a breadth-first search (BFS) of fst starting from the state initial.

At each iteartion the tuple (i,p) is returned, where i is the current state and p a Path.

source

FiniteStateTransducers.get_paths — Function

get_paths(fst::FST, [i = get_initial(fst;single=true)])

Returns an iterator that generates all of the possible paths of fst starting from i.

Notice that the fst must be acyclic for the algorithm to stop (see is_acyclic).

source

FiniteStateTransducers.collectpaths — Function

collectpaths(fst::FST, [i = get_initial(fst;first=true)])

Returns an array containing all of the possible paths of fst starting from i.

Notice that the fst must be acyclic for the algorithm to stop (see is_acyclic).

source

FiniteStateTransducers.DFS — Type

DFS(fst::WFST, initial; filter=arc->true)

Creates an iterator that performs a depth-first search (DFS) of the fst starting from the state initial. The function filter can be specify in order to perform the search ignoring arcs with user-defined properties.

For each iterator the tuple (p,s,n,d,e) is returned, where:

p is the parent parent state (p==0 if there is no parent)
s is the current state
n next state: n==0 if s has no arcs or if completely explored. These will give you standard DFS iterations.
d is a Bool: if true means a new state is discovered, can be used to pick up DFS its (see example below)
e is a Bool, if true means all arcs have been inspected
a inspected arc (empty if d == false)

The following cases/colors are possible:

d == false && n ==0 -> black state, state has been checked completely
d == false && n !=0 -> n is already gray
d == true -> n state is colored gray, i.e. is visited for the first time

Example

julia> fst = WFST(Dict(1=>1));

julia> add_arc!(fst, 1, 2, 1, 1, 1.0);

julia> add_arc!(fst, 1, 5, 1, 1, 1.0);

julia> add_arc!(fst, 1, 3, 1, 1, 1.0);

julia> add_arc!(fst, 2, 6, 1, 1, 1.0);

julia> add_arc!(fst, 2, 4, 1, 1, 1.0);

julia> add_arc!(fst, 5, 4, 1, 1, 1.0);

julia> add_arc!(fst, 3, 4, 1, 1, 1.0);

julia> add_arc!(fst, 3, 7, 1, 1, 1.0);

julia> initial!(fst,1)
WFST #states: 7, #arcs: 8, #isym: 1, #osym: 1
|1/0.0f0|
1:1/1.0f0 → (2)
1:1/1.0f0 → (5)
1:1/1.0f0 → (3)
(2)
1:1/1.0f0 → (6)
1:1/1.0f0 → (4)
(3)
1:1/1.0f0 → (4)
1:1/1.0f0 → (7)
(4)
(5)
1:1/1.0f0 → (4)
(6)
(7)

julia> dfs = FiniteStateTransducers.DFS(fst,1); # DFS starting from state 1

julia> for (p,s,n,d,e,a) in dfs
         println("$p,$s,$n")
         if d == true
           println("visiting first time $n (Gray)")
         else
           if e
             println("completed state $s (Black)")
           else
             println("node $n already visited")
           end
         end
       end
0,1,2
visiting first time 2 (Gray)
1,2,6
visiting first time 6 (Gray)
2,6,0
completed state 6 (Black)
1,2,4
visiting first time 4 (Gray)
2,4,0
completed state 4 (Black)
1,2,0
completed state 2 (Black)
0,1,5
visiting first time 5 (Gray)
1,5,4
node 4 already visited
1,5,0
completed state 5 (Black)
0,1,3
visiting first time 3 (Gray)
1,3,4
node 4 already visited
1,3,7
visiting first time 7 (Gray)
3,7,0
completed state 7 (Black)
1,3,0
completed state 3 (Black)
0,1,0
completed state 1 (Black)

source

FiniteStateTransducers.is_visited — Function

is_visited(fst::WFST, i = get_initial(fst;single=true))

Return an array of booleans indicating if the i-th state of the fst is visted starting form i.

source

Reverse

Base.reverse — Function

reverse(fst::WFST)

Returns rfst, the reversed version of fst. If fst transuces the sequence x to y with weight w, rfst transuces the reverse of x to the reverse of y with weight reverse(w).

source

Shortest Distance

FiniteStateTransducers.shortest_distance — Function

shortest_distance(fst[, s=get_initial(fst); filter=arc->true, reverse=false)

Computes the shortest distance between the state s to every other state. The shortest distance between s and i is defined as the sum of the weights of all paths between these states.

Weights are required to be rigth distributive and k-closed.

If fst has N states a N-long vector is returned where the i-the element contains the shortest distance between the states s and i.

If reversed==true the shortest distance from the final states is computed. In this case s has no effect since a new initial superstate is added to the reversed WFST. Here weights are required to be left distributive and k-closed.

See Mohri "Semiring framework and algorithms for shortest-distance problems", Journal of Automata, Languages and Combinatorics 7(3): 321-350, 2002.

source

Topological sort

FiniteStateTransducers.topsort — Function

topsort(fst::WFST, i = get_initial(fst); filter=arc->true)

Requires fst to be acyclic.

Returns an equivalent WFST to fst which is topologically sorted starting from the i-th state.

Modify the filter function to perform the operation considering only specific arcs.

source

FiniteStateTransducers.topsortperm — Function

topsortperm(fst::WFST, i = get_initial(fst); filter=arc->true)

Requires fst to be acyclic.

Returns the topological permutation of states of fst starting from the i-th state.

Modify the filter function to perform the operation considering only specific arcs.

source

FiniteStateTransducers.get_scc — Function

get_scc(fst::WFST, i = get_initial(fst;single=true); filter=arc->true)

Calculates the strongly connected components of 'fst' using Tarjan's algorithm. Returns a tuple (scc,c,v):

scc are the strongly connected components of fst
c is a boolean array containing the accessibility form the i
v are the visited states of the fst

The function filter can be used to consider only arcs with specific properties.

source

WFST label transition extraction

FiniteStateTransducers.wfst2tr — Function

wfst2tr(fst,Nt; get_label=get_ilabel)

Returns an array time2tr of length Nt-1. The t-th element of time2tr is a dictionary mapping the transition index between input label at time t to the input label at time t+1.

julia> using FiniteStateTransducers

julia> H = WFST(["a1","a2","a3"],["a"]);

julia> add_arc!(H,1,2,"a1","a");

julia> add_arc!(H,2,3,"a2","<eps>");

julia> add_arc!(H,2,2,"a2","<eps>");

julia> add_arc!(H,3,4,"a3","<eps>");

julia> add_arc!(H,3,2,"a3","a");

julia> initial!(H,1);

julia> final!(H,4)
WFST #states: 4, #arcs: 5, #isym: 3, #osym: 1
|1/0.0f0|
a1:a/0.0f0 → (2)
(2)
a2:ϵ/0.0f0 → (3)
a2:ϵ/0.0f0 → (2)
(3)
a3:ϵ/0.0f0 → (4)
a3:a/0.0f0 → (2)
((4/0.0f0))


julia> Nt = 10;

julia> time2tr = wfst2tr(H,Nt)
9-element Array{Dict{Int64,Array{Int64,1}},1}:
 Dict(1 => [2])
 Dict(2 => [2, 3])
 Dict(2 => [2, 3],3 => [2])
 Dict(2 => [2, 3],3 => [2])
 Dict(2 => [2, 3],3 => [2])
 Dict(2 => [2, 3],3 => [2])
 Dict(2 => [2, 3],3 => [2])
 Dict(2 => [2],3 => [2])
 Dict(2 => [3])

source

wfst2tr(fst; get_label=get_ilabel, convert_weight=identity)

Extract transition matrix and initial probabilities from a WFST representing an HMM model.

julia> add_arc!(H,1,2,"a1","a");

julia> add_arc!(H,2,2,"a1","<eps>",-log(0.3));

julia> add_arc!(H,2,3,"a2","<eps>",-log(0.7));

julia> add_arc!(H,3,3,"a2","<eps>",-log(0.3));

julia> add_arc!(H,3,4,"a3","<eps>",-log(0.7));

julia> add_arc!(H,4,4,"a3","<eps>",-log(0.3));

julia> add_arc!(H,4,2,"a1","a",-log(0.7));

julia> initial!(H,1);

julia> final!(H,4)
WFST #states: 4, #arcs: 12, #isym: 3, #osym: 1
|1/0.0f0|
a1:a/0.0f0 → (2)
a1:a/0.0f0 → (2)
(2)
a2:ϵ/0.0f0 → (3)
a2:ϵ/0.0f0 → (2)
a1:ϵ/1.2039728f0 → (2)
a2:ϵ/0.35667494f0 → (3)
(3)
a3:ϵ/0.0f0 → (4)
a3:a/0.0f0 → (2)
a2:ϵ/1.2039728f0 → (3)
a3:ϵ/0.35667494f0 → (4)
((4/0.0f0))
a3:ϵ/1.2039728f0 → (4)
a1:a/0.35667494f0 → (2)

julia> a,A = wfst2tr(H; convert_weight = w -> exp(-get(w)))
(Float32[1.0, 0.0, 0.0], Float32[0.3 0.7 0.0; 0.0 0.3 0.7; 0.7 0.0 0.3])

source