WFSTs internals

Formal definition

Formally a WFSTs over the semiring $\mathbb{W}$ is the tuple $(\mathcal{A},\mathcal{B},Q,I,F,E,\lambda,\rho)$ where:

$\mathcal{A}$ is the input alphabet (set of input labels)
$\mathcal{B}$ is the output alphabet (set of output labels)
$Q$ is a set of states (usually integers)
$I \subseteq Q$ is the set of initial states
$F \subseteq Q$ is the set of final states
$E \subseteq Q \times \mathcal{A} \cup \{ \epsilon \} \times \mathcal{B} \cup \{ \epsilon \} \times \mathbb{W} \times Q$ is a set of arcs (transitions) where an element consist of the tuple (starting state,input label, output label, weigth, destination state)
$\lambda : I \rightarrow \mathbb{W}$ a function that maps initial states to a weight
$\rho : F \rightarrow \mathbb{W}$ a function that maps final states to a weight

Constructors and modifiers

FiniteStateTransducers.WFST — Type

WFST(isym, [osym]; W = TropicalWeight{Float32})

Construct an empty Weighted Finite State Transducer (WFST)

isym input symbol table (can be a dictionary or a vector)
osym output symbol dictionary (optional)
W weight type

If the symbol table are AbstractDict{I,D}, D must be an integer and I the type of the symbol.

source

FiniteStateTransducers.add_arc! — Function

add_arc!(fst, src, dest, ilabel, olabel[, w=one(W)])

add_arc!(fst, srcdest::Pair, ilabelolabel::Pair[, w=one(W)])

Adds an arc from state src to state dest with input label ilabel, output label olabel and weight w. Alternative notation utilizes Pairs.

If w is not provided this defaults to one(W) where W is the weight type of fst.

source

FiniteStateTransducers.add_states! — Function

add_states!(fst,N)

Add N empty states to the FST.

source

FiniteStateTransducers.initial! — Function

initial!(fst::WFST{W},i, [w=one(W)])

Set the i-th state as a initial state. A weight w can also be provided.

source

FiniteStateTransducers.final! — Function

final!(fst::WFST{W},i, [w=one(W)])

Set the i-th state as a final state. A weight w can also be provided.

source

FiniteStateTransducers.rm_final! — Function

rm_final!(fst,i)

Remove final state labeling of the i-th state.

source

FiniteStateTransducers.rm_initial! — Function

rm_initial!(fst,i)

Remove initial state labeling of the state i.

source

Internal access

FiniteStateTransducers.get_isym — Function

get_isym(fst::WFST)

Returns the input symbol table of fst. The table consists of a dictionary where each element goes from the symbol to the corresponding index.

source

FiniteStateTransducers.get_iisym — Function

get_iisym(fst::WFST)

Returns the inverted input symbol table.

source

FiniteStateTransducers.get_osym — Function

get_osym(fst::WFST)

Returns the output symbol table of fst. The table consists of a dictionary where each element goes from the symbol to the corresponding index.

source

FiniteStateTransducers.get_iosym — Function

get_iosym(fst::WFST)

Returns the inverted output symbol table.

source

FiniteStateTransducers.get_states — Function

get_states(fst::WFST)

Returns the states the fst.

source

FiniteStateTransducers.get_initial — Function

get_initial(fst::WFST, [single=false])

Returns the initial state id of the fst. If single is true the first initial state is returned.

source

FiniteStateTransducers.get_initial_weight — Function

get_initial_weight(fst::WFST,i)

Returns the weight of initial i-th state of fst.

source

FiniteStateTransducers.get_final — Function

get_final(fst::WFST, [single=false])

Returns the final state id of the fst. If single is true the first final state is returned.

source

FiniteStateTransducers.get_final_weight — Function

get_final_weight(fst::WFST,i)

Returns the weight of i-th final state of fst.

source

FiniteStateTransducers.get_ialphabet — Function

get_ialphabet(label::fst)

Returns the alphabet of the input labels of fst. The alphabet is defined as the set of symbols composing the input or output labels.

source

FiniteStateTransducers.get_oalphabet — Function

get_oalphabet(label::fst)

Returns the alphabet of the output labels of fst. The alphabet is defined as the set of symbols composing the input or output labels.

source

FiniteStateTransducers.get_arcs — Function

get_arcs(fst::WFST) = ArcIterator(fst)

Returns an iterator that can be used to loop through all of the arcs of fst.

source

Arcs

FiniteStateTransducers.Arc — Type

Arc(ilab::Int,olab::Int,w::W,n::Int)

Constructs an arc with input label ilab, output label olab, weight 'weight' and nextstate n. ilab and olab must be integers consistent with a input/output symbol table of the WFST at use.

Mainly used for internals see add_arc! for a simpler way of adding arcs to a WFST.

source

FiniteStateTransducers.get_ilabel — Function

get_ilabel(A::Arc)

Returns the input label index of the arc A.

source

FiniteStateTransducers.get_olabel — Function

get_olabel(A::Arc)

Returns the input label index of the arc A.

source

FiniteStateTransducers.get_weight — Function

get_weight(A::Arc)

Returns the weight the arc A.

source

get_weight(fst::WFST,i)

Returns the weight of the i-th state of fst. If there is no weight nothing is returned.

source

FiniteStateTransducers.get_nextstate — Function

get_nextstate(A::Arc)

Returns the state that the arc A is pointing to.

source

Paths

FiniteStateTransducers.Path — Type

Path(isym[,osym], iolabel::Pair, w=one(TropicalWeight{Float32}))

Construct a path with input and output symbol table isym and (optional) osym (see WFST).

iolabel is a Pair of vectors.

julia> isym = ["a","b","c","d"];

julia> osym = ["α","β","γ","δ"];

julia> W = ProbabilityWeight{Float32};

julia> p = Path(isym,osym,["a","b","c"] => ["α","β","γ"], one(W))
["a", "b", "c"] → ["α", "β", "γ"], w:1.0f0

The weight of a path of a WFST results from the multiplication ($\otimes$) of the weights of the different arcs that are transversed. See e.g. get_paths to extract paths from a WFST.

julia> p * Path(isym,osym,["d"] => ["δ"], W(0.5))
["a", "b", "c", "d"] → ["α", "β", "γ", "δ"], w:0.5f0

source

FiniteStateTransducers.get_isequence — Function

get_isequence(p::Path)

Returns the input sequence of the path p.

source

FiniteStateTransducers.get_osequence — Function

get_osequence(p::Path)

Returns the output sequence of the path p.

source

Tour of the internals

Let's build a simple WFST and check out its internals:

julia> using FiniteStateTransducers

julia> A = WFST(["hello","world"],[:ciao,:mondo]);

julia> add_arc!(A,1=>2,"hello"=>:ciao);

julia> add_arc!(A,1=>3,"world"=>:mondo);

julia> add_arc!(A,2=>3,"world"=>:mondo);

julia> initial!(A,1);

julia> final!(A,3)
WFST #states: 3, #arcs: 3, #isym: 2, #osym: 2
|1/0.0f0|
hello:ciao/0.0f0 → (2)
world:mondo/0.0f0 → (3)
(2)
world:mondo/0.0f0 → (3)
((3/0.0f0))

For this simple WFST the states consists of an Array of Arrays containing Arc:

julia> get_states(A)
3-element Array{Array{FiniteStateTransducers.Arc{TropicalWeight{Float32},Int64},1},1}:
 [1:1/0.0f0 → (2), 2:2/0.0f0 → (3)]
 [2:2/0.0f0 → (3)]
 []

As it can be seen the first state has two arcs, second state only one and the final state none. A state can also be accessed as follows:

julia> A[2]
1-element Array{FiniteStateTransducers.Arc{TropicalWeight{Float32},Int64},1}:
 2:2/0.0f0 → (3)

Here the arc's input/output labels are displayed as integers. We would expect world:mondo/0.0f0 → (3) instead of 2:2/0.0f0 → (3). This is due to fact that, contrary to the formal definition, labels are not stored directly in the arcs but an index is used instead, which corresponds to the input/output symbol table:

julia> get_isym(A)
Dict{String,Int64} with 2 entries:
  "hello" => 1
  "world" => 2

julia> get_osym(A)
Dict{Symbol,Int64} with 2 entries:
  :mondo => 2
  :ciao  => 1

Another difference is in the definition of $I$, $F$, $\lambda$ and $\rho$. These are also represented by dictionaries that can be accessed using the functions get_initial and get_final.

julia> get_final(A)
Dict{Int64,TropicalWeight{Float32}} with 1 entry:
  3 => 0.0

Epsilon label

FiniteStateTransducers.iseps — Function

iseps(x)

Checks if x is the epsilon label. Defined only for String and Int, where <eps> and 0 are the epsilon symbols. Extend this method if you want to create WFST with a user defined type.

source

FiniteStateTransducers.get_eps — Function

get_eps(x::Type)

Returns the epsilon label for a given Type. Defined only for String and Int, where <eps> and 0 are the epsilon symbols. Extend this method if you want to create WFST with a user defined type.

source

By default 0 indicates the epsilon label which is not present in the symbol table.

Currently the following epsilon symbols are reserved for the following types:

Type	Label
`Char`	`'ϵ'`
`String`	`"<eps>"`
`Int`	`0`

We can define a particular type by extending the functions iseps and get_eps. For example if we want to introduce an epsilon symbol for the type Symbol:

julia> FiniteStateTransducers.iseps(x::Symbol) = x == :eps;

julia> FiniteStateTransducers.get_eps(x::Type{Symbol}) = :eps;

julia> add_arc!(A,3=>3,"world"=>:eps)
WFST #states: 3, #arcs: 5, #isym: 2, #osym: 2
|1/0.0f0|
hello:ciao/0.0f0 → (2)
world:mondo/0.0f0 → (3)
(2)
world:mondo/0.0f0 → (3)
((3/0.0f0))
world:ϵ/0.0f0 → (3)

julia> A[3]
1-element Array{Arc{TropicalWeight{Float32},Int64},1}:
 2:0/0.0f0 → (3)

Properties

Base.length — Function

length(fst::WFST)

Returns the number of states of the fst.

source

Base.size — Function

size(fst::WFST,[i])

Returns number of states and total number of arcs in a tuple. If 'i=1' returns the number of states and if i=2 the number of arcs.

source

FiniteStateTransducers.isinitial — Function

isinitial(fst::WFST, i)

Check if the i-th state is an initial state.

source

FiniteStateTransducers.isfinal — Function

isfinal(fst::WFST, i)

Check if the i-th state is an final state.

source

FiniteStateTransducers.typeofweight — Function

typeofweight(fst_or_path)

Returns the weight type of a WFST or a path.

source

FiniteStateTransducers.has_eps — Function

has_eps([label::Function,] fst::FST)

Check if fst has epsilon transition in either input or output labels. To specify input or output plug in either get_ilabel or get_olabel respectively in label.

source

FiniteStateTransducers.count_eps — Function

count_eps(label::Function, fst::FST)

Returns number of epsilon transition of either input or output labels. To specify input or output plug in ilabel and olabel respectively in label.

source

FiniteStateTransducers.is_acceptor — Function

is_acceptor(fst::WFST)

Returns true if fst is an acceptor, i.e. if all arcs have equal input and output labels.

source

FiniteStateTransducers.is_acyclic — Function

is_acyclic(fst::WFST[,i = get_initial(fst;single=true)]; kwargs...)

Returns the number of states of the fst. For kwargs see DFS.

source

FiniteStateTransducers.is_deterministic — Function

is_deterministic(fst::WFST; get_label=get_ilabel)

Returns true if fst is deterministic in the input labels. A input label deterministic WFST must have a single initial state and arcs leaving any state do not share the same input label.

Change the keyword get_label to get_olabel in order to check determinism in the output labels.

source

Other constructors

FiniteStateTransducers.linearfst — Function

linearfst(ilabels,olabels,w,[isym,[osym=isym]]; W=TropicalWeight{Float32})

Creates a linear WFST. ilabels, olabels and w are the input labels, output labels and weights respectively which must be vectors of the same size. If input and output tables isym and osym are not provided, the symbol tables are derived from ilabels and olabels.

julia> A = linearfst([1,1,3,4],[:a,:b,:c,:d],ones(TropicalWeight{Float64},4))
WFST #states: 5, #arcs: 4, #isym: 3, #osym: 4
|1/0.0|
1:a/0.0 → (2)
(2)
1:b/0.0 → (3)
(3)
3:c/0.0 → (4)
(4)
4:d/0.0 → (5)
((5/0.0))

source

FiniteStateTransducers.matrix2wfst — Function

matrix2wfst(isym,[osym,] X; W = TropicalWeight{Float32})

Creates a WFST with weight type W and input output tables isym and osym using the matrix X. If X is a matrix of dimensions NsxNt, the resulting WFST will have Nt+1 states. Arcs form the t-th state of fst will go only to the t+1-th state and will correspond to the non-zero element of the t-th column of X. State 1 and Nt+1 are labelled as initial and final state, respectively.

julia> matrix2wfst(["a","b","c"],[1 2 3; 1 2 3; 1 2 3])
WFST #states: 4, #arcs: 9, #isym: 3, #osym: 3
|1/0.0f0|
a:a/1.0f0 → (2)
b:b/1.0f0 → (2)
c:c/1.0f0 → (2)
(2)
a:a/2.0f0 → (3)
b:b/2.0f0 → (3)
c:c/2.0f0 → (3)
(3)
a:a/3.0f0 → (4)
b:b/3.0f0 → (4)
c:c/3.0f0 → (4)
((4/0.0f0))

source