WFSTs internals
Formal definition
Formally a WFSTs over the semiring $\mathbb{W}$ is the tuple $(\mathcal{A},\mathcal{B},Q,I,F,E,\lambda,\rho)$ where:
- $\mathcal{A}$ is the input alphabet (set of input labels)
- $\mathcal{B}$ is the output alphabet (set of output labels)
- $Q$ is a set of states (usually integers)
- $I \subseteq Q$ is the set of initial states
- $F \subseteq Q$ is the set of final states
- $E \subseteq Q \times \mathcal{A} \cup \{ \epsilon \} \times \mathcal{B} \cup \{ \epsilon \} \times \mathbb{W} \times Q$ is a set of arcs (transitions) where an element consist of the tuple (starting state,input label, output label, weigth, destination state)
- $\lambda : I \rightarrow \mathbb{W}$ a function that maps initial states to a weight
- $\rho : F \rightarrow \mathbb{W}$ a function that maps final states to a weight
Constructors and modifiers
FiniteStateTransducers.WFST
— TypeWFST(isym, [osym]; W = TropicalWeight{Float32})
Construct an empty Weighted Finite State Transducer (WFST)
isym
input symbol table (can be a dictionary or a vector)osym
output symbol dictionary (optional)W
weight type
If the symbol table are AbstractDict{I,D}
, D
must be an integer and I
the type of the symbol.
FiniteStateTransducers.add_arc!
— Functionadd_arc!(fst, src, dest, ilabel, olabel[, w=one(W)])
add_arc!(fst, srcdest::Pair, ilabelolabel::Pair[, w=one(W)])
Adds an arc from state src
to state dest
with input label ilabel
, output label olabel
and weight w
. Alternative notation utilizes Pair
s.
If w
is not provided this defaults to one(W)
where W
is the weight type of fst
.
FiniteStateTransducers.add_states!
— Functionadd_states!(fst,N)
Add N
empty states to the FST.
FiniteStateTransducers.initial!
— Functioninitial!(fst::WFST{W},i, [w=one(W)])
Set the i
-th state as a initial state. A weight w
can also be provided.
FiniteStateTransducers.final!
— Functionfinal!(fst::WFST{W},i, [w=one(W)])
Set the i
-th state as a final state. A weight w
can also be provided.
FiniteStateTransducers.rm_final!
— Functionrm_final!(fst,i)
Remove final state labeling of the i
-th state.
FiniteStateTransducers.rm_initial!
— Functionrm_initial!(fst,i)
Remove initial state labeling of the state i
.
Internal access
FiniteStateTransducers.get_isym
— Functionget_isym(fst::WFST)
Returns the input symbol table of fst
. The table consists of a dictionary where each element goes from the symbol to the corresponding index.
FiniteStateTransducers.get_iisym
— Functionget_iisym(fst::WFST)
Returns the inverted input symbol table.
FiniteStateTransducers.get_osym
— Functionget_osym(fst::WFST)
Returns the output symbol table of fst
. The table consists of a dictionary where each element goes from the symbol to the corresponding index.
FiniteStateTransducers.get_iosym
— Functionget_iosym(fst::WFST)
Returns the inverted output symbol table.
FiniteStateTransducers.get_states
— Functionget_states(fst::WFST)
Returns the states the fst
.
FiniteStateTransducers.get_initial
— Functionget_initial(fst::WFST, [single=false])
Returns the initial state id of the fst
. If single
is true
the first initial state is returned.
FiniteStateTransducers.get_initial_weight
— Functionget_initial_weight(fst::WFST,i)
Returns the weight of initial i
-th state of fst
.
FiniteStateTransducers.get_final
— Functionget_final(fst::WFST, [single=false])
Returns the final state id of the fst
. If single
is true
the first final state is returned.
FiniteStateTransducers.get_final_weight
— Functionget_final_weight(fst::WFST,i)
Returns the weight of i
-th final state of fst
.
FiniteStateTransducers.get_ialphabet
— Functionget_ialphabet(label::fst)
Returns the alphabet of the input labels of fst
. The alphabet is defined as the set of symbols composing the input or output labels.
FiniteStateTransducers.get_oalphabet
— Functionget_oalphabet(label::fst)
Returns the alphabet of the output labels of fst
. The alphabet is defined as the set of symbols composing the input or output labels.
FiniteStateTransducers.get_arcs
— Functionget_arcs(fst::WFST) = ArcIterator(fst)
Returns an iterator that can be used to loop through all of the arcs of fst
.
Arcs
FiniteStateTransducers.Arc
— TypeArc(ilab::Int,olab::Int,w::W,n::Int)
Constructs an arc with input label ilab
, output label olab
, weight 'weight' and nextstate n
. ilab
and olab
must be integers consistent with a input/output symbol table of the WFST at use.
Mainly used for internals see add_arc!
for a simpler way of adding arcs to a WFST.
FiniteStateTransducers.get_ilabel
— Functionget_ilabel(A::Arc)
Returns the input label index of the arc A
.
FiniteStateTransducers.get_olabel
— Functionget_olabel(A::Arc)
Returns the input label index of the arc A
.
FiniteStateTransducers.get_weight
— Functionget_weight(A::Arc)
Returns the weight the arc A
.
get_weight(fst::WFST,i)
Returns the weight of the i
-th state of fst
. If there is no weight nothing
is returned.
FiniteStateTransducers.get_nextstate
— Functionget_nextstate(A::Arc)
Returns the state that the arc A
is pointing to.
Paths
FiniteStateTransducers.Path
— TypePath(isym[,osym], iolabel::Pair, w=one(TropicalWeight{Float32}))
Construct a path with input and output symbol table isym
and (optional) osym
(see WFST
).
iolabel
is a Pair
of vectors.
julia> isym = ["a","b","c","d"];
julia> osym = ["α","β","γ","δ"];
julia> W = ProbabilityWeight{Float32};
julia> p = Path(isym,osym,["a","b","c"] => ["α","β","γ"], one(W))
["a", "b", "c"] → ["α", "β", "γ"], w:1.0f0
The weight of a path of a WFST results from the multiplication ($\otimes$) of the weights of the different arcs that are transversed. See e.g. get_paths
to extract paths from a WFST.
julia> p * Path(isym,osym,["d"] => ["δ"], W(0.5))
["a", "b", "c", "d"] → ["α", "β", "γ", "δ"], w:0.5f0
FiniteStateTransducers.get_isequence
— Functionget_isequence(p::Path)
Returns the input sequence of the path p
.
FiniteStateTransducers.get_osequence
— Functionget_osequence(p::Path)
Returns the output sequence of the path p
.
Tour of the internals
Let's build a simple WFST and check out its internals:
julia> using FiniteStateTransducers
julia> A = WFST(["hello","world"],[:ciao,:mondo]);
julia> add_arc!(A,1=>2,"hello"=>:ciao);
julia> add_arc!(A,1=>3,"world"=>:mondo);
julia> add_arc!(A,2=>3,"world"=>:mondo);
julia> initial!(A,1);
julia> final!(A,3)
WFST #states: 3, #arcs: 3, #isym: 2, #osym: 2
|1/0.0f0|
hello:ciao/0.0f0 → (2)
world:mondo/0.0f0 → (3)
(2)
world:mondo/0.0f0 → (3)
((3/0.0f0))
For this simple WFST the states consists of an Array
of Array
s containing Arc
:
julia> get_states(A)
3-element Array{Array{FiniteStateTransducers.Arc{TropicalWeight{Float32},Int64},1},1}:
[1:1/0.0f0 → (2), 2:2/0.0f0 → (3)]
[2:2/0.0f0 → (3)]
[]
As it can be seen the first state has two arcs, second state only one and the final state none. A state can also be accessed as follows:
julia> A[2]
1-element Array{FiniteStateTransducers.Arc{TropicalWeight{Float32},Int64},1}:
2:2/0.0f0 → (3)
Here the arc's input/output labels are displayed as integers. We would expect world:mondo/0.0f0 → (3)
instead of 2:2/0.0f0 → (3)
. This is due to fact that, contrary to the formal definition, labels are not stored directly in the arcs but an index is used instead, which corresponds to the input/output symbol table:
julia> get_isym(A)
Dict{String,Int64} with 2 entries:
"hello" => 1
"world" => 2
julia> get_osym(A)
Dict{Symbol,Int64} with 2 entries:
:mondo => 2
:ciao => 1
Another difference is in the definition of $I$, $F$, $\lambda$ and $\rho$. These are also represented by dictionaries that can be accessed using the functions get_initial
and get_final
.
julia> get_final(A)
Dict{Int64,TropicalWeight{Float32}} with 1 entry:
3 => 0.0
Epsilon label
FiniteStateTransducers.iseps
— Functioniseps(x)
Checks if x
is the epsilon label. Defined only for String
and Int
, where <eps>
and 0
are the epsilon symbols. Extend this method if you want to create WFST with a user defined type.
FiniteStateTransducers.get_eps
— Functionget_eps(x::Type)
Returns the epsilon label for a given Type
. Defined only for String
and Int
, where <eps>
and 0
are the epsilon symbols. Extend this method if you want to create WFST with a user defined type.
By default 0
indicates the epsilon label which is not present in the symbol table.
Currently the following epsilon symbols are reserved for the following types:
Type | Label |
---|---|
Char | 'ϵ' |
String | "<eps>" |
Int | 0 |
We can define a particular type by extending the functions iseps
and get_eps
. For example if we want to introduce an epsilon symbol for the type Symbol
:
julia> FiniteStateTransducers.iseps(x::Symbol) = x == :eps;
julia> FiniteStateTransducers.get_eps(x::Type{Symbol}) = :eps;
julia> add_arc!(A,3=>3,"world"=>:eps)
WFST #states: 3, #arcs: 5, #isym: 2, #osym: 2
|1/0.0f0|
hello:ciao/0.0f0 → (2)
world:mondo/0.0f0 → (3)
(2)
world:mondo/0.0f0 → (3)
((3/0.0f0))
world:ϵ/0.0f0 → (3)
julia> A[3]
1-element Array{Arc{TropicalWeight{Float32},Int64},1}:
2:0/0.0f0 → (3)
Properties
Base.length
— Functionlength(fst::WFST)
Returns the number of states of the fst.
Base.size
— Functionsize(fst::WFST,[i])
Returns number of states and total number of arcs in a tuple. If 'i=1' returns the number of states and if i=2
the number of arcs.
FiniteStateTransducers.isinitial
— Functionisinitial(fst::WFST, i)
Check if the i
-th state is an initial state.
FiniteStateTransducers.isfinal
— Functionisfinal(fst::WFST, i)
Check if the i
-th state is an final state.
FiniteStateTransducers.typeofweight
— Functiontypeofweight(fst_or_path)
Returns the weight type of a WFST or a path.
FiniteStateTransducers.has_eps
— Functionhas_eps([label::Function,] fst::FST)
Check if fst
has epsilon transition in either input or output labels. To specify input or output plug in either get_ilabel
or get_olabel
respectively in label
.
FiniteStateTransducers.count_eps
— Functioncount_eps(label::Function, fst::FST)
Returns number of epsilon transition of either input or output labels. To specify input or output plug in ilabel
and olabel
respectively in label
.
FiniteStateTransducers.is_acceptor
— Functionis_acceptor(fst::WFST)
Returns true
if fst
is an acceptor, i.e. if all arcs have equal input and output labels.
FiniteStateTransducers.is_acyclic
— Functionis_acyclic(fst::WFST[,i = get_initial(fst;single=true)]; kwargs...)
Returns the number of states of the fst. For kwargs
see DFS
.
FiniteStateTransducers.is_deterministic
— Functionis_deterministic(fst::WFST; get_label=get_ilabel)
Returns true
if fst
is deterministic in the input labels. A input label deterministic WFST must have a single initial state and arcs leaving any state do not share the same input label.
Change the keyword get_label
to get_olabel
in order to check determinism in the output labels.
Other constructors
FiniteStateTransducers.linearfst
— Functionlinearfst(ilabels,olabels,w,[isym,[osym=isym]]; W=TropicalWeight{Float32})
Creates a linear WFST. ilabels
, olabels
and w
are the input labels, output labels and weights respectively which must be vectors of the same size. If input and output tables isym
and osym
are not provided, the symbol tables are derived from ilabels
and olabels
.
julia> A = linearfst([1,1,3,4],[:a,:b,:c,:d],ones(TropicalWeight{Float64},4))
WFST #states: 5, #arcs: 4, #isym: 3, #osym: 4
|1/0.0|
1:a/0.0 → (2)
(2)
1:b/0.0 → (3)
(3)
3:c/0.0 → (4)
(4)
4:d/0.0 → (5)
((5/0.0))
FiniteStateTransducers.matrix2wfst
— Functionmatrix2wfst(isym,[osym,] X; W = TropicalWeight{Float32})
Creates a WFST with weight type W
and input output tables isym
and osym
using the matrix X
. If X
is a matrix of dimensions Ns
xNt
, the resulting WFST will have Nt+1
states. Arcs form the t
-th state of fst
will go only to the t+1
-th state and will correspond to the non-zero element of the t
-th column of X
. State 1
and Nt+1
are labelled as initial and final state, respectively.
julia> matrix2wfst(["a","b","c"],[1 2 3; 1 2 3; 1 2 3])
WFST #states: 4, #arcs: 9, #isym: 3, #osym: 3
|1/0.0f0|
a:a/1.0f0 → (2)
b:b/1.0f0 → (2)
c:c/1.0f0 → (2)
(2)
a:a/2.0f0 → (3)
b:b/2.0f0 → (3)
c:c/2.0f0 → (3)
(3)
a:a/3.0f0 → (4)
b:b/3.0f0 → (4)
c:c/3.0f0 → (4)
((4/0.0f0))