WFSTs internals
Formal definition
Formally a WFSTs over the semiring is the tuple where:
- is the input alphabet (set of input labels)
- is the output alphabet (set of output labels)
- is a set of states (usually integers)
- is the set of initial states
- is the set of final states
- is a set of arcs (transitions) where an element consist of the tuple (starting state,input label, output label, weigth, destination state)
- a function that maps initial states to a weight
- a function that maps final states to a weight
Constructors and modifiers
FiniteStateTransducers.WFST
— TypeWFST(isym, [osym]; W = TropicalWeight{Float32})
Construct an empty Weighted Finite State Transducer (WFST)
isym
input symbol table (can be a dictionary or a vector)osym
output symbol dictionary (optional)W
weight type
If the symbol table are AbstractDict{I,D}
, D
must be an integer and I
the type of the symbol.
FiniteStateTransducers.add_arc!
— Functionadd_arc!(fst, src, dest, ilabel, olabel[, w=one(W)])
add_arc!(fst, srcdest::Pair, ilabelolabel::Pair[, w=one(W)])
Adds an arc from state src
to state dest
with input label ilabel
, output label olabel
and weight w
. Alternative notation utilizes Pair
s.
If w
is not provided this defaults to one(W)
where W
is the weight type of fst
.
FiniteStateTransducers.add_states!
— Functionadd_states!(fst,N)
Add N
empty states to the FST.
FiniteStateTransducers.initial!
— Functioninitial!(fst::WFST{W},i, [w=one(W)])
Set the i
-th state as a initial state. A weight w
can also be provided.
FiniteStateTransducers.final!
— Functionfinal!(fst::WFST{W},i, [w=one(W)])
Set the i
-th state as a final state. A weight w
can also be provided.
FiniteStateTransducers.rm_final!
— Functionrm_final!(fst,i)
Remove final state labeling of the i
-th state.
FiniteStateTransducers.rm_initial!
— Functionrm_initial!(fst,i)
Remove initial state labeling of the state i
.
Internal access
FiniteStateTransducers.get_isym
— Functionget_isym(fst::WFST)
Returns the input symbol table of fst
. The table consists of a dictionary where each element goes from the symbol to the corresponding index.
FiniteStateTransducers.get_iisym
— Functionget_iisym(fst::WFST)
Returns the inverted input symbol table.
FiniteStateTransducers.get_osym
— Functionget_osym(fst::WFST)
Returns the output symbol table of fst
. The table consists of a dictionary where each element goes from the symbol to the corresponding index.
FiniteStateTransducers.get_iosym
— Functionget_iosym(fst::WFST)
Returns the inverted output symbol table.
FiniteStateTransducers.get_states
— Functionget_states(fst::WFST)
Returns the states the fst
.
FiniteStateTransducers.get_initial
— Functionget_initial(fst::WFST, [single=false])
Returns the initial state id of the fst
. If single
is true
the first initial state is returned.
FiniteStateTransducers.get_initial_weight
— Functionget_initial_weight(fst::WFST,i)
Returns the weight of initial i
-th state of fst
.
FiniteStateTransducers.get_final
— Functionget_final(fst::WFST, [single=false])
Returns the final state id of the fst
. If single
is true
the first final state is returned.
FiniteStateTransducers.get_final_weight
— Functionget_final_weight(fst::WFST,i)
Returns the weight of i
-th final state of fst
.
FiniteStateTransducers.get_ialphabet
— Functionget_ialphabet(label::fst)
Returns the alphabet of the input labels of fst
. The alphabet is defined as the set of symbols composing the input or output labels.
FiniteStateTransducers.get_oalphabet
— Functionget_oalphabet(label::fst)
Returns the alphabet of the output labels of fst
. The alphabet is defined as the set of symbols composing the input or output labels.
FiniteStateTransducers.get_arcs
— Functionget_arcs(fst::WFST) = ArcIterator(fst)
Returns an iterator that can be used to loop through all of the arcs of fst
.
Arcs
FiniteStateTransducers.Arc
— TypeArc(ilab::Int,olab::Int,w::W,n::Int)
Constructs an arc with input label ilab
, output label olab
, weight 'weight' and nextstate n
. ilab
and olab
must be integers consistent with a input/output symbol table of the WFST at use.
Mainly used for internals see add_arc!
for a simpler way of adding arcs to a WFST.
FiniteStateTransducers.get_ilabel
— Functionget_ilabel(A::Arc)
Returns the input label index of the arc A
.
FiniteStateTransducers.get_olabel
— Functionget_olabel(A::Arc)
Returns the input label index of the arc A
.
FiniteStateTransducers.get_weight
— Functionget_weight(A::Arc)
Returns the weight the arc A
.
get_weight(fst::WFST,i)
Returns the weight of the i
-th state of fst
. If there is no weight nothing
is returned.
FiniteStateTransducers.get_nextstate
— Functionget_nextstate(A::Arc)
Returns the state that the arc A
is pointing to.
Paths
FiniteStateTransducers.Path
— TypePath(isym[,osym], iolabel::Pair, w=one(TropicalWeight{Float32}))
Construct a path with input and output symbol table isym
and (optional) osym
(see WFST
).
iolabel
is a Pair
of vectors.
julia> isym = ["a","b","c","d"];
julia> osym = ["α","β","γ","δ"];
julia> W = ProbabilityWeight{Float32};
julia> p = Path(isym,osym,["a","b","c"] => ["α","β","γ"], one(W))
["a", "b", "c"] → ["α", "β", "γ"], w:1.0f0
The weight of a path of a WFST results from the multiplication () of the weights of the different arcs that are transversed. See e.g. get_paths
to extract paths from a WFST.
julia> p * Path(isym,osym,["d"] => ["δ"], W(0.5))
["a", "b", "c", "d"] → ["α", "β", "γ", "δ"], w:0.5f0
FiniteStateTransducers.get_isequence
— Functionget_isequence(p::Path)
Returns the input sequence of the path p
.
FiniteStateTransducers.get_osequence
— Functionget_osequence(p::Path)
Returns the output sequence of the path p
.
Tour of the internals
Let's build a simple WFST and check out its internals:
julia> using FiniteStateTransducers
julia> A = WFST(["hello","world"],[:ciao,:mondo]);
julia> add_arc!(A,1=>2,"hello"=>:ciao);
julia> add_arc!(A,1=>3,"world"=>:mondo);
julia> add_arc!(A,2=>3,"world"=>:mondo);
julia> initial!(A,1);
julia> final!(A,3)
WFST #states: 3, #arcs: 3, #isym: 2, #osym: 2
|1/0.0f0|
hello:ciao/0.0f0 → (2)
world:mondo/0.0f0 → (3)
(2)
world:mondo/0.0f0 → (3)
((3/0.0f0))
For this simple WFST the states consists of an Array
of Array
s containing Arc
:
julia> get_states(A)
3-element Array{Array{FiniteStateTransducers.Arc{TropicalWeight{Float32},Int64},1},1}:
[1:1/0.0f0 → (2), 2:2/0.0f0 → (3)]
[2:2/0.0f0 → (3)]
[]
As it can be seen the first state has two arcs, second state only one and the final state none. A state can also be accessed as follows:
julia> A[2]
1-element Array{FiniteStateTransducers.Arc{TropicalWeight{Float32},Int64},1}:
2:2/0.0f0 → (3)
Here the arc's input/output labels are displayed as integers. We would expect world:mondo/0.0f0 → (3)
instead of 2:2/0.0f0 → (3)
. This is due to fact that, contrary to the formal definition, labels are not stored directly in the arcs but an index is used instead, which corresponds to the input/output symbol table:
julia> get_isym(A)
Dict{String,Int64} with 2 entries:
"hello" => 1
"world" => 2
julia> get_osym(A)
Dict{Symbol,Int64} with 2 entries:
:mondo => 2
:ciao => 1
Another difference is in the definition of , , and . These are also represented by dictionaries that can be accessed using the functions get_initial
and get_final
.
julia> get_final(A)
Dict{Int64,TropicalWeight{Float32}} with 1 entry:
3 => 0.0
Epsilon label
FiniteStateTransducers.iseps
— Functioniseps(x)
Checks if x
is the epsilon label. Defined only for String
and Int
, where <eps>
and 0
are the epsilon symbols. Extend this method if you want to create WFST with a user defined type.
FiniteStateTransducers.get_eps
— Functionget_eps(x::Type)
Returns the epsilon label for a given Type
. Defined only for String
and Int
, where <eps>
and 0
are the epsilon symbols. Extend this method if you want to create WFST with a user defined type.
By default 0
indicates the epsilon label which is not present in the symbol table.
Currently the following epsilon symbols are reserved for the following types:
Type | Label |
---|---|
Char | 'ϵ' |
String | "<eps>" |
Int | 0 |
We can define a particular type by extending the functions iseps
and get_eps
. For example if we want to introduce an epsilon symbol for the type Symbol
:
julia> FiniteStateTransducers.iseps(x::Symbol) = x == :eps;
julia> FiniteStateTransducers.get_eps(x::Type{Symbol}) = :eps;
julia> add_arc!(A,3=>3,"world"=>:eps)
WFST #states: 3, #arcs: 5, #isym: 2, #osym: 2
|1/0.0f0|
hello:ciao/0.0f0 → (2)
world:mondo/0.0f0 → (3)
(2)
world:mondo/0.0f0 → (3)
((3/0.0f0))
world:ϵ/0.0f0 → (3)
julia> A[3]
1-element Array{Arc{TropicalWeight{Float32},Int64},1}:
2:0/0.0f0 → (3)
Properties
Base.length
— Functionlength(fst::WFST)
Returns the number of states of the fst.
Base.size
— Functionsize(fst::WFST,[i])
Returns number of states and total number of arcs in a tuple. If 'i=1' returns the number of states and if i=2
the number of arcs.
FiniteStateTransducers.isinitial
— Functionisinitial(fst::WFST, i)
Check if the i
-th state is an initial state.
FiniteStateTransducers.isfinal
— Functionisfinal(fst::WFST, i)
Check if the i
-th state is an final state.
FiniteStateTransducers.typeofweight
— Functiontypeofweight(fst_or_path)
Returns the weight type of a WFST or a path.
FiniteStateTransducers.has_eps
— Functionhas_eps([label::Function,] fst::FST)
Check if fst
has epsilon transition in either input or output labels. To specify input or output plug in either get_ilabel
or get_olabel
respectively in label
.
FiniteStateTransducers.count_eps
— Functioncount_eps(label::Function, fst::FST)
Returns number of epsilon transition of either input or output labels. To specify input or output plug in ilabel
and olabel
respectively in label
.
FiniteStateTransducers.is_acceptor
— Functionis_acceptor(fst::WFST)
Returns true
if fst
is an acceptor, i.e. if all arcs have equal input and output labels.
FiniteStateTransducers.is_acyclic
— Functionis_acyclic(fst::WFST[,i = get_initial(fst;single=true)]; kwargs...)
Returns the number of states of the fst. For kwargs
see DFS
.
FiniteStateTransducers.is_deterministic
— Functionis_deterministic(fst::WFST; get_label=get_ilabel)
Returns true
if fst
is deterministic in the input labels. A input label deterministic WFST must have a single initial state and arcs leaving any state do not share the same input label.
Change the keyword get_label
to get_olabel
in order to check determinism in the output labels.
Other constructors
FiniteStateTransducers.linearfst
— Functionlinearfst(ilabels,olabels,w,isym,[osym])
Creates a linear WFST. ilabels
, olabels
and w
are the input labels, output labels and weights respectively which must be vectors of the same size. Keyword arguments are the same of WFST
constructor.
FiniteStateTransducers.matrix2wfst
— Functionmatrix2wfst(isym,[osym,] X; W = TropicalWeight{Float32})
Creates a WFST with weight type W
and input output tables isym
and osym
using the matrix X
. If X
is a matrix of dimensions Ns
xNt
, the resulting fst will have Nt+1
states. Arcs form the t
-th state of fst
will go only to the t+1
-th state and will correspond to the non-zero element of the t
-th column of X
. State 1
and Nt+1
are initial and final state, respectively.