WFSTs internals

Formal definition

Formally a WFSTs over the semiring $\mathbb{W}$ is the tuple $(\mathcal{A},\mathcal{B},Q,I,F,E,\lambda,\rho)$ where:

  • $\mathcal{A}$ is the input alphabet (set of input labels)
  • $\mathcal{B}$ is the output alphabet (set of output labels)
  • $Q$ is a set of states (usually integers)
  • $I \subseteq Q$ is the set of initial states
  • $F \subseteq Q$ is the set of final states
  • $E \subseteq Q \times \mathcal{A} \cup \{ \epsilon \} \times \mathcal{B} \cup \{ \epsilon \} \times \mathbb{W} \times Q$ is a set of arcs (transitions) where an element consist of the tuple (starting state,input label, output label, weigth, destination state)
  • $\lambda : I \rightarrow \mathbb{W}$ a function that maps initial states to a weight
  • $\rho : F \rightarrow \mathbb{W}$ a function that maps final states to a weight

Constructors and modifiers

FiniteStateTransducers.WFSTType

WFST(isym, [osym]; W = TropicalWeight{Float32})

Construct an empty Weighted Finite State Transducer (WFST)

  • isym input symbol table (can be a dictionary or a vector)
  • osym output symbol dictionary (optional)
  • W weight type

If the symbol table are AbstractDict{I,D}, D must be an integer and I the type of the symbol.

source
FiniteStateTransducers.add_arc!Function

add_arc!(fst, src, dest, ilabel, olabel[, w=one(W)])

add_arc!(fst, srcdest::Pair, ilabelolabel::Pair[, w=one(W)])

Adds an arc from state src to state dest with input label ilabel, output label olabel and weight w. Alternative notation utilizes Pairs.

If w is not provided this defaults to one(W) where W is the weight type of fst.

source

Internal access

FiniteStateTransducers.get_isymFunction

get_isym(fst::WFST)

Returns the input symbol table of fst. The table consists of a dictionary where each element goes from the symbol to the corresponding index.

source
FiniteStateTransducers.get_osymFunction

get_osym(fst::WFST)

Returns the output symbol table of fst. The table consists of a dictionary where each element goes from the symbol to the corresponding index.

source
FiniteStateTransducers.get_ialphabetFunction

get_ialphabet(label::fst)

Returns the alphabet of the input labels of fst. The alphabet is defined as the set of symbols composing the input or output labels.

source
FiniteStateTransducers.get_oalphabetFunction

get_oalphabet(label::fst)

Returns the alphabet of the output labels of fst. The alphabet is defined as the set of symbols composing the input or output labels.

source

Arcs

FiniteStateTransducers.ArcType

Arc(ilab::Int,olab::Int,w::W,n::Int)

Constructs an arc with input label ilab, output label olab, weight 'weight' and nextstate n. ilab and olab must be integers consistent with a input/output symbol table of the WFST at use.

Mainly used for internals see add_arc! for a simpler way of adding arcs to a WFST.

source

Paths

FiniteStateTransducers.PathType

Path(isym[,osym], iolabel::Pair, w=one(TropicalWeight{Float32}))

Construct a path with input and output symbol table isym and (optional) osym (see WFST).

iolabel is a Pair of vectors.

julia> isym = ["a","b","c","d"];

julia> osym = ["α","β","γ","δ"];

julia> W = ProbabilityWeight{Float32};

julia> p = Path(isym,osym,["a","b","c"] => ["α","β","γ"], one(W))
["a", "b", "c"] → ["α", "β", "γ"], w:1.0f0

The weight of a path of a WFST results from the multiplication ($\otimes$) of the weights of the different arcs that are transversed. See e.g. get_paths to extract paths from a WFST.

julia> p * Path(isym,osym,["d"] => ["δ"], W(0.5))
["a", "b", "c", "d"] → ["α", "β", "γ", "δ"], w:0.5f0
source

Tour of the internals

Let's build a simple WFST and check out its internals:

julia> using FiniteStateTransducers

julia> A = WFST(["hello","world"],[:ciao,:mondo]);

julia> add_arc!(A,1=>2,"hello"=>:ciao);

julia> add_arc!(A,1=>3,"world"=>:mondo);

julia> add_arc!(A,2=>3,"world"=>:mondo);

julia> initial!(A,1);

julia> final!(A,3)
WFST #states: 3, #arcs: 3, #isym: 2, #osym: 2
|1/0.0f0|
hello:ciao/0.0f0 → (2)
world:mondo/0.0f0 → (3)
(2)
world:mondo/0.0f0 → (3)
((3/0.0f0))

For this simple WFST the states consists of an Array of Arrays containing Arc:

julia> get_states(A)
3-element Array{Array{FiniteStateTransducers.Arc{TropicalWeight{Float32},Int64},1},1}:
 [1:1/0.0f0 → (2), 2:2/0.0f0 → (3)]
 [2:2/0.0f0 → (3)]
 []

As it can be seen the first state has two arcs, second state only one and the final state none. A state can also be accessed as follows:

julia> A[2]
1-element Array{FiniteStateTransducers.Arc{TropicalWeight{Float32},Int64},1}:
 2:2/0.0f0 → (3)

Here the arc's input/output labels are displayed as integers. We would expect world:mondo/0.0f0 → (3) instead of 2:2/0.0f0 → (3). This is due to fact that, contrary to the formal definition, labels are not stored directly in the arcs but an index is used instead, which corresponds to the input/output symbol table:

julia> get_isym(A)
Dict{String,Int64} with 2 entries:
  "hello" => 1
  "world" => 2

julia> get_osym(A)
Dict{Symbol,Int64} with 2 entries:
  :mondo => 2
  :ciao  => 1

Another difference is in the definition of $I$, $F$, $\lambda$ and $\rho$. These are also represented by dictionaries that can be accessed using the functions get_initial and get_final.

julia> get_final(A)
Dict{Int64,TropicalWeight{Float32}} with 1 entry:
  3 => 0.0

Epsilon label

FiniteStateTransducers.isepsFunction

iseps(x)

Checks if x is the epsilon label. Defined only for String and Int, where <eps> and 0 are the epsilon symbols. Extend this method if you want to create WFST with a user defined type.

source
FiniteStateTransducers.get_epsFunction

get_eps(x::Type)

Returns the epsilon label for a given Type. Defined only for String and Int, where <eps> and 0 are the epsilon symbols. Extend this method if you want to create WFST with a user defined type.

source

By default 0 indicates the epsilon label which is not present in the symbol table.

Currently the following epsilon symbols are reserved for the following types:

TypeLabel
Char'ϵ'
String"<eps>"
Int0

We can define a particular type by extending the functions iseps and get_eps. For example if we want to introduce an epsilon symbol for the type Symbol:

julia> FiniteStateTransducers.iseps(x::Symbol) = x == :eps;

julia> FiniteStateTransducers.get_eps(x::Type{Symbol}) = :eps;

julia> add_arc!(A,3=>3,"world"=>:eps)
WFST #states: 3, #arcs: 5, #isym: 2, #osym: 2
|1/0.0f0|
hello:ciao/0.0f0 → (2)
world:mondo/0.0f0 → (3)
(2)
world:mondo/0.0f0 → (3)
((3/0.0f0))
world:ϵ/0.0f0 → (3)

julia> A[3]
1-element Array{Arc{TropicalWeight{Float32},Int64},1}:
 2:0/0.0f0 → (3)

Properties

Base.lengthFunction

length(fst::WFST)

Returns the number of states of the fst.

source
Base.sizeFunction

size(fst::WFST,[i])

Returns number of states and total number of arcs in a tuple. If 'i=1' returns the number of states and if i=2 the number of arcs.

source
FiniteStateTransducers.has_epsFunction

has_eps([label::Function,] fst::FST)

Check if fst has epsilon transition in either input or output labels. To specify input or output plug in either get_ilabel or get_olabel respectively in label.

source
FiniteStateTransducers.count_epsFunction

count_eps(label::Function, fst::FST)

Returns number of epsilon transition of either input or output labels. To specify input or output plug in ilabel and olabel respectively in label.

source
FiniteStateTransducers.is_deterministicFunction

is_deterministic(fst::WFST; get_label=get_ilabel)

Returns true if fst is deterministic in the input labels. A input label deterministic WFST must have a single initial state and arcs leaving any state do not share the same input label.

Change the keyword get_label to get_olabel in order to check determinism in the output labels.

source

Other constructors

FiniteStateTransducers.linearfstFunction

linearfst(ilabels,olabels,w,[isym,[osym=isym]]; W=TropicalWeight{Float32})

Creates a linear WFST. ilabels, olabels and w are the input labels, output labels and weights respectively which must be vectors of the same size. If input and output tables isym and osym are not provided, the symbol tables are derived from ilabels and olabels.

julia> A = linearfst([1,1,3,4],[:a,:b,:c,:d],ones(TropicalWeight{Float64},4))
WFST #states: 5, #arcs: 4, #isym: 3, #osym: 4
|1/0.0|
1:a/0.0 → (2)
(2)
1:b/0.0 → (3)
(3)
3:c/0.0 → (4)
(4)
4:d/0.0 → (5)
((5/0.0))
source
FiniteStateTransducers.matrix2wfstFunction

matrix2wfst(isym,[osym,] X; W = TropicalWeight{Float32})

Creates a WFST with weight type W and input output tables isym and osym using the matrix X. If X is a matrix of dimensions NsxNt, the resulting WFST will have Nt+1 states. Arcs form the t-th state of fst will go only to the t+1-th state and will correspond to the non-zero element of the t-th column of X. State 1 and Nt+1 are labelled as initial and final state, respectively.

julia> matrix2wfst(["a","b","c"],[1 2 3; 1 2 3; 1 2 3])
WFST #states: 4, #arcs: 9, #isym: 3, #osym: 3
|1/0.0f0|
a:a/1.0f0 → (2)
b:b/1.0f0 → (2)
c:c/1.0f0 → (2)
(2)
a:a/2.0f0 → (3)
b:b/2.0f0 → (3)
c:c/2.0f0 → (3)
(3)
a:a/3.0f0 → (4)
b:b/3.0f0 → (4)
c:c/3.0f0 → (4)
((4/0.0f0))
source