WFSTs internals

Formal definition

Formally a WFSTs over the semiring $\mathbb{W}$ is the tuple $(\mathcal{A},\mathcal{B},Q,I,F,E,\lambda,\rho)$ where:

  • $\mathcal{A}$ is the input alphabet (set of input labels)
  • $\mathcal{B}$ is the output alphabet (set of output labels)
  • $Q$ is a set of states (usually integers)
  • $I \subseteq Q$ is the set of initial states
  • $F \subseteq Q$ is the set of final states
  • $E \subseteq Q \times \mathcal{A} \cup \{ \epsilon \} \times \mathcal{B} \cup \{ \epsilon \} \times \mathbb{W} \times Q$ is a set of arcs (transitions) where an element consist of the tuple (starting state,input label, output label, weigth, destination state)
  • $\lambda : I \rightarrow \mathbb{W}$ a function that maps initial states to a weight
  • $\rho : F \rightarrow \mathbb{W}$ a function that maps final states to a weight

Constructors and modifiers

FiniteStateTransducers.WFSTType

WFST(isym, [osym]; W = TropicalWeight{Float32})

Construct an empty Weighted Finite State Transducer (WFST)

  • isym input symbol table (can be a dictionary or a vector)
  • osym output symbol dictionary (optional)
  • W weight type

If the symbol table are AbstractDict{I,D}, D must be an integer and I the type of the symbol.

source
FiniteStateTransducers.add_arc!Function

add_arc!(fst, src, dest, ilabel, olabel[, w=one(W)])

add_arc!(fst, srcdest::Pair, ilabelolabel::Pair[, w=one(W)])

Adds an arc from state src to state dest with input label ilabel, output label olabel and weight w. Alternative notation utilizes Pairs.

If w is not provided this defaults to one(W) where W is the weight type of fst.

source

Internal access

FiniteStateTransducers.get_isymFunction

get_isym(fst::WFST)

Returns the input symbol table of fst. The table consists of a dictionary where each element goes from the symbol to the corresponding index.

source
FiniteStateTransducers.get_osymFunction

get_osym(fst::WFST)

Returns the output symbol table of fst. The table consists of a dictionary where each element goes from the symbol to the corresponding index.

source
FiniteStateTransducers.get_ialphabetFunction

get_ialphabet(label::fst)

Returns the alphabet of the input labels of fst. The alphabet is defined as the set of symbols composing the input or output labels.

source
FiniteStateTransducers.get_oalphabetFunction

get_oalphabet(label::fst)

Returns the alphabet of the output labels of fst. The alphabet is defined as the set of symbols composing the input or output labels.

source

Arcs

FiniteStateTransducers.ArcType

Arc(ilab::Int,olab::Int,w::W,n::Int)

Constructs an arc with input label ilab, output label olab, weight 'weight' and nextstate n. ilab and olab must be integers consistent with a input/output symbol table of the WFST at use.

Mainly used for internals see add_arc! for a simpler way of adding arcs to a WFST.

source

Paths

FiniteStateTransducers.PathType

Path(isym[,osym], iolabel::Pair, w=one(TropicalWeight{Float32}))

Construct a path with input and output symbol table isym and (optional) osym (see WFST).

iolabel is a Pair of vectors.

julia> isym = ["a","b","c","d"];

julia> osym = ["α","β","γ","δ"];

julia> W = ProbabilityWeight{Float32};

julia> p = Path(isym,osym,["a","b","c"] => ["α","β","γ"], one(W))
["a", "b", "c"] → ["α", "β", "γ"], w:1.0f0

The weight of a path of a WFST results from the multiplication ($\otimes$) of the weights of the different arcs that are transversed. See e.g. get_paths to extract paths from a WFST.

julia> p * Path(isym,osym,["d"] => ["δ"], W(0.5))
["a", "b", "c", "d"] → ["α", "β", "γ", "δ"], w:0.5f0
source

Tour of the internals

Let's build a simple WFST and check out its internals:

julia> using FiniteStateTransducers

julia> A = WFST(["hello","world"],[:ciao,:mondo]);

julia> add_arc!(A,1=>2,"hello"=>:ciao);

julia> add_arc!(A,1=>3,"world"=>:mondo);

julia> add_arc!(A,2=>3,"world"=>:mondo);

julia> initial!(A,1);

julia> final!(A,3)
WFST #states: 3, #arcs: 3, #isym: 2, #osym: 2
|1/0.0f0|
hello:ciao/0.0f0 → (2)
world:mondo/0.0f0 → (3)
(2)
world:mondo/0.0f0 → (3)
((3/0.0f0))

For this simple WFST the states consists of an Array of Arrays containing Arc:

julia> get_states(A)
3-element Array{Array{FiniteStateTransducers.Arc{TropicalWeight{Float32},Int64},1},1}:
 [1:1/0.0f0 → (2), 2:2/0.0f0 → (3)]
 [2:2/0.0f0 → (3)]
 []

As it can be seen the first state has two arcs, second state only one and the final state none. A state can also be accessed as follows:

julia> A[2]
1-element Array{FiniteStateTransducers.Arc{TropicalWeight{Float32},Int64},1}:
 2:2/0.0f0 → (3)

Here the arc's input/output labels are displayed as integers. We would expect world:mondo/0.0f0 → (3) instead of 2:2/0.0f0 → (3). This is due to fact that, contrary to the formal definition, labels are not stored directly in the arcs but an index is used instead, which corresponds to the input/output symbol table:

julia> get_isym(A)
Dict{String,Int64} with 2 entries:
  "hello" => 1
  "world" => 2

julia> get_osym(A)
Dict{Symbol,Int64} with 2 entries:
  :mondo => 2
  :ciao  => 1

Another difference is in the definition of $I$, $F$, $\lambda$ and $\rho$. These are also represented by dictionaries that can be accessed using the functions get_initial and get_final.

julia> get_final(A)
Dict{Int64,TropicalWeight{Float32}} with 1 entry:
  3 => 0.0

Epsilon label

FiniteStateTransducers.isepsFunction

iseps(x)

Checks if x is the epsilon label. Defined only for String and Int, where <eps> and 0 are the epsilon symbols. Extend this method if you want to create WFST with a user defined type.

source
FiniteStateTransducers.get_epsFunction

get_eps(x::Type)

Returns the epsilon label for a given Type. Defined only for String and Int, where <eps> and 0 are the epsilon symbols. Extend this method if you want to create WFST with a user defined type.

source

By default 0 indicates the epsilon label which is not present in the symbol table.

Currently the following epsilon symbols are reserved for the following types:

TypeLabel
Char'ϵ'
String"<eps>"
Int0

We can define a particular type by extending the functions iseps and get_eps. For example if we want to introduce an epsilon symbol for the type Symbol:

julia> FiniteStateTransducers.iseps(x::Symbol) = x == :eps;

julia> FiniteStateTransducers.get_eps(x::Type{Symbol}) = :eps;

julia> add_arc!(A,3=>3,"world"=>:eps)
WFST #states: 3, #arcs: 5, #isym: 2, #osym: 2
|1/0.0f0|
hello:ciao/0.0f0 → (2)
world:mondo/0.0f0 → (3)
(2)
world:mondo/0.0f0 → (3)
((3/0.0f0))
world:ϵ/0.0f0 → (3)

julia> A[3]
1-element Array{Arc{TropicalWeight{Float32},Int64},1}:
 2:0/0.0f0 → (3)

Properties

Base.lengthFunction

length(fst::WFST)

Returns the number of states of the fst.

source
Base.sizeFunction

size(fst::WFST,[i])

Returns number of states and total number of arcs in a tuple. If 'i=1' returns the number of states and if i=2 the number of arcs.

source
FiniteStateTransducers.has_epsFunction

has_eps([label::Function,] fst::FST)

Check if fst has epsilon transition in either input or output labels. To specify input or output plug in either get_ilabel or get_olabel respectively in label.

source
FiniteStateTransducers.count_epsFunction

count_eps(label::Function, fst::FST)

Returns number of epsilon transition of either input or output labels. To specify input or output plug in ilabel and olabel respectively in label.

source
FiniteStateTransducers.is_deterministicFunction

is_deterministic(fst::WFST; get_label=get_ilabel)

Returns true if fst is deterministic in the input labels. A input label deterministic WFST must have a single initial state and arcs leaving any state do not share the same input label.

Change the keyword get_label to get_olabel in order to check determinism in the output labels.

source

Other constructors

FiniteStateTransducers.linearfstFunction

linearfst(ilabels,olabels,w,isym,[osym])

Creates a linear WFST. ilabels, olabels and w are the input labels, output labels and weights respectively which must be vectors of the same size. Keyword arguments are the same of WFST constructor.

source
FiniteStateTransducers.matrix2wfstFunction

matrix2wfst(isym,[osym,] X; W = TropicalWeight{Float32})

Creates a WFST with weight type W and input output tables isym and osym using the matrix X. If X is a matrix of dimensions NsxNt, the resulting fst will have Nt+1 states. Arcs form the t-th state of fst will go only to the t+1-th state and will correspond to the non-zero element of the t-th column of X. State 1 and Nt+1 are initial and final state, respectively.

source