Computing gradients
This package uses ChainRulesCore.jl API extending its rrule
function. Specifically the following cost functions can be differentiated.
HMMGradients.nlogML
— FunctionnlogML(Nt,a,A,y)
Computes the negative log Maximum Likelihood (ML) normalized by the sequence length ($\log(P(\mathbf{X}))/N_t$) of a Hidden Markov Model (HMM) with Ns
states, transition matrix A
(a Ns
× Ns
matrix), initial probabilities a
(Ns
-vector) and observation probabilities y
(a Ns
×Nt
matrix) for Nt
observations. All Array
s must have the same element type.
nlogML(Nt,t2tr,A,y)
Returns the negative log Maximum Likelihood (ML) normalized by the sequence length ($\log(P(\mathbf{X}))/N_t$) with a constrained path (see forward
).
HMMGradients.nlogMLlog
— FunctionnlogMLlog(Nt,A,a,y)
Same as nlogML
but expects A
, a
and y
to be in the log-domain.
For example if we want to get derivatives for the following model:
julia> using HMMGradients, ChainRulesCore
julia> a = [1.0,0.0];
julia> A = [0.5 0.5; 0.5 0.5];
julia> y = [0.1 0.7; 0.5 0.8; 0.9 0.3];
julia> Ns,Nt = size(A,1), size(y,1);
julia> cost = nlogML(Nt,a,A,y)
1.0813978776174968
all we need to do is to use rrule
, which will return cost and the pullback function:
julia> cost, pullback_nlogML = ChainRulesCore.rrule(nlogML, Nt, a, A, y);
julia> _, _, grada, gradA, grady = pullback_nlogML(1.0);
julia> [println(grad) for grad in [grada,gradA,grady]];
[-0.3333333333333333, -2.3333333333333335]
[-0.4487179487179487 -0.4743589743589744; -0.30769230769230776 -0.10256410256410257]
[-3.3333333333333335 -0.0; -0.2564102564102564 -0.2564102564102564; -0.2777777777777778 -0.2777777777777778]
When we need to impose only specific paths to be allowed through the HMM only the gradient with respect to $y$ is computed.
julia> t2tr = [[1,1]=>[1,2],[1,2]=>[2,2]];
julia> cost, pullback_nlogML = ChainRulesCore.rrule(nlogML, Nt, t2tr, A, y);
julia> _, _, grada, gradA, grady = pullback_nlogML(1.0);
julia> @assert grada == gradA == ChainRulesCore.NoTangent();
julia> grady
3×2 Matrix{Float64}:
-3.33333 -0.0
-0.25641 -0.25641
-0.0 -1.11111
It is also possible to perform these operations in batches. Say we have Nb
sequences of different length then size(yb) == (Nt_max,Ns,Nb)
where Nt_max
is the maximum of length sequence. For example:
julia> Nb = 2; # number of sequences
julia> Nts = [3,4]; # sequences length
julia> ys = [[0.1 0.7; 0.5 0.8; 0.9 0.3], [0.5 0.3; 0.7 0.7; 0.1 0.1; 0.5 0.0]];
julia> Nt_max = maximum(size.(ys,1)); # calculate maximum seq length
julia> yb = zeros(Nt_max,Ns,Nb);
julia> for b in 1:Nb yb[1:Nts[b],:,b] .= ys[b] end; # yb has zeropadded inputs
julia> t2trs = [ [[1,1]=>[1,2],[1,2]=>[2,2]], [[2]=>[2],[2]=>[2],[2]=>[1]] ];
julia> @assert all(length.(t2trs) .== Nts .-1);
julia> cost, pullback_nlogML = ChainRulesCore.rrule(nlogML, Nts, t2trs, A, yb);
julia> _, _, _, _, grady = pullback_nlogML(1.0);
julia> grady
4×2×2 Array{Float64, 3}:
[:, :, 1] =
-3.33333 -0.0
-0.25641 -0.25641
-0.0 -1.11111
-0.0 -0.0
[:, :, 2] =
-0.0 -0.833333
-0.0 -0.357143
-0.0 -2.5
-0.5 -0.0
Batch computation is also available for the unconstrained case. In this case a
and A
must be of the of size (Ns,Nb)
and (Ns,Ns,Nb)
respectively. The same computations can be performed in the log domain using nlogMLlog
after applying log
to a
, A
and y
.