I’m going to sketch out the next few things to plan and code for the DSL library. I have so far provided the ability to describe the network but I haven’t yet provided a way to describe the distributions. For instance, in the HMM Model, we ought to be able to say what the distribution of Symbols
is and so on. I’ll start with the ability to pick two distributions: Dirichlet and Multinomial. We can cover many models with just these two. When I provide a way to specify what type of distribution each node is, I should be able to change the distributions at will without affecting the network; for instance, using a continuous response as opposed to a discrete response in a HMM.
After this, I will want to create a function in the Gibbs
module that can take in a Reader
and sample the distributions. In the case of HMM, this would mean sampling the Transition
distributions and the Symbols
distributions by reading the network to figure out their priors and support.
Finally, with the sampled distributions and a Reader
I will write a sampler that produces a new Reader
. In the case of HMM, this means sampling the new Topic
variables. These steps cover the (uncollapsed) Gibbs sampling technique.
Looking ahead even further, I intend to write a method to compute the density of the observed variables (having marginalized out the latent variables). I will do this using the annealed importance sampling method as described in this paper “Evaluation Methods for Topic Models” by Hanna M. Wallah et. al. In the case of the HMM, this amounts to computing the probability of Symbol
given Symbols
and Transition
while marginalizing out the Topic
variables.