Category Archives: statistics
Modeling and Indexing
It has been welltested in the realworld and is generally accepted that simple models of indexing perform really well. They have no problems scaling or dealing with gigantic vocabularies. The biggest downside to them is that they can only match … Continue reading
Modeling atop a document representation
The paper “DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification” [1] describes a model that not only generates documents but learns them by associating each document with a label. The discrimination of a document is a function of the generative … Continue reading
Entropic Priors
Dirichlet (either by itself, or as a mixture of, or as a hierarchy of) priors are by no means the only option of controlling sparsity of topic mixtures. Entropic priors stand out as an interesting alternative. Given a probability distribution … Continue reading
Optimizing the Dirichlet hyperparameters
One of the things you’ll notice in papers describing generative models of documents using a Dirichlet prior is to simply fix the Dirichlet hyperparameter that controls the distributions of topic mixtures for each document. This isn’t ideal when you wish … Continue reading
