-
Recent Posts
Recent Comments
Archives
Categories
Meta
Category Archives: statistics
Starting Probabilistic Document Retrieval
I want to work through some papers on probabilistic document retrieval mainly to find out the state of things in this area with regards to the depth of infiltration of generative models in this domain. Note that literature refers to … Continue reading
Posted in modeling, statistics
Tagged documents, modeling, retrieval, statistics, topic
Leave a comment
Reservoir Sampling
If you want to uniformly sample a handful of elements from a very large stream of data you probably don’t want to read it all into memory first. It would be ideal if you could sample while streaming the data. … Continue reading
Regression-guided Generative Models
A generative model is pretty pointless on its own unless the generative structure itself holds intrinsic interest. Hence, papers justify their generative models either by comparing its predictive performance against another model or by extending the model to accommodate for … Continue reading
Topic Coherence
Evaluating unsupervised topic models is tricky business. If the resulting model is not employed in retrieval, classification, or regression there really is no way of convincing someone of the model’s worth. You may, rightly, say that there is no use … Continue reading
Starting Part-of-Speech Tagging
This is by no means the latest on the subject of probabilistic part-of-speech tagging of documents but nevertheless provides a good starting point to look at the basic model along with training and testing data. This paper [1] takes a … Continue reading
Adding (more Relaxed) Constraints during Model Inference
In the previous post on posterior regularization we saw how to specify constraints during the -step of expectation maximization that would otherwise be difficult to incorporate into the model itself. The constraints took the following form where we specified our … Continue reading
Adding Constraints during Model Inference
Coming up with a probabilistic model and its inference procedure is only half the work because it’s well known that just a single run of the inference procedure is hardly likely to give you a satisfactory answer. Out of the … Continue reading
Modeling and Indexing
It has been well-tested in the real-world and is generally accepted that simple models of indexing perform really well. They have no problems scaling or dealing with gigantic vocabularies. The biggest downside to them is that they can only match … Continue reading
Modeling atop a document representation
The paper “DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification” [1] describes a model that not only generates documents but learns them by associating each document with a label. The discrimination of a document is a function of the generative … Continue reading
Posted in optimization, statistics
Tagged discriminative, documents, generative, modeling, statistics
Leave a comment
Entropic Priors
Dirichlet (either by itself, or as a mixture of, or as a hierarchy of) priors are by no means the only option of controlling sparsity of topic mixtures. Entropic priors stand out as an interesting alternative. Given a probability distribution … Continue reading