One of the things you’ll notice in papers describing generative models of documents using a Dirichlet prior is to simply fix the Dirichlet hyperparameter that controls the distributions of topic mixtures for each document. This isn’t ideal when you wish to then compute the probability of an unseen document because a fixed encodes no knowledge of the distribution of topic-mixtures over documents in the training corpus.

In the appendix of the journal paper by Blei et. al [1], we find a procedure to learn the that maximizes the following log-likelihood where are the document specific topic mixtures and the dirichlet hyperparameter.

The expression is maximized using the Newton-Raphson method, which requires computing the derivative of and the Hessian (the matrix of second-order derivatives) – with respect to .

## Derivative of

Making use of the digamma function the partial derivatives with respect of each component of gives

## The Hessian

The Hessian component-wise with respect of and is

where is the Kronecker delta. Note that this Hessian matrix can be written in the following form

where is the diagonal matrix containing the second-order derivatives with respect to s across the diagonal and zero elsewhere; ; and is vector of s. The inverse of a matrix of his form is given by the *Matrix Inversion Lemma* which states

In our case, is given by

## The upside

We are now ready to compute the new guess for the next iteration in the Newton-Raphson method: . Given the gradients evaluated at the old s, the new are given by

The reason for the special attention given to the form of the Hessian in this problem si that it requires the computation of only the values and which amount to values which is only *linear* in (the dimension of ) to the otherwise required for a full-blown matrix inversion of .

[1] David M. Blei and Andrew Y. Ng and Michael I. Jordan and John Lafferty. 2003. “Latent Dirichlet Allocation.” *Journal of Machine Learning Research*.