One of the things you’ll notice in papers describing generative models of documents using a Dirichlet prior is to simply fix the Dirichlet hyperparameter that controls the distributions of topic mixtures for each document. This isn’t ideal when you wish to then compute the probability of an unseen document
because a fixed
encodes no knowledge of the distribution of topic-mixtures over documents in the training corpus.
In the appendix of the journal paper by Blei et. al [1], we find a procedure to learn the that maximizes the following log-likelihood where
are the document specific topic mixtures and
the dirichlet hyperparameter.
The expression is maximized using the Newton-Raphson method, which requires computing the derivative of and the Hessian (the matrix of second-order derivatives) – with respect to
.
Derivative of 
Making use of the digamma function the partial derivatives with respect of each component of
gives
The Hessian
The Hessian component-wise with respect of and
is
where is the Kronecker delta. Note that this Hessian matrix can be written in the following form
where is the diagonal matrix containing the second-order derivatives with respect to
s across the diagonal and zero elsewhere;
; and
is vector of
s. The inverse of a matrix of his form is given by the Matrix Inversion Lemma which states
In our case, is given by
The upside
We are now ready to compute the new guess for the next iteration in the Newton-Raphson method: . Given the gradients
evaluated at the old
s, the new
are given by
The reason for the special attention given to the form of the Hessian in this problem si that it requires the computation of only the values and
which amount to
values which is only linear in
(the dimension of
) to the otherwise
required for a full-blown matrix inversion of
.
[1] David M. Blei and Andrew Y. Ng and Michael I. Jordan and John Lafferty. 2003. “Latent Dirichlet Allocation.” Journal of Machine Learning Research.