Dan Piponi has written up a simple to follow derivation of the Expectation-Maximization algorithm. It give a very practical derivation of the algorithm which also makes it easy to remember.
What it clarifies for me is the step in the EM algorithm where one introduces auxilliary variables – one for each value hidden value
that the hidden variable can take on – which somehow turns out to be the conditional probability of
given everything else. Why this turns out to be the case has always been a little fuzzy to me. And Dan’s post clarifies it greatly. The step that determines the auxilliary variables comes from equating the derivative of the log-likelihood and the derivative of the simpler function involving
’s and solving for
. Please have a read.