Department of Mathematics
"Probabilistic latent semantic analysis"
is a sequel to my talk on factor models for text
collections. In this talk, I will go over a generative probabilistic
model that handles polysemy (namely, words with multiple meanings)
better than its deterministic counterpart. Each document is thought to
be generated by a mixture of topic-word distributions. Both topic
distributions and document-topic weights are found by the EM algorithm.
Since the number of topics is smaller than the number of terms, the
documents can be represented by their topic weights in a much smaller
subspace of term space. Each topic gives different weights to the same
terms, which turns out to be an effective way to address the polysemy
Date: Thursday, November 10, 2016
Place: Mathematics Seminar, SA-141
Tea and cookies will be served before the seminar.