Faster PLSA: Finding Initial Approximation for Probabilistic Latent Semantic Analysis training.
Probabilistic Latent Semantic Analysis (PLSA) is an effective technique for information retrieval, but it has a serious drawback: it consumes a huge amount of computational resources, so it is hard to train this model on a large collection of documents. The aim of this paper is to improve time efficiency of the training algorithm. Two different approaches are explored: one is based on efficient finding of an appropriate initial approximation, the idea of another is that for the most of collection topics may be extracted from relatively small fraction of the data.Full text of the paper in pdf
Proceedings of SYRCoDIS'14: The Tenth Spring Researchers Colloquium on Databases and Information Systems, Velikiy Novgorod, 2014.