Faster PLSA: Finding Initial Approximation for Probabilistic Latent Semantic Analysis training.


Faster PLSA: Finding Initial Approximation for Probabilistic Latent Semantic Analysis training.

Authors

Kozlov I., Avanesov V.

Abstract

Probabilistic Latent Semantic Analysis (PLSA) is an effective technique for information retrieval, but it has a serious drawback: it consumes a huge amount of computational resources, so it is hard to train this model on a large collection of documents. The aim of this paper is to improve time efficiency of the training algorithm. Two different approaches are explored: one is based on efficient finding of an appropriate initial approximation, the idea of another is that for the most of collection topics may be extracted from relatively small fraction of the data.

Full text of the paper in pdf

Keywords

PLSA, topic modeling, initial approximation

Edition

Proceedings of SYRCoDIS'14: The Tenth Spring Researchers Colloquium on Databases and Information Systems, Velikiy Novgorod, 2014.

Research Group

Information Systems

All publications during 2014 All publications