Proceedings of ISP RAS


Keyterm extraction from microblogs' messages using Wikipedia.

Anton V. Korshunov.

Abstract

The paper describes a method for keyterm extraction from messages of microblogs. The described approach utilizes the information obtained by analysis of Wikipedia structure and content. The algorithm is based on computation of “keyphraseness” for each term, i.e. an estimation of probability that it can be selected as a key in text. The experimental study demonstrated that the proposed technique performs significantly better comparing to analogues. As a demonstration of possible application, the prototype of context-sensitive advertising system has been implemented. This system is able to obtain the descriptions of goods relevant to found keyterms from Amazon online store. Several ways have been proposed also on how the information derived from Twitter messages may be utilized in different auxiliary services.

Keywords

Information retrieval; keyterm extraction; natural language processing; text mining; semantic analysis; microblogging; Twitter; Wikipedia; context-sensitive advertising

Edition

Proceedings of the Institute for System Programming, vol. 20, 2011, pp. 269-282.

ISSN 2220-6426 (Online), ISSN 2079-8156 (Print).

Full text of the paper in pdf (in Russian) Back to the contents of the volume