Proceedings of ISP RAS


Ranking in keyphrase extraction problem: is it suitable to use statistics of words occurrences?.

S.V. Popova, I.A. Khodyrev.

Abstract

The paper deals with keyphrase extraction problem for single documents, e.g. scientific abstracts. Keyphrase extraction task is important and its results could be used in a variety of applications: data indexing, clustering and classification of documents, meta-information extraction, automatic ontologies creation etc. In the paper we discuss an approach to keyphrase extraction, its’ first step is building of candidate phrases which are then ranked and the best are selected as keyphrases. The paper is focused on the evaluation of weighting approaches to candidate phrases in the unsupervised ex-traction methods. A number of in-phrase word weighting procedures is evaluated. Unsuitable approaches to weighting are identified. Testing of some approaches shows their equivalence as applied to keyphrase extraction. A feature, which allows to increase the quality of extracted keyphrases and shows better results in comparison to the state of the art, is proposed. Experiments are based on Inspec dataset.

Keywords

keyphrase extraction; keyphrase ranking; statistical features for keyphrase ranking; information extraction; scientific abstracts processing

Edition

Proceedings of the Institute for System Programming, vol. 26, issue 4, 2014, pp. 123-136.

ISSN 2220-6426 (Online), ISSN 2079-8156 (Print).

DOI: 10.15514/ISPRAS-2014-26(4)-10

Full text of the paper in pdf (in Russian) Back to the contents of the volume