Ivannikov Institute for System Programming of the RAS


Automatic Enrichment of Informal Ontology by Analyzing a Domain-Specific Text Collection.

Authors

Astrakhantsev N.A., Fedorenko D.G., Turdakov D.Y.

Abstract

The core part of an entity linking system, in particular one oriented to wikification, is ontology, which is often informal and supports semantic relatedness as the only type of relation. Most of these systems suffer from the problem of ontology incompleteness. It is especially important for specific domains, since often the only source of extractable knowledge is plain text. This paper formulates the incompleteness problem as a task of ontology enrichment from domain-specific texts and presents a novel approach that combines state-of-the-art methods for terminology enrichment, our own ML-based method for homonymy detection, and methods adopted from the related field for relations extraction.

Experimental evaluation shows that the bottleneck is terminology enrichment step: its average precision is about 35%, which is inapplicable for automatic usage, especially taking into account the strict requirements for ontology correctness; however, recall is high enough to help semi-automatic terminology enrichment.

We also show that the best features for terminology enrichment differ from those for classic terminology recognition task.

Full text of the paper in pdf

Keywords

ontology enrichment, terminology recognition, terminology enrichment, knowledge base construction, entity linking, wikification

Edition

Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference "Dialogue" (2014) Issue 13, pp. 29-42.

Research Group

Information Systems

All publications during 2014 All publications