Proceedings of ISP RAS


Automatic Extraction of New Concepts from Domain-Specific Terms.

D.G. Fedorenko, N.A. Astrakhantsev.

Abstract

Most of the state-of-the-art approaches for word sense disambiguation (WSD) are based on knowledge bases, or ontologies — databases of terms, their concepts and relations between them. One of the standing problems of knowledge bases is their incompleteness, i.e. the lack of appropriate concepts for terms occurred in some contexts; the problem is mostly actual for domain-specific terms. The consequence is that systems produce incorrect results because existing WSD algorithms simply assign one of the a-priori incorrect concepts to the terms. This paper describes a novel approach for recognition of domain-specific terms that exist in the knowledge base but represent new concepts. In contrast to previous approaches requiring formal ontologies with hierarchical structure and different relation types, our method can be applied to informal knowledge bases — it requires only semantic similarity between concepts and statistics of terms extracted from the domain-specific corpus. We show that our method performs better than existing approaches and achieves 74% precision and 83% recall for the collection of domain-specific terms not fully covered by our knowledge base. Also our method improves precision of WSD from 52% to 78% for the considered terms.

Keywords

concept extraction; domain-specific terms; knowledge base enrichment; ontology enrichment; informal knowledge base; informal ontology; word sense disambiguation; semantic analysis

Edition

Proceedings of the Institute for System Programming, vol. 25, 2013, pp. 167-178.

ISSN 2220-6426 (Online), ISSN 2079-8156 (Print).

DOI: 10.15514/ISPRAS-2013-25-9

Full text of the paper in pdf (in Russian) Back to the contents of the volume