Ivannikov Institute for System Programming of the RAS


Semantic Analysis of Texts using Texterra System.

Authors

Turdakov D.Y., Andrianov I.A., Astrakhantsev N.A., Mayorov V.D., Nedumov Y.R., Sysoev A.A., Fedorenko D.G.

Abstract

Texterra delivers a scalable solution for text processing based on novel methods which exploit knowledge extracted from the Web and collections of domain-specific documents. The paper describes the process of semantic model construction within Texterra. The system first detects compound terms and annotates each term with a meaning by assigning an appropriate concept from the knowledge base using disambiguation algorithm. After that, the key concepts are extracted by detecting the most relevant concepts to the text. Information from Wikipedia is used as a basis for automatic knowledge base construction. In addition, Texterra provides tools for extending knowledge base with information extracted from websites and collections of domain-specific documents.

Evaluation results include term extraction, word sense disambiguation, and key concept extraction from Russian and English corpora. Current technology level allows using Texterra for improving quality of different applications, such as semantic search, recommender systems, and social network analysis. Demonstrations and API are available at https://api.ispras.ru.

Full text of the paper in pdf (in Russian)

Keywords

semantic analysis, Wikipedia, knowledge bases, semantic ontologies, Wikification

Edition

Abstracts of the International Conference on Computational Linguistics "Dialogue". 2014.

Research Group

Information Systems

All publications during 2014 All publications