Joining Dictionaries and Word Embeddings for Ontology Induction

This paper presents an ontology induction approach that extracts the structured lexical knowledge from synonym dictionaries and establishing the semantic relations within these structures using word embeddings and their projections. The results of the preliminary experiments have also been reported showing certain strengths and weaknesses of the proposed approach.


Introduction
A thesaurus, or a lexical ontology, is a lexical database that groups the words into the sets of synonyms called synsets or concepts, and records a number of semantic relations between these concepts [1].It is a crucial resource for many natural language processing and artificial intelligence problems, which require common sense reasoning.However, the current state of the electronic thesauri for the Russian language makes it highly topical to refine and to integrate the existing openly available resources to facilitate the development of language technology [2].In this study, an ontology induction approach that integrates the knowledge represented by the synonym dictionaries enhanced by the methods of distributional semantics has been presented and preliminarily evaluated.

Related Work
Given the fact that thesauri are composed of concepts and relations, the approaches for acquisition of both have been briefly reviewed.Unsupervised methods for concept discovery are designed for clustering co-occurrence graphs to group the words having similar meanings.For instance, the methods proposed by Schütze [3] and Lin & Pantel [4] construct such graphs using large text corpora.Recently, a significant attention in lexical semantics has been paid to the specialized algorithms like Chinese Whispers [5], which is a hard clustering algorithm that assigns a word to at most one cluster at a time, and MaxMax [6], which is a soft clustering algorithm designed specifically for the word sense induction task [7].Currently, the most widely used method for detecting hyponymy-hypernymy relations is the Hearst patterns [8].However, these lexical-syntactic patterns offer the sparse representation of words that is less convenient than word embeddings [9].Fu et al. [10] proposed the projection learning approach to learning hypernyms for the Chinese language.This approach assumes learning the projection matrix so that multiplying it on a hyponym vector produces a hypernym vector.The learning problem has been posed as the linear regression problem that has been then numerically approximated using stochastic gradient descent.Shwartz et al. developed an integrated method that combines the syntactic parsing features with word embeddings based on a long short-term memory network [11].The resulting method called HypeNET has been implemented using the recurrent neural network that encodes the patterns with the embeddings.

Approach
The proposed ontology induction approach, depicted at Fig. 1, uses a synonym dictionary to produce the concepts, and a pre-trained word embeddings model to establish the hyponymy-hypernymy relations between the concepts.The goal of the concept discovery step is to yield a set of concepts by grouping the words with similar meanings from a synonym graph.The goal of the relation establishment step is to link these concepts to each other using the hyponymy-hypernymy relation, which is also known as the subsumption relation or is-a relation.

Concept Discovery
A synonym dictionary is a graph the set of which vertices contains the words and the set of which edges contains the word pairs connected via the synonymy relation.The cliques in such a graph naturally form the densely connected sets of synonyms [12] corresponding to the concepts.Given the fact that the clique problem in a graph is , the Chinese Whispers [5] graph clustering algorithm has been used for finding a global segmentation of the graph.However, the hard clustering property of this algorithm does not handle the polysemous words well.To deal with that, a word sense induction procedure has been run to extract the word senses [14] which have been then combined back into a disambiguated word sense graph [15].Finally, the disambiguated word sense graph has been clustered again using Chinese Whispers to induce the concepts from these disambiguated word senses.

Relation Establishment
To extend the lexical coverage of the available subsumption word pairs, a projection learning setup has been used [10] to compute a transformation matrix for word embeddings by projecting the hyponym vector to its possible hypernym vector.To get this done, the 100-dimensional word embeddings dataset for Russian trained by Arefyev et al. [16] using the skip-gram architecture with the sliding window of 10 words and the minimal word frequency of 100 on a text corpus of 13 billion words have been obtained (this configuration is among the best of those participated in the RUSSE study [17] despite its low computational performance requirements).The subsumption pairs from the Russian Wiktionary [18] stem both the train and test sets for learning the projection matrix, the total number of the pairs being 33 885.To avoid lexical overfitting [19], no hyponym from the train set is present in the test set.Since the specificity of the semantic relations differing in various regions of the embedding space, the hyponym embeddings have been clustered using the k-means algorithm tuned on a development dataset.Having the concepts induced and the subsumptions trained, the relations between the concepts are established as follows.For each concept, every word has been projected to its hypernym embedding and the ten nearest neighbours of that projection have been obtained.These neighbours jointly form a bag of words for this concept, for which the most similar concept is computed using the cardinality of the set intersection as the similarity measure.If such a concept found, the former concept is considered as a hyponym of the latter.

Experiments
For evaluation purposes, the Russian Wiktionary [18], the Abramov's dictionary [20], and the Universal Dictionary of Concepts [21] have been combined into the single graph to benefit from different lexical coverage provided by the different synonym dictionaries and also to enforce the jointly observed synonymy relations.The resulting graph has 406 889 edges connecting 74 133 individual words.The RuTheslite 2.0 lexical ontology, composed of 31.5Kconcepts, 111.5K lexemes and 130K relations, has been used as the gold standard [1] during these experiments.

Concept Evaluation
To assess the performance of the described concept discovery method, the same graph has been also processed by two other algorithms: Chinese Whispers [5] and MaxMax [6].Similarly to the experimental setup used for evaluating the MaxMax clustering algorithm [6], the pairwise precision, recall and F1-measure scores [22] and V-measure score [23] have been computed (Table 1).According to the concept evaluation results, the described concept discovery method outperformed other methods on every pairwise score and showed the comparable Vmeasure representing the goodness of the output clustering.As a hard clustering algorithm, Chinese Whispers demonstrated good performance on grouping monosemous words into the concepts like {компьютер (computer), ЭВМ (ECM), …}, especially on named entities.Also, as anticipated, its performance degraded on polysemous words, resulting in the concepts like {вода (water), акватория (water area), влага (moisture), кислород (oxygen), водород (hydrogen), …}.Surprisingly, MaxMax, despite the existence of a successful case study of the Portuguese language [7], showed poor results in this study due to the possible difference of the expected graph structure.Firstly, unlike other methods being compared, it emitted a large number of the concepts grouping more than 300 words.After the investigation, these concepts have been removed from the evaluation as the non-relevant.Secondly, a substantial part of the concepts provided by MaxMax grouped the words having no obvious synonymy relation, e.g., {прайс (price), бином Ньютона (Newtonian binomial), программный пакет (software package), …}.In contrast, the concepts yielded by the described concept discovery method correctly reflect the polysemy phenomenon, e.g., {пустота (emptiness), бессодержательность (barrenness), бессмысленность (meaninglessness), …} and {вакуум (vacuum), пустота (emptiness), ничто (nihil), …}.Unfortunately, the disambiguation procedure being used [15] tends to miss certain underrepresented word senses, which results in their absence in the disambiguated word sense graph, and, therefore, in the output concepts as well.

Relation Evaluation
To assess the performance of the described relation establishment method, it has been applied for each of the 5 984 concepts discovered at the previous step.Each concept has been matched to the most similar RuThes-lite concept using the cardinality of the set intersection as the similarity measure.An established relation has been then considered as correct if there exists a directed path from the hyponym concept to the hypernym concept in the gold standard.Also, the performance of the projection learning setup has been compared to the using of the unmodified subsumption pairs from the Russian Wiktionary without the word embeddings.Table 2 shows the relation evaluation results.According to the relation evaluation results, the projection learning setup increases the number of candidate concepts and relations, but the number of the correctly established relations did not increase substantially.However, due to the lack of the available subsumption dictionaries, it seems reasonable to try a larger word embeddings dataset, a different learning setup, or a different concept similarity measure.

Conclusion
In this short paper, an ontology induction approach that induces a thesaurus structure by integrating synonym dictionaries for discovering concepts and a distributional model for establishing the hyponymy-hypernymy relations between them, has been described and preliminarily evaluated.The results of this study are openly available under a libré license: https://github.com/dustalov/concept-discovery.The plans for the further study include the improving the relation establishment step by more sophisticated matching and machine learning techniques as well as applying crowdsourcing for validating the subsumption candidates.

Table 1 .
Concept evaluation