The basic problem of text analysis is natural language ambiguity: same words can have different meanings depending on the context. Context understanding requires knowledge bases describing real world concepts. Construction of such knowledge bases (or ontologies) is a very resource- and time-consuming task. Texterra technology provides tools for automatic extraction of knowledge bases from partially structured resources, e.g. Wikipedia and Wikidata, and tools for analysis of texts semantics on top of these knowledge bases. Texterra technology is actively applied in research and industrial projects of ISP RAS.
ISP RAS has developed a number of original methods for social analysis which were combined into a technology called TALISMAN. Unlike most existing solutions for social analytics, TALISMAN technology was originally aimed at working with large amounts of data. The most promising open solutions from the stack of Big Data technologies are employed, such as: Apache Spark, GraphX, MLLib, etc.
Cloud infrastructure can significantly reduce resources and development time by optimizing the use of resources and reducing the time required to deploy and configure systems. For example, the load of web-services with a large number of users can drastically change depending on the time of the day, the time of the year and events (such as the Christmas Day). With elastic balancing of resources in the cloud environment, it is possible to save a huge amount of resources. The cloud infrastructure of ISP RAS consists of several components based on the most promising technologies that provide virtualization and reliable storage.
Noon is an easily extensible framework for semantic search and exploration of domain-specific information.
API Gateway provides more than 25 tools based on natural language processing, social networks analysis, and knowledge base utilization.
Full-featured native XML DBMS with support for the W3C XQuery language. XML is standard for storing and exchanging information in the Web. In order to facilitate work with big amount of XML data we developed special DBMS system that is called Sedna.
Open Source Technologies
Participating in development of Apache Spark and using it in own projects.