Talisman: a data processing framework


Download catalogue of technologies

Talisman: a data processing framework

Talisman is a unified set of tools that automate typical data processing tasks, such as data retrieval, integration, analysis, storage and visualization. It makes possible the fast development of specialized multi-user analytical systems that merge and work uniformly with the data from private databases and Internet sources (including social networks).

Features and advantages

Talisman unifies the tools necessary for big data. It builds on two ISP RAS technologies: Dedoc, a system for document structure retrieval, and Texterra, a platform for extracting semantics from text. Talisman is comparable to world’s best competitors (Palantir Gotham and IBM Watson Content Analytics). Its advantage is automating routine analysis processes with state of the art research results (reducing resources required for manual analysis).

Talisman provides:

  • A rich set of reusable components that have APIs for easy management and integration:
    • Data retrieval components. They include a framework for Internet data collection, namely, from social media (Facebook, VKontakte, Twitter, Instragram, Odnoklassniki, Youtube, LinkedIn etc.), blogs, news, MediaWiki sites, developer portals etc. Also there’s a system for importing data from file storages and databases.
    • Automatic data analysis components. Analysis tools are designed as Docker containers that are managed through APIs by the Talisman.Flow system (included in the Unified Register of Russian Programs as No.6045). The output is stored on hard disks or in databases (PostgreSQL, ElasticSearch, Cassandra etc). The basic services used are the Tesseract OCR system and own ISP RAS tools.
    • Storage and indexing components. These include a number of databases and information search engines that store source data, automatic analysis results, and results of manual user work.
  • An easy to use web interface that unifies all components requiring user interaction.
  • A flexible modular architecture that allows adding new features to the interesting components without changing others.
  • A scalable architecture that allows processing and storing more data just by adding more hardware without any software change.
  • Specialized components that monitor system status, manage event log, perform deployment, authentication and authorization, access control, and unidirectional data transfer.
  • Tools and methods for training machine learning models as well as for transferring existing algorithms to other knowledge domains.
  • A configurable knowledge domain scheme that can be changed by a user when the system is in operation.
  • An onsite deployment option using existing customer hardware or the new hardware provided and configured with the framework.
  • Integration with private customer systems via provided component APIs.
  • Closed license free. Talisman is based on open source and know-how ISP RAS tools.

Talisman application areas

  • Automated knowledge base construction for a given knowledge domain and non-stop monitoring for new information regarding objects of interest.
  • Competitor intelligence based on open sources (OSINT).
  • Detecting information campaigns that aim to manipulate target audience as well as detecting the target audience for a campaign.
  • Detecting and analyzing means for disseminating information (used resources, people, bots) as well as analyzing community member communication roles (news source, opinion leader, disseminator, moderator, bot, commentator).
  • Reputation management for persons and companies, including monitoring relevant news, detecting possible complaints, monitoring leaks and information disclosure.
  • Staff management optimization including efficient recruitment, data verification, detecting hidden activity, assisting in developing motivational systems.
  • Evaluating activity effectiveness objectively and testing strategies on a target audience to gather feedback.
  • Finding and managing social tension to detect and prevent conflict escalation.

Supported languages

Talisman supports languages recognized by the Texterra analyzer, namely, Russian and English.

Talisman workflow

Talisman: a data processing framework

Developer/Participant

Information Systems

Back to the list of technologies of ISP RAS