Synthesys helps you understand your data – discovering unexpected and critical knowledge that may be hidden. Synthesys takes structured and unstructured text as input and then uses named entity recognition, knowledge of time and geographic references combined with patented relationship analysis to develop an understanding of the resolved entities in time and space along with their connections and related concepts. Synthesys automates the understanding of complex data sets by eliminating the requirement for an ontology and by combining a number of point solutions into an integrated entity oriented analytics platform.
Synthesys Key Capabilities
A foundational capability of Synthesys is the ability to perform a recognition, resolution and categorization of entities. This combines traditional entity extraction, categorization, sentence and port-of-speech identification and tagging as well as fact extraction (subject-predicate-object). This foundational capability makes it possible for Synthesys to extract people, places and organizations without the need for pre-defined data models, taxonomy or ontology while maintaining context – an important differentiation that enables further analysis and deeper understanding in an automated way.
Synthesys extracts and resolves both direct and indirect references to time and location. Text phrases like “New York City” are extracted and resolved as well as indirect phrases like “50 miles north of the city”. These geographic references as well as similar temporal references are then related to entities if possible by the context of the usage in the text.
Unique to Digital Reasoning Systems – due to its patented algorithm – is a mechanism that can actually learn the meaning of words based on how they are used, not based on external reference (taxonomy or ontology). The system develops a “semantic signature” of each word in your data (even when there are millions of unique words and entities) and can intelligently discover which words are most similar such as variations of people’s names, aliases, or highly related entities and properties. This technology allows the system to adapt to any data, discovering key unknowns and unexpected relationships and allows you to efficiently structure your domain-specific information on massive scale.
Besides raising intelligent search to new levels, Synthesys remembers exactly where each concept is mentioned in your data, the exact context. Whereas other search systems focus on returning you the most relevant document, Synthesys provides you with the context down to the sentence. This saves you a lot of time in reviewing search results by summarizing the facts that come off of numerous documents in a compressed and manageable result. Also, by allowing the user to search on specific relationships between concepts and entities in the data, Synthesys quickly narrows down the pertinent facts. Thousands of results become hundreds or even a handful within seconds.
Synthesys is able to accurately resolve numerous different references to the same unique entity (or concept) across your entire dataset with no manual intervention. Using its patented associative network technology and best of breed entity extraction capabilities, Synthesys is able to leverage context and historic usage to turn many ambiguous and diverse references to the same concept into a global identifier that integrates knowledge and relationships about that given entity. Synthesys goes much farther than any other system in turning extracted entities into globally unique concepts that developers can use in their applications and services.
The Link Analysis engine present in Synthesys generates graphs that help analysts discover structure in data. These graphs represent various types of relationships present in the data. Co-occurrence relationships are extracted from unstructured data and depicted visually in co-occurrence graphs. These show how selected entities are related to each other in the same contexts (i.e. significant “subject – predicate – object” relationships). Structured data has relationships explicitly encoded, and these relationships can be depicted in structured data relationship graphs. See the discussion in the section describing the KBQL Link Analysis queries for more detailed information on how LInk Analysis works and the different analytical outputs produced for these types of queries.
To the extent possible, components of Synthesys are designed to be language neutral. In cases where that is not possible (or desirable), the software encodes the language specific detail into a module with a well-defined interface that can be implemented differently for each language. There are two general classes of language dependency present in the system: probabilistic mathematical knowledge, and rule-based or procedural knowledge. Both are treated similarly in that an abstract interface hides the specific implementation, allowing the rest of the Synthesys software to remain ignorant/independent of the language dependency.
Synthesys provides support for the Chinese (Simplified and Traditional) languages and a numbed of additional languages are to be released soon. Synthesys performs all of its functions natively in each language, including the data ingestion functions, the analytics functions, and the data query functions. Synthesys is “trained” to understand the structure of new languages with a guided examination of a small sample set of representative documents in that language. Once this brief training is completed Synthesys proceeds with it’s approach of entity oriented analytics with an awareness of a new language. This approach is not dissimilar to teaching a person a new language based on structure and then letting them understand the meaning of words based on their usage in context.
In order to support a wide spectrum of data ingestion and query performance requirements, the Synthesys architecture uses a combination of a clustering mechanism called Cassandra to distribute the Knowledge Base, and Hadoop map/reduce processes to perform the data ingestion/analysis. There are several benefits to using these open source technologies to provide Synthesys with distributed processing capabilities. First of all, Hadoop and Cassandra nodes can be instantiated with commodity hardware. The Hadoop/Cassandra technologies allow additional nodes to be added to the network without requiring software changes to the Synthesys software to begin using them. Additional Hadoop/Cassandra nodes are integrated into the Synthesys platform by simply changing a few configuration parameters.
In addition to Cassandra, Synthesys supports the use of HBase for providing the distributed Knowledge Base storage capability. Support for other distributed storage technologies, such as Cloudbase, are also planned for future versions of Synthesys.