Repustate is announcing today the release of its new product: semantic analysis. Combined with sentiment analysis, Repustate provides any organization, from startup to Fortune 50 enterprise, all the necessary tools they need to conduct in-depth text analytics. For the impatient, head on over to the semantic analysis docs page to get started.
Text analytics: the story so far
Until this point, Repustate has been concerned with analyzing text structurally. Part of speech tagging, grammatical analysis, even sentiment analysis is really all about the structure of the text. The order in which words come, the use of conjunctions, adjectives or adverbs to denote any sentiment. All of this is a great first step in understanding the content around you – but it’s just that, a first step.
Today we’re proud and excited to announce Semantic Analysis by Repustate. We consider this release to be the biggest product release in Repustate’s history and the one that we’re most proud of (although Arabic sentiment analysis was a doozy as well!)
Semantic analysis explained
Repustate can determine the subject matter of any piece of text. We know that a tweet saying “I love shooting hoops with my friends” has to do with sports, namely, basketball. Using Repustate’s semantic analysis API you can now determine the theme or subject matter of any tweet, comment or blog post.
But beyond just identifying the subject matter of a piece of text, Repustate can dig deeper and understand each and every key entity in the text and disambiguate based on context.
Named entity recognition
Repustate’s semantic analysis tool extracts each and every one of these entities and tells you the context. Repustate knows that the term “Obama” refers to “Barack Obama”, the President of the United States. Repustate knows that in the sentence “I can’t wait to see the new Scorsese film”, Scorsese refers to “Martin Scorsese” the director. With very little context (and sometimes no context at all), Repustate knows exactly what an arbitrary piece of text is talking about. Take the following examples:
- Obama is the President.
- Obama is the First Lady.
Here we have three instances where the term “Obama” is being used in different contexts. In the first example, there is no context, just the name ‘Obama’. Repustate will use its internal probability model to determine the most likely usage for this term is in the name ‘Barack Obama’, hence an API call will return ‘Barack Obama’. Similarly, in the second example, the word “President” acts as a hint to the single term ‘Obama’ and again, the API call will return ‘Barack Obama’. But what about the third example?
Here, Repustate is smart enough to see the phrase “First Lady”. This tells Repustate to select ‘Michelle Obama’ instead of Barack. Pretty neat, huh?
Semantic analysis in your language
Like every other feature Repustate offers, no language takes a back seat and that’s why semantic analysis is available in every language Repustate supports. Currently only English is publicly available but we’re rolling out every other language in the coming weeks.
Semantic analysis by the numbers
Repustate currently has over 5.5 million entities, including people, places, brands, companies and ideas in its ontology. There are over 500 categorizations of entities, and over 30 themes with which to classify a piece of text’s subject matter. And an infinite number of ways to use Repustate to transform your text analysis.
Head on over to the Semantic Analysis Tour to see more.