Named entity recognition in Russian

Named entity recognition is now available in Russian via the Repustate API. Combined with Russian sentiment analysis, customers can now do full text analytics in Russian with Repustate.

Repustate is happy to launch Russian named entity recognition to solutions already available in English & Arabic. But like all languages Russian has its nuances that caused named entity recognition to be a bit tougher than say English.

Consider the following sentence:
Путин дал Обаме новую ядерную соглашение

In English, this says:
Putin gave Obama the new nuclear agreement

This is how Barack Obama is written in Russian:
Барак Обама

Notice that in our sentence, “Obama” is spelled with a different suffix at the end. That’s because in Russian, nouns (even proper nouns), are conjugated based on the their use case. In this example, Obama is being used in what’s called the “dative” case, meaning the noun is the recipient of something. In English, there is no concept of conjugating nouns for this reason. English only requires changing the suffix in the case of pluralization.

So Repustate has to know how to stem proper nouns as well in order to properly identify “Обаме” as Barack Obama during the contextual expansion phase.

These are the sorts of problems we have solved so you don’t have to. Try out the Semantic API and let us know what you think.

SurveyMonkey analytics supercharged with Semantic & Sentiment analysis

SurveyMonkey API + Repustate text analytics = insightful open ended responses

We recently worked with a new entrant into the healthy snack foods business who wanted to understand the market they were getting into. Specifically, they wanted to know the following:

  1. Which foods do people currently eat as their “healthy snack”
  2. Which brands do consumers think of when they hear the word “snack”
  3. Was there anything about the current selection of snack foods that consumers didn’t like?
  4. If you were having friends or family over for a casual get together (summer BBQ, catching up etc.) what kinds of snacks would you serve?

Armed with these goals in mind, a survey was created using SurveyMonkey and distributed to the new entrant’s target market via their newsletter (Protip: before you launch a product, build up a mailing list. It works to 1) validate your idea and 2) tighten the feedback loop on product decisions). A telemarketing service was also employed to call people and ask them the same questions. These responses were transcribed and sent to Repustate so the same analysis could be performed.

OK so that’s the easy part; thousands of responses were collected. But the responses were what is referred to in the market research industry as “open ended” meaning they were just free form text as opposed to a multiple choice list. The reason being was this brand didn’t want to introduce any bias into the survey by prompting the respondents with possible answers. For example, take question #2 from above. If presented with a list of snacks, the respondent might in their head say “Oh yeah, I forgot about brand X” and check them off as being a snack they think of, when in reality, that brand’s product had slipped off their radar. Open ended responses test how deeply ingrained a particular idea, concept or in this case, brand, is within a person’s consciousness.

But having open ended responses poses a challenge – how do you data mine them en masse and aggregate the results to come up with something meaningful? If you have a few hundred responses to read, maybe you hire a few interns. But what about when you have ten’s of thousands? That’s where Repustate comes in.

Use the APIs, Luke

Fortunately, SurveyMonkey has a pretty simple to use API. Combined with Repustate’s even easier to use API, you can go from open ended response to data mined text in seconds. Here’s a code snippet that provides a good blueprint for how one can marry these two APIs together. While some details have been omitted, it should be relatively straightforward as to how you can adapt it to suit your needs:

So with very few lines of Python code, we’ve grabbed the open ended responses, processed them through the named entities API call, and can store the results in our backend of choice. Let’s take a look at a sample response and see  how Repustate categorized it.

Q: If you were having friends or family over for a casual get together (summer BBQ, catching up etc.) what kinds of snacks would you serve?

A: I usually put veggies out, like carrots, celery, cucumbers etc. etc. and maybe some dip like hummus and crackers.

Running that response through the Repustate API yields this information:

  "themes": [
  "entities": {
    "celery": "food.vegetable",
    "crackers": "food.other",
    "carrots": "food.vegetable",
    "cucumbers": "food.fruit",
    "hummus": "food.other"
  "status": "OK",
  "expansions": {}

Armed with this analysis, we then aggregated the results to see which categories of food, and which brands were being mentioned the most frequently. This helped our client understand who they were competing against.


As it turns out, it was plain old vegetables that were the biggest competition to this new entrant, which is a double edged sword. On the one hand, it means they don’ have to spent the marketing dollars to compete with an entrenched incumbent who dominates most of the shelf space in supermarkets. On the other hand, it’s a troubling place to be in because vegetables are well known, cheap, and are viewed as healthy (obviously).

We’re fortunate to be living in a time when so much data is at our disposal, ready to be sliced & diced. We’re also cursed because there’s so much of it! We need the right tools and a clear mind to handle these sorts of problems, but it’s possible.

If you think your company could benefit from this sort of semantic analysis, we’d love to help so contact us.