Dashboard design: it’s all about communication

Dashboard design is how best to  communicate the stories the underlying data is trying to tell you.

Every good B2B product these days has to provide some sort of dashboard to let its customers see into the data from a  higher view. Dashboards should provide a narrative of what’s happening under the hood – so why do we make people do all the work? Repustate’s old dashboard for social media analytics flat out sucked. It was clunky, ugly, wasted valuable screen real estate on non-essential items and had very low engagement levels as a result. So we decided to redesign it.

Thank you, Stephen Few

The first step was to get a copy of Stephen Few’s book, Information Dashboard Design. Published in 2006, it covers a variety of topics related to dashboard design and presenting information. Although some of the screen shots in the book are dated, the same principles that applied then still apply now. Here are the main takeaways we got from this great book:

  1. No clutter – remove any elements that don’t add value
  2. Use colours to communicate a specific idea, not just for fun
  3. Place the most important graphs/numbers/conclusions at the top
  4. Provide feedback where possible – make the dashboard tell a story

If you read those 4 things, you’re probably thinking, “Well, duh, that’s obvious.” And yet, take a look at the dashboards you use everyday; they’re loaded with visual noise that is irrelevant, poorly placed, or difficult to make heads or tails from. Here’s a screen shot of Repustate’s new dashboard:

Repustate's new dashboard
Repustate’s new dashboard. Click for larger image.

Here’s a run down of various features and attributes of this new design:

  1. Minimal “chrome” (visual clutter) on the page. The side bar can slide out, but by default is closed and each of the menu items is also a hover menu so you never have to open the menu itself to see what each item means.
  2. The Repustate logo is in the sidebar menu and is visible only when expanded. At this point, does a user really need to be reminded of what they’re using – they know. Hide your logos.
  3. Top left area of the page provides a common action – download the raw data. No wasted space.
  4. Most important information for our users is sentiment trend over time – put that at the top. We colour coded the “panels” depending on the sentiment and also have a background image (thumb up, thumb down, question mark) depending on the value. The visuals align with the data itself.
  5. Summary statements tell the user exactly what’s happening – don’t make them guess. We have a bunch of rules set up that determine which sentences appear when. Within seconds, our users know exactly what’s going on – their dashboard is communicating with them directly. You can’t believe how much engagement increases as a result.
  6. Easy to read graphs & lists. Minimal clutter and minimal labelling. If your graphs and charts are done properly, the labelling of data points should be unnecessary.

In less than 20s, any executive will be able to glean from this exactly what’s going on. With more drilling down, clicking some links, they can get deeper insights, but the point is, to get the “executive summary”, they shouldn’t have to. We’re telling them if they’re doing well or not. We’re telling them what’s changed since last time. Ask yourself what information can your dashboards communicate and then make them do so. Your customers will love you for it.

Moving from freemium to premium

The freemium business model has suited Repustate well to a point, but now it’s time to transition to a fully paid service.

When Repustate launched all that time ago, it was a completely free service. We didn’t take your money even if you offered. The reason was we wanted more customer data to improve our various language models and felt giving software away for free in exchange for the data was a good bargain for both sides.

As our products matured and as our user base grew, it was time to flip the monetary switch and start charging – but still offering a free “Starter” plan as well as a free demo to try out our sentiment & semantic analysis engine.

As we’ve grown and as our SEO has improved, we’ve received a lot more interest from “tire kickers”. People who just want to play around, not really interested in buying. And that was fine by us because again, we got their data so we could see how to improve our engines. But most recently, the abuse of our Starter plan has got to the point where this is no longer worth our while. People are creating those 10 minute throwaway accounts to sign up, activate their accounts, and then use our API.

While one could argue that if people aren’t willing to pay, maybe the product isn’t that good – the extremes people are going to in order to use Repustate’s API for free tells us that we do have a good product and charging everyone is perfectly reasonable.

As a result, we will be removing the Starter plan. From now on, all accounts must be created with a valid credit card. We’ll probably offer a money-back trial period, say 14 days, but other than that, customers must be committing to payment on Day 0. We will also bump up the quotas for all of our plans to make the value proposition all the better.

Any plans currently on Starter accounts will be allowed to remain so. If you have any questions about this change and how it affects you, please contact us anytime.

Introducing Semantic Analysis

Repustate is announcing today the release of its new product: semantic analysis. Combined with sentiment analysis, Repustate provides any organization, from startup to Fortune 50 enterprise, all the necessary tools they need to conduct in-depth text analytics. For the impatient, head on over to the semantic analysis docs page to get started.

Text analytics: the story so far

Until this point, Repustate has been concerned with analyzing text structurally. Part of speech tagging, grammatical analysis, even sentiment analysis is really all about the structure of the text. The order in which words come, the use of conjunctions, adjectives or adverbs to denote any sentiment. All of this is a great first step in understanding the content around you – but it’s just that, a first step.

Today we’re proud and excited to announce Semantic Analysis by Repustate. We consider this release to be the biggest product release in Repustate’s history and the one that we’re most proud of (although Arabic sentiment analysis was a doozy as well!)

Semantic analysis explained

Repustate can determine the subject matter of any piece of text. We know that a tweet saying “I love shooting hoops with my friends” has to do with sports, namely, basketball. Using Repustate’s semantic analysis API you can now determine the theme or subject matter of any tweet, comment or blog post.

But beyond just identifying the subject matter of a piece of text, Repustate can dig deeper and understand each and every key entity in the text and disambiguate based on context.

Named entity recognition

Repustate’s semantic analysis tool extracts each and every one of these entities and tells you the context. Repustate knows that the term “Obama” refers to “Barack Obama”, the President of the United States. Repustate knows that in the sentence “I can’t wait to see the new Scorsese film”, Scorsese refers to “Martin Scorsese” the director. With very little context (and sometimes no context at all), Repustate knows exactly what an arbitrary piece of text is talking about. Take the following examples:

  1. Obama.
  2. Obama is the President.
  3. Obama is the First Lady.

Here we have three instances where the term “Obama” is being used in different contexts. In the first example, there is no context, just the name ‘Obama’. Repustate will use its internal probability model to determine the most likely usage for this term is in the name ‘Barack Obama’, hence an API call will return ‘Barack Obama’. Similarly, in the second example, the word “President” acts as a hint to the single term ‘Obama’ and again, the API call will return ‘Barack Obama’. But what about the third example?

Here, Repustate is smart enough to see the phrase “First Lady”. This tells Repustate to select ‘Michelle Obama’ instead of Barack. Pretty neat, huh?

Semantic analysis in your language

Like every other feature Repustate offers, no language takes a back seat and that’s why semantic analysis is available in every language Repustate supports. Currently only English is publicly available but we’re rolling out every other language in the coming weeks.

Semantic analysis by the numbers

Repustate currently has over 5.5 million entities, including people, places, brands, companies and ideas in its ontology. There are over 500 categorizations of entities, and over 30 themes with which to classify a piece of text’s subject matter. And an infinite number of ways to use Repustate to transform your text analysis.

Head on over to the Semantic Analysis Tour to see more.

 

Social sentiment analysis is missing something

Sentiment analysis by itself doesn’t suffice

An interesting article by Seth Grimes caught our eye this week. Seth is one of the few voices of reason in the world of text analytics that I feel “gets it”. His views on sentiment’s strengths and weaknesses, advantages and shortcomings align quite perfectly with Repustate’s general philosophy.

In the article, Seth states that simply getting relying on a number denoting sentiment or a label like “positive” or “negative” is too coarse a measurement and doesn’t carry any meaning with it. By doing so, you risk overlooking deeper insights that are hidden beneath the high level sentiment score. Couldn’t agree more with this and that’s why Repustate supports categorizations.

Sentiment by itself is meaningless; sentiment analysis scoped to a particular business need or product feature etc. is where true value lies. Categorizing your social data by features of your service (e.g. price, selection, quality) first and THEN applying sentiment analysis is the way to go. In the article, Seth proceeds to list a few “emotional” ones (promoter/detractor, angry, happy etc). that quite frankly I would ignore. These categories are too touchy-feely, hard to really disambiguate at a machine learning level and don’t tie closely to actual business processes/features. For instance, if someone is a detractor, what is that is causing them to be a detractor? Was it the service they received? If so, then customer service is the category you want and negative polarity of the text in question gives you invaluable insights. The fact that someone is being negative about your business means almost by definition they are detractors.

Repustate provides our customers with the ability to create their own categories according to the definitions that they create. Each customer is different, each business is different, hence the need for customized categories. Once you have your categories, sentiment analysis becomes much more insightful and valuable to your business.

Twitter + GeoJSON + d3.js + Rob Ford = Fun with data visualizations

Using Twitter, GeoJSON (via GDAL) and d3 to visualize data about Rob Ford

(Note: Gists turned into links so as to avoid too many roundtrips to github.com which is slow sometimes)

As a citizen of Toronto, the past few weeks (and months) have been interesting to say the least. Our mayor, Rob Ford, has made headlines for his various lewd remarks, football follies, and drunken stupors. With an election coming up in 2014, I was curious as to how the rest of Toronto felt about our mayor.

The Task

Collect data from Twitter, making sure we only use tweets from people in Toronto. Plot those tweets on an electoral map of Toronto’s wards, colour coding the wards based on how they voted in the 2010 Mayoral Election.

The Tools

  1. Repustate’s API (for data mining Twitter and extracting sentiment, of course!)
  2. Shapefiles for the City of Toronto including boundaries for the wards
  3. GDAL
  4. d3.js
  5. jQuery

I’m impatient, just show me the final product …

Alright, here’s the final visualization of Toronto’s opinion of Rob Ford. Take a look and then come back here and find out more about how we accomplished this.

The Process

For those still with me, let’s go through the step-by-step of how this visualization was accomplished.

Step 1: Get the raw data from Twitter

This is a given – if we want to visualize data, first we need to get some! To do so, we used Repustate to create our rules and monitor Twitter for new data. You can monitor social media data using Repustate’s web interface, but this is a project for hackers, so let’s use the Social API to do this. Here’s the code (you first have to get an API key):

https://gist.github.com/mostrovsky/7762449

Alright, so while Repustate gets our data, we can get the geo visualization aspect of this started. We’ll check back in on the data – for now, let’s delve into the world of geographic information systems and analytics.

Step 2: Get the shapefile

What’s a shapefile? Good question! Paraphrasing Wikipedia, shapefiles are a file format that contain vector information describing the layout and/or location of geographic features, like cities, states, provinces, rivers, lakes, mountains, or even neighbourhoods. Thanks to a recent movement to open up data to the public, it’s not *too* hard to get your hands on shape files anymore. To get the shapefiles for the City of Toronto, we went to the City’s OpenData website, which has quite a bit of cool data there to play with. The shapefile we need, the one that shows Toronto divided up into it’s electoral wards, is right here. Download it!

Step 3: Convert the shapefile to GeoJSON

So we have our shapefile, but it’s in a vector format that’s not exactly ready to be imported into a web page. The next step is to convert this file into a web-friendly format. To do this, we need to convert the data in the shapefile into a JSON-inspired format, called GeoJSON. GeoJSON looks just like normal JSON (it is normal JSON), but it has a spec that defines what a valid GeoJSON object must contain. Luckily, there’s some open source software that will do this for us: the Geospatial Data Abstraction Library or GDAL. GDAL has *tons* of cool stuff in it, take a look when you have time, but for now, we’re just going to use the ogr2ogr command that takes a shapefile in and spits out GeoJSON. Armed with GDAL and our shapefile, let’s convert the data into GeoJSON:

ogr2ogr 
  -f GeoJSON 
  -t_srs EPSG:4326
  toronto.json 
  tcl3_icitw.shp

This new file, toronto.json, is a GeoJSON file that contains the co-ordinates for drawing the polygons representing the 44 wards of Toronto. Let’s take a look at that one liner just so you know what’s going on here:

  1. ogr2ogr is the name of the utility we’re using; it comes with GDAL
  2. Specifying the format of the final output (-f option)
  3. This step is *very* key, we’re making sure the output is projected/translated into the familiar lat/long co-ordinate system. If you omit this (in fact, try omitting this flag), you’ll see the output will require some work from us on the front end, which is unnecessary.
  4. The name of the output file
  5. The input shapefile

Alright, GeoJSON file done.

Step 4: Download the data

Assuming enough time has passed, Repustate will have found some data for us and we can retrieve it via the Social API:

https://gist.github.com/mostrovsky/7762900

The data comes down as a CSV. It’ll remain an exercise for the reader to store the data in a format that you can query and retrieve thereafter, but let’s just assume we’ve stored our data. Included in this data dump will be the latitude and longitude of each tweet(er). How do we know if this tweet(er) is in Toronto? There’s a few ways actually.

We could use PostGIS, an unbelievably amazing extension for PostgreSQL that allows you to do all sorts of interesting queries with geographic data. If we went this route, we could load the original shapefile into a PostGIS enabled database, then for each tweet, use the ST_Contains method to check which tweet(er)s are in Toronto. Another way to do this is to iterate over your dataset and for each one, do a quick point in polygon calculation in the language of your choice. The polygon data can be retrieved from the GeoJSON file we created. Either way you go, it’s not too hard to determine which data originated from Toronto, given specified lat/long co-ordinates.

Step 5: Visualize the data

Let’s see how this data looks. To plot the GeoJSON data and each of the data points from our Twitter data source, we use d3.js For those not familiar with d3.js (Data Driven Documents) – it’s a Javascript library that helps you bind data to graphical representations in pure HTML/SVG. It’s a great library to use if you want to visualize any data really, but it makes creating unique and customized maps pretty easy. And best of all, since the output is SVG, you’re pretty much guaranteed that your visuals will look identical on all browsers (well, IE 8 and down don’t support SVG) and mobile devices. Let’s dive into d3.js. We need to plot 2 things:

  1. The map of Toronto, with dividers for each of the wards
  2. The individual tweets, colour coded for sentiment and placed in the correct lat/long location.

The data for the map comes from the GeoJSON file we created earlier. We’re going to tell d3 to load it and for each ward, we’re going to draw a polygon (the “path” node in SVG) by following the path co-ordinates in the GeoJSON file. Here’s the code:

https://gist.github.com/mostrovsky/7776778

Path elements have an attribute, “d”, which describes how the path should be drawn. You can read more about that here. So for each ward, we create a path node, set the “d” attribute to contain the lat/long co-ordinates that when connected, form the boundaries of the ward. You’ll see we also have custom fill colouring based on the 2010 Mayoral election – that’s just an extra touch we added. You can play around with the thickness of the bordering lines. We’re also adding a mouseover event to show a little tooltip about the ward in question when the user hovers over it. Time to add the data points. We’re going to separate them by date, so we can add a slider later to show how the data changes over time. We’re also including sentiment information that Repustate automatically calculates for each piece of social data. Here’s an example of how one complete data point would look:

{
 "lat": 43.788458853157117, 
 "mm": 10,
 "lng": -79.282781549043008,
 "dd": 28,
 "sentiment": "pos"
}

Now to plot these on our map, we again load a JSON file and for each point, we bind a “circle” object, positioned at the correct lat/long co-ordinates. Here’s the code:

https://gist.github.com/mostrovsky/7776799

At this point, if you tried to render the data on screen, you wouldn’t see a thing, just a blank white SVG. That’s because you need to deal with the concept of map projections. If you think about the map of the world, there’s no “true” way to render it on a flat, 2D surface, like your laptop screen or iPad. This has given rise to the concept of projections, which is the process of placing curved data onto a flat plane. There are many ways to do this and you can see a list here of available projections; for our example, we used the Albers projection. Last thing we have to do is scale and transform our SVG and we’re done. Here’s the code for projecting, scaling and transforming our map:

https://gist.github.com/mostrovsky/7776853

Those values are not random – they came about from some trial & error. I’m not sure how to do this properly or in algorithmic way.

Please take a look at the finished product and of course, if you or your company is looking into doing this sort of thing with your own data set, contact us!

Building a better TripAdvisor

(This is a guest post by Sarah Harmon. Sarah is currently a PhD student at UC Santa Cruz studying artificial intelligence and natural language processing. Her website is NeuroGirl.com)

Click here to see a demo of Sarah’s work  in action.

Using Repustate’s API to to get the most out of TripAdvisor customer reviews

I’m a big traveler, so I often check online ratings, such as those on TripAdvisor, to decide which local hotel or restaurant is worth my time.  In this post, I use Repustate’s text categorization API to analyze the sentiment of hotel reviews, and – in so doing – work towards a better online hotel rating system.

Five-star scales aren’t good enough

The five-star rating scale is a standard ranking system for many products and services.  It’s a fast way for consumers to check out the quality of something before paying the price, but it’s not always accurate.  Five-star ratings are inherently subjective, don’t let raters say all they want to say, and force a generic labelling for a complex experience.

Asking people to write text reviews solves a few of these problems.  Instead of relying on the five-star scale, we ask people to highlight the most memorable parts of their experience.  Still, who has the time to read hundreds of reviews to get a true sense of what a hotel is like?  Even the small sample of reviews shown on the first page could be unhelpful, unreliable or cleverly submitted by the hotel itself to make themselves look good. What we need is a way to summarize the review content – and ideally, we’d like a summary that’s specific to our own values.

Making a personalized hotel ranking system

Let’s take a look at a website that uses star ratings all the time: TripAdvisor, a travel resource that features an incredible wealth of user-reviewed hotels.

trip advisor front page

TripAdvisor uses AJAX-based pagination, which means that if we’d like to access that wealth of data, normal methods of screen-scraping won’t work.  I decided to use CasperJS, a fork of PhantomJS, to easily scrape the TripAdvisor website with JavaScript.

Here’s a simplified example of how you can use CasperJS to view the star ratings for every listed hotel from New York:

In a similar fashion, I used Python and CasperJS to retrieve a sample of 100 hotels each from five major locations – Italy, Spain, Thailand, the US, and the UK – and retrieved the top 400 English reviews from each hotel as listed on TripAdvisor.  To ensure that each stored review was in English, I relied on Python’s Natural Language Toolkit. Finally, I called on the Repustate API to analyze each review using six hotel-specific categories: food, price, location, accommodations, amenities, and staff.

sample of hotel rater

Check out the results in this “HotelRater” demo.  Select a location and a niche you care about, and you’ll see hotels organized in order of highest sentiment for that category.  To generate those results, the sentiment scores for each hotel across each category were averaged, and then placed on a scale from 1 to 100.  (I chose to take a mean of the sentiment values because it’s a value that’s easy to calculate and understand.)  The TripAdvisor five-star rating is shown for comparison.  You can also click on the hotels listed to see how Repustate categorized each of their reviews.

When I started putting the app I built into practice, I could suddenly make sense out of TripAdvisor’s abundance of data.  While a hotel might have a four star review on average, the customers were generally very happy with key aspects, such as its food, staff, and location.  Desirable hotels popped up in my search results that I might never have even seen or considered because of their lower average star rating on TripAdvisor.  The listed sentiment scores also helped to differentiate hotels, which I would have previously had trouble sorting through because they all shared the same 4.5 star rating.

This demo isn’t a replacement for TripAdvisor by any means, since there’s hardly enough stored data or options included to assist you in your quest for the perfect hotel on your vacation.  That said, it’s a positive step towards a new ranking system that’s aligned with our individual values.  We can quickly see how people are really feeling, without condensing their more specific thoughts into a blanket statement.

I’d give that five stars any day.

Sentiment Analysis Accuracy – Goal or Distraction?

Everything you need to know about sentiment analysis accuracy

When potential customers approach us, one of the first questions we’re asked is “How accurate is your sentiment analysis engine?”. Well as any good MBA graduate would tell you, the correct answer is “It depends.” That might sound like a cop-out and a way to avoid answering the question, but in fact, it’s the most accurate response one can give.

Let’s see why. Take this sentence:

“It is sunny today.”

Positive or negative? Most would perhaps say positive – Repustate would score this as being neutral as no opinion or intent is being stated, merely a fact. For those who would argue that this sentence is positive, let’s tweak this sentence:

“It is sunny today, my crops are getting dried out and will cause me to go bankrupt.”

Well, that changes things doesn’t it? Put into a greater context and the polarity (positive or negative sentiment) of a phrase can change dramatically. That’s why it is difficult to achieve 100% accuracy with sentiment.

Let’s take a look at another example. From a review of a horror movie:

“It will make you quake in fear.”

Positive or negative? Well, for a horror movie, this is a positive because horror movies are supposed to be scary. But if we were talking about watching a graphic video about torture, then the sentiment is negative. Again, context matters here more than the actual words themselves.

What about when we introduce conjunctions into the mix:

I thought the movie was great, but the popcorn was stale and too salty.

The first part of the sentence is positive, but the second part is negative. So what is the sentiment of this sentence? At Repustate, we would say “It depends.” The sentiment for the movie is positive; the sentiment surrounding the popcorn is negative. The question is: which topic are you interested in analyzing?

There is no one TRUE sentiment engine

As the previous examples have demonstrated, sentiment analysis is tricky and highly contextual. As a result, any benchmarks that companies post have to be taken with a pinch of salt. Repustate’s approach is the following:

  1. Make our global model as flexible as possible to catch as many cases as possible without being too inconsistent
  2. Allow customers the ability to load their own custom, domain-specific, definitions via our API (e.g. Make the phrase “quake in fear” positive for movie reviews)
  3. Allow sentiment to be scored in the context of only one topic, again, via our API

When shopping for sentiment analysis solutions, ask potential vendors about these points. Make sure whichever solution you end up going with can be tailored to your specific domain set because chances are your data has unique attributes and characteristics that might be overlooked or incorrectly accounted for a by a one-size-fits-all sentiment engine.

Or you can save yourself a lot of time and use Repustate’s flexible sentiment analysis engine.

The Repustate Promoter Score (RPS) explained

Repustate’s new metric

One of the newest and most popular features of Repustate’s Social Media Monitoring platform has been the addition of a metric we call the Repustate Promoter Score, or RPS for short. The RPS provides a simple score, from 0 to 10, that indicates the overall sentiment people have about a particular topic in question. This could be a person, a brand, a product, anything.

The RPS is calculated as follows: Take the total number of positive mentions, the total number of negative mentions, compute the Wilson Confidence interval using those two numbers, and then multiple by 10 to get a score between 0 and 10. That’s a mouth full, let’s dig a bit deeper.

How it works

Total number of positive & negative mentions is easy enough to understand, but what’s this about a Wilson Confidence interval? Well, it’s a math formula which tries to determine how confident you are about a particular outcome given the data. To put this in social terms, let’s say we have 1 tweet about the Apple iPhone and that tweet is positive. How confident are we that everyone is positive about the iPhone? Well, not too confident, because we only have 1 piece of data. But let’s say we have 1000 tweets, and 950 are positive. Now our confidence level rises, and that’s the thinking behind using the Wilson Confidence internal.

Some examples

Here are some examples of Repustate Promoter Scores given a number of positive and negative mentions:

Positive: 1, Negative: 0, RPS: 2
Positive: 10, Negative: 0, RPS: 7
Positive: 10, Negative: 10, RPS: 3
Positive: 0, Negative: 100, RPS: 0
Positive: 100, Negative: 5, RPS: 9

When you create a new data source, Repustate calculates the RPS on an ongoing basis so you can see how the RPS fluctuates over time in your dashboard.

Using Python to data mine Twitter

With the launch of our Social API a little over a month ago, Repustate now allows its customers to create rules to monitor social media, all without ever leaving the confines of your favourite programming environment. But the piece that was missing was the ability to get various statistics and reports about your data sources. Today, that changes with Repustate’s new visualize API call.

If you’re impatient, here’s the gist of it:

If you look through the above gist, you’ll see the steps we take are pretty simple and can be parametrized and automated so you can create graphs and charts as part of your regular reporting process. Our code sample was in Python, but it can easily be ported to Ruby, PHP, Go, or any language of your choosing. To summarize the above code snippet:

  1. We created a new data source
  2. We added a monitoring rule to the source. In this case, we want all mentions of iOS7 on Twitter
  3. We then asked for a graph depicting the gender breakdown for all tweets with positive sentiment for this data source.

With very little code, we’re able to create and visualize complex queries. Neat, huh?
We’ll be adding more visualizations to the API as time passes. And by all means, if you have any suggestions, please let us know!

 Data mining Twitter with Python – A little more detail

Now that we’ve skimmed the surface with that code sample up above, let’s dive a little bit deeper and see what’s going on and what the Social API allows us to do.

The first concept we introduced was the data source. A data source is any social network Repustate monitors (currently Twitter & Facebook). Now, you can create multiple instances of data sources, that is to say you can monitor Twitter for all sorts of different things at the same time.

Each data source can have one or more rules. You’ll see in the code above on lines 14-19, we define our rule. Our rule says “Get me any mention of iOS7 on Twitter, don’t include any retweets, and only tweets from people with at least 1,000 followers”. The last bit, filtering by number of followers, helps remove any spam bots.

At this point, Repustate has enough info to start fetching your data. It might take a few minutes before any data gets populated but once it does, you can run the visualize API call to create graphs and charts that summarize your data. In the above example, we chose the gender filter type, which displays the breakdown of male to female tweeters in our data source.

Here’s a sample of a graph the Repustate returns using the visualize API call:

Positive category filtered by gender

Let’s do another one. This time, let’s take a look at all devices used to create this content, sorted by those which were most commonly used:

Devices used for this data source

You can see with just a few API calls and just a few more lines of Python, we’re able to create pretty insightful visuals. No need for Excel or your own ETL processes – just Repustate’s Social API.

 

What’s the secret to great customer service? Don’t be a jerk.

One of the most frustrating aspects of dealing with larger companies is the customer service experience. Whether it’s being caught in a game of customer service hot potato (“Please wait while we transfer you for the 8th time”) or being told “I’m sorry we can’t do that, company policy” it can permanently ruin a customer’s relationship with a company. While larger companies perhaps feel strict policies and procedures provide a consistent experience and therefore make their business easier to administer in the long run, Repustate has never adopted a one-size-fits-all approach to customer service. Each customer is different, so let’s treat them that way.

Examples of good customer service

The one rule we do hold constant is this: Don’t be a jerk. It’s kind of our version of Google’s  (in)famous “Don’t be Evil” motto. Here’s a few examples of “Don’t be a jerk” in practice:

1) I wouldn’t have believed it until I saw it for myself, but every so often, we get customers who sign up for Repustate’s priced plans, use the plan for the month, then contact us and say they’d like a refund for that month. Crazy, I know, and we would be entirely correct and fair if we said “Are you out of your mind? That’s like getting a bucket of chicken from KFC and returning a bucket full of bones and asking for your money back”. But you know what – we always refund their money. It’s just not worth our time to argue and go back and forth. Also, we’re hoping that our leniency and understanding will result in these customers coming back one day. In our experience, about 50% of the time they do come back at a later date as permanent, paying customers.

2) We often get students contacting us wanting to use Repustate for their research papers. The free plan we offer only provides 1000 API calls per month and often these students need hundreds of thousands of calls to complete their project. We always give students a free plan with unlimited usage for a few weeks on condition that they A) cite Repustate in the paper/project and B) send us the final paper or a link to the site. We like to think it keeps Repustate in good standing with the karma gods, but more seriously, these students will graduate one day, go work somewhere and might recommend Repustate to their employers.

3) Prompt email responses. Sounds like a no-brainer, but Repustate does not send out “Thank you for contacting Repustate” emails. I hate getting those because it gets your hopes up that someone replied, and then you check the contents, and you get sad 🙁 We just reply promptly, always same day, usually within the hour.

4) Honour your existing customers. We recently underwent some price plan changes and modified the account quotas. Now, we could have gone to all of our existing customers and said “Too bad, new quotas in place” but we didn’t. We continue to honour some of our older plans because that’s the right thing to do.

To sum up, just be nice, it’s not that hard. It’ll pay off in the long run, if not the short run as well.