Using Python to data mine Twitter

With the launch of our Social API a little over a month ago, Repustate now allows its customers to create rules to monitor social media, all without ever leaving the confines of your favourite programming environment. But the piece that was missing was the ability to get various statistics and reports about your data sources. Today, that changes with Repustate’s new visualize API call.

If you’re impatient, here’s the gist of it:

If you look through the above gist, you’ll see the steps we take are pretty simple and can be parametrized and automated so you can create graphs and charts as part of your regular reporting process. Our code sample was in Python, but it can easily be ported to Ruby, PHP, Go, or any language of your choosing. To summarize the above code snippet:

  1. We created a new data source
  2. We added a monitoring rule to the source. In this case, we want all mentions of iOS7 on Twitter
  3. We then asked for a graph depicting the gender breakdown for all tweets with positive sentiment for this data source.

With very little code, we’re able to create and visualize complex queries. Neat, huh?
We’ll be adding more visualizations to the API as time passes. And by all means, if you have any suggestions, please let us know!

┬áData mining Twitter with Python – A little more detail

Now that we’ve skimmed the surface with that code sample up above, let’s dive a little bit deeper and see what’s going on and what the Social API allows us to do.

The first concept we introduced was the data source. A data source is any social network Repustate monitors (currently Twitter & Facebook). Now, you can create multiple instances of data sources, that is to say you can monitor Twitter for all sorts of different things at the same time.

Each data source can have one or more rules. You’ll see in the code above on lines 14-19, we define our rule. Our rule says “Get me any mention of iOS7 on Twitter, don’t include any retweets, and only tweets from people with at least 1,000 followers”. The last bit, filtering by number of followers, helps remove any spam bots.

At this point, Repustate has enough info to start fetching your data. It might take a few minutes before any data gets populated but once it does, you can run the visualize API call to create graphs and charts that summarize your data. In the above example, we chose the gender filter type, which displays the breakdown of male to female tweeters in our data source.

Here’s a sample of a graph the Repustate returns using the visualize API call:

Positive category filtered by gender

Let’s do another one. This time, let’s take a look at all devices used to create this content, sorted by those which were most commonly used:

Devices used for this data source

You can see with just a few API calls and just a few more lines of Python, we’re able to create pretty insightful visuals. No need for Excel or your own ETL processes – just Repustate’s Social API.


Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>