Social media data categorized automatically

Social media data is being generated faster than we can count, but it doesn’t have to be hard to organize and analyze the mountain of data. Repustate’s newest API call categorizes your social media data making it easier for you to analyze and report on the data that is important to your company.

Categorize by industry

Repustate’s categorization works within the context of a particular industry. Currently, the supported industries are airlines, hotels, telecommunications, and restaurants. The possible categories for each industry are different because of course, the industries themselves are quite different. To view the full list of categories available for each, take a look at our API docs.

Social media example (and when sentiment alone is not enough)

Let’s say we are a hotel chain, Repustate Hotels, and we have a Twitter account used for getting feedback from our customers. When a customer tweets “@RepustateHotels I loved the beds in your hotel, but the food was terrible”, there are a few pieces of sentiment being ┬ástated here. Is the overall sentiment of this sentence positive or negative? I would argue it’s both. The customer liked the rooms, but did not like the food. It would be useful to extract these bits of information separately. That’s what Repustate’s API call does automatically.

As a result of passing the above tweet to the categorize API call, you’ll get the following structure back (in JSON):

 {"food": [{"chunk": "the food was terrible", "score": -0.42798}], "accommodations": [{"chunk": "@ RepustateHotels_I loved the beds in your hotel", "score": 0.23237}]}

Now very quickly you can see the various categories within the hotel industry that this text matches up with. Expanding on this idea, you can envision how this would be an invaluable tool to assess customer satisfaction across your entire organization and see which areas need to be improved.

How to extract images from a web page

Ever wanted to extract images from web pages? Now you can with one simple API call.


Repustate’s clean-html API call has been one of our most popular API calls since Day 1. It hasn’t been touched much as its performance was quite good from the get-go, but that changed recently. Now you can extract images as well as the text from any web page.

We had a customer request to add the ability to extract the main image from a web page as well, similar to how Instapaper or Mobile Safari’s “Reader” feature works.

Now by default, when you call clean-html, an image attribute comes back with a URL for the main image, if it exists, for a given article.

Let’s take a look at an example. You’ll need a Repustate API key to try this on your own but it’s free and easy to get one. Let’s take this URL:

and pass it to our API call.

curl -d "url="

And here’s the response:

 {"status": "OK", "text": "To progressive Canadian Catholic ... (shortened for this example)", "image": "", "url": ""}

As you can (kind of) see, there is an ‘image’ key in the JSON response with a URL for the main image of that article.

With this API call, you can create your own version of Instapaper or Readability for your own purposes.