How to extract images from a web page

Ever wanted to extract images from web pages? Now you can with one simple API call.

 

Repustate’s clean-html API call has been one of our most popular API calls since Day 1. It hasn’t been touched much as its performance was quite good from the get-go, but that changed recently. Now you can extract images as well as the text from any web page.

We had a customer request to add the ability to extract the main image from a web page as well, similar to how Instapaper or Mobile Safari’s “Reader” feature works.

Now by default, when you call clean-html, an image attribute comes back with a URL for the main image, if it exists, for a given article.

Let’s take a look at an example. You’ll need a Repustate API key to try this on your own but it’s free and easy to get one. Let’s take this URL:

http://www.thestar.com/news/insight/2013/02/15/challenging_the_vatican_progressive_catholics_say_reform_must_begin_with_church_governance.html

and pass it to our API call.

curl -d "url=http://www.thestar.com/news/insight/2013/02/15/challenging_the_vatican_progressive_catholics_say_reform_must_begin_with_church_governance.html" http://api.repustate.com/v2/YOUR_API_KEY/clean-html.json

And here’s the response:

 {"status": "OK", "text": "To progressive Canadian Catholic ... (shortened for this example)", "image": "http://www.thestar.com/content/dam/thestar/news/insight/2013/02/15/challenging_the_vatican_progressive_catholics_say_reform_must_begin_with_church_governance/vatican_lightning.jpg.size.xxlarge.promo.jpg", "url": "http://www.thestar.com/news/insight/2013/02/15/challenging_the_vatican_progressive_catholics_say_reform_must_begin_with_church_governance.html"}

As you can (kind of) see, there is an ‘image’ key in the JSON response with a URL for the main image of that article.

With this API call, you can create your own version of Instapaper or Readability for your own purposes.

25 thoughts on “How to extract images from a web page”

  1. Pingback: fuck google
  2. Pingback: fuck google
  3. Pingback: fuck google
  4. Pingback: fake taxi
  5. Pingback: 他妈的谷歌
  6. Pingback: 他妈的谷歌
  7. Pingback: porno
  8. Pingback: mobil porn
  9. Pingback: 他妈的谷歌
  10. Pingback: 手錶手機色情
  11. Pingback: 色情管
  12. Pingback: QWEQEWQE
  13. Pingback: 色情
  14. Pingback: 牛混蛋
  15. Pingback: porno izle
  16. Pingback: picccc
  17. Pingback: oruspu çocuğu
  18. Pingback: istanbul escort
  19. Pingback: bok
  20. Pingback: amkpici
  21. Pingback: sana nefes yok

Leave a Reply