Pitchfork.com Album Rating 'API'14 Feb 2012
Skip straight to a JSFiddle demo if words bore you.
My initial process went as follows:
- Use album and artist from Rdio URL to get URL of Pitchfork review page
- Scrape review page for score
Getting the Pitchfork Album Review URL
Upon first review of the Pitchfork site, it seemed like trying to generate URLs based on album name was out of the question. All Pitchfork album review pages contain an ID in the URL, such as
http://pitchfork.com/reviews/albums/15551-bon-iver/. The next best thing I could think of was to try and generate a search URL since the syntax was simple (
http://pitchfork.com/search/?query=bon+iver) and then scrape that page. However, when investigating a little further I saw that Pitchfork has an autocomplete 'API' which returns JSON.
Since it returned JSON, I was able to use a Yahoo Pipe to convert the JSON to JSONP. I wanted to keep this demo on the client-side as much as I could, so anything I could do to not have to setup a server-side proxy was a win. With the Yahoo Pipe returning JSONP I had a list of URLs for review pages when entering a search term.
Getting the Album Score
Staying the client-side spirit, I decided to use YQL and based my code off James Padolsey's 2009 blog post on the subject. YQL made it relatively simple to take the review URL, parse the review score from the HTML with XPath, and return JSONP. I'm not an expert in these things so I feel like my XPath query is pretty fragile. If you have any suggestions to make it better, I'd to hear them. Fragile or not, my response data was JSON that contained the album's score. Success!
The demo currently pulls the artist and album search terms from the
#album spans. Click the + button to edit the fiddle and try some of your own search terms.