Ionrock Dot Org

by Eric Larson

My Weblog

MySpace Hacking

This weekend, I took a couple minutes to start making an actual MySpace "parser" of sorts. Mainly I am just scraping for the show list of bands, but I have seen other scrapers that have gotten things like a comment feed and friend feed. My goal is to create a nice page that lists all the tour dates of bands I like and allow others to sign up and do the same. I will also try to support customized RSS/Atom feeds but, no one uses that just yet so there is no hurry on that front.

I am using BeautifulSoup to parse the page. This is a really great library because it takes care of crappy html and makes it somewhat manageable. The problem with MySpace is the html is a nightmare. By the way, when I say it is a nightmare, I really mean I dreamed about the nastiness of their crappy code.

In related news, I read this article on being "Web 2.0″ that seemed to only egg on my frustrations with MySpace. The author points out that in addition to interesting uses of AJAX, Web 2.0 has really been about APIs. I think this is right on. The concept of web services has always been of interest to me, but the implementation always feels very kludgy. Things like screen scraping, on the other hand, is simple, even if it does force a lesson or two in regular expressions.

Making screen scraping essentially easier is the new RESTful services and I really like it. Designing an interface is not about writing a WSDL file and registering it with a UDDI service. It becomes simply creating a nice URL that you can depend on and writing the app to hand out data in many formats. TurboGears does this nicely with its JSON filter and there are a million different XML examples. After talking more with Jesse on the subject, I think I will definitely be looking into how to essentially keep all my "design" in an XSLT template and leave my application to simply use consistent XML content.

I know none of this is really new, but I think it suggests a new way of thinking that has been brought about by the web. Previously, application developers paid close attention to handling many cases with applications and the result is things get very complex. Now, developers recognize that handling every case requires more work than is profitable. The result is to create effective APIs that can be used to handle simple tasks. Essentially, this is like taking the Unix philosophy to the web through services.

I think it is rather silly that MySpace is one of the most popular sites on the internet when it was written in Cold Fusion and cannot write decent HTML to save its life. It just goes to show that good technology means nothing if no one uses it.

Posted Sun Jul 30 19:47:04 2006 by Eric Larson
Created using Python, jQuery and Emacs