Ionrock Dot Org

by Eric Larson

My Weblog

The Music Industry on the Web

I've started checking out more music blogs as of late (thanks Dave) and it is pretty interesting to see the conversations going on. Some blogs are primarily news. Personally, this gets rather old quickly since most writers (if these are actually people at all) are simply posting press releases or small paraphrases. Stereogum is a good example of someone doing the news in a relatively interesting way. Many others feel like bots. Other blogs make an effort to decipher the music industry and how musicians can actually make a career out of music. In theory, these are of interest to me, but the reality is they are usually a little too career focused. There is nothing wrong with this of course, but I'm not interested in playing in a cover band on week nights to pay the bills or writing songs for others to record. Again, it is totally cool if folks want to do this, but it is not for me. The rest of the music blogs are essentially just typical blogs. Kanye West's blog, for example, is essentially a stream of consciousness that seems to be aiming at defining what's cool. These obviously have a professional bent to them in that they are more consistent than one would imagine. Picking on Kanye again, I can't imagine he finds the 10 or so different pictures and videos that seem to show up on his blog daily. I could be wrong here and Kanye might have an unquenchable desire to find interesting and stylish furniture, but my bet is that he has an organization that helps him out.

Another theme I've noticed is the obvious frustration of major labels. As a programmer and generally geeky guy, I'm interested in most technical subjects. The web as a platform is more than the next generation of Windows.Forms for the world. It is a platform for content. The music industry was not prepared for the influx of shelf space the web provides. Before, with the inherit limitations of traditional distribution, the music industry could effectively act as a filter for popular music and the audience were more or less OK with it all. But now, things are different. There is a massive amount of music alongside an enormous marketplace. Major labels don't have the bandwidth to filter the music and programmers are doing everything in their power to make the future a place where labels will never have the influence they once had. I say "programmers" specifically because these are the people who have consistently started web sites dedicated to publishing free music over and over again. What's more programmers are the ones to pioneer podcasting, filesharing and blogging. All these advances have moved beyond programmers command lines to joe user, who now can easily rip and share every CD he has ever owned. In the end, it has made running a label near impossible.

As a musician, the tone of the industry is frustrating as well. There was a time where a great band could actually get paid! Often times it didn't work out, but there were still plenty of "rock stars" out there where making it seemed attainable. Now, we see indie artists blowing up and there is still a doubt they make any money at all, much less enough to have a decent lifestyle when no one cares about their music anymore. There are people still making money, but it is clear from the music industry bloggers, labels are really hurting, which in the long run also hurts artists.

The silver lining to all this is that as a musician it's very humbling. Ironically, the world of drugs, sex and rock'n'roll, humility is becoming the pathway to success on the web. The trend of social networking sites has become a hot topic because it is clear there is a mountain of possibility that is not being tapped. Social networking gives artists a way to directly, yet less intimately, interact with fans. The problem is that the lead singer telling the crowd to f*** off may seem cool on stage, but saying the same thing on twitter just seems rude. It is starting to look like the rock stars of the future will be the musicians who remember the kids name who saw them play six months ago.

Finally, the last theme that seems to be hidden beneath all the conversations is how do people connect with music. The current answer is through relationships, which is consistent with how the music industry obviously works. It's who you know. Yet, there is an acknowledgment within the industry that, above all, you have to write good music. At this point folks don't necessarily seem to believe there are too many tragedies out there where great bands don't make a name for themselves. I think I disagree, but that is probably more a function of my own hopes that we'll make it some day. What is pretty terrible is the cream of the crop (those that are doing pretty well) is so large that many people may never connect with bands they may love. Regardless of whether or not some band is getting listened to, audiences don't have a way to filter effectively. As I said before, labels used to do this, but no longer. There are attempts such as Pandora, but at the end of the day, they seem to fall short. The technical questions might not be how to find music for people, but rather how to organize people in a way that filters effectively. It might be more online radio, podcasts and blogs, or facebook groups. But, my bet is that there is a subtle context that we've been silently missing. Twitter, for example, championed communicating short ideas, much like a cocktail party. We need a twitter for discovering music.

Posted Thu Feb 5 08:21:00 2009 by Eric Larson

A Contextmanager Based Connection Pool

At work we have a whole variety of services we use that utilize some concept of a connection. Some of the services are RESTful, while others utilize Pyro and straight sockets. Seeing as these are so common, our makeshift web framework has a simple pool implementation that allows you to reuse the connections in a threadsafe way. One of these services is called Faststore (and yes this name is pretty bad). Faststore is a storage tool that aims at making writes extremely fast. It uses a bsddb underneath and handles a massive amount of data for us already. We've also written a CouchDB like app on top of Faststore called Ottoman (which I thought was a very clever name). In both cases, there is a pool that can be made available via CherryPy tools that allows you to use and reuse connections to these data storage services.

Seeing as our goal is to eventually make these apps open source, I've started playing around with them in my free time to see what it would take to remove a few of the more coupled aspects, which lead me to needing a pool implementation. I could have simply taken the pool from our framework, but it seemed like a good opportunity to learn something new and I read this article  from Jesse Noller on context managers that seemed applicable. The result is a simple connection pool using context managers. I call it "poodle" because it kind of sounds like "pool".

from __future__ import with_statement import thread, threading import contextlib import random import time class Pool(object):     def __init__(self, factory, args=None, kwargs=None, cleanup=None, min=5):         self.pool = []         self.swimmers = {}         self.args = args or tuple()         self.kwargs = kwargs or {}         self.factory = factory         self.cleanup = cleanup         self._lock = threading.Semaphore()         for i in xrange(min):             self._ci()             self.pool.append(self._create())     def _create(self):         return self.factory(*self.args, **self.kwargs)     def _get(self):             return self.swimmers[id]     @contextlib.contextmanager     def get(self):         with self._lock:             id = thread.get_ident()             if id not in self.swimmers:                 if self.pool:                     self.swimmers[id] = self.pool.pop()                 else:                     self.swimmers[id] = self._create()             yield self.swimmers[id]         with self._lock:             id = thread.get_ident()             if self.swimmers.get(id):                 if self.cleanup:                     self.cleanup(self.swimmers[id])                 else:                     self.pool.append(self.swimmers[id])                     del self.swimmers[id] class MockThread(threading.Thread):     def __init__(self, pool, name, indent=0):         threading.Thread.__init__(self)         self.pool = pool         self.name = name         self.indent = '\t'.join(['|' for i in xrange(0, indent)])     def m(self, *s):         print '%s %s' % (str(self.indent), ''.join(map(str, s)))     def run(self):         with self.pool.get() as conn:             waiting = random.randint(1, 3)             self.m('got connection')             self.m('using connection in ', self.name)             time.sleep(waiting)             conn(self.indent, ' hello world')         return class MockConn(object):     def __init__(self, name):         self.name = name         def __call__(self, *args):         print ''.join(args) if __name__ == '__main__':     tp = Pool(MockConn, ['eric'], min=0)     workers = [MockThread(tp, x, x) for x in range(0, 10)]     for i, w in enumerate(workers):         w.start()

The win here is that by using context managers, I've eliminated the need to check to see if threads are using a connection. This is traditionally necessary because an exception somewhere can be missed, which in turn never releases the lock on the connection. The context managers should automatically release the lock no matter what, which simplifies the code quite a bit. That said, I have no idea if there are blocking issues I'm missing or anything so feel free to comment and set me straight!

Posted Thu Feb 5 18:09:00 2009 by Eric Larson

Python Indexing with BerkelyDB

I've been somewhat enamored with CouchDB lately. When I began, I had a concerns about the speed. I had written similar stores based on XML and while they had their advantages, speed was never one of them. My understanding is that CouchDB manages to stay relatively speedy by using a BTree for storing its views. I looked around for a BTree implementation in Python and found that bsddb has built in BTree storage option. It seemed like it would be trivial to make a simple index using bsddb for persistence and caching views, and in fact, it was pretty easy.

import bsddb from collections import defaultdict from itertools import ifilter, imap from UserDict import UserDict from multiprocessing import Pool class RawIndex(UserDict):     def __init__(self, fn):         self.fn = fn         self._open()         self._cache = set()     def _open(self):         self.data = bsddb.btopen(self.fn)                    def sync(self):         self.data.sync()         self.data.close()         self._open() class MappedIndex(RawIndex):     def __init__(self, fn, map_func=None):         RawIndex.__init__(self, fn)         self._cache = {}         self._map = map_func     def add_map(self, map_func):         self._map = map_func     def _apply(self, func, args, itr):         def wrapper(v):             return func(*(v + args))         return ifilter(None, imap(wrapper, itr))     def build_cache(self):         for k in self._cache.iterkeys():             self.search(k)     def search(self, arg_tuple=None, reduce=None):         arg_tuple = arg_tuple or tuple()         if not arg_tuple in self._cache:             self._cache[arg_tuple] = self._apply(                 self._map, arg_tuple, self.data.iteritems()             )         if reduce:             return self.apply(reduce, [], self._cache[arg_tuple])         return self._cache[arg_tuple] class ProcessFunction(object):     def __init__(self, func, args):         self.f = func         self.args = args     def __call__(self, value):         return self.f(*(value + self.args))                       class ProcMappedIndex(MappedIndex):         def _apply(self, func, args, itr):         p = Pool()         f = ProcessFunction(func, args)         return ifilter(None, p.map(f, itr))

There are three different types of indexes here. The RawIndex is basically a super thin wrapper around a typical bsddb object. The MappedIndex allows searching the keys using a MapReduce-ish pattern. The idea is you pass in a callable that takes the key and value along with any other arguments you want to pass and returns an iterator. The ProcMappedIndex allows you to run the map and reduce functions in parallel using the multiprocessing module.

I experimented with using another bsddb for the in memory cache, but that didn't allow me to store the iterators or the argument tuple as easily. I could pickle them or something similar, but it seemed pointless since the searches seem like they'd never really hit the performance limits of a Python dictionary (which is fast). That said, I think my next toy might be a memcache type of application that uses an in memory bsddb for keeping key/value pairs.

I do want to point out that the multiprocessing module is really pretty awesome. I know there are some limitations on windows and it is a relatively new module (bugs?), but the fact it made things like performing map/imap operations on an iterator via its built in Pool is simply awesome. One thing though is that when I did try to use the imap function it would hang. I think there must have been a deadlock or something that I didnt' understand, so if there was an obvious reason please let me know.

 

Posted Fri Feb 6 09:00:53 2009 by Eric Larson

Transplanting with Mercurial

At work we use Mercurial. I don't know that we will keep using it as we are a rather global company and some of the other teams don't have the time adopt a new VCS that is much more complicated than existing systems. Despite mercurial's mixed reviews among the team, I'm becoming more of a fan. I can't say I'm really a fan of mercurial per se, but it is becoming clear how a DVCS is beneficial in a more intimate way. There are the traditional arguments surrounding things like "commit on a plane" and "branching made easy" but I don't think people totally see the impact until they really have to work with a tool like mercurial for an extended period of time. It doesn't mean it's easy by any means, but after a while there are definite advantages.

One of the benefits of a DVCS is the ability to take a set of changes and place them in another branch. This is not as simple as it sounds. There are a suite of things to consider and even more potential data to keep track of. Where did the patch come from? Can you revert the changes to a different version that existed before the current version was added? If it is a set of patches or changesets, do you get to revert specific changes or is it an all or nothing kind of operation? Is there now a permanant link between the two branches/tags/heads after copying over the changes? How would that even work?!

When you start limiting things a bit, the idea becomes manageable. Mercurial has a plugin called transplant that makes some decisions. You don't necessarily get massive amounts of information which makes it relatively simple to move changesets around without much hassle. It also moves the changeset around as an atomic entity, which means that after you've transplanted, you don't need to commit or add a message saying you transplanted things. All in all, it is pretty easy once you get the hang of it.

To do a transplant first you need a repo. We are going to do everything in place, which means we are not going to clone to another directory, creating an implicit new branch.

elarson $ echo 'print "hello world!"' > hello.py elarson $ hg add hello.py elarson $ hg branch 1.0 marked working directory as branch 1.0 elarson $ echo 'print "goodbye world!"' >> hello.py elarson $ hg st A hello.py elarson $ hg ci -m 'ended' elarson $ hg id 020db5c02665 (1.0) tip elarson $ hg branch 2.0 marked working directory as branch 2.0 elarson $ echo 'print "wait... ah nvmd"' >> hello.py elarson $ hg ci -m 'nvmd' elarson $ hg up 1.0 1 files updated, 0 files merged, 0 files removed, 0 files unresolved elarson $ echo 'print "talk to you again later"' >> hello.py elarson $ hg st M hello.py elarson $ hg ci -m 'tty' created new head elarson $ hg id 65a5be09f306 (1.0) tip elarson $ hg heads changeset:   2:65a5be09f306 branch:      1.0 tag:         tip parent:      0:020db5c02665 user:        Eric Larson <eric@ionrock.org> date:        Mon Feb 09 21:10:41 2009 -0600 summary:     tty changeset:   1:93127fc79160 branch:      2.0 user:        Eric Larson <eric@ionrock.org> date:        Mon Feb 09 21:09:38 2009 -0600 summary:     nvmd elarson $ hg up 2.0 1 files updated, 0 files merged, 0 files removed, 0 files unresolved elarson $ hg transplant -b 1.0 2 applying 65a5be09f306 65a5be09f306 transplanted to 9cbf35e4f623 elarson $

In the example we made two branches and both had some work in them. In the real world, this is as if you are working on a new release (2.0) and you fixed a bug in a previous release (1.0) that you need to forward port to your new release branch. If there were a series of commits you'd just do something like hg transplant -b 1.0 3:5 7. That would transplant the changesets 3 through 5 and also changeset 7. If there are conflicts you will get to merge as usual. For example, if you use Emacs (really, what else would you use?) ediff should come up with the merge interface and you can move along.

Also, I should mention there is something to be said for being able to work in the same directory all the time. As a Python developer, I use virtualenv, but for day to development, it can be much easier to just keep your system some what bleeding edge and only use virtualenv's for specific projects or sandboxes. It is nice to have your server running, hg up to some branch to test and see your server restart and be ready to go. It is a small issue, but once you get used to it, it is pretty convenient.

If you are using mercurial I hope you spend some time trying to learn the more detailed aspects of it. The concept of heads, while trying, is pretty helpful at times. There are also a host of plugins that can be helpful. For example, Mercurial Queues is one that consistently comes up when comparing Git and rebasing. I've found queues to be extremely confusing, but transplanting has worked for me. There are also other plugins like Local Branch that seem pretty nice. A DVCS raises the complexity bar in terms of possible work flows, and there is a pretty good chance that whatever DVCS you choose, there should be a way to make it work. For me, transplant works.

Posted Tue Feb 10 03:35:40 2009 by Eric Larson

Discovery People are People!

Kanye West did an interview and posted it on his blog. He made a really poingnent comment regarding as he has grown as an artist, he has also grown as a person realizing that everyone is a person and deserves to be respected as such. Well said Kanye!


Kanye visits Big Boy's Neighborhood (Talks about Chris Brown,Rihanna,and Gay rumors) from qdeezy on Vimeo.

Posted Thu Feb 12 22:15:00 2009 by Eric Larson

Emacs: For Those Not Command Line Inclined

At work most folks use Vim. I use Emacs. We have fun debates all the time regarding why each other's editors are better. Nothing serious, just fun. One of the things I've realized is that I do most of my work within Emacs, which is a pretty common sentiment for Emacs users. I run servers, ipython, diffs, irc, tweet, etc. all in Emacs. I'm realizing part of the reason Emacs appeals to me in such a way is that it effectively has become my desktop environment through the shell. To put it another way, my command line skills suck, so Emacs is my GUI.

This became extremely clear to me when I was going to apply a diff. I had started working on a feature around the same time we were going to make a release. I was bouncing back and forth between branches, making fixes and moving forward a new feature. At one point, I realized I was working in the wrong branch. I needed to move my uncommitted work to the default branch and push my release branch changes. I didn't want to commit and then transplant the changes. That seemed like a bad idea because that would be a commit and then revert of the code in the log. It seemed like a good time to try and learn how to really work with diffs and patches so I did a quick "hg diff > release_to_default.diff" and was on my way.

After pushing my release changes, I needed to figure how to apply the patches in the diff. I started looking at the patch command. Back in the day, I used to try and do things like patch my kernel, but the whole patching experience was never very fluid to me. I didn't do nearly enough to become proficient. Realizing my desire not to actually use the patch command, I went ahead and opened the diff file in Emacs. Sure enough there was syntax highlighting for diff output along with a diff-mode menu. I took a quick look at the menu, saw C-a C-c applies a patch to a file. One keyboard command later, I had applied part of the diff file. A few more commands later and I was done! Very nice.

I'm obviously avoiding some potentially important information here by choosing the Emacs way over learning something about the patch command. I can't say I'd reccommend the same steps. At the same time, it was so easy! Some day I'm really hoping I'll get my head wrapped around using a shell like a real guru, but for the time being, I don't mind a little magic in my editor... Even if the Vim folks make fun of me for it!

Posted Fri Feb 13 16:36:11 2009 by Eric Larson
Created using Python, jQuery and Emacs