At work we used IRC. While I’m not sure the original purpose, we managed to spawn an IRC bot that has become something of an icon in our corporate culture. It does the normal things like logging for specific channels and can remember the last time someone said something. It also can tell you the weather for different areas, stock prices and quite a few other web service mashup type functions. What is the most fun though is the karma system it allows. You can motivate (!m or !motivate $username) which will provide a nice “you’re doing good work foo” message and add to your karma. There are a whole set of karma incrementors and decrementors that help release a little steam or provide a nice pick up when you’re doing something right. In addition to karma, there are also quotes. Some come from websites and different characters, but many are from our discussions.
While I wouldn’t say the our bot is necessary, it honestly makes working remotely a much nicer experience and generally is a lot of fun. For example, I have a virtual machine that has the same name as my username. When I would mention it in the chat room it was a little confusing, so I made a function that would let people know if I said my username, I actually meant my VM (elarson – The Machine!!!). It is kind of silly, but I really enjoy it.
With that in mind, it seemed like it would be a lot of fun to have a similar bot for Twitter. An account that you could ask questions or tell it to do things for you. Personally, I didn’t have to many specific use cases, but my hope is that others will customize it to suit their needs. The project/library is called TwitterBot and is written in Python.
The basic idea is that you start up the TwitterBot with a configuration and you can add your own functions to either run periodically or depending on the content of tweets to the twitter bot’s username. It comes with two example plugins. One is a delicious tool that lets you tweet URLs to your TwitterBot and it will post the to delicious for you. The format is really simple. You just tweet a message, a url and any tags in the form of ((tags: foo bar)). I stole the tags format from Posterous, so hopefully it will be somewhat familiar. It will save the tweet without the URL and tags as the note and make an effort to find the title of the page. It will also expand any URLs that have minified so things like edited RTs should be pretty easy to move from something like a twitter favorite to delicious.
The other tool I added was a period check for new followers. It saves your follower list in a sqlite database and if there are any new followers, it will follow that person and send them a direct message with a simple “Thanks for following!” message. This is really more to be used as an example of a periodic check. For example, if you wanted to tweet when some page gets updated or send reminders for events, this would provide a simple format.
I haven’t put it up on the PyPI just yet, but I will do so shortly. Until then, you can check out the code on bitbucket. To run it you can do:
python -m twitterbot path/to/config.yaml
The config file is just a YAML file and there is an example in the package. I’m happy to accept any patches or extensions to help make it more interesting. That said, if you are interested in making something that annoys folks, it is probably best to keep it to yourself. For any questions just leave a comment, send me a message on bitbucket or email.
Thanks!
Last night we saw Crazy Heart. It is a pretty typical story about a drunk country singer who makes an effort to change late in life. I can’t say that the story was really all that original, it has some great acting that really helps to relate to the characters. The musical side of things also felt very real. The way he wrote songs and traveled around in a old suburban, sleeping in dumpy motel rooms and playing terrible places was all right in line with real life. It was obvious he had some success at some point in his life, but that was well behind him. The film presented a realistic picture of a working musician that had found success and lost it over time.
The movie portrays a rather realistic view of a working musician. What’s more, it shows a rather realistic perspective of song writing in general. If you write a good song, it should sound a feel like other songs, even though it might be completely different. That is not to say you should rip off other songs, but rather you consider the audience. The older I get the more I realize how important it is to consider your audience when communicating. It is a huge challenge to put yourself in someone else’s shoes and consider their influences and perspectives on different topics and use that information to make your own point.
Songs are most definitely a form of communication both through the lyrics and with the music. If you listen to music, there is huge segment of popular music that uses a basic back beat. This is where the snare usually hits on the second and fourth beat of a measure or phrase. This is because it is a such a popular beat that it is in so many songs. My guess is that it will continue to be a critical aspect in music for years to come, simply because it gives listeners something to immediately feel comfortable with.
I’d definitely recommend Crazy Heart. Not so much because the story is so enduring, but simply because it clearly communicates the difficult life of a working musician. The long drives through beautiful country only to end up at a dirty bar (or bowling ally in the movie) with the appreciation of one or two fans making it all worthwhile. I think part of the appeal of music is that you have that opportunity to communicate on such a different level where the ideas and concepts are less important than transferring some set of emotions through sound. Even though it can feel pretty thankless at times, it also doesn’t take much to know you’ve made that connection.
It is rare that I answer my door anymore at the house. The vast majority of the time it is nothing I want to deal with. One time I had my neighbor come to my door and it ended in a rather nice conversation for a bit. That is probably the last time I can remember where I was really glad someone unexpected came to my door. Most the time they selling something, with magazines being the most prevalent product. Some other popular products people have tried to sell are arts and crafts, monthly meat subscriptions and lawn services. Yet, of all the things folks try to sell, magazines seems to be the most popular. Needless to say, I’ve never bought any.
The whole concept of selling magazines door to door just doesn’t make sense. First off, I have the internet. I can get whatever news and information I need. Magazines quickly become clutter, something I just don’t need. In addition, I get the opportunity to publish my own thoughts and opinions as well as become part of a community. Secondly, why would I give a random person off the street a personal check, cash or credit card number for anything? What is the difference between some kid off the street trying to “succeed” selling me a subscription to Time and some bit of spam telling me to buy some pills? I didn’t ask for either and therefore I don’t really feel confident sharing personal information with these parties. This goes the same for sales calls.
It is nothing personal. Honestly, the kid that just came to my door this afternoon seemed like a nice enough guy. He was working the streets in a shirt and tie in the rain. His hand was freaking freezing when I shook it. Part of me felt sorry for him, but at the same time, I was not about risk giving out my banking information or credit card just because this person felt selling magazines was a worthwhile endeavor. While I doubt he was planning on swindling me or stealing my identity, there is no way I’m going to take that risk with my family just so this guys can get a cheap cruise or get the opportunity to sell selling subscriptions to other kids.
Some might say I’m just being paranoid and I’ll admit that the chances of me getting burned are pretty slim. But, in addition to avoiding an unnecessary risk, I’m not really a fan of selling door to door as a good career for a kid. When I was in high school and looking for work, there were a few times where I found out the job was actually a sales position. One was selling knives and I quickly said no. It required me forking out $300 to get my demo kit and that was not going to happen. Another time, I went door to door selling coupon cards. A company would offer a card for $20-$30 that had discounts and free services. If you actually used it, it wasn’t a terrible deal. They would pick up everyone at a Perkins and drive us out to some neighborhood where we would roam the streets knocking on doors. I’m not a very good sales person, so it was easy to see how as a source of income it just didn’t scale. The people who were relatively successful were not exactly breaking the bank either. A friend of mine, who turned out to be an excellent sales person, got stiffed on some checks.
The point is not so much that selling door to door is bad as much as it just isn’t a good career. As a society, culture or community, there come times when transitions happen. It can be difficult, but usually it means a better deal for the majority. In my mind, selling door to door magazines falls into that category. There will always be people who buy into the prepackaged sales businesses, but at some point I hope they stop being folks knocking on my door in the rain trying to sell me paper magazines. In other words, look at Snuggie’s, ShamWow and even spammers for better models selling relatively useless goods. Then I don’t have to be frustrated when I go to the door knowing I’ll be saying no to buying an outdated medium.
I read this blog from one of the Miro developers that was a response to this article on how open source projects can find more contributors. I commented on the latter and realized that it might also warrant a blog post.
I’ve written a few times on the difficulty working on other people’s code. I read a blog the other day about the mythical 10x productive developer that claimed that the developer was really productive because that person had originally written the code. While I have no statistics on the issue, I bet that guy was more or less correct. It is so much different diving back into code you wrote versus taking a look at some unknown bits. Recently, I’ve been trying to do more work public with my TwitterBot and Dragoman (RESTful gettext). Part of what I’m realizing is that I’m not that bad a programmer. At work there are moments where I wonder if I picked the wrong profession. Maybe my mind just isn’t bent the right way to be a programmer. What I’m realizing now is that working on other people’s code is really, really hard! You may never fully understand what is happening under the hood and sometimes you don’t really even need to know.
I think this contrast between working on someone else’s code and new code has a direct relationship to how Open Source projects find contributors. If a project allows people to write extensions or plugins easily, it greatly increases the chances of that person eventually contributing to the core code. A plugin gives a programmer something that is small enough to do and in most cases, allows that person to scratch their own itch without feeling any sort of liability or responsibility to the community. The code works and that is all the matters. Over time though, that person might try to make that code nicer and eventually, the next thing you know, that person has dived into the core code to see why his pet feature is acting odd. What’s more, they are doing so in the context of their own plugin/extension, that they fully understand. It provides a really nice transition from specific modular knowledge of some codebase to understanding the core.
I would argue that most successful projects manage to make writing extensions easy. jQuery is a great example of a project that has a massive community and a much larger plugin community than core group. Firefox is another situation where the core C++ application is developed by a small group, while there are a huge number of extension authors. Eclipse, Apache, and Rails are all more examples where much of the community actually lies in its extensions. In the Python world, WSGI created a huge entry point for developers to write their own middleware and framework tools. In fact, I might go so far to say that Python would not have the community it has if it weren’t for the modular aspect WSGI brought to Python web developers.
Where this pluggability becomes difficult is when there is a user interface. Most projects generally have someone who is a leader and things like the UI end up being something restricted to the core group. That said, I think the Eclipse project (along with Emacs, Vim and TextMate) has done an excellent job exposing extensibility, even at the UI level.
If you want to run a successful open source project, I honestly can’t tell you how to do it. But, if you make it easy to write plugins or extensions, then you give developers a means of scratching their own itch. Nothing motivates better than Jones own selfish desires and extensibility speaks to this aspect of human nature. There are definitely cases where systems can become too modular, but that is what separates the great projects from the others. They have found that sweet spot for letting people get involved and become effective without sacrificing the core aspects of the application/library.
Before the Super Bowl, we went to the store. Honestly, this was not the best plan as it was a mad house. As I stood in the extremely long checkout line, I had moment really look around at my surroundings. I noticed a game machine where you pick up a stuffed animal. The sign on it was pretty terrible. It looked liked someone threw it together in Print Shop. While I’m sure there is some interesting tech in gaming machines, I couldn’t help but wonder why something so generic and tacky still manages to get people to spend money on it.
This got me looking around at other logos and advertisements and realized how most were less than appealing. It was clear that the “good enough” for advertising was pretty horrible. It also was clear that they were somewhat successful. I’m sure there plenty of brands that I glanced at that will never be seen again, but there were plenty that relatively successful. The whole experience made me think, that as a culture, we’ve become so accustomed to seeing ads that even though their message is displayed with a huge lack of quality, we don’t really care.
The fact that a dumb game machine that has to compete with guarantees can even be around when it looks so shady. If I had learned the hook couldn’t possibly pick up 99% of the items, it wouldn’t surprise me. Yet, even though the machines screams ripoff, it is still there and making money. The reason we are a culture of ads is because we spend our money on the products.
There is nothing wrong with spending money on something like a silly game picking up a stuffed animal. But, there is something wrong with a culture almost addicted to its own convincing. It might be that we feel desired when we see advertisements. That someone wants something of us. We feel needed. Personally, I don’t feel that way, but who knows what goes on in someones subconscious.
The Super Bowl and its tradition of “good” ads only supports the addiction. The people who say they watch the game for the commercials I think are lying. I watched the commercials and for every interesting gag, there were more regular commercials just showing produce and toilet paper.
My observation isn’t anything new, but for me personally, it was something I hadn’t personally realized. It honestly angered me to some degree that we’ve let ourselves become such consumers. Our senses are no longer focused on keeping out of danger or finding food, but instead they are interfaces for companies to give their pitch. Fortunately, we all have the choice whether or not to buy. And what’s more, the important aspect of that choice is that it’s something you as a single person do. No one chooses for you. The best way to combat the ad addiction is to simply exercise that choice and choose not to listen.
I used to be a big fan of XML and I’ve realized that while it definitely has its place, there it has been misused the vast majority of the time. What is interesting is that when I did work with XML, one area that it seemed to excel in my mind was for document type content. I don’t mean documents as in JSON and CouchDB, but actual documents like blog posts, articles, books, etc.
It would make sense then that I would think HTML is a good arena for XML-ish content, but I’m getting to the point where I’m not sure this is the case. Recently, I took a look at HAML and SASS. I didn’t try using them, but simply went through the docs and got a better idea what the actual markup might look like. The inherit link to Ruby was actually appealing because using it in Rails made so much sense. Getting rid of the angle brackets in the template language (ERB I think) really improved the templates and made things feel like Ruby. Being a Pythonista, I wondered where’s the Python version.
While I didn’t find a port, I did find SHPAML. It seemed really similar, so I tried writing a few examples. My first impression was that the syntax would make for a great template library. After talking to some folks about this, it became clear integrating it into another template library would make sense. Lo and behold Mako had a preprocessor argument that let you do things like run a function on the template content before passing it to the Mako processor. Adding that argument let me immediately integrate SHPAML and Mako. It was way too easy.
The templates looked nice. They did need a bit of getting used to, but overall, they were really simple. Today I saw a blog about how HAML it is bad for content. It makes a ton of sense. In fact, that is why I write my blogs in Emacs using webblogger.el. I get to write like I’m writing an email, yet I don’t have any of the email client to HTML cruft that always seems to be a problem.
I still think XML has its place, but honestly that place is becoming harder and harder for me to find. At this point, I’m going to suggest it can make for a good interchange format, but even then I’m not sure where something like JSON and conventions wouldn’t be a better place to start. What is making a ton of sense is optimizing templates for the purpose of the markup. I know HTML is not going anywhere any time soon and HTML makes for an OK output format. But, it doesn’t mean you need to author in HTML. This has been common for actual content, but I think for templates you can get some advantages using something like HAML or SHPAML. Likewise, using something like Markdown or reStructuredText for actual content is another way to optimize the document formats.
The gain is subtle, but important. It is just a little nicer writing a SHPAML file. It is not so large a difference that you never want to write HTML again, but it is enough that the code makes a little more sense. Personally, while it increases the complexity of the tool chain, it reduces the complexity of the actual code being written. In this case, I think the hidden complexity is worth it if you can understand the templates faster. C did the same thing with machine code, so it is the same thing here. If you’ve looked at these sorts of tools before and dismissed them, I’d suggest taking another look and actually try it. While it is a question of taste, I believe more people might enjoy the markup tools more than they expect.
I’ve always been a theoretical proponent of Test Driven Development (TDD). Part of it is simple pragmatism. It is nice to imagine a world where you can simply write code, switch to a terminal, run the tests and pretty much know things are working. This is especially heavenly when compared to testing Javascript across browsers.
Yet, when I have really looked into TDD as a structured practice it becomes less inviting. There are some ideas that promote fixing code so tests pass, regardless of whether or not you really solve the problem. I’m positive there is a rigor that helps to make this a successful means of writing code, but for me it is just a bit too abstract and scholastic. I’ll admit that I don’t know what the heck I’m talking about. But, since first impressions are important, it seems relevant to point out that TDD as a documented practice (much like Extreme Programming) doesn’t make as much sense.
With that in mind here is my description of how to do TDD.
1. Write something you think should work.
The first thing you can do before you test is having something to test. Consider it a hypothesis where you had an idea for some and made a first pass at implementing it. I have real problem writing tests first. It is difficult to imagine what you should really test. I have enough problems thinking about how to design and write a program where doing the same for tests before there is any code just doesn’t make sense. Even if you invest the time to come up with decent tests, you could have easily invested the time in the initial design and in both cases you would have gained the benefits of thinking through a problem. So, why waste the time on the tests and go ahead and learn more about your actual code and usually the domain you’re programming in.
2. Write something to prove that it basically works.
If you write Python, there is the “if __name__ == ‘__main__’” pattern. This is the kind of thing I’m talking about. You can write a small bit of code to make sure things look OK. If it is an API you get a minute to think about what it looks like to code to it. You also get a sanity check for catching compiler type errors such as spelling mistakes or broken imports. The idea here is not to write a test per se, but rather just create a small bit of code to help get the basics working.
3. Write a test that encompasses your basic proof.
Eventually, your basic code is going to start getting complicated. I don’t have a rule for this and neither should you. If you find you’ve written the same thing twice, then that is a good time to try moving it to some tests. At this point you need to make sure your test environment is really working for you. From here on out you are going to actually be doing things like writing tests as you add features, so you want to be sure that your bases are covered. The goal is to have a command to run that tells you your code is working. You have some code that more or less works and you need a way to automate more complicated input to that code. Hopefully this subtle difference is clear.
4. Write a test for each feature/bug.
At this point you have a command to run your tests and individual tests. This is probably using something like nose or py.test. Now you can really get on the ball and start doing more TDD-ish practices like writing tests before code. The reason that it works at this point is because you can start introducing different input in a structured way. You’re not firing blind into the dark without any idea what is even out there. You’ve written your code and found something that sort of works. You’ve put together a means of testing it and covering more use cases. Now you can think in terms of more uses cases that need tests.
5. Refactor the code.
This last step is to point out that after you test should be prepared to potentially refactor your code. This is kind of the point. Tests give you confidence that you haven’t backtracked and broken things. Refactoring is where you can improve your communication skills by making the code clearer. I’d lump in adding comments and documentation to this step as well since documenting is a good way to realize flaws in your design. When you realize that you just gave 14 steps for connecting to your simple RESTful service, then you’ll probably see there are some details that still need work.
That is pretty much it. I’m a proponent of writing tests first, but only when you really have a grasp on what you’re doing. One of the draws of TDD (and most development practices) is that you are forced to think about the problem. Most people do poorly at the waterfall method of project management because they don’t have the discipline to that thinking up front. In of TDD, moving that process to the test writing phase has presents the same discipline requirement with the same lack of understanding. I think things like agile project management don’t necessarily skirt the planning issues as much as they point out that failing fast and learning is what really works. In this case my simplified version of TDD is there to help fail fast and learn. What’s more you can potentially learn more about your users rather than your own ideas about your users. Your users won’t run your tests, they will be running your code, so focus on what solves user’s problems.
I haven’t tried Google Buzz. Since I moved to using Emacs for email and have tried to focus on using my own domain name, using Gmail proper isn’t really happening. But! From what I’ve read I think the ideas (that I’ve gathered) are really interesting.
Email is a really great platform. Not technically, but from a social perspective. Email is more ubiquitous than the web. There are no client issues when it comes to core functionality and everyone knows how to use it. What’s more, many folks are expert users that do complex tasks like create folders and filters. This is complicated compared to opening up your email and only reading and deleting.
With a world of people using email every day, it is a no brainer to consider how to create social software with it. Posterous is a great example of an application that exploits the power of email to help people publish. Google Buzz, being based on email is doing the exact same thing.
I have a facebook account that I don’t really use. I try to user twitter as a publishing platform, but ideas are hard to come by at times. Email I use everyday. It is something that is now a requirement for getting things done in the world. A tool that helps graph a social network based on email is idea because it is yet another helpful tool based on email to improve its place as the ultimate web communication tool.
Now, since I really don’t know the specifics of what Google Buzz is doing, my “analysis” is going to stop right there. But, I do have a wish list for the future of email that is based off of tools like Gmail.
1. Create and API – Oh how I wish Gmail had a really good API that wasn’t IMAP. Personally, I’m totally fine with IMAP, but it is not going to scale email to where it needs to be. Gmails labels are powerful and help to provide more options for organizing email. The search in Gmail is also really great in terms of the UI. It is trivial to see all your unread items in any set of folders that contain someones name in the subject. I’ll also bring up the spam filtering that I never notice, which is just proof that I appreciate it.
2. Build an Open Source Email Server – Yeah I know there are tons already, but I’m talking about one that speaks HTTP and uses a RESTful API like the one mentioned above. Something anyone could implement that doesn’t have the cruft of IMAP even though it might be supported.
3. Create More Organizational Paradigms – I think email is really important because of the audience. Everyone uses email, period. Yet, overloading it has always been a problem thanks to spammers. That is OK, as I think spam is essentially licked. With spam out of the way, we can start to consider other ways of organizing our mail. If you take a look at Gmail Labs, many of them provide extra tools for helping to organize your life in email. There are todo lists, extra inboxes, emblems and more that are all focused on improving your mail experience. This needs to continue if we want to continue to overload email with the social web.
So that is. I hope Google is listening (the company, not the search bot) and considers the future. What is funny is that on a personal level, I’m not really a fan of email. It is not something I enjoy or find that interesting. Yet, when you think of ways to get people to use an API, email always wins. I used email for mimicking a text message platform. Why? All the carriers support sending text messages by email. Mailman the mailing list software has allowed managing lists via email forever. Email marketing (not spamming) as a market is still rather vibrant with new companies showing up all the time improving the experience for readers. Like the web browser, email is not going anywhere and it is extremely pervasive. Email as the interface to the social web seems like the logical next step.
So, now that I’ve actually used Google Buzz a bit, it seems important to look back on my assumptions I raised before actually trying it out.
The basic assumption I was making is that Google Buzz had to do with email because it is used with Gmail. This was pretty much wildly wrong. The reality is it just takes twitter/facebook and gives you a page for it. That’s it. I’m not saying that it isn’t useful, but it is not what I hoped.
Some nice things about it have been how easy it is to integrate other services. Twitter is my micro-blogging-whatever-you-call-140-characters tool. My blog is for larger ideas and in depth publishing. Delicious and Google reader, in terms of sharing, both are simply link feeds that can be very twitter like or similar to a blog. All these tools to seem to integrate seamlessly with Google Buzz. And by seamlessly, I mean I didn’t have to configure an actual application (facebook) or use OAuth for anything. The benefit is that I get to use the tools I want and people following get to do what they want. Perfect.
That said, what’s the point. Who really cares if there is yet another way to share your status. I think twitter does a good job because it is a constraint. Google Buzz partially fixes that constraint, but that is not a good thing. Take MySpace for example. They were very strict in terms of what sort of data showed up on profiles. You had a UI that essentially had a basic database behind it and it put that information on the page. Where they let you go crazy was with themes, which I believe was a side effect. Now look at Facebook. Those guys don’t let you customize anything. It is still the basic info, only you get status messages. There really isn’t much that has changed and the big difference between the two in my mind is that one let you visually communicate who you are online while one doesn’t. My facebook usage is by no means authoritative, but the fact that people now use facebook instead of myspace suggests that no having an option to customize the look and feel is a good constraint.
Google Buzz on the other hand is only introducing less constraints. This is not a bad thing, but when they don’t offer better ways to keep in touch then it doesn’t matter much. Facebook has the timeline/status update that allows people to keep up with others. It is a feed reader for people. Instead of checking out a person’s profile and seeing their animated gif background, they get updated on their own profile page. People are challenged to post interesting status updates because that is how they say who they are and communicate with others.
My thought is that if Google Buzz was using email for this, then we’d be talking. You could pull in those folks that only use email into the social web. This might be grandpa and grandma or just your friend that doesn’t have any interest in computers. In addition, email could work with non-browser based clients. This is huge as it means phones could use out of the box (ahem like another service we’ve heard of… twitter). Alas, that was not the case.
The one potential innovation is that you have the concept of a Profile that is a little more ubiquitous than before. Facebook is a walled garden. Your Google Profile has more legs. You probably search with Google, check email with Google, read your feed reader with Google and possibly use a large set of other services (Picasa, YouTube, Blogger, etc.). Having a single provider for this is nice. It is also rather worrisome. The other nice idea about email and Google Buzz is that they don’t have to hold all the cards. Anyone can use email and anyone can create an email server. This is nice and good for the web. Everyone using their Google Profile means big brother. Maybe not at first but it doesn’t seem very far fetched that state schools might eventually require students have a Google Profile for working with certain software. The next thing you know Google gets involved in the DMV. After a while you realize that Google TV you bought is telling you to do your exercises in your living room and calling you specifically by your Gmail account name to go all the way to floor on your toe touches. Call me crazy, but it we don’t find a way to really distribute identity and become owners of our virtual property, then we are in some serious trouble.
With PyCon a couple days away I’m trying to get ready. Partially this is doing things like laundry. But, mostly, I’m trying to get my mind really focused on technology. One would think that since I’m a programmer that it is really easy to get into geek mode and just geek out on all the cool things going on at a conference like PyCon. The reality is that it is a little harder than that.
In addition to being a programmer, I’m a musician. The thing that stinks about PyCon is that it usually happens pretty close to SXSW. The result is while I’m ramping up for a serious geekout, my mind is also thinking of a grueling 5 days of shows, meeting people and working towards creating opportunities for the band. Since PyCon is a rather low key event, it is easy to put more focus on my SXSW concerns than my geek prep.
This year is a little different. For one, I started thinking about PyCon earlier. This was actually really helpful as I have some concrete plans as to things I’d like to work on for PyCon. I’m still searching for a theme in terms of sessions, but that shouldn’t be too hard once I get my schedule card in my hands. Another difference this year is that PyCon was happening before SXSW. Last year it was afterwords and I was very much still in SXSW mode. This meant thinking about things like record labels, managers and booking agents more so than the multprocessing module and NoSQL databases. I still managed to learn a great deal and come away with some great experiences, but this year I’m able to make a much better effort to keep focused on geeky topics.
I’m sure other people have had similar experiences where it is difficult to get out of a certain mode. I know when I’ve looked for a job in the past, that mode of searching for work, selling your experience and skills and generally trying to network becomes addictive. It is the same sort of thing at SXSW. You have high hopes important people will come see you play, so you give them every opportunity possible. It is a ton of work and rarely is that successful, but it seems that every year there is potential so you still work hard and go for it. Besides the hard work, it can be a lot of fun. You can meet some pretty interesting people and see some great bands. Some shows are really hard to get into, so it can be fun to find someone to get you in or to find your own way to slip past the person at the door. There is always a lot of free alcohol, although when you play all the time, you don’t really get that many chances to settle down for a cold beer. Likewise, you never know who you’ll meet at a show or in the street, so it is best to stay reasonably prepared.
All in all though, I’m really excited about both PyCon and SXSW. Both events feel familiar and more comfortable. This is probably the first year I’ve felt this way. I’m not taking it easy by any stretch, but my expectations are realistic and achievable. So, if you’re going to be at either (or both!) please drop me a line and lets grab a beer!
I’m almost done with the first day of PyCon and it has been a really good time. One thing that is kind of nice is that I don’t have a really strong topical focus. It has been allowing me to attend talks that fall on the edges of my interests, which means topics that don’t seem reflect directly on my day to day work. Fortunately, these kinds of talks often have an impact through exposure to different ideals or technology. A great example was an open space on MongoDB. They were talking about some pretty serious low level stuff, but at the same time, it was some great overviews of how things work.
Culturally it is a pretty interesting experience. This past year has pushed me to split my focus on music in addition to technology. It has been a good thing to get a better balance life and appreciate the time I do spend on technology. The converse is that the overwhelming geekiness of the crowd is rather stark. This is far from a bad thing. It is really inspiring to see people who are entirely engulfed in not just technology but the community surrounding it. While the initial social interactions can be tough for all involved (myself definitely included) once the commonality of Python comes to forefront, lasting friendships quickly develop.
It is definitely a good time to spend the days with such an interesting and intelligent group of people. There are immensely smart people all in once place and while there are plenty of quirks, there is a mass of kindness that is rather refreshing. I’m really looking forward to tomorrow!
One theme that I took away from PyCon this year was how truly bad the GIL is. For those that don’t know, the GIL is the Global Interpreter Lock. When you are programming and you want to do two things at the same time, you usually use something like a thread. Threads can access resources, so often times it is important to lock them while you use them. This prevents the situation where things get overwritten or corrupted due to two different threads accessing the same resources.
When you have one processor the result is that while things are presumably threaded, the operating system is actually just acting as though it can do two things at one time. So, in Python, when you are using threads on a one processor or single core machine, your threads are effectively just as good as any other language that uses the operating system’s threads. The problem is that when you move to more than one core, the operating system allows you to use more than one core, while Python’s GIL continues to mimic parallel operations.
The reality is that most of time it really doesn’t matter. I’ve avoided using threads in my design for the majority of my programing career and my findings are that it was a good idea to do so. But, now that machines are getting more cores, avoiding parallel operations is in fact more limiting.
Again, most people won’t need it. Python is pretty quick and there are instances where Python’s threads really can help. The problem is not where we are now, but where we want to be in the future. There was definitely an asynchronous theme at PyCon as well that effectively addresses some of the problems with threads, but it considers it in terms of connections. When programs communicate there is a limit to how many connections can be made. Threads are one option where each thread is a connection, but threads have a cost in terms of memory and operating system resources that make using 10k (for example) threads impossible. The async model instead changes the concepts around, so instead of having a thread constantly listening, you utilize the operating system to notify you of events. This means having 10k connections is trivial because each connection is truly efficient in that it won’t do any work without being notified.
While async helps with the aspect of connections, it doesn’t help at all in terms of utilization of the hardware. The current option then is to simply use more than one process and make sure your state is not held within your applications. This is a good practice generally with both threads and async. Unfortunately, not all applications are focused on connections like web applications are.
The whole issue that I personally have with the GIL is not so much the actual implications but rather that we need new models and that even if we get them, without the GIL issues fixed, they won’t help as much as they could. The async model is pretty tough. The libraries don’t usually work with it because they can block the main loop. This makes them rather impractical then. Threading is becoming pretty well understood but it is still difficult because threading implies a shared state that is a bad idea. There are still some required changes in how we program that are needed to make threading more scalable from a programmer perspective. The design patterns are coming along, but again, with the GIL, it just doesn’t make that much sense when you’re talking about something like a 16 core machine.
Even though I personally don’t have a specific use case that is a problem today, it shouldn’t really matter. Python is a great language with a ton of “batteries” included along with a battery store close by. If we can’t figure out a way to make the GIL go away or work across cores, then we really limit the gains we can get from the history in the Python community. It limits rather deeply where Python can be used.
One solution I see is taking something like the multiprocessing module and use that as the basis for implementing higher level concurrency models. This already works pretty well on Linux but there are some definite corner cases that cause problems. That said, it seems reasonable that a higher level library could restrict the use to a certain model in such a way that the corners cases are never hit.
Lately I’ve been really making efforts to become a better tester. I’ve heard that tests make code better just about as much as I’ve heard that there isn’t really a difference. After working on a large code base for a while, my conclusion is that tests do help isolate code, which is a definite positive for maintainability.
The issue though is a catch 22 when you have a code base that managed to get a little out of control. This tendency is entirely normal of course. If you can stop a user from hurting with a few lines of code then that is a big win. Users are important and in the end then code is not.
That said, at some point there becomes the requirement to start really nailing down corner cases. This is where bugs occur in really sneaky parts of the code that you may never have considered or because of slow changes in use cases. It challenges you to reconsider seemingly stable code, in which case your tests are really the only way to have some confidence things are working correctly. If you originally wrote the code you probably feel more assured things can work, but when the code is foreign, you need the tests to verify in real terms what is happening.
The question then is how the heck to tear apart the old code to improve the testability? To be perfectly frank, I have no idea. Part of the problem is simply finding an easy to way to iterate on the problem. It is easy to bite off more than you can chew, so there needs to be ways to roll things back. Likewise, it is also difficult to know your making improvements since you don’t have tests there to reassure yourself things didn’t get way worse in some unknown manner.
There is obviously not a single answer to this, but I would like to find some resources that define some techniques or at the very least ideas. For example, DVCS does help because in theory breaking off a “cleanup” branch is trivial. Unfortunately, that is only the beginning since you also have to consider any sort of environment that needs to be setup.
No matter what techniques are out there it can’t hurt to dive in. If you fail, remove the branch and try again. Eventually you’ll hit on something. There have been many times where I started learning some language or library only to become extremely frustrated. Eventually when I revisit it, things are clearer and it all makes sense. This is the same kind of thing, so step one is just doing it even if you fail.
I’ve heard plenty of times from coders that comments are pointless. Some folks are even avidly against them. In the past my own thoughts most definitely tended towards code telling the algorithmic story. Lately though, my perceptions have changed. My recent goals have been to understand what really makes code maintainable in the real world. There are plenty of blogs out there that can offer small examples of how something could be made more readable and assume that it negates the need for comments, but I don’t believe them at this point. It is not that the refactoring and naming isn’t important, but rather it is not enough in the real world. A blog post will get a few paragraphs to explain the context, give you the small snippet and walk you through the changes. At then end you feel like the code is so obvious it hurts. Unfortunately, the difference is that in the real world, that snippet is wrapped by a few hundred lines on either side. And, in python, you have no clue what some arguments type possibly is, so looking at the original object just doesn’t cut it. It just isn’t that easy.
What is ironic then is that those folks trying to make code more readable are also making a strong case for commenting. The blog usually does a really good job of explaining the context and then how that context applies to the actual code. Unfortunately we’ve all see the massively commented code complete with paragraphs for getters and setters and found it extremely difficult to gain anything from the extreme verbosity that is probably incorrect anyway. Still, there is a need to provide context.
Lately my code has drawn me into the CherryPy internals a bit to see how things work. CherryPy has always been a framework (or library really) that just gets out of the way. It has done such a good job at this that I’ve been trying to use more of it! In looking at the code, there are some really consistent practices that have made understanding what is going on much easier. Here are my summations.
1. All code should be in chunks.
If you look at the CherryPy code you quickly notice that each piece of structure is actually created from smaller bits of structure defined by spaces and short comments. Most I would say are between 4 and 8 lines long. They also have a really short one line comment describing what is happening. This leads me to my next point.
2. All comments length should be relative to the size of the code.
If you have most of your code in 4 to 8 line segments, a one line comment is perfect. If you have something that takes up most of the page in your editor, the comments should either be a single larger before the algorithm/section or should follow the logic as it goes through the code.
I should mention that when I say “segments” these are not language constructs. This is simply putting a blank line between things.
Commenting in generally is one of those thing that is tough to do right. It is easy to skirt the issue, but I’m going to go ahead an say if you’re not commenting your code, you’re really doing it wrong. Comments share the context of your well written clean code. It is not your fault that code is mean for machine not people, so don’t feel bad that your code is easier to read because you added a little comment. Likewise, when you commit your code, it is really publishing it. Someone else will read it at some point, so consider your audience.
I’d encourage anyone to take a look at the CherryPy source and try to understand it. It is surprisingly simple to follow along. It doesn’t make you fluent in the code, but it does let you see what is happening in a way that diving deeper is easier.
I should also mention these suggestions are somewhat limited to languages like Python and Ruby where the types are dynamic. In C#/Java for example you usually have an very clear picture of what is happening and I’d say in rare cases to comments really help. Python doesn’t have this so adding a comment or two here and there allows the reader a context for digging deeper.