Ionrock Dot Org

by Eric Larson

My Weblog

How To Decouple Code

I’ve been stuck on an entire code base for way to long now. The problem is simple to state. It is really hard to test. It could be worse though. It could be hard to test and totally broken! But that is not the case. The code is wildly successful, but the problem is nothing is getting easier. That is to be expected in the sense that the code will inevitably become more complicated. At the same time, the refactoring has been slow coming and is difficult to do in small atomic bits.

The question then is how do you decouple things. They tell you in school to write decoupled, yet cohesive code, but what happens when you have some code that managed to get coupled pretty extensively. How do you take a huge amount of code and make things more isolated and in turn testable?

I should mention there are already tests. I should also mention, again, that the code works extremely well. We are not talking about code that doesn’t get the job done. We’re talking about code that gets the job done very well thank you very much.

The real problem here is that it is not just the code that needs refreshing. It is my own experience as a programmer. I’m becoming more attuned to the value of real object oriented programming. Testing is clearing becoming more and more important in my day to day life. I want to write better code and I’m learning things to help me do just that. The problem though is that the code I stare at all day long isn’t helping me use all this new found knowledge.

My only tactic is to dive in. Start tearing apart he code line by line. I’m trying to simply organize things differently and see how it looks. There is obviously a reason they call them text editors after all. I’m not writing new code. It is just editing what is there.

I should also mention that none of this bothers me in the slightest. This kind of struggle is what programming is all about. Some people need to solve huge scaling issues for the first time while others need to take already scaleable code and improve it via an updated version of some library or tool. It is all a part of the job. This is also not a rant against the original author! I’ve intentionally mentioned it worked and scales really well consistently and on purpose. The code is not the “problem”. The problem is I’m still learning how to improve the code and make it truly better.

It all might be a pipe dream and the fact of code is that it gets uglier with age. If that is the case, so be it. But at least I’d like to prove it. If not mathematically (like that would even happen!), but to myself. I’m perfectly willing to accept something is a hard problem and that is just the way it is. At this point though, I’m pretty sure it is just me that needs to learn a few more tricks. The title of this post is really a question. I hope folks consider offering some advice in the comments.


Posted Tue May 4 08:32:09 2010 by Eric Larson

A Serious TDD Attempt

Designing a public API is very different than designing a language level API. The former involves a language agnostic view of a problem. There are broad assumptions that are made to help with interoperability, but you don’t have to contend with language specific features. At the language level, things are much different. You are thinking in terms of language features and available constructs alongside best practices that could very well define how useful a library is. While I can say I’ve designed quite a few public-type APIs, designing language level systems is still pretty new.

At my job, I’ve recently been rather frustrated because there is a ton of code that has become tough to maintain, yet refactoring and testing has been really difficult. Part of the difficulty has been my lack of design skills when it comes to language level APIs. What better way to refactor and improve the testability of the code than to try writing a more usable API on top of the existing code? Here is where TDD is coming in handy.

It seems like a really helpful exercise to consider how I’d like to work with the code and write tests in order to figure out a good plan for refactoring. TDD fits the bill nicely because by forcing myself to write (and rewrite) the ideal code. The biggest confusion now is not necessarily the API, but rather how it integrates with the underlying systems and existing code. TDD examples are all pretty primitive and don’t necessarily make it clear how to consider integration testing and more complicated scenarios. My theory is that there are some good techniques out there already. One thing that seems to be consistent is that tests should be fast, so figuring out how to test functionality without actually requiring services to be running might be key. We’ll see how it turns out!


Posted Thu May 6 22:41:35 2010 by Eric Larson

TDD and Creating Contracts with Dingus

In my quest to become a better tester and generally write better code, I’ve started looking at TDD. One of the things about TDD is that has always stumped me is you manage to make sure that some piece of code with work when it is not isolated. Seeing as Dingus has played a critical role with my interest in testing, the concept of isolation has felt important before I even understood what it really meant.

One ideal in TDD (and this is an assumption based on what little I’ve read) is that tests should be really fast. This is simply a user interface kind of feature that makes testing more usable. If you know you can run 500+ tests in less than a second, then you are more likely to run those tests. In fact, I’d say it is critical to TDD that test run quickly. It is no fun to write a test first that takes a few seconds to run before writing any working code. You can trust me on this one as I’ve experienced it first hand.

With that in mind, how do you speed up the tests? From my perspective, it is critical to remove I/O. Specifically, you need to remove real connections to real sources of data. That means any RESTful web service, file system reads/writes and databases. You can do this with Dingus and I’d say it really what Dingus excels at. The problem you generally feel is that you will write the wrong interface or API when mocking these services. This is where a contract comes into play.

Lets say we have a small session library. We’re talking a really simple key/value kind of store using MongoDB for storing session information in a web application. Here is the object:


class Session(object):
    def __init__(self, conn):
        self.conn = conn

I’m leaving out any methods because we’re TDD’ing here!

So, the basic thing we want to do is test that sessions can be saved. What does that mean? Well it means that we take some dictionary and store it in MongoDB in a way we can get it back. I’m not going to go into details about things like IDs or how to make sure your session keys are unique. The point here is to show how to provide a contract for saving and retrieving the data.

So here are two tests:


class TestSession(object):
    def test_saving_session(self):
        mock_conn = Dingus()
        sess = Session(mock_conn)
        sess.save(ID, {'foo': 'bar', 'ts': now()})
        expected_data = {
            'sess_id': ID,
            'data': {'foo': 'bar', 'ts': now()}
        }
        assert mock_conn.calls('insert', expected_data)
        assert mock_conn.calls('find_one', {'sess_id': ID})

    def test_loading_session(self):
        mock_conn = Dingus()
        sess = Session(mock_conn)
        sess.load(ID)
        assert mock_conn.calls('find_one', {'sess_id': ID})

So, what you see is that we are testing that when the session object tries to save a session it will do some series of operations using the fake MongoDB connection.

Here is another way you could test it with a live MongoDB connection.


def test_saving_session(self):
    test_conn = get_test_conn()
    sess = Session(test_conn)
    sess.save(ID, {'foo': 'bar'})
    loaded = sess.load(ID)
    assert loaded['foo'] == 'bar'

This seems pretty rock solid. One problem though is that your test needs to run a MongoDB instance. That is a pain, although nothing major. More importantly though, what happens when the MongoDB API changes or you need to change how you do inserts. MongoDB, for example, supports things called upserts where if the object doesn't exist, it will create it, or else it updates the objects that it finds. That might be a better way to go about it, but the live connection doesn't reveal that subtlety which is actually a really important distinction.

When I first started looking at this, the idea of mapping out the API calls on the Dingus object seemed fragile and daunting. But, as I started to think about it, the idea of mocking a service is to create that contract. Contracts in this situation are between an external library and your code, but they can very well go deeper. In the above example, I might have a SessionStore abstraction that works like a file like object. In that case, I'd want to verify that doing something like "session.save()" ends up doing an open, write and close call. Likewise, if that still ended up using MongoDB on the back end, I'd then test the SessionStore for calls similar to the above. In doing so, I'm making contracts all the way down.

Likewise, it is important to understand that we are using TDD. When there is a bug, we write a test first. In our session example, if we needed to test whether a session is expired then, we can effectively add that to the contract of calls on the mock connection object to verify that we check for a expire date. We run the fast tests, see things are broken, and go about fixing it. Writing the code first wouldn't enforce the same outcome because we haven't established that a certain set of operations describe what it means to be "saved". Instead, by recording the operations performed, we've created that description. Likewise it provides metrics for complexity. If you have a single test that has tons of assertions, it might be time to try and refactor.

The key here really is practicing the process. At work it is clear that the project I'm working is ready for a rather large restructuring. I've started that restructuring through tests and it seems to be a rather reasonable way to go about it. It seems obvious that with practice the process will only become easier as well as more effective.


Posted Fri May 7 20:20:11 2010 by Eric Larson

Recording Technologies

We are officially back in the studio, which means lots of coffee and intense focus. At this point in my musical career I feel relatively confident in my studio expertise that commenting on the technology and process. Recording music really is a process.

If you have never recorded music you might be surprised to know how inexact a science it is. You go in trying to find sounds that are vague ideas to say the least. The parts of the songs that you thought would sound really great end up being completely wrong. Even though you practiced like crazy and you tune before every song, you still end up being sharp because when you play with one finger you’re actually pulling it sharp. Don’t even get me started on trying to get actual sounds.

In terms of technology, recording is surprisingly simple. The most important pieces of equipment are the compressors. This is not limited to actual rack compressors, but include anything that squashes sound. A nice tube mic going to tape sounds good because they are two sources of fantastic analog compression.

Compressors are also the reason why digital recording is a serious viable option. Without good natural compression before writing the bits, you get super clean sounds that just don’t sound very good. The compressors manage to take the sounds and make things listenable.

The analog vs. digital debates in my opinion come down to compression. First off a compressor does just what its name describes, it takes the sound wave and sets limits on it. Sometimes the limits are based on volume but that limit can be relative or absolute. Compression gives you distortion and overdriven tone. The twang of a guitar sounds the way it sounds thanks to compression. It is how acoustic sound becomes digital and why old records feel warm.

It is interesting to know that it is often the sum of small changes that make things sound good. At the same time starting from extremes is usually beneficial. People say a tube amp sounds best when loud, but I would go so far to say that for recording, the first step is to turn everything up all the way. It can feel pretty scary pushing an amp and speaker hard, but it also allows for the most natural sounds. The physical constraints of the material become important at extreme levels so the overdive sound you can get is actually the natural limits. In some ways I think it is a carnal sound that actually taps into whatever part of a person that is still wild and wanting to survive.

The quest to find that carnal greatness is what recording is really about. The general idea is to capture what makes the music enjoyable. Likewise, the goal should also be to remove that which is distracting. All the little details in a song add up one way or another. A single drum hit or lag in a instrument can ruin a song. While the tones and sounds need to be carnal, the actual performance needs to be flawless. When I say flawless, it doesn’t mean every beat is exactly the same. Instead, it means that the way everything works together is correct. Transitions lead the listener and the song flows.

In terms if making things flawless, this is where technology excels. It is insanly time consuming to get everyone in a band to play a perfect take at the same time. There is the execution where no one makes mistakes, but there is also the performance which is when everyone plays together just right. In lieu of perfect performances, software can make all the difference. You can push or pull the drum hits to match perfectly. If the second verse is played better than the first, just copy an paste. There are automated tools for syncing a track to a grid, but the more subtle editing can make a good enough performance make the cut. The time saved is literally money saved.

The thing about recording now is that it takes so little to try it out. Garage Band is a great piece of software. We demoed our tracks at almost every practice and would record things when experimenting. Paying attention to the mic placement gave us halfway decent results. Garage Band even seems to do a good job mimicing a the compression of tape, so when the volume gets too loud, it breaks really nicely.

Ok, I’m done brain vomiting. I need to head back to concentrating on every single note my wife is shredding on guitar!


Posted Mon May 17 03:19:10 2010 by Eric Larson

The Problem with Dynamic Typing

In my quest to become a better tester, I’ve been looking at how we use logs and error messages to find and fix bugs. This is obviously a really important part of systems since it is often times the primary means of finding how a system is failing in production. The glaring detail that seems to show up is the importance of types within a system. This might partially be because the system I’m personally working on includes an entire language with its own types. That said, my gut is telling me the problem is more general.

This past year my focus has been on maintenance. It is a huge part of programming and has been a struggle for me lately. My perception of the issue has gone from personal doubts to blame and back again. I’m beginning to wonder if the issue is not necessarily the actual code in question, but the platform itself. Specifically, the idea of dynamic typing is starting to feel somewhat dangerous. This opinion is partly because some experience in a large C# code base. It was a big code base that had to run on a variety of systems (browsers, .net runtime, operating system), yet had almost no automated testing. Despite the size and breadth of the code base, it took only a few weeks to become relatively proficient fixing bugs and adding features. A benefit I attribute to the impact of static typing.

While I’m sure there are plenty of proponents that would argue with me here, I’d like to make it clear that a simple yes/no regarding dynamic typing is not what I’m talking about. The real problem is the paradigm that current dynamic typed systems utilize to express the code.

Let’s talk about Python for a moment. Python is an object oriented language that has some functional features, but by and large, is built around objects. At the same time it is dynamically typed. In fact it is considered “pythonic” to avoid using types and depend on duck typing. For example, it is better to use something like the “in” operator when testing whether a variable contains some value since it can be applied to a dict’s keys, a list or a set. This sort of system can feel invigorating because you have very few limits when writing code.

As I’ve delved further into TDD, something smells of a cover up. The freedom that was available with duck typing starts to feel rather inhibiting. This is because the tests should help establish and enforce the contract. What we’ve done by removing the typing and adding the tests is established the contract within the tests instead of the type. Some would argue this is a good thing because your contract can be more detailed and robust since you have the full language to work with. The problem though is that you’ve lost the language level constraint provided by keeping contracts in the types.

Taking a larger look at the picture, the idea really doesn’t depend on types as much as where you keep your contracts. Programming works because there is agreement. Code promises other code to work a certain way and in doing so we have (mostly) working systems. Things like objects and classes all help to enforce this idea by providing the semantic keywords to encapsulate the ideas, thus providing more abstraction. The problem then is that when you make a language like Python dynamically typed, the idea of an class or object providing a contract becomes weakened. When you can’t depend on the language to enforce the contract then other measures will have to be taken in order to make sure things work. Writing tests is the current trend for solving this problem. This might very well be a good thing since testing your code is a good idea. Yet, my theory is that there really is a better way.

That better way comes through Haskell. While I’m almost positive the pure functional nature of Haskell is all but a death sentence in terms of adoption and general usefulness, the thing that Haskell excels at is its concept of types. In Haskell a type is what you use to determine a code path. Your function doesn’t get called, but instead is considered for matching against the arguments, which are types. Because of this reliance you immediately get well written and valuable contracts between components. The other nice aspect of Haskell’s type system is that it is implicit. While I was able to get up to speed on a C# code base quickly, it wasn’t very fun. Typing boiler plate code all the time is cumbersome. Yet, it allows the compiler to catch quite a few mistakes. It is this kind of early check that helps to reduce bugs to real deal logic errors and not simple issues where the wrong data type was used incorrectly.

While I admire Haskell’s type system from a distance I’ve yet to really do any large projects with it. My guess is that there is some cumbersome aspect that I’ve yet to encounter that might convince me otherwise. Still, it is clear that dynamic typing and relying on duck typing becomes difficult when a system gets large. Likewise, relying on tests can work, but unless you are diligent in the beginning, it is very difficult getting a large code base to a good place in terms of tests. In any case it is clear that in terms of concrete solutions, nothing is a panacea.

Considering that most folks are not able to completely change platforms, there must be strategies for overcoming the obstacles associated with languages. Testing is one example that I personally hope proves to be more valuable than it has been. Another idea is to consider how to utilize types more effectively. This is somewhat analogous to effective object oriented code with the major difference being what sort of goals you are trying for. The Smalltalk idea of object oriented is based around message passing, although in this case, the idea of types feels like a better fit even though it is like adding the cruft of Java. It is clear others have taken similar tracks. Zope interfaces are a good example of someone handling the contract problem. While they seem unpythonic, that might very well be a good thing. Being pythonic might have an upper bound where being enterprisy really does make more sense. Databases have seen NoSQL destroy normalized data for great wins, so it wouldn’t surprise me that tools to enforce types on a dynamic language might be extremely beneficial as well in terms of reliability and a maintainable platform.

At this point I’m pretty much just mumbling, but I do hope it is clear there is still work to be done. I love Python and all that it makes possible, but it has become clear that maintaining code with few limits and a test suite that doesn’t necessarily build contracts is difficult business. Outside of any architectural concepts, one of the things that makes the whole scenario difficult is not knowing when you’re wasting time. I’ve recently spent an inordinate amount of time writing and rewriting tests according some theories about what is really beneficial. It has been a learning experience, but at what expense? Am I headed in the right direction or is this a wild goose chase? Hopefully some really great mentors can come and set me straight or at least make it clear I’m not alone in these thoughts. While a lot of this is probably just rambling, at the very least is it rambling for a cause. Programmers should know that when things are tough and difficult that there are other people who have asked the same questions and looked for answers. Likewise, those programmers who managed to successfully emerge from huge code bases might very well see the flares asking for help and offer some wise words. One thing is for sure, at least I’m not coding in Java ;)


Posted Thu May 27 18:23:19 2010 by Eric Larson

Getting Started with JVM Languages is Hard

I watched this video on integrating Scala. Now, I’m not a Java developer. I used it in college and had to mess with small bits of it throughout the years, but generally, it is something I’ve avoided. Scala is a language built on the JVM that is (as I naively understand) well suited for concurrency thanks to it support of Actors and is relatively friendly to the functional style of programming. It also is statically typed, which is what initially got me interested in looking at it.

Seeing as I’m a web developer, my first search on a new language is always a quick review of the web frameworks. Web frameworks usually do a good job getting you up and running quickly, which is really beneficial for a non-Java developer like myself. Unfortunately, this is also usually where I end up losing steam in JVM based languages. The problem is Java and the JVM make it difficult to get started.

Part of this evaluation is pretty unfair. I have no doubt Scala and other JVM languages (Clojure is another I had a limited interest in) can be really excellent. The problem is you have to understand the JVM. Am I whining? Probably. Does it matter? Nope.

When learning a language friction is a killer. Understanding the programming constructs and ideals of the languages is usually least of your worries. It is when you have to find dependencies and simply get started you have a problem. The Java classpath is a great example of pure stopping power when learning a language. The concept makes sense. You create a path for searching for libraries and necessary files. It is just like the operating system’s PATH only for code. Got it. Unfortunately, it just doesn’t work that way. There is almost always a stack trace that reflects javac not being able to find some aspect of the program to make things work. When I start hitting these kinds of problems, my eyes glaze over, I start closing buffers in Emacs and start doing something else. It is a pain in the neck and it doesn’t make sense why it is so hard.

This pain also goes for things like Maven, Ant and all the surrounding Java ecosystem tools. Let’s take a look at getting started with Lift:


mvn archetype:generate -U \
  -DarchetypeGroupId=net.liftweb \
  -DarchetypeArtifactId=lift-archetype-blank \
  -DarchetypeVersion=1.0 \
  -DremoteRepositories=http://scala-tools.org/repo-releases \
  -DgroupId=demo.helloworld \
  -DartifactId=helloworld \
  -Dversion=1.0-SNAPSHOT

Now to a Java developer this might not be a big deal. This might even be elegant. To me, it is a huge blinking light that says I’ll be doing a ton of configuration, dealing with obtuse XML and generally wasting my time on the unimportant details. If that is how much I have to write in order to get a Hello World application running, then the serious complexity of a real world application is going to be a nightmare. Even if it is not, I’ve lost my will to find out. Game over.

The reason I mention it at all is that there seems to be tons of cool stuff happening on the JVM. Scala is one language I’d like to try out. Clojure is another I was really excited about getting my hands dirty with. Even Jython seems like an interesting tool to have in one’s programming tool belt. Yet, even though the concept of an interesting language on top of the production proven JVM is really promising, the reality is the interface is a nightmare. That fact is too bad because it means that most JVM based languages that don’t make an effort to hide the JVM-ism are somewhat limited in scope to Java friendly developers.

The one JVM based language that I’ve used without having much trouble getting started was Rhino. It was relatively trivial to get up and running and I never even thought about the classpath. At a minimum, that is what I’d hope to find from other JVM based languages, especially if their features are based more around the language than integrating with existing Java applications. For example, I understand that Scala is a language that seems pride itself on how easily it integrates in a Java application. My point is that it would be really helpful to be able to use JVM based languages without having to know I’m using a JVM based language.


Posted Fri May 28 11:51:12 2010 by Eric Larson
Created using Python, jQuery and Emacs