Ionrock Dot Org

Programming and Music

Concurrency Transitions

Glyph, the creator of Twisted wrote an interesting article discussing the intrinsic flaws of using threads. The essential idea is that unless you know explicitly when you are switching contexts, it is extremely difficult to effectively reason about concurrency in code.

I agree that this is one way to handle concurrency. Glyph also provides a clear perspective into the underlying constraints of concurrent programming. The biggest constraint is that you need a way to guarantee a set of statements happens atomically. He suggests an event driven paradigm as how best to do this. In a typical async system, the work is built up using small procedures that run atomically, yielding back control to the main loop as they finish. The reason the async model works so well is because you eliminate all CPU based concurrency and allow work to happen while waiting for I/O.

There are other valid ways to achieve as similar effect. The key in all these methods, async included, is to know when you transition from atomic sequential operations to potentially concurrent, and often parallel, operations.

A great example of this mindset is found in functional programming, and specifically, in monads. A monad is essentially a guarantee that some set of operations will happen atomically. In a functional language, functions are considered “pure” meaning they don’t introduce any “side effects”, or more specifically, they do not change any state. Monads allow functional languages a way to interact with the outside world by providing a logical interface that the underlying system can use to do any necessary work to make the operation safe. Clojure, for example, uses a Software Transactional Memory system to safely apply changes to state. Another approach might be to use locking and mutexes. No matter the methodology, the goal is to provide a safe way to change state by allowing the developer an explicit way to identify portions of code that change external state.

Here is a classic example in Python of where mutable state can cause problems.

In Python, and the vast majority of languages, it is assumed that a function can act on a variable of a larger scope. This is possible thanks to mutable data structures. In the example above, calling the function multiple time doesn’t re-initialize argument to an empty list. It is a mutable data structure that exists as state. When the function is called that state changes and that change of state is considered a “side effect” in functional programming. This sort of issue is even more difficult in threaded programming because your state can cross threads in addition to lexical boundaries.

If we generalize the purpose of monads and Clojure’s reference types, we can establish that concurrent systems need to be able to manage the transitions between pure functionality (no state manipulation) and operations that effect state.

One methodology that I have found to be effective managing this transition is to use queues. More generally, this might be called message passing, but I don’t believe message passing guarantees the system understands when state changes. In the case of a queue, you have an obvious entrance and exit point for the transition between purity and side effects to take place.

The way to implement this sort of system is to consider each consumer of a queue as a different process. By considering consumers / producers as processes we ensure there is a clear boundary between them that protects shared memory, and more generally shared state. The queue then acts as bridge to cross the “physical” border. The queue also provides the control over the transition between pure functionality and side effects.

To relate this back to Glyph’s async perspective, when state is pushed onto the queue it is similar to yielding to the reactor in an async system. When state is popped off the queue into a process, it can be acted upon without worry of causing side effects that could effect other operations.

Glyph brought up the scenario where a function might yield multiple times in order to pass back control to the managing reactor. This becomes less necessary in the queue and process system I describe because there is no chance of a context switch interrupting an operation or slowing down the reactor. In a typical async framework, the job of the reactor is to order each bit of work. The work still happens in series. Therefore, if one operation takes a long time, it stops all other work from happening, assuming that work is not doing I/O. The queue and process system doesn’t have this same limitation as it is able to yield control to the queue at the correct logical point in the algorithm. Also, in terms of Python, the GIL is mitigated by using processes. The result is that you can program in a sequential manner for your algorithms, while still tackle problems concurrently.

Like anything, this queue and process model is not a panacea. If your data is large, you often need to pass around references to the data and where it can be retrieved. If that resource is not something that tries to handle concurrent connections, the file system for example, you still may run into concurrency issue accessing some resource. It also can be difficult to reason about failures in a queue based system. How full is too full? You can limit the queue size, but that might cause blocking issues that may be unreasonable.

There is no silver bullet, but if you understand the significance of transitions between pure functionality and side effects, you have a good chance of producing a reasonable system no matter what concurrency model you use.

A Sample CherryPy App Stub

In many full stack frameworks, there is a facility to create a new application via some command. In django for example, you use django-admin.py startproject foo. The startproject command will create some directories and files to help you get started.

CherryPy tries very hard to avoid making decisions for you. Instead CherryPy allows you to setup and configure the layout of your code however you wish. Unfortunately, if you are unfamiliar with CherryPy, it can feel a bit daunting setting up a new application.

Here is how I would set up a CherryPy application that is meant to serve basic site with static resources and some handlers.

The File System

Here is what the file system looks like.

├── myproj
│   ├── __init__.py
│   ├── config.py *
│   ├── controllers.py
│   ├── models.py
│   ├── server.py
│   ├── static
│   ├── lib *
│   └── views
│       └── base.tmpl
├── setup.py
└── tests

First off, it is a python package with a setup.py. If you’ve never created a python package before, here is a good tutorial.

Next up is the project directory. This is where all you code lives. Inside this directory we have a few files and directories.

  • config.py : Practically every application is going to need some configuration and a way to load it. I put that code in config.py and typically import it when necessary. You can leave this out until you need it.
  • controllers.py : MVC is a pretty good design pattern to follow. The controllers.py is where you put your objects that will be mounted on the cherrypy.tree.
  • models.py : Applications typically need to talk to a database or some other service for storing persistent data. I highly recommend SQLAlchemy for this. You can configure the models referred to in the SQLAlchemy docs here, in the models.py file.
  • server.py : CherryPy comes with a production ready web server that works really well behind a load balancing proxy such as Nginx. This web server should be used for development as well. I’ll provide a simple example what might go in your server.py file.
  • static : This is where your css, images, etc. will go.
  • lib : CherryPy does a good job allowing you to write plain python. Once the controllers start becoming more complex, I try to move some of that functionality to well organized classes / function in the lib directory.
  • views : Here is where you keep your template files. Jinja2 is a popular choice if you don’t already have a preference.

Lastly, I added a tests directory for adding unit and functional tests. If you’ve never done any testing in Python, I highly recommend looking at pytest to get started.

Hooking Things Together

Now that we have a bunch of files and directories, we can start to write our app. We’ll start with the Hello World example on the CherryPy homepage.

In our controllers.py we’ll add our HelloWorld class

# controllers.py
import cherrypy


class HelloWorld(object):
    def index(self):
        return 'Hello World!'
    index.exposed = True

Our server.py is where we will hook up our controller with the webserver. The server.py is also how we’ll run our code in development and potentially in production

import cherrypy

# if you have a config, import it here
# from myproj import config

from myproj.controllers import HelloWorld

HERE = os.path.dirname(os.path.abspath(__file__))


def get_app_config():
    return {
        '/static': {
            'tools.staticdir.on': True,
            'tools.staticdir.dir': os.path.join(HERE, 'static'),
        }


def get_app(config=None):
    config = config or get_config()
    cherrypy.tree.mount(HelloWorld(), '/', config=config)
    return cherrypy.tree


def start():
    get_app()
    cherrypy.engine.signals.subscribe()
    cherrypy.engine.start()
    cherrypy.engine.block()

if __name__ == '__main__':
    start()

Obviously, this looks more complicated than the example on the cherrypy homepage. I’ll walk you through it to let you know why it is a little more complex.

First off, if you have a config.py that sets up any configuration object or anything we import that first. Feel free to leave that out until you have a specific need.

Next up we import our controller from our controllers.py file.

After our imports we setup a variable HERE that will be used to configure any paths. The static resources is the obvious example.

At this point we start defining a few functions. The get_app_config function returns a configuration for the application. In the config, we set up the staticdir tool to point to our static folder. The default configuration is to expose these files via /static.

This default configuration is defined in a function to make it easier to test. As you application grows, you will end up needing to merge different configuration details together depending on configuration passed into the application. Starting off by making your config come from a function will help to make your application easier to test because it makes changing your config for tests much easier.

In the same way we’ve constructed our config behind a function, we also have our application available behind a function. When you call get_app it has the side effect of mounting the HelloWorld controller the cherrypy.tree, making it available when the server starts. The get_app function also returns the cherrypy.tree. The reason for this is, once again, to allow easier testing for tools such as webtest. Webtest allows you to take a WSGI application and make requests against it, asserting against the response. It does this without requiring you start up a server. I’ll provide an example in a moment.

Finally we have our start function. It calls get_app to mount our application and then calls the necessary functions to start the server. The quickstart method used in the homepage tutorial does this under the hood with the exception of also doing the mounting and adding the config. The quickstart can become less helpful as your application grows because it assumes you are mounting a single object at the root. If you prefer to use quickstart you certainly can. Just be aware that it can be easy clobber your configuration when mixing it with cherrypy.tree.mount.

One thing I haven’t addressed here is the database connection. That is outside the scope of this post, but for a good example of how to configure SQLAlchemy and CherryPy, take a look at the example application, Twiseless. Specifically you can see how to setup the models and connections. I’ve chosen to provide a file system organization that is a little closer to other frameworks like Django, but please take liberally from Twiseless to fill in the gaps I’ve left here.

Testing

In full stack frameworks like Django, testing is part of the full package. While many venture outside the confines of whatever the defaults are (using pytest vs. django’s unittest based test runner), it is generally easy to test things like requests to the web framework.

CherryPy does not take any steps to make this easier, but fortunately, this default app configuration lends itself to relatively easy testing.

Lets say we want to test our HelloWorld controller. First off, we’ll should set up an environment to develop with. For this we’ll use virtualenv. I like to use a directory called venv. In the project directory:

$ virtualenv venv

Virtualenv comes bundled with a pip. Pip has a helpful feature where you can define requirements in a single test file. Assuming you’ve already filled in your setup.py file with information about your package, we’ll create a dev_requirements.txt to make it easy to get our environment setup.

# dev_requirements.txt

-e .  # install our package

# test requirements
pytest
webtest

Then we can install these into our virtualenv by doing the following in the shell:

$ source venv/bin/activate
(venv) $ pip install -r dev_requirements.txt

Once the requirements are all installed, we can add our test.

We’ll create a file in tests called test_controller_hello_world.py. Here is what it will look like:

import pytest
import webtest

from myproj.server import get_app


@pytest.fixture(scope='module')
def http():
    return webtest.WebTest(get_app())


class TestHelloWorld(object):

    def test_hello_world_request(self, http):
        resp = http.get('/')
        assert resp.status_int == 200
        assert 'Hello World!' in resp

In the example, we are using a pytest fixture to inject webtest into our test. WebTest allows you to perform requests against a WSGI application without having to start up a server. The request.get call in our test then is the same as if we had started up the server and made the request in our web browser. The resulting response from the request can be used to make assertions.

We can run the tests via the py.test command:

(venv) $ py.test tests/

It should be noted that we also could test the response by simply instantiating our HelloWorld class and asserting the result of the index method is correct. For example

from myproj.controllers import HelloWorld


def test_hello_world_index():
    controller = HelloWorld()
    assert controller.index() == 'Hello World!'

The problem with directly using the controller objects is when you use more of CherryPy’s features, you end up using more of cherrypy.request and other cherrypy objects. This progression is perfectly natural, but it makes it difficult to test the handler methods without also patching much of the cherrypy framework using a library like mock. Mock is a great library and I recommend it, but when testing controllers, using WebTest to handle assertions on responses is preferred.

Similarly, I’ve found pytest fixtures to be a powerful way to introduce external services into tests. You are free to use any other method you’d like to utilize WebTest in your tests.

Conclusions

CherryPy is truely an unopinionated framework. The purpose of CherryPy is to create a simple gateway between HTTP and plain Python code. The result is that there are often many questions of how to do common tasks as there are few constraints. Hopefully the above folder layout along side the excellent Twiseless example provides a good jumping off point for getting the job done.

Also, if you don’t like the layout mentioned above, you are free to change it however you like! That is the beauty of cherrypy. It allows you to organize and structure your application the way you want it structured. You can feel free to be creative and customize your app to your own needs without fear of working against the framework.

Queues

Ian Bicking has said goodbye. Paste and WSGI played a huge part of my journey as a Python programmer. After reading Ian’s post, I can definitely relate. Web frameworks are becoming more and more stripped down as we move to better JS frameworks like AngularJS. Databases have become rather boring as Postgres seems to do practically anything and MongoDB finally feels slightly stable. Where I think there is still room to grow is in actual data, which is where queues come in.

Recently I’ve been dragged into the wild world of Django. If you plan on doing anything outside of typical request / response cycle, you will quickly run into Celery. Celery defines itself as a distributed task queue. The way it works is that you run celery workers processes that use the same code as your application. These workers listen to a queue (like RabbitMQ for example) for task events that a worker will execute. There are some other pieces that are provided such as scheduling, but generally, this is the model.

The powerful abstraction here is the queue. We’ve recently seen the acceptance of async models in programming. On the database front, eventual consistency has become more and more accepted as fact for big data systems. Browsers have adopted data storage models to help protect user data while that consistency gets replicated to a central system. Mobile devices with flaky connectivity provide the obvious use case for storing data temporarily on the client. All these technologies present a queue-like architecture where data is sent through a series of queues where workers are waiting to act on the data.

The model is similar to functional programming. In a functional programming language you use functions to describe a set of operations that will happen on a specific type or set of data. Here is simple example in Clojure

(defn handle-event [evt]
  (add-to-index (map split-by-id (parse (:data evt)))))

Here we are handling some evt data structure that has a data key. The data might be a string that gets parsed by the parse function. The result of the parsing is passed to a map operation that also returns an iterable that is consumed by the add-to-index function.

Now, say we wanted to implement something similar using queues in Python.

def parse(data, output):
    # some parsing...
    for part in parts:
        output.push(split_by_id(part))


def add_to_index(input):
    while True:
        doc = input.get()
        db.write(doc)


def POST(doc):
    id = gen_id()
    indexing_queue.push((id, doc))
    return {'message': 'added to index queue',
            'redirect': '/indexed/%s' % id}

Even though this is a bit more verbose, it presents a similar model as the functional paradigm. Each step happens on a immutable value. Once the function receives the value from the queue, it doesn’t have to be concerned with it changing as it does its operation. What’s more, the processing can be on the same machine or across a cluster of machines, mitigating the effect of the GIL.

This isn’t a new idea. It is very similar to the actor model and other concurrency paradigms. Async programming effectively does the same thing in that the main loop is waiting for I/O, at which time it sends the I/O to the respective listener. In theory, a celery worker could queue up a task on another celery queue in order to get a similar effect.

What is interesting is that we don’t currently have a good way to do this sort of programming. There is a lot of infrastructure and tooling that would be helpful. There are questions as to how to deploy new nodes, keep code up to date and what happens when the queue gets backed up? Also, what happens when Python isn’t fast enough? How do you utilize a faster system? How do you do backfills of the data? Can you just re-queue old data?

I obviously don’t have all the answers, but I believe the model could work to make processing streamable data more powerful. What makes the queue model possible is an API and document format for using the queue. If all users of the queue understood the content on the queue, then it is possible for any system that connect to the queue to participate in the application.

Again, I’m sure others have built systems like this, but as there is no framework available for python, I suspect it is not a popular paradigm. One example of the pattern (sans a typical queue) is Mongrel2 with its use of ZeroMQ. Unfortunately, with the web providing things like streaming responses and the like, I don’t believe this model is very helpful for a generic web server.

Where I believe it could excel is when you have a lot of data coming that requires flexible analysis by many different systems, such that a single data store cannot provide the flexibility required. For example, if you wanted to process all facebook likes based on the URLs, users and times, it would require a rather robust database that could effectively query each facet and establish a reasonably fast means of calculating results. Often this is not possible. Using queues and a streaming model, you could listen to each like as it happens and have different workers process the data and create their own data sources customized for the specific queries.

I still enjoy writing python and at this point I feel I know the language reasonably well. At the same time I can relate to the feeling that it isn’t as exciting as it used to be. While JavaScript could be a lot of fun, I think there is still something to be done with big data that makes sense for Python. Furthermore, I’d hope the queue model I mentioned above could help leave room to integrate more languages and systems such that if another platform does make sense, it is trivial to switch where needed.

Have other written similar systems? Are there problems that I’m missing?

Immutability

One thing about functional programming languages that is source of frustration is immutable data structures. In Clojure there are a host of data structures that allow you change the data in place. This is possible because the operation is wrapped in a transaction of sorts that will guarantee it will work or everything will be reverted.

One description that might be helpful is that Clojure uses locks by default. Any data you store is immutable and therefore locked. You can always make a copy efficiently and you are provided some tools to unlock the data when necessary.

I’m definitely not used this model by any stretch, but it seems the transactional paradigm along with efficient copies makes for a good balance of functional values along side practical requirements.

My First Cojure Experience

Clojure is a LISP built on top of the JVM. As a huge fan of Emacs, it shouldn’t be suprising that there is a soft spot in my heart for LISP as well functional programming. The problem is lisp is a rather lonely language. There are few easily googable tutorials and a rather fractured community. You have a ton of options (Guile, Scheme, Racket, CL, etc.) with none of them providing much proof that a strong, long lasting community exists. It can be rather daunting to spend time trying to learn a language based on a limited set of docs knowing that it is unlikely you will have many chances to actually use it.

Of course, learning a lisp (and functional programming) does make you a better programmer. Learning a lisp is definitely time well spent. That said, this reality of actually using lisp in the real world has always been a deterrent for me personally.

Clojure, on the other hand, is a little different.

Clojure, being built on the JVM, manages to provide a lisp that is contextualized by Java. Clojure doesn’t try to completely hide the JVM and instead provides clear points of interoperability and communicates its features in terms of Java. Rich Hickey does a great job explaining his perspective on Clojure, and more importantly, what the goal is. This all happens using Java colored glasses. The result is a creator that is able to present a practical lisp built from lessons learned programming in typical object oriented paradigms.

Idealism aside, what is programming in Clojure really like?

Well, as a python programmer with limited Java experience, it is a little rough to get started. The most difficult part of learning a lisp is how to correctly access data. In Python (and any OO language) it is extremely easy to create a data structure and get what you need. For example, if you have a nested dictionary of data, you can always provide a couple keys and go directly to the data you want. Lisp does not take the same approach.

It would be really great if I were to tell you how best to map data in python data structures into Clojure data structures, but I really don’t know. And that is really frustrating! But, it is frustrating because I can see how effective the platform and constructs would be if only I could wrap my head around dealing with data.

Fortunately, Rich gives us some tips by way of Hammock Driven Development, that seem promising. A common concept within the world of lisp is that your program is really just data. Cascalog, a popular hadoop map reduce framework, provides a practical example of this through its logic engine. Here is a decent video that reflects how a declarative form, where you program really is just data used by a logic engine. Eventually, I’m sure my brain will figure out how to effectively use data in Clojure.

Another thing that is commonly frustrating with a JVM for a Python programmer is dealing with the overwhelming ecosystem. Clojure manages to make this aspect almost trivial thanks to Leiningen. Imagine virtualenv/pip merged with manage.py in Django and you start to see how powerful a tool it is.

Finally, Clojure development is really nice in Emacs. The key is the inferior lisp process. If you’ve ever wanted a Python IDE you’ll find that the only way to reliably get features like autocomplete to work with knowledge of the project is to make sure the project is configured with the specific virtualenv. Emacs makes this sort of interaction trivial in Clojure because of tools like Cider that jack into the inferior lisp process to compile a specific function, run tests or play around in a repl.

I highly recommend checking out Clojure. Like a musical instrument, parens may take a little practice. But, once you get used to them, the language becomes really elegant. At a practical level you get a similar dynamism as you see in Python. You also get the benefits of a platform that is fast and takes advantages of multiple cores. Most importantly, there is a vibrant and helpful community.

Even if you don’t give Clojure a try, I encourage you to watch some of Rich Hickey’s talks online. They are easy to watch and take an interesting perspective on OO based languages. I’ve become a fan.

Code by Line

I saw this tweet:

Limiting lines to 80 characters is a great way to ensure that variable names remain cryptically short while lines break in confusing places.

It makes some sense. For example, if I had something like::

put_to_s3(project_bucket, resultant_keyname, use_multipart=True, overwrite=False, confirm=True)

One way to a shorter line would be to make some variables names a bit shorter:

put_to_s3(bucket, key, use_multipart=True, overwrite=False, confirm=True)

Unfortunately, this doesnt’ quite do the trick. A better tact, that has benefits that go beyond 80 characters, is to utilize vertical space. Or in simpler terms, code by lines rather than variables. For example, I would have refactored the original code like this.

put_to_s3(
    project_bucket,
    resultant_keyname,
    use_multipart=True,
    overwrite=False,
    confirm=True
)

I get to keep my more descriptive names and when the signature of the function changes or I have to add another keyword argument, the diff / patch will be much clearer. Also, and this is obviously subjective, if the vertical listing seems to grow large, you have a more obvious “smell” to the code when you are browsing the codebase.

It is understandable to assume that limiting line size could result in cryptic variable names, but more often than not, longer lines end up being more difficult to read and decode. More importantly, you end up fighting the endless suite of line based tools we utilize in version control. The next time you feel limited by the line length, consider the vertical space you have and if that might allow you to have your descriptive variable names along side your line based coding tools.

Announcing CacheControl 0.9.2

I’ve just released CacheControl 0.9.2! As requests now supports response pickling out of the box, CacheControl won’t try to patch the Response object unless it is necessary.

Also, I’ve heard that CacheControl is being used successfully in production! It has helped us replace httplib2 in our core application, which has pretty decent traffic.

Download the release over at pypi and check out the docs.

Hiding Complexity vs. Too Many Layers

If you’ve ever tried TDD there is a decent chance you’ve written some code like this:

from mock import patch


@patch('foo.uploader.upload_client')
def test_upload_foo(upload_client):
    do_upload()

    upload_client.upload.assert_called_with(new_filename())

In this example, what is happening is we are testing some code that uploads a file somewhere like S3. We patch the actual upload layer to make sure we don’t have to upload anything. We then are asserting that we are uploading the file using the right filename, which is the result of the new_filename function.

The code might look something like this:

from mypkg.uploader import upload_client


def new_filename():
    return some_hash() + request.path


def do_upload():
    upload_client.upload(new_filename())

The nice thing about this code it is pretty reasonable to test. But, in this simplified state, it doesn’t reflect what happens when you have a more complex situation with multiple layers.

For example, here is an object that creates a gzipped CSV writer on some parameters and the current time.

class Foo(object):

    basedir = '/'

    def __init__(self, bar, baz, now=None):
        self.bar = bar
        self.baz = baz
        self._now = now
        self._file_handle = None

    @property
    def now(self):
        if not self._now:
            self._now = datetime.now().strftime('%Y-%m-%d')
        return self._now

    def fname(self):
        return '%s.gz' % os.path.join(self.basedir, self.now,
                                      self.bar, self.baz)

    @property
    def file_handle(self):
        if not self._file_handle:
            self._file_handle = gzip.open(self.fname())
        return self._file_handle

    def writer(self):
        return csv.writer(self.file_handle)

The essence of this functionality could all be condensed down to a single method:

def get_writer(self):
    now = self._now
    if not now:
        now = datetimetime.now().strftime('%Y-%m-%d')

    fname = '%s.gz' % os.path.join(self.basedir, now,
                                   self.bar, self.baz)

    # NOTE: We have to keep this handle around to close it and
    #       actually save the data.
    self.file_handle = gzip.open(fname)
    return csv.writer(self.file_handle)

The single method is pretty easy to understand, but testing becomes more difficult.

Even though the code is relatively easy to read, I believe it is better to lean towards the more testable code and I’ll tell you why.

Tests Automate Understanding

The goal of readable code and tests is to help those that have to work on the code after you’ve moved on. This person could be you! The code you pushed might have seemed perfectly readable when you originally sent it upstream. Unfortunately, that readability can only measured by the reader. The developer might be new to the project, new to the programming language or, conversely, be an author that predates you! In each of these cases, your perspective on what is easy to understand is rarely going to be nearly as clear to the next developer reading your code.

Tests on the other hand provide the next developer with confidence because they have an automated platform on which to build. Rather than simply reading the code in order to gain understanding, the next developer can play with it and confirm his or her understanding authoritatively. In this way, tests automate your understanding of the code.

Be Cautious of Layers!

Even though hiding complexity by way of layers makes things easier to test and you can automate understanding, layers still present a difficult cognitive load. Nesting objects in order to hide complexity can often become difficult to keep track of, especially when you are in a dynamic language such as Python. In static languages like Java, you have the ability to create tools to help navigate the layers of complexity. Often times in dynamic languages, similar tools are not the norm.

Obviously, there are no hard and fast rules. The best course of action is to try and find a balance. We have rough rules of thumb that help us make sure our code is somewhat readable. It is a good idea to apply similar rules to your tests. If you find that testing some code, that may be reasonably easy to read, is difficult to confirm an isolated detail, then it is probably worth creating a test and factoring out that code. The same goes for writing tons of tests to cover all the code paths.

About the Example

I came up with the example because it was some actual code I had to write. I found that I wanted to be able to test each bit separately. I had a base class that would create the file handles, but the file naming was different depending on the specific class that was inherited. By breaking out the naming patterns I was able to easily test the naming and fix the naming bugs I ran into easily. What’s more, it gave me confidence when I needed to use those file names later and wanted to be sure they were correct. I didn’t have rewrite any code that created the names because there was an obvious property that was tested.

It did make the code slightly more ugly. But, I was willing to accept that ugliness because I had tests that made sure when someone else needed to touch the code, they would have the same guarantees that I found helpful.

Test are NOT Documentation

Lastly, tests are not a replacement for readable code, docs or comments. Code is meant for computers to read and understand, not people. Therefore it is in our best interest to take our surrounding tools and use them to the best of our abilities in order to convey as clearly as possible what the computer will be doing with our text. Test offer a way to automate understanding. Test are not a replacement for understanding.

Finally, it should be clear that my preference for tests and more layers is because I value maintainable code. My definition of maintainable code is defined by years (5-10) and updated by teams of developers. In other words, my assumption is that maintenance of the code is, by far, the largest cost. Other projects don’t have the same requirements, in which case, well commented code with less isolated tests may work just fine.

Announcing CacheControl 0.8.2! Now with ETag and Vary Support

I’ve released CacheControl 0.8.2. Thanks to tow for submitting the ETag support!

Take a look at the docs and specifically the way etags are handled. I believe it is a subtle improvment over httplib2’s behavior.

Lastly, I’ve also confirmed the test suite is running under Python 3.2+. This is my first foray into the brave new world of 3.x, so please open tickets for any issues or suggestions.

Milk the Cat?

I finally read Dune. It was more of a fantasy story than pure sci-fi. The picture Dune paints reminded me of something you’d see in Heavy Metal or some other comic, so it was a pretty fun read.

Then I made the mistake of watching the movie.

First off, the positive. The movie really tries to fit as much as the book as possible. It uses a narrator to fill in a lot gaps and includes the internal dialog prevalent in the book. It ends up being pretty cheesy though and reminded me of The Wonder Years.

Now, obviously the book is better the movie. What is funny is where the movie took liberties. Generally, the “powers” of the different characters feel more magical than in the book. This isn’t a huge deal, but it cheeses the movie out.

The worst and most ridiculous is the milking of the cat.

In the book there is a character that is captured by the antagonist camp. He is given a poison that requires a daily dosage in order to keep the mortal effect at bay. In the book they give it to him in his food.

What do you think they did in the movie?

That’s right. They brought in a totally stupid contraption built around an annoyed white cat that had a rat on its back and told this character that he had to milk the cat every day in order to keep the poison at bay. I have no idea...

The worst part of it all was that the movie made the book feel cheesy. It was so bad that the imagery and story the book painted started to feel like a cheesy 80s B sci-fi flick. It was kind of bummer.

Bask in the awfulness.