Announcing Withenv

I wrote a tool to help sanely manage environment variables. Environment Variables (env vars) are a great way to pass data to programs because it works practically everywhere with no set up. It is a lowest common denominator that almost all systems support all the way from dev to production.

The problem with env vars is that they can be sticky. You are in a shell (zsh, bash, fish, etc...) and you set an environment variable. It exists and is available to every command from then on. If an env var contains an important secret such as a cloud account key, you could silently delete production nodes by mistake. Someone else could use your computer and do the same thing, with or without malicious intent.

Another difficulty with env vars is that they are a global key value store. Writing small shell scripts to export environment variables can be error prone. Copying and pasting or commenting out env vars in order to configure a script is easy to screw up. The fact these env vars are long lasting only makes it more difficult to automate reliably.

Withenv tries to improve this situation by providing some helpful features:

  • Setup the environment for each command without it leaking into your shell
  • Organization of your environment via YAML files
  • Cascading of your environment files in order to override specific values
  • Debugging the environment variables

Here is how it works.

Lets say we have a script that starts up some servers. It uses some environment variables to choose how many servers to spin, what cloud account to use and what role to configure them with (via Chef or Ansible or Salt, etc.). The script isn’t important, so we’ll just assume make create does all the work.

Lets organize our environment YAML files. We’ll create a envs folder that we can use to populate our environment. It will have some directories to help build up an environment.

├─ env
│  ├─ dev
│  └─ prod
└─ roles
   ├─ app-foo
   └─ app-bar

Now we’ll add some YAML files. For example, lets create a YAML file in the envs/env/dev that connects to a development account.

# envs/env/dev/rax_creds.yml
  - USERNAME: eric
  - API_KEY: 02aloksjdfp;aoidjf;aosdijf

You’ll notice that we used a nested data structure as well as lists. Using lists ensure we get an explicit ordering. We could have used a normal dictionary as well if the order doesn’t matter. The nesting ensures that each child entry will use the correct prefix. For example, the YAML above is equivalent to the following bash script.

export RACKSPACE_API_KEY=02aloksjdfp;aoidjf;aosdijf

Now, lets create another file for defining some object storage info.

# envs/env/dev/cloud_storage.yml
- STORAGE_BUCKET: devstore

You’ll notice that the STORAGE_PREFIX uses the value of the STORAGE_BUCKET. You can do normal dollar prefixed replacements like you would do normally in an shell. This includes any variables currently defined in your environment such as $HOME or $USER that are typically set. Also, by using a list (as defined by the -), we ensure that we apply the variables in order and the STORAGE_BUCKET exists for use within the STORAGE_PREFIX value.

With our environment YAML in place, we can now use the we command withenv provides in order to set up the environment before calling a command.

$ we -e envs/common.yml -d envs/env/dev -d envs/role/app-foo make create

The -e flag lets you point to a specific YAML file, while the -d flag points to a directory of YAML files. The ordering of the flags is important because the last entry will take precedence. In the command above, we might have configured common.yml with a personal dev account along with our defaults. The envs/env/dev/ folder contains a rax_creds.yml file that overrides the default cloud account with shared development account, leaving the other defaults alone.

The one limitation is that you cannot use the output from commands as a value to an env var. For example, the following wouldn’t work to set a directory path.

CONFIG_PATH: `pwd`/etc/foo/

This might be fixed in the future, but at the moment it is not supported.

If you don’t pass any argument to the we command it will output he environment as a bash script using export to set variables.

Withenv is available on pypi. Please let me know if you give it a try.

Log Buffering

Have you ever had code that needed to do some logging, but your logging configuration hadn’t been loaded? While it is a best practice to set up logging as early as possible, logging is still code that needs to be executed. The Python runtime will still do some setup (ie import everything) that MUST come before ANY code is executed, including your logging code.

One solution would be to jump through some hoops to make that code evaluated more lazily. For example, say you wanted to apply a decorator from some other package if it is installed. The first time the function is called, you could apply the decorator. This would get pretty complex pretty quickly.

class LazyDecorator(object):
    def __init__(self, entry_point):
        self.entry_point = entry_point
        self.func = None

    def find_decorator(self):
        # find our decorator...

    def __call__(self, f):
        self.original_func = f
        def lazy_wrapper(*args, **kw):
            if not self.func:
                self.func = self.find_decorator()
            return self.func(*args, **kw)
        return lazy_wrapper

I haven’t tried the code above, but it does rub me the wrong way. The reason being is that we’re jumping through hoops just to do some logging. Function calls are expensive in Python, which means if you decorated a ton of functions, the result could end up as a lot of overhead for a feature that only effects start up.

Instead, we can just buffer the log output until after we’ve loaded our logging config.

import logging

class LazyLogger(object):

    LVLS = dict(

    def __init__(self):
        self.messages = []

    def replay(self, logger=None):
        logger = logging.getLogger(__name__)
        for level, msg, args, kw in self.messages:
            logger.log(level, msg, *args, **kw)

    __call__ = replay

    def capture(self, lvl, msg, *args, **kw):
        self.messages.append((lvl, msg, args, kw))

    def __getattr__(self, name):
        if name in self.LVLS:
            return functools.partial(self.capture, self.LVLS[name])

We can use this as our logging object in our code that needs to log before logging has been configured. Then, when we can replay our log when it is appropriate by importing the logger, and calling the replay method. We could even keep a registry of lazy loggers and call them all after configuring logging.

The benefit of this tactic is that you avoid adding runtime complexity, while supporting the same logging patterns at startup / import time.

DevOps System Calls

One thing I’ve found when looking at DevOps is the adherance to specific tools. For example, if an organization uses chef, then it is expected that chef be responsible for all tasks. It is understandable to reuse knowledge gained in a system, but at the same time, all systems have pros and cons.

More importantly, each tool adheres to its own philosophies for how a system should be defined. Some are declarative while others are iterative and almost all systems define their own (clever at times) verbage for what the different elements of a system should be.

What the DevOps ecosystem really needs is a low level suite of common primitives we can build off of. A set of DevOps System Calls, if you will, we can use to build higher order systems. The reason is to gain the ability to have some gaurantees we can start to assume will work.

For example, in Python, when I write tests, I assume the standard library functions such as open or the socket module work as expected. You don’t see tests such as:

def test_open():
    with open('test_file.txt') as fh:

    assert open('test_file.txt').read() = 'foo'

We have similar expectations regarding much of the TCP/IP stack. We assume the bits are read correctly on the network hardware and passed to the OS, eventually landing in our program correctly. We take it for granted that the HTTP request becomes something like request.headers[‘Content-Type’] in our language of choice.

These assumptions let us consider our program in higher level terms that are portable across languages and systems. Every programmer understands what it means to open file, connect to a database or make a HTTP request within our programs because our level of abstraction is reasonably high.

DevOps could use a similar standard and the implementation doesn’t matter. A machine might be created with Ansible, but configured via Chef. That part doesn’t matter. What matters is we can write simple code that manages our operations.

For example, lets say I want to spin up a machine to run an app and a DB. Here is some psuedo code that might get the job done.

machine = cloud.create(flavor=provider.FLAVOR_COMPUTE)
app = packages.find('my-app')

This would compile to a suite of commands that trigger some DevOps tools do the work necessary to build the machines. The configuration of what provider, available flavors, and repository locations would all live in OS level config like you see for your OS networking, auth and everything else in /etc.

The key is that we can assume the calls will work or throw an error. The process is ecapsulated in such a way that we don’t need to think about the provider, setting API keys in an environment, bootstrapping the node for our configuration managment and every other tiny detail that needs to be performed and validated in order to consider the “recipe” or “playbook” as done.

Obviously, this is not trivial. But, if we consider where our tools excel and begin the process of encapsulating the tools behind some higher order concepts, we can begin to create a glossary and shared expectations. The result is a true Cloud OS.

Playing with Repose

At work we use a proxy called repose in front of most services in order to make common tasks such as auth, rate limiting, etc. consistent. In python, this type of function might also be accomplished via WSGI middleware, but by using a separate proxy, you get two benefits.

  1. The service can be written in any language that understands HTTP.
  2. The service gets to avoid many orthogonal concerns.

While the reasoning for repose makes a lot of sense, for someone not familiar with Java, it can be a little daunting to play with. Fortunately, the repose folks have provided some packages to make playing with repose pretty easy.

We’ll start with a docker container to run repose. The repose docs has an example we can use as a template. But first lets make a directory to play in.

$ mkdir repose-playground
$ cd repose-playground

Now lets create our Dockerfile:

FROM ubuntu

RUN apt-get install -y wget

RUN wget -O - | apt-key add - && echo "deb stable main" > /etc/apt/sources.list.d/openrepose.list

RUN apt-get update && apt-get install -y \
  repose-valve \
  repose-filter-bundle \

CMD ["java", "-jar", "/usr/share/repose/repose-valve.jar"]

The next step will be to start up our container and grab the default config files. This makes it much easier to experiment since we have decent defaults.

$ docker build -t repose-playground .
$ mkdir etc
$ docker run -it -v `pwd`/etc:/code repose-playground cp -r /etc/repose /code

Now we have our config in ./etc/repose, we can try something out. Lets change our default endpoint to point to a different website.

<?xml version="1.0" encoding="UTF-8"?>

<!-- To configure Repose see: -->
<system-model xmlns="">
    <repose-cluster id="repose">
            <node id="repose_node1" hostname="localhost" http-port="8080"/>
            <endpoint id="open_repose" protocol="http"
                      <!-- redirect to! -->
                      root-path="/" port="80"

Now we’ll run repose from our container, using our local config instead of the config in the container.

$ docker run -it -v `pwd`/etc/repose:/etc/repose -p 8080:8080

If you’re using boot2docker, you can use boot2docker ip to find the IP of your VM.

$ export REPOSE_HOST=`boot2docker ip`
$ curl "http://$REPOSE_HOST:8080"

You should see the homepage HTML from!

Once you have repose running, you can leave it up and change the config as needed. Repose will periodically pick up any changes without restarting.

I’ve gone ahead and automated the steps in this repose-playground repo. While it can be tricky to get started with repose, especially if you’re not familiar with Java, it is worth taking a look at repose for implementing orthogonal requirements that make the essential application code more complex. This especially true if you’re using a micro services model where the less code the better. Just run repose on the same node, proxying requests to your service, which only listens on localhost and you’re good to go.

Docker vs. Provisioning

Lately, I’ve been playing around with Docker as I’ve moved back to OS X for development. At the same time, I’ve been getting acquainted with Chef in a reasonably complex production environment. As both systems have a decent level of overlap, it might be helpful to compare and contrast the different methodologies of these two deployment tactics.

What does Docker actually do?

Docker wraps up the container functionality built into the Linux kernel. Basically, it lets a process use the machine’s hardware in a very specific manner, using a predefined filesystem. When you use docker, it feels like starting up a tiny VM to run a process. But, what really happens, the container’s filesystem is used along with the hardware provided by the kernel in order to run the process in an isolated environment.

When you use Docker, you typically start from an “image”. The image is just an initial filesystem you’ll be starting from. From there, you might install some packages and add some files in order to run some process. When you are ready to run the process, you use docker run and it will get the filesystem ready and run the process using the computer’s hardware.

Where this differs from VM is that you only start one process. While you might create a container that has installed Postgres, RabbitMQ and your own app, when you run docker run myimage myapp, no other processes are running. The container only provides the filesystem. It is up to the caller how the underlying hardware is accessed and utilized. This includes everything from the disk to the network.

What does a Provisioner do?

A provisioner, like Chef, configures a machine in a certain state. Like Docker, this means getting the file system sorted out, including installing packages, adding configuration, adding users, etc. A provisioner also can start processes on the machine as part of the provisioning process.

A provisioner usually starts from a known image. In this case, I’m using “image” in the more common VM context, where it is a snapshot of the OS. With that in mind, a provisioner doesn’t require a specific image, but rather, the set of required resources necessary to consider the provisioned machine as complete. For example, there is no reason you couldn’t use a provisioner to create user directories across variety of unices, including OS X and the BSDs.

Different Deployment Strategies

The key difference when using Docker or a provisioner is the strategy used for deployment. How do you take your infrastructure and configure it to run your applications consistently?

Docker takes the approach of deploying containers. The concept of a container is that it is self contained. The OS doesn’t matter, assuming it supports docker. Your deployment then involves getting the container image and running the processes supported by the container.

From a development perspective, the deliverable artifact of the build process would be a container (or set of containers) to run your different processes. From there, you would configure your infrastructure accordingly, configuring the resources the processes can use at run time.

A provisioner takes a more generalized route. The provisioner configures the machine, therefore, it can use any number of deliverables to get your processes running. You can create system packages, programming language environments or even containers to get your system up and running.

The key difference from the devops perspective (the intersection of development and sysops), is development within constraints of the system must be coordinated with the provisioner. In other words, developers can’t simply choose some platform or application. All dependencies must be integrated into the provisioning system. A docker container, on the other hand, can be based on any image and use any resource available within the image’s filesystem.

What do you want to do?

The question of whether to use Docker or a provisioning system is not an either or proposition. If you choose to use Docker containers as your deployment artifact, the actual machines may still need to be configured. There are options that avoid the need to use a provisioning system, but generally, you may still use something like Chef to maintain and provision the servers that will be running your containers.

One vector to make a decision on what strategy to use is the level of consistency across your infrastructure. If you are fine with developers creating containers that may use different operating systems and tooling, docker is an excellent choice. If you have hard requirements as to how your OS should be configured, using a provisioning system might be better suited for you.

Another thing to consider is development resources. It can be a blessing and a curse to provision a development system, no matter what system you use. Your team might be more than happy to take on managing containers efficiently, while other teams would be better off leaving most system decisions to the provisioning system. The ecosystem surrounding each platform is another consideration.


I don’t imagine that docker (and containers generally) will completely supplant provisioning services. But, I do believe the model does aid in producing more consistent deployment artifacts over time. Testing a container locally is a reasonably reliable means of ensuring it should run in production. That said, containers require that many resources must be configured (network, disk, etc.) in order to work correctly. This is a non-trivial step and making it work in development, especially when you consider devs using tools like boot2docker, can be a difficult process. It can much easier to simply spin up a Vagrant VM with the necessary processes and be done with it. Fortunately, there tools like docker compose and docker machine that seem to be addressing this shortcoming.

Some Thoughts on Agile

I’ve recently started working under in an “agile” environment. There are stories, story points, a board, etc. I couldn’t tell you whether it was pure scrum or some other flavor of “agile”, but I can say it is definitely meant to be an “agile” system of software development. It is not perfect, but that is sort of the point, to roll with the punches of the real world and do your best with a reasonable about of data.

Some folks argue agile is nonsense, but after using agile techniques, the detractors typically don’t consider using agile techniques as a tool and consider it a concrete set of rules. No project management technique perfectly compartmentalizes all problems into easily solvable units. The best we can do is utilize techniques in order to improve our chances of success writing software.

There are two benefits that I’ve noticed of using an agile technique.

  1. You must communicate and record what is happening
  2. You may change things according to your needs

The requirement to communicate and record what’s happening is important because it forces developers to make information public. The process of writing a good story and updating the story with comments (I’m assuming some bug tracking software is being used) helps guard against problems going unnoticed. It also provides history that can be learned from. It holds people accountable.

Allowing change is also critical. Something like Scrum is extremely specific and detailed, yet as an organization, you have the option and priviledge to adapt and change vanilla Scrum to your own requirements. For example, some organizations should use story points for estimating work and establishing velocity, while others would be better suited for using more specific time estimates. Both estimation methods have their place and you have can choose the method that best meets your needs.

When adopting an agile practice it is a good idea to try out the vanilla version. But, just like your software, you should iterate and try things to optimize the process for your needs. It is OK to create stories that establish specifications. It is OK to use 1-10 for estimating work. It is OK to write “stories” more like traditional bug reports.

It is not OK to skip the communication and recording of what is going on and it is not OK to ignore the needs of your organization in order to adhere to the tenents of your chosen agile methodology.

Announcing rdo

At work I’ve been using Vagrant for development. The thing that bothers me most about using Vagrant or any remote development machine is the disconnect it presents with local tools. You are forced to either login into the machine and run command or jump through hoops to run the commands from the local machine, most often times losing the file system context that make the local tools possible.

Local Tools

What I mean by local tools are things like IDEs or build code that performs tasks on your repository. IDEs assume you are developing locally and expect a local executable for certain tasks in order to work. Build code can be platform specific, which is likely why you are using Vagrant in the first place.

My answer to this is rdo.

Why rdo

I have a similar project called xe that you can configure to sort out your path when in a specific project. For example, if I have a virtualenv venv, in my cloned repo, I can use xe python to run the correct python without having to activate the virtualenv or manually include the path to the python executable.

rdo works in a similar way, the difference being that instead of adjust the path, it configures the command to run on a remote machine, such as a Vagrant VM.

Using rdo

For example, lets assume you have a Makefile in your project repo. You’ve written a bootstrap task that will download any system dependencies for your app.

     sudo apt-get install -y python-pip python-lxml

Obviously if you are on OS X or RHEL, you don’t use apt for package management, and therefore use a Vagrant VM. Rather than having to ssh into the VM, you can use rdo.

The first step is to create a config file in your repo.

driver = vagrant
directory = /vagrant

That assumes you’re Vagrantfile is mounting your repo at /vagrant. You can change it as needed.

From there you can use rdo to run commands.

$ rdo make bootstrap

That will compose a command to connect to the vagrant VM, cd to the correct directory and run your command.


I hope you give it a try and report back any issues. At the moment it extremely basic in that it doesn’t do anything terribly smart as far as escaping goes. I hope to remedy that as well as support generic ssh connections as well.

Virtual Machine Development

I’ve recently started developing on OS X again on software that will be run on Linux. The solution I’ve used has been to use a Vagrant VM, but I’m not entirely happy with it. Here are a few other things I’ve tried.

Docker / Fig

On OS X, boot2docker makes it possible to use docker for running processes in containers. Fig lets you orchestrate and connect containers.


Fig is deprecated and will be replaced with Docker Compose, but I found that Docker Compose didn’t work for me on OS X.

The idea is that you’d run MySQL, RabbitMQ, etc. in containers and expose those processes’ ports and hosts to your app container. Here is an example:

  image: mysql:5.5

  build: path/to/myapp/  # A Dockerfile must be here
    - mysql

The app container then can access mysql as a host in order to get to the container running MySQL.

While I think this pattern could work, I found that it needs a bit too much hand holding. For example, you explicitly need to make sure volumes are set up for each service that needs persistence. Doing the typical database sync ended up being problematic because it wasn’t trivial to connect. I’m sure I was doing it wrong along the way, but it seems that you have to constantly tweak any tutorial because you have to use boot2docker.

Docker Machine

Another tactic I used was docker-machine. This is basically how boot2docker works. It can start a machine, configured by docker, and provide you commands so you can run things on that machine via the normal docker command line. This seemed promising, but in the end, it was pretty much the same as using Vagrant, only a lot less convenient.

I also considered using it with my Rackspace account, but, for whatever reason, the client couldn’t destroy machines, which made it much less enticing.


One thing that was frustrating with Vagrant is that if you use a virtualenv that is on part of the file system that is mounted from the host (ie OS X), doing any sort of package loading is really slow. I have no clue why this is. I ended up just installing things as root, but I think a better tactic might be to use virtualenvwrapper, which should install it to the home directory, while code still lives in /vagrant/*.

One thing that I did do initially was to write a Makefile for working with Vagrant. Here is a snippet:

VCMD=vagrant ssh -c

     $(VCMD) 'virtualenv $(VENV)'
     $(VCMD) 'cd $(SRC) && $(VENV)/bin/pip install -r requirements.txt -r test-requirements.txt'
     $(VCMD) 'cd $(SRC) && $(VENV)/bin/python develop'

     $(VCMD) 'cd $(SRC) && $(VENV)/bin/tox'

It is kind of ugly, but it more or less works. I also tried some other options such as using my xe package to use paramiko or fabric, but both tactics made it too hard to simply do things like:

$ xe tox -e py27

And make xe figure out what needs to happen to run the commands correctly on the remote host. What is frustrating is that docker managed to essentially do this aspect rather well.


OS X is not Linux. There are more than enough differences that make developing locally really difficult. Also, most code is not meant to be portable. I’m on the fence as to whether this is a real problem or just a fact of life with more work being done on servers in the cloud. Finally, virtualization and containers still need tons of work. It feels a little like the wild west in that there are really no rules and very few best practices. The potential is obvious, but the path is far from paved. Of the two, virtualization definitely feels like a better tactic for development. With that in mind, it would be even better if you could simply do with Vagrant what you can do with docker. Time will tell!

Even though I didn’t manage to make major strides into a better dev story for OS X, I did learn quite a bit about the different options out there. Did it make me miss using Linux? Yes. But I haven’t given up yet!

tsf and randstr

I wrote a couple really small tools the other day that I packaged up. I hope someone else finds them useful!


tsf directs stdin to a timestamped file or directory. For example:

$ curl | tsf homepage.html

The output from the cURL request goes into a file 20150326123412-homepage.html. You can also specify that a directory should be used.

$ curl | tsf -d homepage.html

That will create a homepage.html directory with the timestamped files in the directory.

Why is this helpful?

I often debug things by writing small scripts that automate repetitive actions. It is common that I’ll keep around output for this sort of thing so I can examine it. Using tsf, it is a little less likely that I’ll overwrite a version that I needed to keep.

Another place it can be helpful is if you periodically run a script and you need to keep the result in a time stamped file. It does that too.


randstr is a function creates a random string.

from randstr import randstr


randstr provides some globals containing different sets of characters you can pass in to the call in order to get different varieties of random strings. You can also set the length.

Why is this helpful?

I’ve written this function a ton of times so I figured it was worth making a package out of it. It is helpful for testing b/c you can easily create random identifiers. For example:

from randstr import randstr

batch_records = [{'name': randstr()} for i in range(1000)]

I’m sure there are other tools out there that do similar or exactly the same thing, but these are mine and I like them. I hope you do too.

Automate Everything

I’ve found that it has become increasingly difficult to simply hack something together without formally automating the process. Something inside me just can’t handle the idea of repeating steps that could be automated. My solution has been to become faster at the process of formal automation. Most of the steps are small, easy to do and don’t take much time. Rather than feeling guilty that I’m wasting time by writing small a library or script, I work to make the process faster and am able to re-use these scripts and snippets in the future.

A nice side effect is that writing code has become much more fluid. I get more practice using essential libraries and tools where over time they’ve become second nature. It also can be helpful getting in the flow because taking the extra steps of writing to files and setting up a small package feels like a warm up of sorts.

One thing that has been difficult is navigating the wealth of options. For example, I’ve gone back to OS X for development. I’ve had to use VMs for running processes and tests. I’ve been playing with Vagrant and Docker. These can be configured with chef, ansible, or puppet in addition to writing a Vagrantfile or Dockerfile. Does chef set up a docker container? Do you call chef-apply in your Dockerfile? On OS X you have to use boot2docker, which seems to be a wrapper around docker machine. Even though I know the process can be configured to be completely automated, it is tough to feel as though you’re doing it right.

Obviously, there is a balance. It can be easy to become caught in a quagmire of automation, especially when you’re trying to automate something that was never intended to be driven programaticaly. At some point, even though it hurts, I have to just bear down and type the commands or click the mouse over and over again.

That is until I break down and start writing soem elisp to do it for me ;)