Virtual Machine Development

I’ve recently started developing on OS X again on software that will be run on Linux. The solution I’ve used has been to use a Vagrant VM, but I’m not entirely happy with it. Here are a few other things I’ve tried.

Docker / Fig

On OS X, boot2docker makes it possible to use docker for running processes in containers. Fig lets you orchestrate and connect containers.

Note

Fig is deprecated and will be replaced with Docker Compose, but I found that Docker Compose didn’t work for me on OS X.

The idea is that you’d run MySQL, RabbitMQ, etc. in containers and expose those processes’ ports and hosts to your app container. Here is an example:

mysql:
  image: mysql:5.5

app:
  build: path/to/myapp/  # A Dockerfile must be here
  links:
    - mysql

The app container then can access mysql as a host in order to get to the container running MySQL.

While I think this pattern could work, I found that it needs a bit too much hand holding. For example, you explicitly need to make sure volumes are set up for each service that needs persistence. Doing the typical database sync ended up being problematic because it wasn’t trivial to connect. I’m sure I was doing it wrong along the way, but it seems that you have to constantly tweak any tutorial because you have to use boot2docker.

Docker Machine

Another tactic I used was docker-machine. This is basically how boot2docker works. It can start a machine, configured by docker, and provide you commands so you can run things on that machine via the normal docker command line. This seemed promising, but in the end, it was pretty much the same as using Vagrant, only a lot less convenient.

I also considered using it with my Rackspace account, but, for whatever reason, the client couldn’t destroy machines, which made it much less enticing.

Vagrant

One thing that was frustrating with Vagrant is that if you use a virtualenv that is on part of the file system that is mounted from the host (ie OS X), doing any sort of package loading is really slow. I have no clue why this is. I ended up just installing things as root, but I think a better tactic might be to use virtualenvwrapper, which should install it to the home directory, while code still lives in /vagrant/*.

One thing that I did do initially was to write a Makefile for working with Vagrant. Here is a snippet:

SRC=/vagrant/designate
VENV=/usr/local
SPHINX=$(VENV)/bin/sphinx-build
VCMD=vagrant ssh -c

bootstrap:
     $(VCMD) 'virtualenv $(VENV)'
     $(VCMD) 'cd $(SRC) && $(VENV)/bin/pip install -r requirements.txt -r test-requirements.txt'
     $(VCMD) 'cd $(SRC) && $(VENV)/bin/python setup.py develop'

tests:
     $(VCMD) 'cd $(SRC) && $(VENV)/bin/tox'

It is kind of ugly, but it more or less works. I also tried some other options such as using my xe package to use paramiko or fabric, but both tactics made it too hard to simply do things like:

$ xe tox -e py27

And make xe figure out what needs to happen to run the commands correctly on the remote host. What is frustrating is that docker managed to essentially do this aspect rather well.

Conclusions

OS X is not Linux. There are more than enough differences that make developing locally really difficult. Also, most code is not meant to be portable. I’m on the fence as to whether this is a real problem or just a fact of life with more work being done on servers in the cloud. Finally, virtualization and containers still need tons of work. It feels a little like the wild west in that there are really no rules and very few best practices. The potential is obvious, but the path is far from paved. Of the two, virtualization definitely feels like a better tactic for development. With that in mind, it would be even better if you could simply do with Vagrant what you can do with docker. Time will tell!

Even though I didn’t manage to make major strides into a better dev story for OS X, I did learn quite a bit about the different options out there. Did it make me miss using Linux? Yes. But I haven’t given up yet!

tsf and randstr

I wrote a couple really small tools the other day that I packaged up. I hope someone else finds them useful!

tsf

tsf directs stdin to a timestamped file or directory. For example:

$ curl http://ionrock.org | tsf homepage.html

The output from the cURL request goes into a file 20150326123412-homepage.html. You can also specify that a directory should be used.

$ curl http://ionrock.org | tsf -d homepage.html

That will create a homepage.html directory with the timestamped files in the directory.

Why is this helpful?

I often debug things by writing small scripts that automate repetitive actions. It is common that I’ll keep around output for this sort of thing so I can examine it. Using tsf, it is a little less likely that I’ll overwrite a version that I needed to keep.

Another place it can be helpful is if you periodically run a script and you need to keep the result in a time stamped file. It does that too.

randstr

randstr is a function creates a random string.

from randstr import randstr

print(randstr())

randstr provides some globals containing different sets of characters you can pass in to the call in order to get different varieties of random strings. You can also set the length.

Why is this helpful?

I’ve written this function a ton of times so I figured it was worth making a package out of it. It is helpful for testing b/c you can easily create random identifiers. For example:

from randstr import randstr

batch_records = [{'name': randstr()} for i in range(1000)]

I’m sure there are other tools out there that do similar or exactly the same thing, but these are mine and I like them. I hope you do too.

Automate Everything

I’ve found that it has become increasingly difficult to simply hack something together without formally automating the process. Something inside me just can’t handle the idea of repeating steps that could be automated. My solution has been to become faster at the process of formal automation. Most of the steps are small, easy to do and don’t take much time. Rather than feeling guilty that I’m wasting time by writing small a library or script, I work to make the process faster and am able to re-use these scripts and snippets in the future.

A nice side effect is that writing code has become much more fluid. I get more practice using essential libraries and tools where over time they’ve become second nature. It also can be helpful getting in the flow because taking the extra steps of writing to files and setting up a small package feels like a warm up of sorts.

One thing that has been difficult is navigating the wealth of options. For example, I’ve gone back to OS X for development. I’ve had to use VMs for running processes and tests. I’ve been playing with Vagrant and Docker. These can be configured with chef, ansible, or puppet in addition to writing a Vagrantfile or Dockerfile. Does chef set up a docker container? Do you call chef-apply in your Dockerfile? On OS X you have to use boot2docker, which seems to be a wrapper around docker machine. Even though I know the process can be configured to be completely automated, it is tough to feel as though you’re doing it right.

Obviously, there is a balance. It can be easy to become caught in a quagmire of automation, especially when you’re trying to automate something that was never intended to be driven programaticaly. At some point, even though it hurts, I have to just bear down and type the commands or click the mouse over and over again.

That is until I break down and start writing soem elisp to do it for me ;)

Server Buffer Names in Circe

Circe is an IRC client for Emacs. If you are dying to try out Emacs for your IRC-ing needs, it already comes with two other clients, ERC and rcirc. Both work just fine. Personally, I’ve found circe to be a great mix of helpful features alongside simple configuration.

One thing that was always bugging me was that the server buffers names. I use an IRC bouncer that keeps me connected to the different IRC networks I use. At work, I connect to each network using a different username using a port forwarded by ssh. The result being I get 3 buffers with extremely descriptive names such as localhost:6668<2>. I’d love to have names like *irc-freenode* instead, so here is what I came up with.

First off, I wrote a small function to connect to each network that looks like this:

(defun my-start-ircs ()
  (interactive)
  (start-freenode-irc)
  (start-oftc-irc)
  (start-work-irc)
  ;; Tell circe not to show mode changes as they are pretty noisey
  (circe-set-display-handler "MODE" (lambda (&rest ignored) nil)))

Then for each IRC server I call the normal circe call. The circe call returns the server buffer. In order to rename the buffer, I can do the following:

(defun start-freenode-irc ()
  (interactive)
  (with-current-buffer (circe "locahost"
                              :port 6689
                              :nick "elarson"
                              :user "eric_on_freenode"
                              :password (my-irc-pw))
    (rename-buffer "*irc-freenode*"))

Bingo! I get a nice server buffer name. I suspect this could work with ERC and rcirc, but I haven’t tried it. Hope it helps someone else out!

Hello Rackspace!

After quite a while at YouGov, I’ve started a new position at Rackspace working on the Cloud DNS team! Specifically, I’m working on Designate, a DNS as a Service project that is in incubation for OpenStack. I’ve had an interest in OpenStack for a while now, so I feel extremely lucky I have the opportunity to join the community with one of the founding organizations.

One thing that has been interesting is the idea of the Managed Cloud. AWS focuses on Infrastructure as a Service (IaaS). Rackspace also offers IaaS, but takes that a step farther by providing support. For example, if you need a DB as a Service, AWS provides services like Redshift or SimpleDB. It is up to the users to figure out ways to optimize queries and tune the database for the user’s specific needs. In the Managed Cloud, you can ask for help and know that an experienced expert will understand what you need to do and help to make it happen, even at the application level.

While this support feels expensive, it can be much cheaper than you think when you consider the amount of time developers spend becoming pseudo experts at a huge breadth of technology that doesn’t offer any actual value to a business. Just think of the time spent on sharding a database, maintaining a CI service, learning the latest / greatest container technology, building your own VM images, maintaining your own configuration management systems, etc. Then imagine having a company that will help you set it up and maintain it over time. That is what the managed cloud is all about.

It doesn’t stop at software either. You can mix and match physical hardware with traditional cloud infrastructure as needed. If your database server needs real hardware that is integrated with your cloud storage and cloud servers, Rackspace can do it. If you have strict security compliance requirements that prevent you from using traditional IaaS providers, Rackspace can help there too. If you need to use VMWare or Windows as well as Open Stack cloud technologies, Rackspace has you covered.

I just got back from orientation, so I’m still full of the Kool-Aid.

That said, Fanatical Support truly is ingrained in the culture here. It started when the founders were challenged. They hired someone to get a handle on support and he proposed Fanatical Support. His argument was simple. If we offer a product and don’t support it, we are lying to our customers. The service they are buying is not what they are getting, so don’t be a lier and give users Fanatical Support.

I’m extremely excited to work on great technology at an extremely large scale, but more importantly, I’m ecstatic to being working at a company that ingrains integrity treats its customers and employees with the utmost respect.

Setting up magit-gerrit in Emacs

I recently started working on OpenStack and, being an avid Emacs user, I hoped to find a more integrated workflow with my editor of choice. Of the options out there, I settled on magit-gerrit.

OpenStack uses git for source control and gerrit for code review. The way code gets merged into OpenStack is through code review and gerrit. In a nutshell, you create a branch, write some code, submit a code review and after that code is reviewed and approved, it is merged upstream. The key is ensuring the code review process is thorough and convenient.

As developers with specific environments, it is crucial to be able to quickly download a patch and play around with the code. For example, running the tests locally or playing around with a new endpoint is important when approving a review. Fortunately, magit-gerrit makes this process really easy.

First off, you need to install the git-review tool. This is available via pip.

$ pip install git-review

Next up, you can check out a repo. We’ll use the Designate repo because that is what I’m working on!

$ git clone https://github.com/openstack/designate.git
$ cd designate

With a repo in place, we can start setting up magit-gerrit. Assuming you’ve setup Melpa, you can install it via M-x package-install RET magit-gerrit. Add to your emacs init file:

(require 'magit-gerrit)

The magit-gerrit docs suggest setting two variables.

;; if remote url is not using the default gerrit port and
;; ssh scheme, need to manually set this variable
(setq-default magit-gerrit-ssh-creds "myid@gerrithost.org")

;; if necessary, use an alternative remote instead of 'origin'
(setq-default magit-gerrit-remote "gerrit")

The magit-gerrit package can infer the magit-gerrit-ssh-creds from the magit-gerrit-remote. This makes it easy to configure your repo via a .dir-locals.el file.

((magit-mode
  (magit-gerrit-remote . "ssh://eric@review.openstack.org:29418/openstack/designate")))

Once you have your repo configured, you open your repo in magit via M-x magit-status. You should also see a message saying “Detected magit-gerrit-ssh-creds” that shows the credentials used to login into the gerrit server. These are simple ssh credentials, so if you can’t ssh into the gerrit server using the credentials, then you need to adjust your settings accordingly.

If everything is configured correctly, there should be an entry in the status page that lists any reviews for the project. The listing shows the summary of the review. You can navigate to the review and press T to get a list of options. From there, you can download the patchset as a branch or simply view the diff. You can also browse to the review in gerrit. From what I can tell, you can’t comment on a review, but you can give a +/- for a review.

I’ve just started using gerrit and magit-gerrit, so I’m sure there are some features that I don’t fully understand. For example, I’ve yet to understand how to re-run git review in order to update a patch after getting feedback. Assuming that isn’t supported, I’m sure it shouldn’t be too hard to add.

Feel free to ping me if you try it and have questions or tips!

Todo Lists

With a baby on the way, I’ve started reconsidering how I keep track of my TODO list.

Seeing as I spent an inordinate amount of time Emacs, org-mode is my go to tool when it comes to keeping track of my life. I can keep notes, track time and customize my environment to support a workflow that works for me. The problem is life happens outside of Emacs. Shocking, I know.

So, my goal is to have a system that integrates well with org-mode and Emacs, while still allows me to use my phone, calendar, etc. when I’m away from my computer. Also, seeing as Emacs doesn’t provide an obvious, effective calendaring solution like GMail does, I want to be sure I’m able to schedule things so I do them at specific times and have reminders.

With that in mind, I started looking at org-mobile. It seems like the perfect solution. It is basically a way to edit an org file on my phone and will (supposedly) sync deadlines and scheduled items to my calendar as one would expect. Sure, the UI could use some work, but having to know the date vs. picking it from a slick dialog on my phone seemed like a reasonable way to know what date it was...

Unfortunately, it didn’t work. I had one event sync’d to my google calendar, but that was the end of that. It didn’t seem to add anything to my calendar no matter the settings. That is a deal breaker.

I’m currently starting to play with org-trello instead. I’m confident I can make this work for two reasons.

  1. The mobile app is nice
  2. The sync’ing in Emacs is nice

What doesn’t work (yet?) is adding deadlines or scheduling to my calendar, but seeing as this new year I’m resolving to slow down, I’m going to stop trying to over optimize and just add stuff to my calendar. It is a true revelation, I know.

One thing I did consider was just skipping the computer all together and using a physical planner. The problem with a planner for me is,

  1. My handwriting is atrocious
  2. It doesn’t sync to my calendar

In addition to trying to understand my handwriting, I’d have to develop a habit to always look at it. I can’t see it happening when I can get my phone to annoy me as needed.

Part of this effort is part of a larger plan to use some of the tactics in GTD to get tasks off of my mental plate and put them somewhere useful. So far, it has been sticking more than it ever has, so I’m hopeful this could be the time it becomes a real habit. Wish me luck, as I’m sure I’ll need it!

CacheControl 0.11.0 Released

Last night I released CacheControl 0.11.0. The big change this time around is using compressed JSON to store the cache response rather than a pickled dictionary. Using pickle caused some problems if a cache was going to be read by both Python 3.x and Python 2.x clients.

Another benefit is that avoiding pickle also avoid exec’ing potentially dangerous code. It is not unreasonable that someone could include a header that could cause problems. This hasn’t happened yet, but it wouldn’t suprise me if it were feasible.

Finally, the size of the cached object should be a little smaller. Generally responses are not going to that large, but it should help if you use storage that keeps a hot keys in memory. MongoDB comes to mind along with Memcached and, probably, Redis. If you are avoiding caching large objects it could also be valuable. For example, a large sparse CSV response might be able to get compressed quite a bit to make caching it reasonable.

I haven’t don any conclusive tests regarding the actual size impact of compression, so these are just my theories. Take them with a grain of salt or let me know your experiences.

Huge thanks to Donald Stuff for sending in the compressed JSON patches as well all the folks who have submitted other suggestions and pull requests.

The Future

I’ve avoided making any major changes to CacheControl as it has been reasonably stable and flexible. There are some features that others have requested that have been too low on my own personal priorites to take time to implement.

One thing I’ve been tempted to do is add more storage implementations. For example, I started working on a SQLite store. My argument, to myself at least, was that the standard library supports SQLite, which makes it a reasonable target.

I decided to stop that work as it didn’t really seem very helpful. What did seem enticing was that a cache store becoming queryable. Unfortunately, since the cache store API only gets a key and blob for the value, it would require any cache store author to unpack the blog in order read any values it is interested in.

In the future I’ll be adding a hook system to let a cache store author have access to the requests.Response object in order to create extra arguments for setting the value.

For example, in Redis, you can set an expires time the DB will use to expire the response automatically. The cache store then might have an extra method that looks like this.

class RedisCache(BaseCache):

    def on_cache_set(self, response):
        kwargs = {}
        if 'expires' not in response.headers:
            return kwargs

        return {'expires': response.headers['expires']}

    def set(self, key, value, expires=None):
        # Set the value accordingly

I’m not crazy about this API as it is a little confusing to communicate that creating a on_cache_set hook is really a way to edit the arguments sent to the set method. Maybe calling it a hook is really the wrong term. Maybe it should be called prepare and it explicitly calls set. If anyone has thoughts, please let me know!

The reasoning is that I’d like to remove the Redis cache and start a new project for CacheControl stores that includes common cache store implementations. At the very least, I’d like to find some good implementations that I can link to from the docs to help folks find a path from a local file cache to using a real database when the time comes.

Lastly, there are a couple spec related details that could use some attention that I’ll be looking at in the meantime.

Replacing Monitors

I just read a quick article on Microsoft’s new VR goggles. The idea of layering virtual interfaces on top of the real world seems really cool and even practical in some use cases. What seems really difficult is how an application will understand the infinite number visual environments in order to effectively and accurately visualize the interfaces. Hopefully the SDK for the device includes a library that provides for different real world elements like placeOnTop(vObject, x, y z) where it can recognize some object in the room and allows that object to be made available as a platform. In other words, it sees the coffee table and makes it available as an object that you can put something on top of.

The thing is, what I’d love to see is VR replacing monitors! Every year TVs and monitors get upgrades in size and quality, yet the prices rarely drop very radically. Right now I have a laptop and an external monitor that I use at home. I’d love to get rid of both tiny screens and just look into space and see my windows on a huge, virtual, flat surface.

An actual computer would still be required and there wouldn’t necessarily need to be huge changes to the OS at first. The goggles would just be another big screen that would take the rasterized screen and convert it to something seen in the analog world. Sure, this would probably ruin a few eyes at first, but having a huge monitor measured in feet vs. inches is extremely enticing.

DevOps

I finally realized why DevOps is an idea. Up until this point, I felt DevOps was a term for a developer that was also responsible for the infrastructure. In other words, I never associated DevOps with an actual strategy or idea, and instead, it was simply something that happened. Well no more!

DevOps is providing developers keys [1] to operations. In a small organization these keys never have a chance to leave the hands of the small team of developers that have nothing to concern themselves except getting things done. As an organization grows, there becomes a dedicated person (and soon after group of people) dedicated to maintaining the infrastructure. The thing that happens is that the keys developers had to log into any server or install a new database are taken away and given to operations to manage. DevOps is a trend where operations and developers share the keys.

Because developers and operations both have access to change the infrastructure, there is a meeting of the minds that has to happen. Developers and Ops are forced to communicate the what, where, when, why and how of changes to the infrastructure. Since folks in DevOps are familiar with code, version control becomes a place of communication and cohesion.

The reason I now understand this paradigm more clearly is because when a developer doesn’t have access to the infrastructure, it is a huge waste of time. When code doesn’t work we need to be able to debug it. It important to come up with theories why things don’t work and iteratively test the theories until we find the reason for the failure. While it is possible to debug bugs that only show up in production, it can be slow, frustrating and difficult, when access to the infrastructure isn’t available.

I say all this with a huge grain of salt. I’m not a sysadmin and I’ve never been a true sysadmin. While I understand the hand wavy idea that if all developers have ssh keys to the hosts in a datacenter, there are more vectors for attack. What I don’t understand is why a developer with ssh keys is any more dangerous than a sysadmin having ssh keys. Obviously, a sysadmin may have a more stringent outlook on what is acceptable, but at the same time, anyone can be duped. After all, if you trust your developers to write code that writes to your database millions (or billions!) of times a day, I’m sure you can trust them to keep an ssh key safe or avoid exposing services that are meant to remain private.

I’m now a full on fan of DevOps. Developers and Ops working together and applying software engineering techniques to everything they work with seems like a great idea. Providing Developers keys to the infrastructure and pushing Ops to communicate the important security and/or hardware concerns is only a good thing. The more cohesion between Ops and Dev, the better.

[1]I’m not talking about ssh keys, but rather the idea of having “keys to the castle”. One facet of this could be making sure developer keys and accounts are available on the servers, but that is not the only way to give devs access to the infrastructure.