Software Project Structure and Ecosystems

Code can smell, but just like our own human noses, everyone has his/her own perspective on what stinks. Also, just because something smells bad now, it doesn’t mean you can’t get used to the smell and eventually enjoy it. The same applies for how software projects are organized and configured.

Most languages have the concept a package. Your source code repository is organized to support building some a package that can be distributed via your language’s ecosystem. Python has setuptools/pip, Perl has CPAN, JavaScript has NPM, Ruby has gems, etc. Even compiled languages provide packages by way of providing builds for different operating system packaging systems.

In each package use case, there are common idioms and best practices that the community supports. There are tools that promote these ideals that end up part of the language and community packing ecosystem. This ecosystem often ends up permeating not just the project layout, but the project itself. As developers, we want to be participants in the ecosystem and benefit from everything that the ecosystem provides.

As we become accustomed to our language ecosystem and its project tendencies, we develop an appreciation for the aroma of the code. Our sense of code smell can be tainted to the point of feeling that other project structures smell bad and are some how “wrong” in how they work.

If you’ve ever gone from disliking some cuisine to making it an integral part of your diet, you will quickly see the problem with associating a community’s ecosystem with sound software development practices. By disregarding different patterns as “smelly”, you also lose the opportunity to glean from positive aspects of other ecosystems.

A great example is Python’s virtualenv infrastructure compared to Go’s complete lack of packages. In Python, you create source based packages (most of the time) that are extracted and built within a virtual Python environment. If the package requires other packages, the environment uses the same system (usually pip) to download and build the sub-requirements. Go, on the other hand, requires that you specify libraries and other programs you use in your build step. These requirements are not “packaged” at all and typically use source repositories directly when defining requirements. These requirements become a part of your distributed executable. The executable is a single file that can be copied to another machine and run without requiring any libraries or tools on the target machine.

Both of these systems are extremely powerful, yet radically different. Python might benefit a great deal by building in the ability to package up a virtualenv and specific command as a single file that could be run on another system. Similarly, Go could benefit from a more formalized package repository system that ensures higher security standards. It is easy to look at either system and feel the lack of the others packing features is detrimental, when in fact, they are just different design trade offs. Python has become very successful on systems such as Heroku where it is relatively easy to take a project from source code to a release because the ecosystem promotes building software during deployment. Go, on the other hand, has become an ideal system management tool because it is trivial to copy a Go program to another machine and run it without requiring OS specific libraries or dependencies.

While packaging is one area we see programmers develop a nose for coding conventions, it doesn’t stop there.

The language ecosystem also prescribes coding standards. Some languages are perfectly happy to have a huge single file while others require a new file for each class / function. Each methodology has its own penalties and rewards. Many a Python / Vimmer has wretched at the number of files produced when writing Java, while the Java developer stares in shock as the vimmer uses search and replace to refactor code in a project. Again, these design decisions with their own set of trade offs.

One area where code smell becomes especially problematic is when it involves a coding paradigm that is different from the ecosystem. Up until recently, The Twisted and stackless ecosystems in Python felt strange and isolated. Many developers felt that things like deferred and greenlets felt like code smell when compared to the tried and true threads. Yet, as we discovered the need for more socket connections to be open at the same time and as we needed to read more I/O concurrently, it became clear that asynchronous programming models should be a first class citizen in the Python ecosystem at large. Prior to the ecosystem’s acceptance of async, the best practices felt very different and quite smelly.

Fortunately, in the case of async, Python managed to find a way to be more inclusive. The async paradigm still has far reaching requirements (there needs to be a main loop somewhere...), but the community has begun the process of integrating the ideas as seamless, natural, fragrant additions to the language.

Unfortunately, other paradigms have not been as well integrated. Functional programming, while having many champions and excellent libraries, still has not managed to break into the ecosystem at the project level. If we have a packages that have very few classes, nested function calls, LRU caches and tons of (cy)func|itertool(s|z) it feels smelly.

As programmers we need to make a conscious effort to expand our code olfaction to include as wide a bouquet as possible. When we see code that seems stinky, rather than assuming it is poorly designed or dangerous, take a big whiff and see what you find out. Make an effort to understand it and appreciate its fragrance. Over time, you’ll eventually understand the difference between pungent code, that is powerful and efficient, vs. stinky code that is actually rancid and should be thrown out.