Speeding Up Python

Oct 21, 2025

One theme at Dagster is speed. Python, the langua franca of the data engineering world, is arguably slow. We use Python for everything and consistently see areas where it could be faster. Historically, Python advocates have always said you can rewrite slow code in C. While I’m sure there are a few folks that take this path, I think the more common solution is to simply spin up more infrastructure. I am excited about free threading in Python 3.14. I think finding ways to leverage all the CPUs on a machine will end up being really helpful. I interview a lot of younger engineers that grew up in a serverless enabled world where threads are mysteries. It is easier to just always assume one thread and add more machines to the mix. I don’t disagree with this mindset, but at the same time, after using Go for a while and seeing the performance, it feels wasteful. I’m hopeful that we’ll see more multi-threaded tools that offer more room to scale vertically now that we have a means around the GIL!

I did take a minute to try to implement go routines in Python. The result was interesting, but generally not ideal. Go, being a compiled language, is able to more efficiently break things apart under the hood to balance async I/O and CPU bound work in a way that Python can’t do easily without more well defined hooks. If gevent worked on free threading, I think this concept might have legs, but as it stands, you need to mix and match asyncio (or something similar like Trio) to get the behavior I was looking for and that made it too challenging to make the async side magic.

When I think about making Python faster, I usually think about how I can move work out of Python. I mentioned writing C extensions where Cython seems promising for folks that don’t know C very well. Another classic tactic from web development is to push more work into the database. This got me thinking about DuckDB and how I could move work there. DuckDB has tons of great features already and if you do need to optimize for reading/writing data, including doing computations, it is a great fit. So, thinking of a recent problem I had around downloading S3 files I made an extension to, hopefully, do it faster than in Python.

I started with a simple cli, s3dl that would spin up a bunch of threads to download some parquet files. Ideally, I’d do whatever uv does, but as I had to use C++ because Rust is still experimentally support for DuckDB extensions, I stuck with minimal dependencies and just used threads. After I got my script working, I created my ducks3 extension. The extension didn’t leverage anything in DuckDB specifically. It just downloaded the files to a folder and outputted the results of the download. But, I do think the idea wasn’t too far off from what I hoped to achieve. My goal was to do work outside Python, yet still with Python, that leveraged other tools without the burden being too great (i.e. learning to write C extensions).

The thing I like about this DuckDB solution is that we can load data into an environment where we can manipulate it quickly. Specifically, I think this is valuable in the Data Engineering space because one area that will always be somewhat slow is taking data and converting it to Python to perform analysis. I believe Polars is a good solution as it performs a similar role, but for folks that aren’t as familiar with data frames, maybe DuckDB is a better fit.

I’m hopeful free threading helps to unlock vertical scaling for Python. Where that won’t work, I’m excited about tools like DuckDB that offer a direct path for people to optimize that may be easier than writing your own C/Rust extensions. Also, as I vibe coded these projects, it is pretty cool for someone that hasn’t written C/C++ since college to get something working with some reasonable complexity.