Docker Gripes

September 9, 2021 | 10 min. read

At my internship over the last few months, I've been working a lot on CI/CD pipelines and revamping dev environments. A lot of this work has been utilizing Docker for containerization, and I've actually really enjoyed it so far. Docker is a really neat piece of technology, and allows for super disposable and scalable applications. I've even started to use it in a lot of my personal projects, just to keep things easy to work on across my desktop and laptop.

However, it hasn't been a totally perfect experience. As I've learned more about the technology and applied it to more use cases, there are a few things that really bug me about it. Some of these are a matter of "wrong tool for the job" that I think should be made more clear among the docs and community, but others are more core to Docker itself.

It's Not a Tool For Reproducibility

Every time someone says something along the lines of, "When you build a Docker image, it's the same application every time!!", a kitten dies. Docker is not a tool for reproducible packages or applications, and it has never been that Actually, I'll add a caveat to that. I like to think that Docker is meant for reproducible processes, not reproducible programs. Docker is great for the kind of scenario where you want to kill an app, restart it, scale it out to a dozen containers, etc., and have it be the same every time and across any system with a Docker runtime (given they all use the same image). . Yes, you can get a modicum more reproducibility than you would from using your OS package manager everywhere (e.g., using a certain version of the node image to make sure you have the right NodeJS version), but that's not strictly reproducible.

In a huge majority of Dockerfiles I've looked at, there's always something along the lines of:

RUN sudo apt update && sudo apt upgrade
RUN sudo apt install python

This makes any kind of effort toward reproducibility null and void. What's in your Docker image now depends on what's in the package repositories at the time of compilation; it's now totally unpredictable what python version you're on, or if it has any new dependencies, or if there's been any breaking changes among any of those dependencies. I can't count how many messes I've had to clean up because a CI/CD pipeline using a Dockerfile with a line just like the above hasn't been run in a while, and it magically refuses to work after being dusted off despite there being no changes in the code. Even if you personally avoid writing the above offense, it's no guarantee you'll still have a reproducible build. Since Docker allows you to derive images from other images, it's almost a guarantee that somewhere, deep down the stack, someone is pulling in something unpredictable.

If you're looking for actual reproducibility, then you have to adopt a pretty different mindset. Your build process needs to be a pure function of code - code goes in, application comes out. It doesn't depend on any extraneous or hidden state like the time it was compiled or the contents of files, and all dependencies must be explicitly accounted for and treated like parameters of the build function. I won't go too much into depth on this, but I really can't recommend Nix enough for this purpose.

It's a Security Nightmare

It's really telling that a lot of third party Docker tools have "doesn't need to use the Docker Engine" as a main selling point. This is for good reason - if you're on Linux, any user added to the docker group (read: any user that can do anything useful with Docker) basically has root privileges.

Let's do a little exercise to demonstrate. Let's make a secret file that only the root user can access:

$ touch secretfile.txt
$ echo "here's a secret!" >> secretfile.txt
$ chmod 000 secretfile.txt

Now, if we try to read the file:

$ cat secretfile.txt
/usr/bin/cat: secretfile.txt: Permission denied

Now, let's try and access those sweet secrets via Docker:

$ docker run --rm -it -v $(pwd):/home/me alpine:latest
# / cd /home/me
# /home/me cat secretfile.txt
here's a secret!

Since the default user of this Docker container is root (and if we don't have an image where that's the case, it's trivial to write such a Dockerfile) and we can mount files via Docker volumes willy-nilly, we can basically access everything on the host as if we had sudo privileges.

This mostly stems from the fact that the Docker daemon runs as root by default. There are ways to run the Docker daemon as a non-root user, but this has a million pain-in-the-ass edge cases of its own and I've rarely come across this in the wild. If you have a careless sysadmin or are just trying out Docker, it's far too easy to run apt install docker, add everyone dev-adjacent to the docker group, and think the job is done.

This isn't even mentioning Docker's networking model. The inner workings of Linux networking is still very high on my Things I Need To Learn list, but here's how I understand it: Docker uses iptables to manage its own networking features, and creates its own DOCKER and DOCKER-USER chains. Any Docker container that's using a network driver other than host will use these chains, and totally ignore the others and blast through any firewalling/VPN you might have set up. As I said, I'm not a networking expert so this blog post is a much more in-depth gripe.

Constant Churn

Easily my biggest meta-gripe with Docker is how often the tool reinvents itself for no discerning reason. I usually have a pretty high tolerance for changing technologies (I love to learn new things anyway), and it kind of rubs me the wrong way when people dismiss new tech as little more than "fads" or "trends" Maybe it's just because I do webdev, and it seems like anytime there's an advance in web technology (useful or otherwise) people shit their pants and start crying about JavaScript frameworks. . There's a threshold to constant revolutionizing though, and Docker really toes the line.

Probably the most recent example is the new docker compose API. If you hadn't seen, instead of the old-fashioned and antiquated docker-compose, the hot new style is the docker compose command. What's the difference? Nothing! All the commands and functionality are the same, except your terminal will now yell at you about deprecation when you use docker-compose. And good luck trying to get any Google results for docker compose as opposed to docker-compose. That SEO has been locked in for years now!

There's even more (previously fundamental) features that have been totally deprecated just in the past few years. Container links are just the first that come to mind, which has since been replaced by the pretty complex networking features.

This isn't to say that there haven't been welcome additions. Multistage Dockerfiles are fantastic, and I use them all the time. However, for every useful update there's a ton that are just annoying or user-hostile.

Conclusion

Like I mentioned in the introduction, I still think Docker is a really neat piece of technology. Obviously containerization existed before via Linux namespaces/cgroups and BSD jails, but Docker finally made containers actually accessible and easy for non-sysadmins to work on.

However, the more I learn about Docker and container technology, the more I wonder what Docker actually is. At first I thought Docker was basically the entire container stack, but turns out that's wrong. Is Docker the container runtime, that launches and sandboxes the underlying Linux process? Nope, turns out that's runc. We call them "Docker images" in common parlance, so is Docker the image format? Nope, turns out it just uses the standard Open Container Initiative image format.

As far as I can tell, The Tool Formerly Known as Docker is just the Dockerfile to OCI image build system, the Docker Container Engine that relays API and CLI calls to the underlying container runtime, and the glue that holds it all together. I think this problem of fuzzy nomenclature is really something people need to do better at communicating. Apparently even professional sysadmins/devops engineers have a hard time grokking this, as evident by the general panic caused by the Kubernetes team when they announced they were no longer supporting "Docker" - k8s still supports Docker/OCI images, it just no longer uses the Docker container engine. This is a little amusing, since those two things are the exact features that I dislike the most about Docker - as stated previously, the build system is a major footgun and the container engine requires giving up the keys to the kingdom.

The solutions that I can foresee are twofold. As alluded to above, I'm a big fan of Nix and its philosophies around build systems. It has a few features for building OCI-compliant images, and I'd suggest to anyone keen on reproducible and stable builds to give that a try. I've also been seeing more and more about Podman, and it looks pretty interesting - it has all of Docker's features and commands (the tutorial even recommends just aliasing docker to podman), and does not requires a daemon running with root access. I have yet to use it seriously, but I foresee myself joining that cargo cult in the future.