Docker Gripes
September 9, 2021 | 10 min. read
At my internship over the last few months, I've been working a lot on CI/CD pipelines and revamping dev environments. A lot of this work has been utilizing Docker for containerization, and I've actually really enjoyed it so far. Docker is a really neat piece of technology, and allows for super disposable and scalable applications. I've even started to use it in a lot of my personal projects, just to keep things easy to work on across my desktop and laptop.
However, it hasn't been a totally perfect experience. As I've learned more about the technology and applied it to more use cases, there are a few things that really bug me about it. Some of these are a matter of "wrong tool for the job" that I think should be made more clear among the docs and community, but others are more core to Docker itself.
It's Not a Tool For Reproducibility
Every time someone says something along the lines of, "When you build a Docker
image, it's the same application every time!!", a kitten dies. Docker is
not a tool for reproducible packages or applications, and it has
never been
Actually, I'll add a caveat to that. I like to
think that Docker is meant for reproducible processes, not reproducible
programs. Docker is great for the kind of scenario where you want to
kill an app, restart it, scale it out to a dozen containers, etc., and have it
be the same every time and across any system with a Docker runtime (given they
all use the same image).
. Yes, you can get a modicum more reproducibility than you would from using
your OS package manager everywhere (e.g., using a certain version of the
node
image to make sure you have the right NodeJS version), but
that's not strictly reproducible.
In a huge majority of Dockerfiles I've looked at, there's always something along the lines of:
RUN sudo apt update && sudo apt upgrade
RUN sudo apt install python
This makes any kind of effort toward reproducibility null and void. What's
in your Docker image now depends on what's in the package repositories at
the time of compilation; it's now totally unpredictable what
python
version you're on, or if it has any new dependencies, or
if there's been any breaking changes among any of those dependencies. I
can't count how many messes I've had to clean up because a CI/CD pipeline
using a Dockerfile with a line just like the above hasn't been run in a
while, and it magically refuses to work after being dusted off despite there
being no changes in the code. Even if you personally avoid writing the above
offense, it's no guarantee you'll still have a reproducible build. Since
Docker allows you to derive images from other images, it's almost a
guarantee that somewhere, deep down the stack, someone is pulling in
something unpredictable.
If you're looking for actual reproducibility, then you have to adopt a pretty different mindset. Your build process needs to be a pure function of code - code goes in, application comes out. It doesn't depend on any extraneous or hidden state like the time it was compiled or the contents of files, and all dependencies must be explicitly accounted for and treated like parameters of the build function. I won't go too much into depth on this, but I really can't recommend Nix enough for this purpose.
It's a Security Nightmare
It's really telling that a lot of third party Docker
tools have "doesn't need to use the Docker Engine" as a main selling
point. This is for good reason - if you're on Linux, any user added to the
docker
group (read: any user that can do anything useful with
Docker) basically has root privileges.
Let's do a little exercise to demonstrate. Let's make a secret file that only the root user can access:
$ touch secretfile.txt
$ echo "here's a secret!" >> secretfile.txt
$ chmod 000 secretfile.txt
Now, if we try to read the file:
$ cat secretfile.txt
/usr/bin/cat: secretfile.txt: Permission denied
Now, let's try and access those sweet secrets via Docker:
$ docker run --rm -it -v $(pwd):/home/me alpine:latest
# / cd /home/me
# /home/me cat secretfile.txt
here's a secret!
Since the default user of this Docker container is root
(and if
we don't have an image where that's the case, it's trivial to write such a
Dockerfile) and we can mount files via Docker volumes willy-nilly, we can
basically access everything on the host as if we had sudo
privileges.
This mostly stems from the fact that the Docker daemon runs as root by
default. There are ways to run the Docker daemon as a non-root user,
but this has a million pain-in-the-ass edge cases of its own and I've rarely
come across this in the wild. If you have a careless sysadmin or are just
trying out Docker, it's far too easy to run apt install docker
,
add everyone dev-adjacent to the docker
group, and think the
job is done.
This isn't even mentioning Docker's networking model. The inner workings of
Linux networking is still very high on my Things I Need To Learn list, but
here's how I understand it: Docker uses iptables
to manage its
own networking features, and
creates its own DOCKER
and DOCKER-USER
chains. Any
Docker container that's using a network driver other than host
will use these chains, and totally ignore the others and blast through any
firewalling/VPN you might have set up. As I said, I'm not a networking
expert so this
blog post is a much more in-depth gripe.
Constant Churn
Easily my biggest meta-gripe with Docker is how often the tool reinvents itself for no discerning reason. I usually have a pretty high tolerance for changing technologies (I love to learn new things anyway), and it kind of rubs me the wrong way when people dismiss new tech as little more than "fads" or "trends" Maybe it's just because I do webdev, and it seems like anytime there's an advance in web technology (useful or otherwise) people shit their pants and start crying about JavaScript frameworks. . There's a threshold to constant revolutionizing though, and Docker really toes the line.
Probably the most recent example is the new docker compose
API. If you hadn't seen, instead of the old-fashioned and
antiquated docker-compose
, the hot new style is the
docker compose
command. What's the difference? Nothing! All the
commands and functionality are the same, except your terminal will now yell
at you about deprecation when you use docker-compose
. And good
luck trying to get any Google results for docker compose
as
opposed to docker-compose
. That SEO has been locked in for
years now!
There's even more (previously fundamental) features that have been totally deprecated just in the past few years. Container links are just the first that come to mind, which has since been replaced by the pretty complex networking features.
This isn't to say that there haven't been welcome additions. Multistage Dockerfiles are fantastic, and I use them all the time. However, for every useful update there's a ton that are just annoying or user-hostile.
Conclusion
Like I mentioned in the introduction, I still think Docker is a really neat piece of technology. Obviously containerization existed before via Linux namespaces/cgroups and BSD jails, but Docker finally made containers actually accessible and easy for non-sysadmins to work on.
However, the more I learn about Docker and
container technology, the more I wonder what Docker actually is.
At first I thought Docker was basically the entire container stack, but
turns out that's wrong. Is Docker the container runtime, that launches and
sandboxes the underlying Linux process? Nope, turns out that's
runc
. We call them "Docker images" in common parlance, so is
Docker the image format? Nope, turns out it just uses the standard Open
Container Initiative image
format.
As far as I can tell, The Tool Formerly Known as Docker is just the
Dockerfile
to OCI image build system, the Docker Container
Engine that relays API and CLI calls to the underlying container runtime,
and the glue that holds it all
. I think this problem of fuzzy
nomenclature is really something people need to do better at communicating.
Apparently even professional sysadmins/devops engineers have a hard time
grokking this, as evident by the general
panic caused by the Kubernetes team when they announced they were no longer
supporting "Docker" - k8s still supports Docker/OCI images, it just no longer
uses the Docker container engine.
This is a little amusing, since
those two things are the exact features that I dislike the most about
Docker - as stated previously, the build system is a major footgun and the
container engine requires giving up the keys to the kingdom.
The solutions that I can foresee are twofold. As alluded to above, I'm a big
fan of Nix and its philosophies around build systems. It has a few
features for building OCI-compliant images, and I'd suggest to anyone
keen on reproducible and stable builds to give that a try. I've also been
seeing more and more about Podman, and it
looks pretty interesting - it has all of Docker's features and commands (the
tutorial even recommends just aliasing docker
to
podman
), and does not requires a daemon running with root
access. I have yet to use it seriously, but I foresee myself joining that
cargo cult in the future.