Docker — Structuring Dockerfiles For Productivity
Recently, we had a request to add a parameter to our image build pipeline to allow a user configurable timeout on the build. We were planning to configure a 10 minute default timeout on builds, and some of the users of the pipeline had requested that they be allowed to override this timeout because some of their image builds were taking up to 60 minutes to complete.
At first, I wondered if I was becoming hard of hearing. After all, I’m not as young as I used to be. Up to 60 minutes?!?! Surely they meant 60 seconds… Alas, my hearing was fine…
To make matters worse, there was active development under way on the application, with the result being that multiple builds were being done each day.
I’ve trimmed out a bunch of the content for readability, but the Dockerfile looked something like this:
FROM ubuntu:focal-20210119RUN apt-get -y update && \
apt-get -y upgrade && \
apt-get install -y --no-install-recommends \
dos2unix \
jq \
libpython3.10 \
python3-pip \
software-properties-common \
tar \
unzip \
wget \
zip && \
echo "Cleaning up" && \
rm -rf /var/lib/apt/lists/* && \
apt-get cleanRUN pip3 install boto3 flask
RUN echo "Installing AWS CLIv2" && \
TMPDIR=$(mktemp -d) && \
wget -P $TMPDIR --no-check-certificate "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" && \
unzip $TMPDIR/awscli-exe-linux-x86_64.zip -d $TMPDIR && \
$TMPDIR/aws/install && \
rm -rf /usr/local/aws-cli/v2/dist/awscli/examples/ && \
rm -rf $TMPDIRRUN echo "Installing kubectl" && \
wget -P /usr/bin/ --no-check-certificate https://storage.googleapis.com/kubernetes-release/release/$(wget --no-check-certificate -O - https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl && \
chmod +x /usr/bin/kubectl# Install the app
COPY dummyapp.py /app/
The focus of the rebuilds were application changes, not changes to the underlying dependencies.
Because the Dockerfile has been trimmed down, it now builds in about 5 minutes. But seriously, that’s still a long time — 5 minutes… for what could (usually) be just a few seconds.
Here’s the abbreviated build output for my build:
# time docker build --no-cache --progress=plain -t test:test .
Sending build context to Docker daemon 3.072kB
Step 1/5 : FROM ubuntu:focal-20210119
---> f63181f19b2f
Step 2/5 : RUN apt-get -y update && apt-get -y upgrade && apt-get install -y --no-install-recommends dos2unix jq libpython3.10 python3-pip software-properties-common tar unzip wget zip && echo "Cleaning up" && rm -rf /var/lib/apt/lists/* && apt-get clean
---> Running in 37bff266446e
Get:1 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB]
Get:2 http://security.ubuntu.com/ubuntu focal-security/main amd64 Packages [1470 kB]<snip>45400K .......... .......... .......... .......... .......... 99% 2.31M 0s
45450K .......... .......... .......... .......... ...... 100% 8.05M=18s2022-01-28 19:37:06 (2.40 MB/s) - '/usr/bin/kubectl' saved [46587904/46587904]Removing intermediate container 86223b438cef
---> b8f9a2cc1d9a
Step 6/6 : COPY dummyapp.py /app/
---> b95d22cdca6f
Successfully built b95d22cdca6f
Successfully tagged test:testreal 5m11.679s
user 0m1.248s
sys 0m1.961s
How can we make this build go faster?
Everytime we build this Dockerfile, we repeat a lot of processing, the outcome of which is unlikely to change that frequently. We:
- update the ubuntu package list
- upgrade the ubuntu packages
- install some additional packages
- use pip3 to install some python packages
- install the AWS CLI
- install kubectl
and then finally:
- install the application
We are also starting with a fairly old build of Ubuntu, and the upgrade step therefore will take longer than necessary because more packages need to be upgraded then if we had simply built the image FROM a newer base image.
The purpose of most of the rebuilds of this image (and I suspect most of the rebuilds of your images as well) are to incorporate application changes (which may often be only the last line in the Dockerfile). As a result, the most obvious change would be to split this Dockerfile into 2 (or more) Dockerfiles, changing the FROM statement to use the latest Ubuntu base image. The first Dockerfile might look something like this:
FROM ubuntu:latestRUN apt-get -y update && \
apt-get -y upgrade && \
apt-get install -y --no-install-recommends \
dos2unix \
jq \
libpython3.10 \
python3-pip \
software-properties-common \
tar \
unzip \
wget \
zip && \
echo "Cleaning up" && \
rm -rf /var/lib/apt/lists/* && \
apt-get cleanRUN pip3 install boto3 flask
RUN echo "Installing AWS CLIv2" && \
TMPDIR=$(mktemp -d) && \
wget -P $TMPDIR --no-check-certificate "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" && \
unzip $TMPDIR/awscli-exe-linux-x86_64.zip -d $TMPDIR && \
$TMPDIR/aws/install && \
rm -rf /usr/local/aws-cli/v2/dist/awscli/examples/ && \
rm -rf $TMPDIRRUN echo "Installing kubectl" && \
wget -P /usr/bin/ --no-check-certificate https://storage.googleapis.com/kubernetes-release/release/$(wget --no-check-certificate -O - https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl && \
chmod +x /usr/bin/kubectl
The second Dockerfile might look something like this:
FROM dummyapp-dependencies:latest# Install the app
COPY dummyapp.py /app/
Building the first Dockerfile doesn’t save us any time… It still takes approximately 5 minutes… However, we only need to build that first Dockerfile relatively infrequently (Once a week? Once a month?).
Building the second Dockerfile, however, is dramatically better and only takes about 2 seconds:
# time docker build --no-cache --progress=plain -t tes
t:test -f Dockerfile.app .
Sending build context to Docker daemon 5.12kB
Step 1/2 : FROM dummyapp:1.0.0
pull access denied for dummyapp, repository does not exist or may require 'docker login': denied: requested access to the resource is deniedreal 0m2.125s
user 0m0.043s
sys 0m0.030s
Given that the app changes much more frequently than the underlying dependencies, we’ve just saved ourselves a whole bunch of time. Score!