This lesson is being piloted (Beta version)

Writing Dockerfiles and Building Images

Overview

Teaching: 20 min
Exercises: 10 min
Questions
  • How are Dockerfiles written?

  • How are Docker images built?

Objectives
  • Write simple Dockerfiles

  • Build a Docker image from a Dockerfile

Docker images are built through the Docker engine by reading the instructions from a Dockerfile. These text based documents provide the instructions though an API similar to the Linux operating system commands to execute commands during the build. The Dockerfile for the example image being used is an example of some simple extensions of the official Python 3.9 Docker image based on the Debian Bullseye OS (python:3.9-bullseye).

As a very simple of extending the example image into a new image create a Dockerfile on your local machine

touch Dockerfile

and then write in it the Docker engine instructions to add cowsay and scikit-learn to the environment

# Dockerfile
FROM matthewfeickert/intro-to-docker:latest

USER root

RUN apt-get -qq -y update && \
    apt-get -qq -y upgrade && \
    apt-get -qq -y install cowsay && \
    apt-get -y autoclean && \
    apt-get -y autoremove && \
    rm -rf /var/lib/apt/lists/* && \
    ln -s /usr/games/cowsay /usr/bin/cowsay

RUN python -m pip install --no-cache-dir --quiet scikit-learn

USER docker

Dockerfile layers

Each RUN command in a Dockerfile creates a new layer to the Docker image. In general, each layer should try to do one job and the fewer layers in an image the easier it is compress. When trying to upload and download images on demand the smaller the size the better.

Don’t run as root

By default Docker containers will run as root. This is a bad idea and a security concern. Instead, setup a default user (like docker in the example) and if needed give the user greater privileges.

Then build an image from the Dockerfile and tag it with a human readable name

docker build . -f Dockerfile -t extend-example:latest

You can now run the image as a container and verify for yourself that your additions exist

docker run --rm -ti extend-example:latest /bin/bash
which cowsay
cowsay "Hello from Docker"
pip list | grep scikit
python -c 'import sklearn as sk; print(sk)'
/usr/bin/cowsay
 ___________________
< Hello from Docker >
 -------------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

scikit-learn        1.0
<module 'sklearn' from '/usr/local/lib/python3.9/site-packages/sklearn/__init__.py'>

Build context matters!

In the docker build command the first argument we passed, ., was the “build context”. The build context is the set of files that Docker build is aware of when it starts the build. The . meant that the build context was set to the contents of the current working directory. Importantly, the entire build context is sent to the Docker daemon at build which means that any and all files in the build context will be copied and sent. Given this, you want to make sure that your build context doesn’t contain any unnecessary large files, or use a .dockerignore file to hide them.

Beyond the basics

ARGs and ENVs

Even though the Dockerfile is a set of specific build instructions to the Docker engine it can still be scripted to give greater flexibility by using the Dockerfile ARG instruction and the build-arg flag to define build time variables to be evaluated. For example, consider an example Dockerfile with a configurable FROM image

# Dockerfile.arg-py3
# Make the base image configurable
ARG BASE_IMAGE=python:3.9-bullseye
FROM ${BASE_IMAGE}

USER root

RUN apt-get -qq -y update && \
    apt-get -qq -y upgrade && \
    apt-get -y autoclean && \
    apt-get -y autoremove && \
    rm -rf /var/lib/apt/lists/*

RUN python -m pip --no-cache-dir install --upgrade pip setuptools wheel && \
    python -m pip --no-cache-dir install --quiet scikit-learn

# Create user "docker"
RUN useradd -m docker && \
    cp /root/.bashrc /home/docker/ && \
    mkdir /home/docker/data && \
    chown -R --from=root docker /home/docker

WORKDIR /home/docker/data
USER docker

and then build it using the python:3.8 image as the FROM image

docker build -f Dockerfile.arg-py3 --build-arg BASE_IMAGE=python:3.8 -t arg-example:py-38 .

which you can check has Python 3.8 and not Python 3.9

docker run --rm -ti arg-example:py-38 /bin/bash
which python
python --version
/usr/local/bin/python3

Python 3.8.11

Default ARG values

Setting the value of the ARG inside of the Dockerfile allows for default values to be used if no --build-arg is passed to docker build.

ENV variables are similar to ARG variables, except that they persist past the build stage and still accessible in the container runtime. Think of using ENV in a similar manner to how you would use export in Bash.

# Dockerfile.arg-py3
# Make the base image configurable
ARG BASE_IMAGE=python:3.9-bullseye
FROM ${BASE_IMAGE}

USER root

RUN apt-get -qq -y update && \
    apt-get -qq -y upgrade && \
    apt-get -y autoclean && \
    apt-get -y autoremove && \
    rm -rf /var/lib/apt/lists/*

RUN python -m pip --no-cache-dir install --upgrade pip setuptools wheel && \
    python -m pip --no-cache-dir install --quiet scikit-learn

# Create user "docker"
RUN useradd -m docker && \
    cp /root/.bashrc /home/docker/ && \
    mkdir /home/docker/data && \
    chown -R --from=root docker /home/docker

# Use C.UTF-8 locale to avoid issues with ASCII encoding
ENV LC_ALL=C.UTF-8
ENV LANG=C.UTF-8

ENV HOME /home/docker
WORKDIR ${BASE_IMAGE}/data
USER docker
docker build -f Dockerfile.arg-py3 --build-arg BASE_IMAGE=python:3.8 -t arg-example:latest .
docker run --rm -ti arg-example:latest /bin/bash
echo $HOME
echo $LC_ALL
/home/docker

C.UTF-8

whereas for

docker run --rm -ti arg-example:py-38 /bin/bash
echo $HOME
echo $LC_ALL
/home/docker


Tags

In the examples so far the built image has been tagged with a single tag (e.g. latest). However, tags are simply arbitrary labels meant to help identify images and images can have multiple tags. New tags can be specified in the docker build command by giving the -t flag multiple times or they can be specified after an image is built by using docker tag.

docker tag <SOURCE_IMAGE[:TAG]> <TARGET_IMAGE[:TAG]>

Add your own tag

Using docker tag add a new tag to the image you built.

Solution

docker images extend-example
docker tag extend-example:latest extend-example:my-tag
docker images extend-example
REPOSITORY          TAG                 IMAGE ID            CREATED            SIZE
extend-example      latest              b571a34f63b9        t seconds ago      1.59GB

REPOSITORY          TAG                 IMAGE ID            CREATED            SIZE
extend-example      latest              b571a34f63b9        t seconds ago      1.59GB
extend-example      my-tag              b571a34f63b9        t seconds ago      1.59GB

Tags are labels

Note how the image ID didn’t change for the two tags: they are the same object. Tags are simply convenient human readable labels.

COPY

Docker also gives you the ability to copy external files into a Docker image during the build with the COPY Dockerfile command. Which allows copying a target file on the from a host file system into the Docker image file system

COPY <path on host> <path in Docker image>

For example, if there is a file called requirements.txt in the same directory as the build is executed from

touch requirements.txt

with contents

cat requirements.txt
scikit-learn==1.0

then this could be copied into the Docker image of the previous example during the build and then used (and then removed as it is no longer needed) with the following

# Dockerfile.arg-py3
# Make the base image configurable
ARG BASE_IMAGE=python:3.9-bullseye
FROM ${BASE_IMAGE}

USER root

RUN apt-get -qq -y update && \
    apt-get -qq -y upgrade && \
    apt-get -y autoclean && \
    apt-get -y autoremove && \
    rm -rf /var/lib/apt/lists/*

COPY requirements.txt requirements.txt

RUN python -m pip --no-cache-dir install --upgrade pip setuptools wheel && \
    python -m pip --no-cache-dir install --requirement requirements.txt && \
    rm requirements.txt

# Create user "docker"
RUN useradd -m docker && \
    cp /root/.bashrc /home/docker/ && \
    mkdir /home/docker/data && \
    chown -R --from=root docker /home/docker

# Use C.UTF-8 locale to avoid issues with ASCII encoding
ENV LC_ALL=C.UTF-8
ENV LANG=C.UTF-8

ENV HOME /home/docker
WORKDIR ${BASE_IMAGE}/data
USER docker
docker build -f Dockerfile.arg-py3 --build-arg BASE_IMAGE=python:3.8 -t arg-example:latest .

For very complex scripts or files that are on some remote, COPY offers a straightforward way to bring them into the Docker build.

Using COPY

Write a Bash script that installs the tree utility and then use COPY to add it to your Dockerfile and then run it during the build

Solution

touch install_tree.sh
#install_tree.sh
#!/usr/bin/env bash

set -e

apt-get update -y
apt-get install -y tree
# Dockerfile.arg-py3
# Make the base image configurable
ARG BASE_IMAGE=python:3.9-bullseye
FROM ${BASE_IMAGE}

USER root

RUN apt-get -qq -y update && \
    apt-get -qq -y upgrade && \
    apt-get -y autoclean && \
    apt-get -y autoremove && \
    rm -rf /var/lib/apt/lists/*

COPY requirements.txt requirements.txt

RUN python -m pip --no-cache-dir install --upgrade pip setuptools wheel && \
    python -m pip --no-cache-dir install --requirement requirements.txt && \
    rm requirements.txt

COPY install_tree.sh install_tree.sh
RUN bash install_tree.sh && \
    rm install_tree.sh

# Create user "docker"
RUN useradd -m docker && \
    cp /root/.bashrc /home/docker/ && \
    mkdir /home/docker/data && \
    chown -R --from=root docker /home/docker

# Use C.UTF-8 locale to avoid issues with ASCII encoding
ENV LC_ALL=C.UTF-8
ENV LANG=C.UTF-8

ENV HOME /home/docker
WORKDIR ${BASE_IMAGE}/data
USER docker
docker build -f Dockerfile.arg-py3 --build-arg BASE_IMAGE=python:3.8 -t arg-example:latest .
docker run --rm -ti arg-example:latest /bin/bash
cd
which tree
tree
/usr/bin/tree

.
└── data

1 directory, 0 files

Key Points

  • Dockerfiles are written as text file commands to the Docker engine

  • Docker images are built with docker build

  • Built time variables can be defined with ARG and set with --build-arg

  • ENV arguments persist into the container runtime

  • Docker images can have multiple tags associated to them

  • Docker images can use COPY to copy files into them during build