🔗Revisiting The Basics
In my earlier post, Getting Started with Docker, I covered building a basic Dockerfile using the FROM, COPY, RUN, and CMD instructions and how to use a .dockerignore file to keep unnecessary files out of your images and containers. If you haven't read that post, go check it out to learn the basics of building Docker images. In this post, I'll cover some more advanced techniques for building container images. In addition, I recently published a post exploring advanced Docker CLI usage. I recommend giving it a read, too, if you aren't already a CLI pro.
FROM with an official image for your language or framework will get you a long way, but many applications will require a system dependency that's not included in the
FROM image. For example, many applications use ImageMagick for processing image uploads, but it's not included by default in the Debian images that most language images are based on. You can use
apt-get to install missing dependencies.
FROM node:15 RUN apt-get update RUN apt-get install -y imagemagick WORKDIR /usr/src/app COPY package*.json ./ RUN npm install COPY . . EXPOSE 3000 # Use the start script defined in package.json to start the application CMD ["npm", "start"]
We started the Dockerfile just like the example from my earlier post, using the official NodeJS 15 image, but then we do 2 additional steps to install ImageMagick using apt-get. To keep the base image size low, Debian does not come pre-loaded with all of the data it needs to install packages from apt-get, so we need to run
apt-get update first so that
apt-get has that info. Then, we simply use
apt-get install -y imagemagick to install imagemagick. The
-y option is used to automatically respond with "yes" when apt-get prompts you to confirm the package installation.
🔗RUN vs CMD (vs ENTRYPOINT)
By now you've probably noticed that there are two different instructions that run commands in your containers,
CMD. While both are used to run commands, they're used in very different contexts. As we've seen in previous examples,
RUN is used exclusively in the build process to run commands to modify the image as needed.
CMD is different because it specifies the command that will be run by the container when you launch it using
docker run. You can have as many
RUN instructions as you need, but only one
CMD. If you need to run a different command at runtime, you can pass it as an argument when you launch the container with
docker run (check out my Docker CLI Deep Dive post). Additionally, Docker provides the
ENTRYPOINT instruction. This is a command that the command you provide to the
CMD instruction will be passed to as arguments. If you do not provide an
ENTRYPOINT it will default to
/bin/sh -c which will cause your
CMD command to execute in a basic unix shell environment. The default
ENTRYPOINT will satisfy most use cases. It's possible to override a container's
CMD at runtime, but it is not possible to change its
ENTRYPOINT. Docker's own ENTRYPOINT documentation goes into more detail about how it can be used.
In the example Dockerfile above, you probably noticed that the way commands are passed to
RUN looks different. Typically, when using
RUN you provide commands using shell syntax, and you provide commands to
ENTRYPOINT) using the exec syntax, but they can be used interchangably. When using shell syntax, you can resolve shell expressions within your command. You can use shell variables and operators like output pipes (
|) and redirects (
>>), as well as boolean operations (
||) to join commands. Exec syntax is much more straightforward. Each string within the bracketed array is joined with the other elements with a space in between and run exactly as provided.
🔗Layers and Caching
Each isntruction in your Dockerfile adds a new Layer to your image. For performance reasons, it's considered a best practice to limit the total number of layers that comprises your finished image. There are a number of ways to do this. The simplest is by combining lines where
COPY are used in close proximity to each other. Consider the example above where we installed ImageMagick; instead of using two separate
RUN instructions, we can combine them using the bash
FROM node:15 RUN apt-get update && apt-get install -y imagemagick
Combining copy commands is a bit easier. The COPY instruction takes any number of arguments. The first N parameters provided to COPY are interpreted as a list of files to copy, and the N+1th paramter is the location to copy those files to. You can also use
* as a wildcard character as I did in the first example when copying the package.json and package-lock.json files to the image.
Anothing thing to consider when thinking about how your image layers are composed is caching. When Docker processes your Dockerfile to build your image, it runs each of the instructions in order to create the layers of your image. Docker analyzes each instruction before it is run and checks its cache to determine whether or not there is an identical existing image layer. When analyzing RUN instructions, Docker looks for any cached image layer that was built using the exact same command and uses it instead of rebuilding the same layer. For
ADD instructions, it analyzes the files to be copied and looks for a previously built layer that has the exact same file contents. If at any point any instruction requires its layer to be rebuilt, all of the following instructions will result in a rebuild. Optimizing your Dockerfile to take advantage of the layer cache can greatly reduce the time it takes to build your image. Organize your Dockerfile so that the layers least likely to change are processed first (ex: installing dependencies) and those more likely to change (ex: copying application code) are processed later.
These techniqes will help you create more advanced container images and hopefully help you optimize them. However, I've only covered a small slice of the options available to you when building container images. If you dig deeper into the official Dockerfile reference you'll find information about all of the instructions available to you and more advanced concepts and use cases.
This Dot Labs is a modern web consultancy focused on helping companies realize their digital transformation efforts. For expert architectural guidance, training, or consulting in React, Angular, Vue, Web Components, GraphQL, Node, Bazel, or Polymer, visit thisdotlabs.com.
This Dot Media is focused on creating an inclusive and educational web for all. We keep you up to date with advancements in the modern web through events, podcasts, and free content. To learn, visit thisdot.co.