« Back to Blog

Behind the Scenes at Jungle Disk - Docker Multi-Stage Builds

By Ryan Veazey
May 1, 2017

Background

Docker is a containerization platform that allows you to wrap up all the parts needed to run a piece of software on a system. These containers are described in a simple, declarative way that can also serve as requirement documentation.

One of the advantages of container images over something like a virtual machine is how much more efficient they are with respect to the allocation of resources; one server (or VM) can run many containers. However, some platforms require a complicated or heavy build tool-chain that is required to build the software, but not to run it.

The Problem

For example, the base container images for Ruby or Go can be several hundred MB or more. There are often alternative tags of these images with smaller sizes, usually inheriting from Alpine Linux, a minimal distribution, but for compiled languages or when the result is some sort of build artifact, the build tools aren’t needed at all during runtime.

One solution that has been common is to have one container for building artifacts, which are exposed on the host machine and later picked up in the build of a second, different container. While this works, it requires yet another step and another dependency on the machine running docker. It also makes it more difficult to use systems which automatically build docker images from source repositories like GitHub.

The current edge release of Docker includes support for a solution called multi-stage builds. In essence, it’s not much different from the idea of copying files from one container into another, but it’s now built in to the system.

Jungle Disk Site

If you’ve been reading technical posts on the Jungle Disk Blog, you may have seen it mentioned that we use Jekyll to build our retail website and blog. Jekyll is a tool that can build complicated static websites from templates and content that can be written in html, markdown and other formats. It’s a great way to speed up a website, since no processing needs to be done when visiting the site, just building it. We then host the content with Nginx running in Docker containers (since we do need some routing and redirecting logic) which is behind a content delivery network.

As the site has grown, we’ve had to add multiple tools to the build process. We need an environment with Ruby, of course, but we also need a JavaScript runtime, such as node, to minify and compress some of our resources. Some of the Ruby gems may also need to build some extensions in C, which requires those build tools. Then, of course, we need Nginx. These can add up to a pretty bloated image.

We started out just putting all the requirements in one container to minimize the difficulty of everyone being able to contribute to the site.

Now, with multi-stage builds, we start from the official Ruby image to build the site, but copy it into a fresh Nginx image.

FROM ruby:2.4.1 as build-env
RUN apt-get update && apt-get install -y build-essential ruby-dev nodejs
RUN gem install bundler
COPY Gemfile Gemfile.lock ./
RUN bundle install
COPY . ./
RUN JEKYLL_ENV=production bundle exec jekyll build --verbose

FROM nginx:1.13.0-alpine
COPY --from=build-env _site _site
COPY nginx.conf /etc/nginx/conf.d/default.conf

After we name the container here with as build-env, we continue normally. Then, when we want to create our actual web server image, we simply run another FROM command to start from a new Nginx image. Then, during the copy command, we use --from=build-env to specify that we’re copying from the file system of the build-env container rather than the build system.

Now, we’ve still got a single Dockerfile, with no new dependancies, but we’ve shrunk the size from around 670MB to a little over 100 MB.

Getting Multi-Stage Build Support

If you’re interested in using multi-stage builds, you’ll need to make sure you have Docker version 17.05.0 or greater, which as of this writing, has not yet been released as stable, but is available as an edge release. The resulting images shouldn’t have this dependency, so you only need to upgrade your local/CI boxes.