Optimizing Docker Images for Maven Projects
I recently wrote a post on optimizing Docker images for Rust projects, which turned out to be quite popular, so I figured I'd follow it up with another for those Java developers out there who have to painfully sit through slow Maven builds inside Docker. Just like the other post, this is derived from a project at work where I quickly tired of sitting through long builds and attempted to optimize both the size and speed of the build.
Base Dockerfile
To begin with, we'll take a really basic Dockerfile that a developer might use to ship a Maven project. For my example project I'm going to use a work project (so I can't share too much about it), but the project is relatively large and has several dependencies (which all have their own wonderful tree of dependencies). For the sake of simplicity I'm going to work against JDK 8 for the time being. A very basic (yet functional) Dockerfile for this project could look something like this:
# select image
FROM maven:3.5-jdk-8
# copy your source tree
COPY ./ ./
# build for release
RUN mvn package
# set the startup command to run your binary
CMD ["java", "-jar", "./target/my-project.jar"]
It's quite straightforward and works well; your project is quickly copied across and packaged using Maven. Note that I'm not going to cover disabling things like tests in the lifecycle as it's different for everyone, but you may wish to do so (in my case using -DskipTests
). Let's look at how a build like this fares (assuming you have the base image maven:3.5-jdk-8
already downloaded):
Maven build time: 123s
Total time taken: 134s
Final image size: 794MB
That's not too bad really, a build of ~2 minutes and an image size of just under 1GB. However there's definite room for improvement here, and now we have our baseline to compare to! Let's see what improvements we can make.
Optimizing Build Times
There are several layers to why builds done the way shown above are so slow; firstly that we're not taking advantage of the Docker cache. The Docker cache is easily the most valuable tool when it comes to speeding up Docker builds, so we need to make sure we play nice with it. In our case, we're invalidating the Docker cache every time we copy across our working directory. Every time your code is copied across, if anything at all has changed, it automatically voids the Docker cache! Definitely not what we want.
Our next step is to avoid that cache invalidation. This can be done by copying your project components in stages, in such a way that the most frequently changed files are copied across last. Alone this wouldn't really help the cache, because you're not gaining anything at each of these steps. Fortunately for us, Maven allows you to pull in dependencies before actually building anything. We can use this to trick the Docker cache into holding onto our Maven dependencies:
# select image
FROM maven:3.5-jdk-8
# copy the project files
COPY ./pom.xml ./pom.xml
# build all dependencies for offline use
RUN mvn dependency:go-offline -B
# copy your other files
COPY ./src ./src
# build for release
RUN mvn package
# set the startup command to run your binary
CMD ["java", "-jar", "./target/my-project.jar"]
The Dockerfile above is optimized in such a way that most of your dependencies will be cached until the next time you change your pom.xml
- and as that changes infrequently, you'll often take advantage of this cache. Let's run a build and see if this changes anything:
Maven build time: 33.6s
Total time taken: 134s
Final image size: 794MB
Very little has changed, aside from the Maven build time (as it's now separate to the download of dependencies). The total build time is roughly the same. This is expected though, because this is the first build with our new Dockerfile, and so nothing was cached. If we run it again, we should see an improvement (but make sure to change a file to correctly simulate changing source code):
Maven build time: 31.4s
Total time taken: 36.7s
Final image size: 794MB
Ok, now we see some actual improvement. The build time is cut by approx. 70% due to hitting the cache. This is probably near the optimal build time for this project that I can gain at this point (at least, it's where I've gotten to so far), due to most of it being related to shading.
Optimizing Build Sizes
Even with the speed increase, the image is still far too large. There are several options here; an easy one being to just flip the base image to maven:3.5-jdk-8-alpine
for around a 60% reduction, but the result is still around 336MB.
A better approach is to use a multi-stage Docker build, and only actually use a JDK based image for the build stages. For the actual runtime, you only require a JRE compatible with the JDK version you built with. This is fairly easy to change, as follows:
# our base build image
FROM maven:3.5-jdk-8 as maven
# copy the project files
COPY ./pom.xml ./pom.xml
# build all dependencies
RUN mvn dependency:go-offline -B
# copy your other files
COPY ./src ./src
# build for release
RUN mvn package
# our final base image
FROM openjdk:8u171-jre-alpine
# set deployment directory
WORKDIR /my-project
# copy over the built artifact from the maven image
COPY --from=maven target/my-project-*.jar ./
# set the startup command to run your binary
CMD ["java", "-jar", "./target/my-project.jar"]
The start of the Dockerfile is basically the same, the only different is that we add another FROM
clause to define our final image, and then copy across the build my-project-*.jar
file, which we can then run in an image which contains just a JRE. We select the -alpine
suffix as it's even smaller as the base, to optimize our sizes further. This leaves us with images which are much smaller:
REPOSITORY TAG IMAGE ID CREATED SIZE
my-project dev a5f008918590 15 seconds ago 89MB
my-project base a3aecaaa8c8c 40 minutes ago 794MB
This is about as far as we can go; the base JDK image is 83MB in size, with the built artifact being 6MB in size.
Optimization Results
Our first Dockerfile may have been straightforward, but it didn't make good use of developer time, or space on a drive. Below is a final comparison of the base and final build times/sizes:
# base statistics
Maven build time: 123s
Total time taken: 134s
Final image size: 794MB
# optimized statistics
Maven build time: 31s (~25.2%)
Total time taken: 36s (~26.9%)
Final image size: 89MB (~11.2%)
This is a pretty solid improvement in both speed and size; we still have the Maven build of our actual source code taking a fair amount of time, but at least the dependencies are cached properly. The image size is much better, at around 1/10 of the original size, without losing any relevant functionality.
Just like the last post, we're not building anything new here; we're just making good use of the existing Docker tooling. This blog post took around an hour to work through, so it goes to show that it's worth it (for reference, your time would be paid back after 40 more builds of the unoptimized image).
Please reach out if you have any questions, or anything needs clarification. If you have build improvements, please also reach out - although I think this is probably quite minimal given the requirement of a JRE installation.