Fixing DIND Builds That Stall When Using Gitlab and Kubernetes

Under some conditions, you may find that your Docker in Docker builds will hang our stall out, especially when you combine DIND based builds and Kubernetes. The fix for this isn’t always obvious because it doesn’t exactly announce itself. After a bit of searching, I came across a post that described the issue in great detail located at https://medium.com/@liejuntao001/fix-docker-in-docker-network-issue-in-kubernetes-cc18c229d9e5.

As described, the issue is actually due to the MTU the DIND service uses when it starts. By default, it uses 1500. Unfortunately, a lot of Kubernetes overlay networks will set a smaller MTU of around 1450. Since DIND is a service running on an overlay network it needs to use an MTU equal to or smaller than the overlay network in order to work properly. If your build process happens to download a file that is larger than the Maximum Transmission Unit then it will wait indefinitely for data that will never arrive. This is because DIND, and the app using it, thinks the MTU is 1500 when it is actually 1450.

Anyway, this isn’t about what MTU is or how it works, it’s about how to configure a Gitlab based job that is using the DIND service with a smaller MTU. Thankfully it’s easy to do.

In your .gitlab-ci.yml file where you enable the dind service add a command or parameter to pass to Gitlab, like this:

Build Image:
  image: docker
  services:
    - name: docker:dind
      command: ["--mtu 1000"]
  variables:
    DOCKER_DRIVER: overlay2
    DOCKER_TLS_CERTDIR: ""
    DOCKER_HOST: tcp://localhost:2375

This example shown will work if you are using a Kubernetes based Gitlab Runner. With this added, you should find that your build stalls go away and everything works as expected.