Determining the Docker registries, namespaces and images you most depend on

How news of the Docker Free Tier being sunset in March 2023 led to organisations wanting to understand their dependence on different namespaces or images on the public Docker Hub.

(Note: adapted from the blog post Working out which Docker namespaces and images you most depend on)

Related read: Case study: Determining how the Docker Free Tier sunset affects you.

Context

Similar to the situation noted in Case study: Determining how the Docker Free Tier sunset affects you, every company I've worked at has at one point wondered "who's using Docker images that aren't internally hosted?"

It can be useful to understand where you've got uses of Docker images from external sources - for instance Amazon's public Elastic Container Registry (ECR) or produced by GitHub or GitLab repositories and stored on their respective container registries - or through various internal container registries.

Understanding whether there are any namespaces you heavily use - for instance if internal.registry/java is now deprecated, and you want to move folks to internal.registry/jvm - can be convenient to know.

It can also be of note to see if there are Docker images that are heavily depended on, especially if they're not internally managed and could be a good opportunity to build an internally built alternative.

Additionally, as part of these checks, you can discover if there are uses of non-approved images, for instance if there is an (unenforced) requirement for production services to use internally-hosted-or-proxied images.

Problem

Data

Let's say that we have the following data in the renovate table:

platform organisation repo package_name version current_version package_manager package_file_path datasource dep_types
gitlab technottingham Hack24-API mongo 3.4.3 3.4.3 docker-compose docker-compose.yml docker []
gitlab jamietanna annadodson monachus/hugo gitlabci .gitlab-ci.yml docker ["image"]
github co-cddo api-catalogue ruby 3.3.0-alpine 3.3.0-alpine dockerfile Dockerfile docker ["final"]
github elastic beats busybox docker-compose .ci/jobs/docker-compose.yml docker []
github incident-io catalog-importer alpine 20230329 20230329 dockerfile Dockerfile docker ["final"]
github thechangelog changelog.com ghcr.io/thechangelog/changelog-runtime elixir-v1.14.5-erlang-v26.2-nodejs-v20.10.0 docker-compose .devcontainer/docker-compose.yml docker []
github cloud-custodian cloud-custodian cloudcustodian/c7n latest helm-values tools/ops/azure/container-host/chart/values.yaml docker []
github hashicorp consul docker.mirror.hashicorp.services/alpine 3.18 3.18 dockerfile Dockerfile docker ["stage"]
gitlab jamietanna content-negotiation openjdk 11 11 gitlabci .gitlab-ci.yml docker ["image"]
gitlab jamietanna content-negotiation-go golang 1.18 1.18 gitlabci .gitlab-ci.yml docker ["image"]
gitlab jamietanna cucumber-reporting-plugin openjdk 11 11 gitlabci .gitlab-ci.yml docker ["image"]
github wiremock wiremock-graphql-extension maven 3.6.3-jdk-11-slim 3.6.3-jdk-11-slim dockerfile wiremock-graphql-extension/Dockerfile docker ["stage"]

(Note: this is a subset of the available data)

Query

The dmd CLI has an inbuilt query that produced the following output:

$ dmd report mostPopularDockerImages --db dmd.db
Renovate
+----------------------------------+-----+
| REGISTRY                         |   # |
+----------------------------------+-----+
| docker.io                        | 651 |
| ghcr.io                          |  24 |
| docker.mirror.hashicorp.services |  20 |
| gcr.io                           |  19 |
| docker.elastic.co                |  17 |
| public.ecr.aws                   |  12 |
| registry1.dsop.io                |  11 |
| mcr.microsoft.com                |   8 |
| registry.gitlab.com              |   8 |
| registry.access.redhat.com       |   3 |
| quay.io                          |   3 |
| registry1.dso.mil                |   2 |
+----------------------------------+-----+
+----------------------------------+-----+
| NAMESPACE                        |   # |
+----------------------------------+-----+
| library                          | 499 |
| ghcr.io/gravitational            |  18 |
| dockersamples                    |  12 |
| gcr.io/distroless                |  11 |
| public.ecr.aws/gravitational     |  10 |
| registry1.dsop.io/redhat/ubi     |  10 |
| docker.mirror.hashicorp.services |  10 |
| wiremock                         |   8 |
| docker                           |   8 |
| docker.elastic.co/elasticsearch  |   7 |
| cimg                             |   6 |
| hashicorpdev                     |   6 |
+----------------------------------+-----+
+---------+----+
| IMAGE   |  # |
+---------+----+
| alpine  | 57 |
| golang  | 53 |
| node    | 40 |
| docker  | 38 |
| python  | 25 |
| nginx   | 24 |
| ruby    | 23 |
| debian  | 22 |
| ubuntu  | 22 |
| redis   | 20 |
| openjdk | 18 |
| busybox | 14 |
+---------+----+

Note that this isn't straightforward to do with an SQL statement on its own.