Using Podman and Docker Health Checks to Prevent CI Pipeline Failures

I recently experimented with Continuous Integration (CI) pipelines in GitLab, and I needed a way to detect the availability of an Oracle AI Database 26ai Free instance running in Podman. Unless the database instance is available, any connection attempt will fail, causing the CI pipeline to abort unnecessarily.

This article is based on Gerald Venzl’s container images. Click on the link for instructions on how to use the images in your projects.

The general idea is to repeatedly query podman (or docker) inspect until the status of the image reports healthy. There’s a caveat: you have to provide a healthcheck script, or else the container engine has no way of knowing whether the database is up. The usage of said healthcheck script is documented for Gerald Venzl’s image. Note that this isn’t the same as the one used in container-registry.oracle.com!

Prerequisites

The following stack was used for this article:

  • Podman (I used version 5.6.0 as shipped by Oracle Linux 10)
  • Optionally Docker
  • Gerald Venzl’s Oracle Database Free image
  • jq (optional but highly recommended)

Let’s look at. the details.

Starting the container

Apart from the usual {podman,docker} run you need to supply a few additional options. There are lots of potential ones: just run podman run --help | grep health for the full list),

The following are the ones that worked for me. I use Oracle Linux 10, which shipped podman 5.6.0 at the time of writing. Here’s my podman run command:

podman run --detach \
--name "oraclefree" \
--publish 1521:1521 \
--env ORACLE_PASSWORD="$ORACLE_PASSWORD" \
--env APP_USER="$APP_USER" \
--env APP_USER_PASSWORD="$APP_USER_PASSWORD" \
--healthcheck-command="CMD-SHELL /opt/oracle/healthcheck.sh" \
--healthcheck-interval="10s" \
--health-retries=5 \
ghcr.io/gvenzl/oracle-free:23.26.2

The same command should work unchanged for docker, well, almost: replace podman with docker and you are off to the races.

Querying the database’s health

So far so good, now it’s time to check the database’s health. A container reaching the running state only means the container’s entrypoint process has started. For databases such as Oracle, additional initialization work may still be in progress before client connections can be accepted.

Additional processes in your CI pipeline require a database connection, and while the database doesn’t accept connections (mainly during the bootstrap phase) the app may fail and not restart, and the entire deployment is going sideways. Here’s a snippet I added to the pipeline making sure the database is up before proceeding.

# determine whether a healthcheck has been configured
has_healthcheck="$(podman inspect oraclefree | jq -r '.[0].Config.Healthcheck != null')"
# wait for 10 times 15 seconds for the database to start
for attempt in $(seq 1 10); do
status="$(podman inspect oraclefree | jq -r '.[0].State.Health.Status // .[0].State.Status')"
echo "attempt ${attempt}: ${status}"
if [ "$status" = "healthy" ]; then
# health is reported by the healthcheck command and the database is now up
break
fi
if [ "$status" = "exited" ] || [ "$status" = "dead" ]; then
# if everything failed, print the logs
podman logs oraclefree
exit 1
fi
# if a healthcheck exists, healthy is required before the timeout expires
# if no healthcheck command was provided all you can do is hope the database
# is up after the 10th attempt
if [ "$has_healthcheck" = "true" ] && [ "$attempt" -eq 10 ]; then
echo "Container failed to become healthy within the timeout period"
podman logs oraclefree
exit 2
fi
sleep 15
done

This works like a dream:

attempt 1: starting
attempt 2: starting
attempt 3: starting
attempt 4: unhealthy
attempt 5: unhealthy
attempt 6: healthy

Depending on startup timing, the health check may temporarily report unhealthy while the database is still initializing. As long as the status eventually transitions to healthy, this is expected.

It also works in those cases when no healthcheck command has been provided, except that it waits the full 10 x 10 seconds. In the absence of a healthcheck command, the JSON output doesn’t contain the Health Object and the jq expression falls back to the generic Status. Unfortunately this status doesn’t bear any resemblance to what is going on inside the container.

If you don’t have jq available, the following command is your second best choice. Replace line 3 in the above snippet with this:

status="$(podman inspect --format '{{if .State.Health}}{{.State.Health.Status}}{{else}}{{.State.Status}}{{end}}' oraclefree)

Another option is to use podman‘s native waiting functionality (aka podman wait --condition=healthy) where available, but querying inspect works consistently in both Podman and Docker and is easy to integrate into shell scripts. Timeouts aren’t yet part of the wait command, either, and must be implemented.

Summary

Sometimes your CI pipeline must wait for a database to be up and running. Using a healthcheck command and its cousins it’s possible to add additional metadata to podman inspect output that can be queried.

By the way, the same thing is possible in compose: have a look at my GitHub for a working example where Oracle REST Data Services (ORDS) waits for the database to be up and running.