docker – creating a monitoring stack

docker – creating a monitoring stack

I think it is time for the first docker story on our techblog!

Since we started our enablement for docker, we where really excited to get our hands on a “real” project.
And we did not have to wait long for it.
It was a project to containerize the monitoring environment of a customer. So let’s give our “monitoring stack” a go and hear about the story and some issues we stumbled upon!

The initial setup

The customer had an existing Nagios monitoring in place. Due to some expiring license he wanted to freshen up some things.
He has multiple locations all over the world which need to be managed centrally.

The idea

The idea was to further work with the already configured nrpe clients and make use of new docker technology.

The solution

After some initial communication and coordination we managed to find a solution which fitted the customer.
The monitoring stack would use the following software technologies:

  • icinga (master with multiple slaves)
  • thruk (web front-end)
  • mysql
  • graphite
  • grafana
  • nagvis
  • some easy to use way of generating icinga configurations

As the customer did not use any container technology in production yet, he decided against the use of a (more complex) container orchestration tool (such as Kubernetes) to keep it rather simple for the start.
So we went on with a pure Docker swarm setup.

The implementation

As we started with the realization of the project, some very basic structure questions arose. The first of them was: how and where do we develop our monitoring stack?
Since the customer did not have his own development environment (yet), we started of by creating our own one:

  • Bitbucket – for the code
  • Nexus – as docker image repository
  • a local docker swarm setup for deployment

With this simple setup we could start getting into development pretty fast and make some quick progress. Which would bring us right to the next challenge: how do we manage the stack / container initialization?
Our quick and simple development environment and the fact we are not using any orchestration tool had us thinking about the right way to do the initial setup of containers.
The software used in the monitoring stack was heavily depending on the right setup order. To give you a quick breakdown:

  1. setup MySQL
  2. create databases and tables for icinga, thruk and grafana
  3. setup the icinga master node
  4. setup icinga slave nodes
  5. accept icinga slaves on icinga master
  6. load some default configs for icinga
  7. setup graphite, grafana, nagvis
  8. setup “some easy to use way of generating icinga configurations”


The ordering of things

Some of these do not have dependencies to other services during setup (graphite, grafana and nagvis for example). But especially the icinga master / node setup is a back and forth between both.
So to keep it simple (which is always a good thing), we started creating some bash scripts which would handle the initial setup and ordering for us. These needed to be executed on a docker swarm master node.
Unfortunately, as you may or may not know, docker swarm YAML files are unable to handle ordering in a good way.
There once was a option for configuring basic dependencies between services (“depends_on”), but is has been removed with config version 3 for docker stack deployments (“docker stack” is the command to use for deploying a docker stack in swarm mode).
So we used the “docker-compose” command in our bash scripts to start single services from our YAML file and initialize the data on those services. The work flow would then look like this:

  1. using docker-compose, initialize all services one after the other and run specific commands on the containers (with the persistent volumes being mounted)
  2. stop all services via docker-compose
  3. using docker stack deploy, start the docker stack on the docker swarm, using the now pre-configured persistent volumes containing the necessary data.

These steps would especially help when deploying a completely new environment (which would happen a lot during early development phase).
If you now say: well in my opinion that is not a production-ready way of implementing initialization – you are probably right and I can assure you that we changed it – more to that later on.

The creation of something new

After handling the init part, we stumbled upon which images to use for our setup.
We quickly realized that we need to create own images for the special needs of this project. We ended up creating six images of the 8 services used in the monitoring stack.
During image creation we came to a point where we somewhat worked against the idea of docker containers. We needed to run multiple processes (or rather services) on one container.
Let’s take the icinga master image for example: we needed to run the icinga process, a cronjob for realizing the “some easy way to generate icinga configurations” part, the nrpe process to be able to monitor the container itself and a postfix job for mail relaying.

Usually you configure the process you need to run in foreground of the container or define some init script which “does some things” and then defaults back to the process you need.
Then you have a clean PID 1 of either the process itself or the init script.
So we needed to use some software to manage all of our processes. We ended up configuring “supervisor daemon” on the images. Supervisor daemon will start any process with the configured options as a specified user and can, as a bonus, consolidate the log file output into STDOUT – which then again is readable via the docker engine (“docker logs”).

It is to note that these processes CAN be separated, but we decided to stick to our “keep it simple” strategy. Separating processes into multiple images and configuring them to still work together is, unfortunately, not that simple using pure docker. This was one situation where we definitely missed the pods architecture of Kubernetes (where multiple containers can easily share the same namespaces / volumes).

After creating these images we needed some place to store them. Having a running nexus server in our network, we quickly made the decision to make use of nexus docker registry plugin. It was a very easy way of implementing user authentication against a docker repository. To top it off, it was easy to enable our reverse proxy to forward access from external networks to this docker repository plugin.

The replicas problematic

Let’s take a quick step back to talk about the basic setup of icinga. You have on icinga master running and responsible for maintaining the configuration of all zones and checks. Then you have one or more icinga slaves, which connect to the icinga master and are able to receive the configuration via push from the master. The icinga slaves are now able to connect to the different clients, which are running the NRPE process. This is especially helpful if you are running nodes in different geographical locations (f.e. one in North America and one in Europe). The clients report back to the icinga slaves, which then push the consolidated reports to the icigna master.

Since we where using the docker swarm mechanics, we wanted to make use of the replicas key inside the deployment specification of a service. So we can practically just defined one service for all icinga slaves.
The problem with this setup is, that we need to somehow differentiate between the slaves and each slave should have its independent configuration on a persistent volume.
After some searching we found the GO template “{{.Task.Slot}}” to access the specific deployment number of each slave node and use it for container hostnames and named volumes.

  ## icinga2 slave
    hostname: icinga-slave-{{.Task.Slot}}
    - icinga2-slave-config:/etc/icinga2:rw
      replicas: 6

  ## volumes for icinga Slave
    name: icinga2-slave{{.Task.Slot}}-config

So we would end up with slaves “icinga-slave-1”, “icinga-slave-2”, “icinga-slave-3” and so on, each having its own persistent volume and therefore its own configuration.
To actually place the slaves on the right docker node (which are standing in the different geographical locations), we would end up using labels on the docker node and then work with the placement option for the deploy key.

The return of the ordering

Now to come back to the initialization topic. After we made progress on the stack itself and the software was running, we looked into a replacement for the global bash init script.
We discussed two different possibilities which would work with the pure docker swarm setup:

  1. creating an init docker image for the complete environment (basically just move the global init script into its own image)
  2. creating init scripts on each docker image (where we would need to init something).

Since we already created own images for the most services used, we would end up with the second option. Most of the images already had custom init scripts in-place anyways.
So we would script our way through the init process:

  1. check if the init was already done for this container (f.e. is config file X in place; is this node setup as Y; etc.)
  2. if necessary, wait for depending services to come online (f.e. port scan)
  3. do some other stuff (f.e. change default passwords, etc.)
  4. init the actual service (f.e. supervisord, etc.)

The MySQL image was a special case. We did not want to create our own image here just because of some init steps. Luckily we could make use of the “/docker-entrypoint-initdb.d” directory provided by the MySQL image maintainers ( -> “Initializing a fresh instance”). So we just mount our custom init script via docker configs into this directory and it would be automatically executed during container startup.

The summary

So what do we take away from all this? Well it was not an easy project but a pretty good one to get our hands on. There were a lot of challenges but, and that is the really nice part about docker, there seems to always be a way to make it work in the end. And I don’t mean in a compromised kind of way but so that it is production ready and good to maintain.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.