Grafana with podman kube

9 minutes
January 19, 2023

Background

For a long time I have been using a containerized version of Munin to monitor some of my devices. But there are many things I can’t do with it and for many new tasks I would have to write code first, while there are often more than enough solutions for Prometheus and Grafana.

So far I have only used it for demos of openSUSE Kubic and openSUSE MicroOS at conferences and trade shows (like SUSECon19).

So time to modernize my monitoring in my home lab and use a containerized Grafana solution on Linux :)

The Linux OS: openSUSE MicroOS

All my servers run [openSUSE MicroOS] (https://microos.opensuse.org/), a fast, small environment designed for hosting container workloads with automated management and patching. As a rolling release distribution, the software is always up to date. For automatic updates, transactional-update is used, which depends on btrfs subvolumes. This makes it possible to have a read-only root filesystem that is updated in the background so that the running process does not notice it. If something goes wrong, the new snapshot is deleted and it looks like nothing happened. If the updates could be installed without errors, the next time the system boots with the new snapshot. If the reboot fails, a rollback to the last working snapshot is performed automatically. So there is no need to spend hours repairing the system after a faulty update 😃, which has already saved me several hours of work. And a special feature of transactional-update: all changes to /etc are also undone! On other systems with atomic updates this is usually ignored, because this directory is not part of the read-only OS image.

Another advantage of btrfs is that only the actual update changes are stored on disk, which is very space efficient. So you can save many old snapshots for a rollback or to create differences between snapshots to see what has really changed. So there is no massive waste of space like with an A/B partitioning scheme, no hard limit on old snapshots because you only have 2 or 3 partitions for them. The limiting factor is only the size of the disk and the size of the updates. I usually have about 20 snapshots on my system. And when they get too big, the system automatically cleans up and removes enough old snapshots.

Podman kube

For various features I need Podman as container runtime. Podman comes with a very nice feature: podman pod and podman kube, which uses kubernetes yaml files, at least if they don’t use too advanced features.

There are many docker-compose.yaml files out there, which start Prometheus and Grafana, but I don’t want python on my OS (in my opinion python is good for applications to run inside the container, but python is a really bad choice for system tasks on the OS itself), so I couldn’t use them. On the other side, the documentation about how to create yaml files for podman kube play are not really existing and partly very complicated. But in the end I managed to create a working yaml file:

# Save the output of this file and use kubectl create -f
# to import it into Kubernetes.
apiVersion: v1
kind: Pod
metadata:
  labels:
    app: monitoring
  name: monitoring
spec:
  containers:
  - name: prometheus
    image: docker.io/prom/prometheus:latest
    ports:
    - containerPort: 9090
      hostIP: 127.0.0.1
      hostPort: 9090
    resources: {}
    securityContext:
      capabilities:
        drop:
        - CAP_MKNOD
        - CAP_NET_RAW
        - CAP_AUDIT_WRITE
    volumeMounts:
    - mountPath: /etc/prometheus
      name: srv-prometheus-etc-host-0
    - mountPath: /prometheus
      name: srv-prometheus-data-host-0
  - name: grafana
    image: docker.io/grafana/grafana:latest
    ports:
    - containerPort: 3000
      hostIP: <your external host IP>
      hostPort: 3000
    resources: {}
    securityContext:
      capabilities:
        drop:
        - CAP_MKNOD
        - CAP_NET_RAW
        - CAP_AUDIT_WRITE
        - CAP_AUDIT_WRITE
      privileged: false
    volumeMounts:
    - mountPath: /var/lib/grafana
      name: srv-grafana-data-host-0
  restartPolicy: unless-stopped
  volumes:
  - hostPath:
      path: /srv/prometheus/etc
      type: Directory
    name: srv-prometheus-etc-host-0
  - hostPath:
      path: /srv/prometheus/data
      type: Directory
    name: srv-prometheus-data-host-0
  - hostPath:
      path: /srv/grafana/data
      type: Directory
    name: srv-grafana-data-host-0
status: {}

The <your external host IP> needs to be replaced with the host IP on which the grafana dashboard should be later accessible. Or localhost, if it should not be reacheable via the network.

This yaml file starts three containers:

  • *-infra - this is a podman helper container
  • monitoring-prometheus - this is the prometheus container
  • monitoring-grafana - this is the grafana container

Important to know: localhost is inside the POD identical for all containers. So grafana in the one container can connect via http://localhost:9090 with the prometheus container, and every process on the host can access e.g. prometheus via http://localhost:9090, too. But this is not valid for other PODs or in general for all containers outside the PODs: as they have their own localhost, you can only access them via the external network interface of the host, or if you do a more complicated network setup. The later one is still on my TODO list, but since podman announced to discontinue the currently used CNI network stack and wants to switch to their own, I’m waiting for that.

Directory layout, Configuration files and Permissions

But before we can start the containers, we need to create the necessary directories, to store the configuration files and persistent data. The directory layout looks like:

/srv/
  ├── prometheus/etc/prometheus.yml -> the prometheus configuration file
  ├── prometheus/data/ -> for the persistent database
  └── grafana/data/ -> for the persistent grafana data

Beside the existence of the diretories, the owner ships is as important. Prometheus writes as user nobody:nobody into /srv/prometheus/data, so this user needs write permissions. This is already a very big security design flaw, the user nobody should have never write rights anywhere. So make sure, that really nobody is allowed to look into the /srv/prometheus directory struct. Best is to do

chmod 700 /srv/prometheus

Grafana is here better, it runs by default with the user ID 472. So make sure that nothing is using this ID on your system and create a grafana system user for the data ownership:

useradd -u 472 -r grafana -d /srv/grafana/data

This command will create a system account grafana with the user and group ID 472. Afterwards we made sure that the grafana data directory is owned by this user, so that the grafana process can write into it:

chown grafana:grafana /srv/grafana/data

At least we need a configuration file for prometheus. For the start, we monitor prometheus itself:

global:

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
    - targets: ['localhost:9090']

This config (stored as /srv/prometheus/etc/prometheus.yml) is very simple: we don’t change any global defaults and tell prometheus, to scrap localhost:9090 (so itself) every 60 seconds (which is the default) and store it as prometheus.

Done!

Run Containers

Now we just need to start the containers:

podman kube play monitoring.yaml

The command podman pod ps should show you one pod:

POD ID        NAME        STATUS      CREATED         INFRA ID      # OF CONTAINERS
63b242775da6  monitoring  Running     39 minutes ago  0e6192eb180c  3

The command podman ps will show you the three containers:

CONTAINER ID  IMAGE                                    COMMAND               CREATED         STATUS             PORTS                                             NAMES
0e6192eb180c  localhost/podman-pause:4.3.1-1669075200                        39 minutes ago  Up 39 minutes ago  127.0.0.1:9090->9090/tcp, 0.0.0.0:3000->3000/tcp  63b242775da6-infra
d9bbdaf48eb8  docker.io/prom/prometheus:latest         --config.file=/et...  39 minutes ago  Up 33 minutes ago  127.0.0.1:9090->9090/tcp, 0.0.0.0:3000->3000/tcp  monitoring-prometheus
96a3e9486239  docker.io/grafana/grafana:latest                               39 minutes ago  Up 35 minutes ago  127.0.0.1:9090->9090/tcp, 0.0.0.0:3000->3000/tcp  monitoring-grafana

Start container with every boot

While the containers are now running, we need to make sure, that they will be started with the next reboot, too. For this, podman comes with a very nice and handy systemd service: podman-kube@.service. This service will not only start the pod, but also makes sure, that the containers are current and update them if necessary.

The configuration file with complete path is passed as argument. The path needs to be escaped, but for this there is a systemd-escape.

So the final command to enable the systemd service would be:

systemctl enable "podman-kube@$(systemd-escape /<path>/monitoring.yaml).service"

Setting up Grafana

Now we can connect to the grafana dashboard using the url http://<hostname>:3000:

The grafana container comes with a default login:

  • User: admin
  • Password: admin

Grafana enforces a password change at the first login for this reason.

Before we can see the first data, there are two important steps to do:

  1. Configure prometheus as Datasource
  2. Create a dashboard for prometheus

Adding prometheus as datasource is simple:

  • Select DATA SOURCES and afterwards Prometheus.
  • Enter http://localhost:9090 in the HTTP URL field. Remember, this are the hostIP and hostPort fields for the prometheus container in the yaml file.
  • Go to the end of the page and select Save & test.

This should give you a “Data source is working” message.

Select the grafana logo in the upper left corner to come back to the main screen, now we need to add a dashboard. Since we want to use an existing one and don’t create a new one, we do not select DASHBOARDS, but the icon with the four squares on the left side. In the menu we go down and select Import. We Add the ID 3662 in the Import via grafana.com box and click on Load. This will import the Prometheus 2.0 Overview dashboard. But before we can finally import it, we need to select the datasource. On the next page, in the prometheus dropdown box, select Prometheus (default) and click on Import.

Now you should see your first dashboard:

Deploying Node Exporter

Monitoring Prometheus itself is now not the most exciting tasks, much more important is to monitor your servers. The most common tool used here is Node exporter, which is a prometheus sub-project and exists also as container.

Podeman kube yaml file

At first, we need a new yaml file. We could add the container to the existing monitoring.yaml file, but since we want to install it on several servers, it makes sense to create a standalone version of it:

# Save the output of this file and use kubectl create -f
# to import it into Kubernetes.
apiVersion: v1
kind: Pod
metadata:
  labels:
    app: node-exporter
  name: node-exporter
spec:
  containers:
  - args:
    - --path.rootfs=/host
    image: docker.io/prom/node-exporter:latest
    name: node_exporter
    securityContext:
      capabilities:
        drop:
        - CAP_MKNOD
        - CAP_NET_RAW
        - CAP_AUDIT_WRITE
    volumeMounts:
    - mountPath: /host
      name: root-host-0
      readOnly: true
  enableServiceLinks: false
  hostNetwork: true
  volumes:
  - hostPath:
      path: /
      type: Directory
    name: root-host-0

Start Node Exporter

This time we don’t need to create any diretories or configuration files, so the command to enable and start the node-exporter is just:

systemctl enable --now "podman-kube@$(systemd-escape /<path>/node_exporter.yaml).service"

Add Node Exporter to Prometheus

To teach Prometheus about Node Exporter, we need to add the following lines at the end of the /srv/prometheus/etc/prometheus.yml file, so in the scrape_configs section:

  - job_name: 'node'
    static_configs:
    - targets: ['server1.example.com:9100']
    - targets: ['server2.example.com:9100']

We will not create an own entry for every host, instead there will be one node entry, and every host to scrape the metrics from is added as target.

We only need to restart prometheus to make it aware of the config file change:

podman restart monitoring-prometheus

Add Node Exporter Dashboard to Grafana

Now we will import that grafana dashboard with the ID 13978 like the first dashboard and you should see something like the following:

Completed

That’s it for now.

In the next blogs of this series I will explain how to monitor the Fritz!Box including the Smarthome devices (which I use to monitor my balcony power plant) and various IoT devices (power metering and temperature) with Prometheus, InfluxDB and an MQTT broker.