blog-image

Monitoring – How to install Prometheus/Grafana on arm – Raspberry PI/Rock64

  • dHENRY
  • 26/12/2018
  • (Reading time : 14 mn)

(**) Translated with www.DeepL.com/Translator

Update 2020-01-16 : This additional article allows the implementation of data storage in an InfluxDB database.

This procedure can easily be adapted to X86 and Amd64 environments (metal bars)

I used Nagios a lot, then associated with Centreon, to monitor the infrastructures I had to manage. These are very heavy, resource-intensive solutions that freak you out when you need to apply the latest update…. The priority remained to have a lightweight solution. I preferred to take a step back and read a lot before choosing a solution for MytinyDC, I even thought about developing…
My research led to the website of PROMETHEUS, solution developed by the company SOUNDCLOUD and become open source.
By going to the site prometheus.io, I discovered a whole ecosystem and especially, programs compiled for ARM processors. The trail was serious.
From the first phase of tests, I quickly notice that the whole thing will be the solution: obvious simplicity of implementation, and above all a minimalist load for the small units I use.
This solution is often presented as a duo: Prometheus/Grafana . But what is Grafana???? As indicated on its Wikipedia page:

... allows the visualization and formatting of metric data...

https://fr.wikipedia.org/wiki/Grafana

Grafana is also available for ARM processors.

To summarize, it is necessary to install one or more programs exporting metrics, on the client servers (depending on the type of monitoring desired), a program on the server that collects the data, a program on the server to view the dashboards, and finally, the program on the server responsible for launching alerts. The result is very satisfactory:

Prometheus/Grafana - Monitoring for MytinyDC

Prometheus installation

I will start by installing a metric exporter on a client server. The “Prometheus” project has a module “node_exporter” allowing the reading of system metrics (CPU load, networks, etc…) compiled for armv7 and arm64. After installation and start-up, it is very easy to check that it is working properly.
I will continue with the installation of the monitoring system itself, i.e. the service responsible for collecting metrics from client servers, called “Prometheus”.
It will then be necessary to install the “Grafana” server, which will allow to visualize the collected metrics, in the form of dashboards.
It is not finished, we will see the implementation of the alert system “Alertmanager”, without it, our monitoring system is useless.
And to close this post, we will see the installation of an additional exporter: “haproxy_exporter” for example and the integration of alerts in instant messaging systems:

  • Matrix/Synapse/Element.io** (internal messaging from the French government)
  • Rocketchat a competitor.
    These two messaging systems are open-source and decentralized (your data is at home), of the same quality as Whatsapp, Hangout, Messenger. This makes them very serious competitors. Personally and after testing both, I prefer Matrix, which is very complete, robust, “ultra” fast, ergonomic, and supported by the small arm64 unit, which constitutes MytinyDC.

Topology Monitoring - MytinyDC

Downloads

Prometheus-node_exporter installation on a client

Connect root to your client server, go to the /opt directory:

cd /opt

and download the archive “node_exporter “ corresponding to your platform :

  • Rasberry PI: Operating system: Linux - Architecture: armv7
  • Rock64 : Operating system : Linux - Architecture : arm64

Use the command wget, example for the Raspberry PI-0.17 version :

wget https://github.com/prometheus/node_exporter/releases/download/v0.17.0/node_exporter-0.17.0.linux-armv7.tar.gz

Extract the data from the archive, example for the previous download:

tar xfz node_exporter-0.17.0.linux-armv7.tar.tar.gz
rm node_exporter-0.17.0.linux-armv7.tar.gz
cd node_exporter-0.17.0.linux-armv7
# execute the export in detached mode
nohup./node_exporter &

The exporter is now working and listening on the port: TCP/9100

Let’s add the INPUT rules to the firewall of this server:

iptables -A INPUT -i eth0 -p tcp -m tcp --dport 9100 -j ACCEPT
iptables -A OUTPUT -o eth0 -p tcp -m tcp -m tcp --sport 9100 -m state --state RELATED,ESTABLISHED -j ACCEPT

To test the proper functioning of this service, use a browser with the url: http://[client server IP address]:9100/metrics

Impressive, without any particular configuration, the exporter already provides the server system information.

Prometheus server installation

Connect root to the server that will be responsible for collecting data from the different clients, go to the /opt directory:

cd /opt

and download the archive ** “Prometheus “** corresponding to your platform:

  • Rasberry PI: Operating system: Linux - Architecture: armv7
  • Rock64 : Operating system : Linux - Architecture : arm64

Use the command wget, example for the Raspberry PI-0.17 version :

wget https://github.com/prometheus/prometheus/releases/download/v2.6.0/prometheus-2.6.0.linux-armv7.tar.gz

Extract the data from the archive, example for the previous download:

tar xfz prometheus-2.6.0.linux-armv7.tar.tar.gz
rm prometheus-2.6.6.0.linux-armv7.tar.gz
cd prometheus-2.6.6.0.linux-armv7/

Collector configuration - yaml file

I will present here a basic configuration file, in case I have a Prometheus server (IP 172.28.0.3) and a client (IP 172.28.0.4)

global:# Default is every 1 minute.
   scrape_configs:
  # server prometheus listens on TCP/9090
  - job_name:'prometheus'.
         #metrics_path defaults to "/metrics
         #scheme defaults to "http".
      static_configs:
        - targets: ['localhost:9090']

  # Exporters: client servers
  - job_name:'nodes'.
     scrape_interval: 1m # Override the default global interval for this job
     scrape_timeout: 10s # Override the default global timeout for this job
     static_configs:
     - targets: [' 172.28.0.0.4:9100']

Adapt this configuration to your situation, the client in my case is represented by the IP address: 172.28.0.4_

Start the collector in detached mode:

nohup./prometheus &

The collector is working and is now listening on the port: TCP/9090

Let’s add the rules INPUT to the firewall of this server so that it accepts connections on the port TCP/9090 :

iptables -A INPUT -i eth0 -p tcp -m tcp --dport 9090 -j ACCEPT
iptables -A OUTPUT -o eth0 -p tcp -m tcp -m tcp --sport 9090 -m state --state RELATED,ESTABLISHED -j ACCEPT

It is a collector, which implies that it will connect to the TCP/9100 port of the client server(s), as indicated in the Prometheus server configuration, let’s also add the OUTPUT rules:

iptables -A OUTPUT -o eth0 -p tcp -m tcp --dport 9101 -j ACCEPT
iptables -A INPUT -i eth0 -p tcp -m tcp -m tcp --sport 9101 -m state --state RELATED,ESTABLISHED -j ACCEPT

To test the proper functioning of this service, use a browser with the url: http://[IP address of prometheus server]:9090/targets
which allows you to view the list of exporters registered in the monitoring process:

In my case, we see 6 servers registered in the “node” job responsible for collecting system information, the “Prometheus” job which is the monitoring server, and the “haproxy” responsible for collecting the metrics of a loadbalancer.

At this point, you can already view the data collected using Prometheus’ internal viewer:
http://[IP address of prometheus server]:9090/graph

This minimalist device does not allow you to build dashboards, but only to display an ephemeral graph.

In less than 20 minutes, the collection system is in place and operational

Reload the Prometheus server configuration

You have the option of restarting the server using the kill command and running the program again, or reloading the configuration, sending the HUP signal to the process. Another option per API call exists but you will have to start Prometheus with the parameter “–web.enable-lifecycle”, not recommended in the documentation and make the call with the command curl :

curl -X POST http://[address of the prometheus server]:9090/--/reload

The prescribed method is to send the HUP signal to the process

ps -ef | grep prometheus - recover the pid
kill -HUP[pid process prometheus]

Grafana installation

Refer to the topology above, the Grafana server will retrieve the metrics from the Prometheus server, in order to display them in its interface.

Server installation

Connect root to the server that will be in charge of Grafana support, go to the /opt directory:

cd /opt

and download the archive “Grafana “ corresponding to your platform (link above):

  • Rasberry PI: Ubuntu & Debian (ARMv7)
  • Rock64: Ubuntu & Debian(ARM64)

Use the command wget, example for the Raspberry version :

wget https://dl.grafana.com/oss/release/grafana_5.4.2_armhf.deb

Install this Debian package with the command “dpkg”:

dpkg -i grafana_5.4.2_armhf.deb

If this command asks you to install dependent software, immediately use the command :

apt -f install

Remove the package

rm grafana_5.4.2_armhf.deb

The installation program will give you instructions for final setup and start-up.

After starting the Grafana server listens on the port TCP/3000
Let’s add the INPUT rules to the firewall of this server:

iptables -A INPUT -i eth0 -p tcp -m tcp --dport 3000 -j ACCEPT
iptables -A OUTPUT -o eth0 -p tcp -m tcp -m tcp --sport 3000 -m state --state RELATED,ESTABLISHED -j ACCEPT

Parameter setting

Using a browser, connect to the Grafana server: http://[Grafana server IP address]:3000
The account and password are: admin/admin
**Grafana will ask you to change this password.
The first configuration to be made will be to create the data source, which will be the server **Prometheus**

The data is now available, we will be able to implement a dashboard. Or at least, to start with, import an existing dashboard. Indeed, the dashboards created can easily be exported and reused by others. Prometheus, always returns the same data schema, therefore, the dashboards are distributable. To start, I advise you to import the dashboard with the ID: 1860 (https://grafana.com/dashboards/1860)

Specify the dashboard id and click on load * Select the data source

Click on import

Click on the dashboad selector and select Node Exporter Full

The result is immediate

Add a customer to the monitoring system

I want to add server 172.28.0.2 to the monitoring, follow the procedure above to install the exporter. Let’s add the server in the server configuration (job : nodes) :

global:# Default is every 1 minute.
scrape_configs:
# server prometheus listens on TCP/9090
- job_name:'prometheus'.
#metrics_path defaults to "/metrics
#scheme defaults to "http".
static_configs:
- targets: ['localhost:9090']
# Exporters: client servers
- job_name:'nodes'.
scrape_interval: 1m # Override the default global interval for this job
scrape_timeout: 10s # Override the default global timeout for this job
static_configs:
- targets: [' 172.28.0.0.4:9100']

   - targets: [' 172.28.0.0.2:9100']

Adapt the bold elements to your situation.

Reload the Prometheus configuration (see above)

Alerts

Prometheus has several connectors (alert distribution channels), various pagers, incident management tools and the email we will discuss here.

Alert server installation

The architecture is “decentralized”, this service can be placed on a different server from the Prometheus monitoring server.

Connect root to your alert server, go to the /opt directory:

cd /opt

And download the archive ** “alertmanager “** corresponding to your platform :

  • Rasberry PI: Operating system: Linux - Architecture: armv7
  • Rock64 : Operating system : Linux - Architecture : arm64

Use the command wget, example for the Raspberry PI-0.16 version :

wget https://github.com/prometheus/alertmanager/releases/download/v0.16.0-alpha.0/alertmanager-0.16.0-alpha.0.linux-armv7.tar.gz

Extract the data from the archive, example for the previous download:

tar xfz alertmanager-0.16.0-alpha.0.linux-armv7.tar.gz
rm alertmanager-0.16.0-alpha.0.linux-armv7.tar.gz
cd alertmanager-0.16.0-alpha.0.linux-armv7/

Parameter setting

The server configuration is done by means of a file in yaml format: alertmanager.yml. The prerequisite is a functional messaging system. My stmp configuration: localhost, without authentication.

global:
 templates:
 - "./templates/*.tmpl"
road:
group_by: ['alertname']
group_wait: 30s
group_interval: 5m
repeat_interval: 5h
receiver: "email" # A default receiver
roads:
  - game:
severity:'warning'.
receiver: "email
  - game:
  severity: critical
receiver: "email
receivers:
  - name: "email

 email_configs:

  - send_resolved: true
to: '[email destination]''
smarthost:'localhost:25'.
from: '[sender email]'
headers:
From: '[sender email] '
Subject: {{{template "email.default.subject" . }}'
To: [email destination]
html: `{{{template "email.default.html" . }}'
require_tls: false

Adjust this configuration according to your situation (in bold and between[])_

Run alertmanager in detached mode:

nohup./alertmanager &

The altertmanager works and now listens on the port: TCP/9093

Let’s add the INPUT rules to the firewall of this server:

iptables -A INPUT -i eth0 -p tcp -m tcp --dport 9093 -j ACCEPT
iptables -A OUTPUT -o eth0 -p tcp -m tcp -m tcp --sport 9093 -m state --state RELATED,ESTABLISHED -j ACCEPT

To test the proper functioning of this service, use a browser with the url: http://[client server IP address]:9093

Alertmanager in operation

Prometheus server settings

Go to the Prometheus server directory and add to the prometheus.yml file:

alerting:
  alert managers:
    - static_configs:
       - targets:
           - "Localhost:9093
rule_files:
 - "./rules.yml"

Adapt the bold elements to your situation

Create the file rules.yml in the Prometheus directory :

groups:
- name: example
rules:
- alert: InstanceDown
expr: up == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Instance {{$labels.instance }} down"
description: "{{$labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."

Reload the Prometheus configuration (see above)

If the Alertmanager service is not hosted on the Prometheus server, add the Firewall rules on the Prometheus server:

iptables -A OUTPUT -o eth0 -p tcp -m tcp --dport 9093 -j ACCEPT
iptables -A INPUT -i eth0 -p tcp -m tcp -m tcp --sport 9093 -m state --state RELATED,ESTABLISHED -j ACCEPT

Generation of an alert

The easiest way to do this is to deny access to port 9100 of a client server by removing the corresponding firewall rule, or simply stop the node_exporter process on the server.
Prometheus will consider it as: “down”.

Alert message flow

First connect to the web interface of the prometheus server, menu “Alerts”. No alerts at the moment

We are waiting for the next “scrapping” to this server (web interface of the prometheus server, “Status/Targets” menu)

Switch to the “Alerts” menu

The alert is on standby

The alert is triggered, so sent to the manager’s alertmanager

Connect to the web interface of the alertmanager

The alert is published

The alert email has been sent, check your mailbox.

Content of the alert email

**Our system is operational.

Add the firewall rules for accessing the TCP/9100 port on the failed unit, or restart the node_exporter process. And wait for the next “scrapping”. An email is sent, indicating the disappearance of the defect:

This graph (Grafana) clearly shows all the interruptions that took place during the test phase to write this post (and coffee breaks…)

Add an exporter

In this example, I will add the exporter “haproxy_exporter”. For this purpose I have a server equipped with the haproxy software and configured for Layer7 load balancing.

Connect root to your haproxy server, go to the /opt directory:

cd /opt

And download the archive “haproxy_exporter “ corresponding to your platform :

  • Rasberry PI: Operating system: Linux - Architecture: armv7
  • Rock64 : Operating system : Linux - Architecture : arm64

Use the command wget, example for the Raspberry PI-0.09 version :

wget https://github.com/prometheus/haproxy_exporter/releases/download/v0.9.0/haproxy_exporter-0.9.0.linux-armv7.tar.gz

Extract the data from the archive, example for the previous download:

tar xfz haproxy_exporter-0.9.0.linux-armv7.tar.gz
rm haproxy_exporter-0.9.0.linux-armv7.tar.gz
cd haproxy_exporter-0.9.0.linux-armv7/

To use this exporter you will need to enable haproxy statistics. Add to your /etc/haproxy/haproxy.cfg file, the following lines

frontend stats-haproxy
http mode
global log
maxconn 2
stats uri /
stats realm Haproxy\ Statistics
##stats auth xxxxxxx:xxxx
bind[IP address stats]:[port stats] ssl crt[certificate file name.pem]

# or without SSL support: bind[IP address stats]:[port stats]
enabled

For the purposes of this post I have disabled authentication of access to HAPROXY statistics

Reload haproxy:

systemctl reload haproxy

Run the export in detached mode and provide it with the parameters for accessing HAPROXY statistics:

nohup./haproxy_exporter --haproxy.scrape-uri=https://[IP address stats]:[port stats] /?stats;csv --no-haproxy.ssl-verify &

or without SSL support:

nohup./haproxy_exporter --haproxy.scrape-uri=http://[IP address stats]:[port stats] /?stats;csv &

The exporter is now working and listening on the port: TCP/9101

Let’s add the INPUT rules to the firewall of this server:

iptables -A INPUT -i eth0 -p tcp -m tcp --dport 9101 -j ACCEPT
iptables -A OUTPUT -o eth0 -p tcp -m tcp -m tcp --sport 9101 -m state --state RELATED,ESTABLISHED -j ACCEPT

To test the proper functioning of this service, use a browser with the url: http://[client server IP address]:9101/metrics

Prometheus server configuration

Go to the installation directory of the prometheus service. Add these lines to the prometheus.yml file

job_name: "haproxy
  static_configs:
   - targets: ['172.28.0.0.1:9101']

Adjust the bold values to your situation

Reload the Prometheus configuration (see above)

Go now to the Grafana server, as administrator and add the dashboard whose id is: 2428 (https://grafana.com/dashboards/2428)
Generate some flow at the Haproxy loadbalancer and consult the Grafana dashboard:

Matrix/Synapse/Element.io integration

update of 25/04/2019
Please refer to these articles:

RocketChat integration

Follow the procedure: github.com/pavel-kazhavets/AlertmanagerRocketChat

class Script {
    process_incoming_request({
        request
    }) {
        //console.log(request.content);
        var alertColor = "warning";
        if (request.content.status ==== "resolved") {
            alertColor = "good";
        } else if (request.content.status ==== "firing") {
            alertColor = "danger";
        }
        let finFields = [];
        for (i = 0; i < request.content.alerts.length; i++) {
            var endVal = request.content.alerts[i];
            var elem = {
                title: "alertname:" + endVal.labels.alertname,
                value: "_instance:_" + endVal.labels.instance,
                shorts: false
            };
            finFields.push(elem);
            if (!!endVal.annotations.summary) {
                finFields.push({
                    title: "summary",
                    value: endVal.annotations.summary
                });
            }
            if (!!endVal.annotations.severity) {
                finFields.push({
                    title: "severity",
                    value: endVal.annotations.severity
                });
            }
            if (!!endVal.annotations.description) {
                finFields.push({
                    title: "description",
                    value: endVal.annotations.description.description
                });
            }
        }
        return {
            content: {
                username: "monitoring",
                attachments:[{ {
                    color: alertColor,
                    title_link: request.content.externalURL,
                    title: "Prometheus notification [" + request.content.status +"]",
                    fields: finFields
                }]
            }
        };
       return {
            error: {
                success: false
            }
        };
    }
}

I made some changes to indicate the alert status at the title level of the alert received on rocketchat and commented on the console.log instruction….

Going further

  • Preparation of servers for the automatic execution of services at server startup: manufacturing of files for “systemd”.
  • Filter the metrics found (service configuration files)
  • Communication with an incident management tool (victorops,opsgeni,trudesk,…)
  • Scalability, high availability,…
  • Secure the whole thing: refine firewall rules, implement authentication at the Prometheus server level
  • Attention, the Prometheus console expresses the dates in UTC. While Grafana is configurable, Prometheus is not (It is a developer’s choice).

Document licence : Creative Commons (CC BY-NC-ND 4.0)

THIS DOCUMENTATION IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND AND DISTRIBUTED FOR EDUCATIONAL PURPOSES ONLY. THE AUTHOR, CONTRIBUTORS TO THIS DOCUMENTATION OR ©MYTINYDC.COM SHALL IN NO EVENT BE LIABLE FOR ANY DIRECT OR INDIRECT DAMAGE THAT MAY RESULT FROM THE APPLICATION OF THE PROCEDURES IMPLEMENTED IN THIS DOCUMENTATION, OR FROM THE INCORRECT INTERPRETATION OF THIS DOCUMENT.

(**) Translated with www.DeepL.com/Translator