Alerting for Cloud-native Applications with Prometheus

Different components in the monitoring system

Creating Prometheus Alerts

Based on the metrics created in your application, you can configure different alerts in Prometheus to fulfill your business requirements. Based on the metric created in the previous application, lets create an alert to be triggered when your service is down or not detached from Prometheus server and another alert to be triggered if the total API requests exceed a predetermined value.

global:
scrape_interval: 15s
evaluation_interval: 15s
# Alertmanager configuration
alerting:
alertmanagers:
— static_configs:
— targets:
— localhost:9093
# Load rules once and periodically evaluate them according to the global ‘evaluation_interval’.
rule_files:
— ‘alerts\*.yml’
groups:- name: ExampleAlertGroup
rules:
- alert: YourServiceDown
expr: up{job="your_service"} == 0
for: 1m
labels:
severity: "critical"
type: "service"
environment: "production"
annotations:
description: "Your Service {{ $labels.job }} instance {{ $labels.instance }} down"
summary: "your service is down."
- alert: RequestLimit
expr: sum(api_request_total[1m]) > 10
for: 1m
labels:
severity: warning
type: "service"
environment: "production"
annotations:
summary: "Total request count Limit Exceeded (instance {{ $labels.instance }})"
description: "Total request count Exceeded the Limit on node (> 10 / s) VALUE = {{ $value }} LABELS: {{ $labels }}"
global:
smtp_smarthost: email-smtp.amazonaws.com:587

route:
receiver: 'default-receiver'
group_by: ['alertname', 'cluster']
group_wait: 30s
group_interval: 30m
repeat_interval: 5h
# All the above attributes are inherited by all child routes and can be overwritten on each.
routes:
- match_re:
service: ^.*
receiver: 'prometheus-msteams'
continue: true
routes:
- match:
severity: critical
receiver: 'default-receiver'
continue: true
receivers:
- name: 'default-receiver'
email_configs:
- send_resolved: false
to: 'mymail@abc.com'
from: 'alerts@abc.com'
auth_username: "XXXXXXXXXXXXX"
auth_identity: "XXXXXXXXXXXXX"
auth_password: "XXXXXXXXXXXXX"
- name: prometheus-msteams
webhook_configs:
- url: "http://prom2teams-server:8089/v2/Connector"
send_resolved: true
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'cluster']

Prometheus to Ms Teams Integration

Integrating the alerts generated by alert-manager with Ms Teams is not straightforward. We need an intermediate component called Prom2teams service to interconnect the alert-manager with Ms Teams.

Webhook connector creation in Ms Teams
[HTTP Server]
Host:
Port: 8089
[Microsoft Teams]
Connector: https://outlook.office.com/webhook/1231232232323
[Group Alerts]
Field:
[Log]
Level: INFO
[Template]
Path: /opt/prom2teams/helmconfig/teams.j2
Sample email alert
Sample Ms Teams Alert

Dead Man’s Switch

Dead man’s switch is a device/service designed in such a way that an action will occur upon a switch being opened or closed. In our case we use another service as a dead man’s switch to trigger an alert in case of Prometheus alert manager failure. So to achieve that, we configure a watchdog alert as given below for the dead man’s switch endpoint configured in the alert-manager.

groups:
- name: meta
rules:
- alert: WatchdogAlert
expr: vector(1)
labels:
severity: "critical"
environment: "Production"
annotations:
description: This is a Watchdog alert to ensure that the entire Alerting pipeline is functional.
summary: Watchdog Alerting
routes:
- match_re:
alertname: WatchdogAlert
receiver: 'cole'
group_interval: 10s
repeat_interval: 4m
continue: false
receivers:
- name: cole
webhook_configs:
- url: "http://deadman-switch:8080/ping/bpbn2earafu3t25o2900"
send_resolved: false

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store