Alerting for Cloud-native Applications with Prometheus

Different components in the monitoring system

Creating Prometheus Alerts

global:
scrape_interval: 15s
evaluation_interval: 15s
# Alertmanager configuration
alerting:
alertmanagers:
— static_configs:
— targets:
— localhost:9093
# Load rules once and periodically evaluate them according to the global ‘evaluation_interval’.
rule_files:
— ‘alerts\*.yml’
groups:- name: ExampleAlertGroup
rules:
- alert: YourServiceDown
expr: up{job="your_service"} == 0
for: 1m
labels:
severity: "critical"
type: "service"
environment: "production"
annotations:
description: "Your Service {{ $labels.job }} instance {{ $labels.instance }} down"
summary: "your service is down."
- alert: RequestLimit
expr: sum(api_request_total[1m]) > 10
for: 1m
labels:
severity: warning
type: "service"
environment: "production"
annotations:
summary: "Total request count Limit Exceeded (instance {{ $labels.instance }})"
description: "Total request count Exceeded the Limit on node (> 10 / s) VALUE = {{ $value }} LABELS: {{ $labels }}"
global:
smtp_smarthost: email-smtp.amazonaws.com:587

route:
receiver: 'default-receiver'
group_by: ['alertname', 'cluster']
group_wait: 30s
group_interval: 30m
repeat_interval: 5h
# All the above attributes are inherited by all child routes and can be overwritten on each.
routes:
- match_re:
service: ^.*
receiver: 'prometheus-msteams'
continue: true
routes:
- match:
severity: critical
receiver: 'default-receiver'
continue: true
receivers:
- name: 'default-receiver'
email_configs:
- send_resolved: false
to: 'mymail@abc.com'
from: 'alerts@abc.com'
auth_username: "XXXXXXXXXXXXX"
auth_identity: "XXXXXXXXXXXXX"
auth_password: "XXXXXXXXXXXXX"
- name: prometheus-msteams
webhook_configs:
- url: "http://prom2teams-server:8089/v2/Connector"
send_resolved: true
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'cluster']

Prometheus to Ms Teams Integration

Webhook connector creation in Ms Teams
[HTTP Server]
Host:
Port: 8089
[Microsoft Teams]
Connector: https://outlook.office.com/webhook/1231232232323
[Group Alerts]
Field:
[Log]
Level: INFO
[Template]
Path: /opt/prom2teams/helmconfig/teams.j2
Sample email alert
Sample Ms Teams Alert

Dead Man’s Switch

groups:
- name: meta
rules:
- alert: WatchdogAlert
expr: vector(1)
labels:
severity: "critical"
environment: "Production"
annotations:
description: This is a Watchdog alert to ensure that the entire Alerting pipeline is functional.
summary: Watchdog Alerting
routes:
- match_re:
alertname: WatchdogAlert
receiver: 'cole'
group_interval: 10s
repeat_interval: 4m
continue: false
receivers:
- name: cole
webhook_configs:
- url: "http://deadman-switch:8080/ping/bpbn2earafu3t25o2900"
send_resolved: false

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Danuka Praneeth

Danuka Praneeth

Senior Software Engineer | BSc (Hons) Engineering | CIMA | Autodidact | Knowledge-Seeker