监控数据
微服务监控告警Prometheus实验,极客时间微服务架构实践课https://github.com/spring2go/prom_lab
配置文件
# my global config
global:
scrape_interval: 5s # Set the scrape interval to every 5 seconds. Default is every 1 minute.
scrape_timeout: 5s
evaluation_interval: 5s # Evaluate rules every 5 seconds. The default is every 1 minute.
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:9090']
- job_name: 'http-simulator'
metrics_path: /prometheus
static_configs:
- targets: ['localhost:8080']
校验http-simulator在1状态
up{job="http-simulator"}
查询http请求数
http_requests_total{job="http-simulator"}
查询成功login请求数
http_requests_total{job="http-simulator", status="200", endpoint="/login"}
查询成功请求数,以endpoint区分
http_requests_total{job="http-simulator", status="200"}
查询总成功请求数
sum(http_requests_total{job="http-simulator", status="200"})
查询成功请求率,以endpoint区分
rate(http_requests_total{job="http-simulator", status="200"}[5m])
查询总成功请求率
sum(rate(http_requests_total{job="http-simulator", status="200"}[5m]))
4. 延迟分布(Latency distribution)查询
查询http-simulator延迟分布
http_request_duration_milliseconds_bucket{job="http-simulator"}
查询成功login延迟分布
http_request_duration_milliseconds_bucket{job="http-simulator", status="200", endpoint="/login"}
不超过200ms延迟的成功login请求占比
sum(http_request_duration_milliseconds_bucket{job="http-simulator", status="200", endpoint="/login", le="200.0"}) / sum(http_request_duration_milliseconds_count{job="http-simulator", status="200", endpoint="/login"})
成功login请求延迟的99百分位
histogram_quantile(0.99, rate(http_request_duration_milliseconds_bucket{job="http-simulator", status="200", endpoint="/login"}[5m]))