Prometheus

Prometheus 是 CNCF 旗下的开源监控与告警系统，以多维度时间序列数据模型和强大的 PromQL 查询语言著称。采用拉取（Pull）模型，通过 HTTP 定期抓取目标暴露的 /metrics 端点。

适合：基础设施与应用指标监控、告警规则配置；可视化层通常配合 Grafana 使用。

数据模型

<metric_name>{<label_name>=<label_value>, ...} <value> [<timestamp>]

# 示例
http_requests_total{method="GET", status="200", handler="/api/users"} 1027
jvm_memory_used_bytes{area="heap", id="G1 Eden Space"} 1.234e+08

四种指标类型：

类型	说明	典型用途
Counter	单调递增计数器，重启归零	请求总数、错误总数
Gauge	可增可减的瞬时值	内存使用量、在线用户数
Histogram	分桶统计（含 `_count`、`_sum`、`_bucket`）	请求耗时分布、响应大小
Summary	客户端计算分位数（含 `_count`、`_sum`、`_quantile`）	P99 延迟（单实例精确值）

架构

Exporters / App /metrics
        │  (Pull HTTP)
        ▼
Prometheus Server
   ├── TSDB（本地时序数据库）
   ├── Retrieval（定时抓取）
   ├── PromQL 引擎
   └── Alertmanager（告警路由）
        │
        ▼
   Email / PagerDuty / Webhook / DingTalk

快速上手（Docker Compose）

services:
  prometheus:
    image: prom/prometheus:latest
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - "9090:9090"

# prometheus.yml
global:
  scrape_interval: 15s
 
scrape_configs:
  - job_name: "spring-app"
    metrics_path: /actuator/prometheus
    static_configs:
      - targets: ["app:8080"]

Spring Boot 集成

<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-registry-prometheus</artifactId>
</dependency>

management:
  endpoints:
    web:
      exposure:
        include: prometheus,health,info
  metrics:
    tags:
      application: ${spring.application.name}

// 自定义业务指标
@Component
public class OrderMetrics {
    private final Counter orderCounter;
    private final Timer orderTimer;
 
    public OrderMetrics(MeterRegistry registry) {
        this.orderCounter = Counter.builder("orders.created.total")
            .description("Total orders created")
            .tag("channel", "web")
            .register(registry);
        this.orderTimer = Timer.builder("orders.process.duration")
            .description("Order processing time")
            .register(registry);
    }
 
    public void recordOrder() {
        orderCounter.increment();
    }
 
    public void recordProcessTime(Runnable task) {
        orderTimer.record(task);
    }
}

PromQL 常用查询

# 最近 5 分钟 HTTP 请求速率（QPS）
rate(http_requests_total[5m])
 
# 按状态码统计错误比例
sum(rate(http_requests_total{status=~"5.."}[5m]))
  / sum(rate(http_requests_total[5m]))
 
# P99 请求延迟（Histogram）
histogram_quantile(0.99,
  sum by (le) (rate(http_request_duration_seconds_bucket[5m])))
 
# JVM 堆内存使用率
jvm_memory_used_bytes{area="heap"}
  / jvm_memory_max_bytes{area="heap"}
 
# 过去 1 小时内实例是否在线（0/1）
up{job="spring-app"}

告警规则（AlertManager）

# alerts/rules.yml
groups:
  - name: app
    rules:
      - alert: HighErrorRate
        expr: |
          sum(rate(http_requests_total{status=~"5.."}[5m]))
            / sum(rate(http_requests_total[5m])) > 0.05
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "错误率超过 5%（当前: {{ $value | humanizePercentage }}）"
 
      - alert: InstanceDown
        expr: up == 0
        for: 1m
        labels:
          severity: warning
        annotations:
          summary: "实例 {{ $labels.instance }} 已下线"

存储与高可用

方案	说明
本地 TSDB	默认 15 天保留，适合单机
Remote Write	写入 Thanos / Cortex / VictoriaMetrics
Thanos	多 Prometheus 联邦 + 长期存储 + 全局查询
VictoriaMetrics	单二进制、高压缩比、兼容 PromQL

知识仓库

探索

Prometheus

Prometheus

数据模型

架构

快速上手（Docker Compose）

Spring Boot 集成

PromQL 常用查询

告警规则（AlertManager）

存储与高可用

相关链接

关系图谱

目录

反向链接