日志聚合

多实例部署后,日志散落在各台机器上,排查问题需逐台登录查找。日志聚合将所有服务的日志集中收集、统一存储、可视化检索

主流方案对比

方案采集存储可视化特点
ELKLogstash / FilebeatElasticsearchKibana功能全面,全文检索强,资源占用大
EFKFilebeatElasticsearchKibanaFilebeat 替换 Logstash,轻量
Loki + GrafanaPromtail / AlloyLokiGrafana标签索引,低存储成本,云原生首选
云厂商Agent云存储控制台阿里云 SLS、AWS CloudWatch,免运维

前提:结构化日志输出

日志聚合的基础是输出 JSON 格式,方便后端解析字段(级别、traceId、userId 等)。

implementation 'net.logstash.logback:logstash-logback-encoder:7.4'
<!-- logback-spring.xml -->
<appender name="JSON" class="ch.qos.logback.core.rolling.RollingFileAppender">
    <file>logs/app-json.log</file>
    <rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
        <fileNamePattern>logs/app-json-%d{yyyy-MM-dd}.log.gz</fileNamePattern>
        <maxHistory>7</maxHistory>
    </rollingPolicy>
    <encoder class="net.logstash.logback.encoder.LogstashEncoder">
        <!-- 自定义追加字段 -->
        <customFields>{"service":"order-service","env":"prod"}</customFields>
        <!-- 从 MDC 提取的字段 -->
        <includeMdcKeyName>traceId</includeMdcKeyName>
        <includeMdcKeyName>spanId</includeMdcKeyName>
        <includeMdcKeyName>userId</includeMdcKeyName>
    </encoder>
</appender>

输出示例:

{
  "@timestamp": "2025-04-27T10:23:45.123Z",
  "level": "INFO",
  "logger_name": "com.example.OrderService",
  "message": "订单创建成功, orderId=1001",
  "thread_name": "http-nio-8080-exec-1",
  "traceId": "abc123def456",
  "spanId": "001122",
  "userId": "42",
  "service": "order-service",
  "env": "prod"
}

日志配置详见 日志,traceId 注入详见 链路追踪


方案一:EFK(Filebeat + Elasticsearch + Kibana)

架构

Spring Boot 应用
  │ 写入 JSON 日志文件
  ▼
Filebeat(轻量采集 Agent,每台机器部署)
  │ 推送
  ▼
Elasticsearch(存储 + 索引)
  │
  ▼
Kibana(可视化 + 搜索)

Filebeat 配置

# filebeat.yml
filebeat.inputs:
  - type: log
    enabled: true
    paths:
      - /opt/app/logs/*-json.log
    json.keys_under_root: true      # JSON 字段提升到顶层
    json.add_error_key: true
    fields:
      service: order-service
      env: prod
    fields_under_root: true
 
output.elasticsearch:
  hosts: ["http://elasticsearch:9200"]
  index: "app-logs-%{+yyyy.MM.dd}"   # 按天分索引
 
setup.kibana:
  host: "kibana:5601"

Docker Compose 快速启动

services:
  elasticsearch:
    image: elasticsearch:8.13.0
    environment:
      - discovery.type=single-node
      - xpack.security.enabled=false
      - ES_JAVA_OPTS=-Xms1g -Xmx1g
    ports:
      - "9200:9200"
    volumes:
      - es_data:/usr/share/elasticsearch/data
 
  kibana:
    image: kibana:8.13.0
    ports:
      - "5601:5601"
    environment:
      - ELASTICSEARCH_HOSTS=http://elasticsearch:9200
 
  filebeat:
    image: elastic/filebeat:8.13.0
    user: root
    volumes:
      - ./filebeat.yml:/usr/share/filebeat/filebeat.yml:ro
      - /opt/app/logs:/opt/app/logs:ro
    depends_on:
      - elasticsearch
 
volumes:
  es_data:

Logstash Appender(直推,无需 Filebeat)

适合日志量不大、不想维护 Filebeat Agent 的场景:

implementation 'net.logstash.logback:logstash-logback-encoder:7.4'
<appender name="LOGSTASH" class="net.logstash.logback.appender.LogstashTcpSocketAppender">
    <destination>logstash:5044</destination>
    <encoder class="net.logstash.logback.encoder.LogstashEncoder"/>
    <!-- 网络断开时的缓冲 -->
    <reconnectionDelay>10 seconds</reconnectionDelay>
    <writeBufferSize>8192</writeBufferSize>
</appender>

直推方案的缺点:Logstash 重启时日志可能丢失;Filebeat 方案有磁盘缓冲,更可靠。


方案二:Loki + Grafana(云原生推荐)

Loki 不对日志内容全文索引,只索引标签(service、level、env),存储成本比 Elasticsearch 低 10 倍以上,适合 K8s 环境。

架构

Spring Boot 应用
  │ 写 JSON 日志文件
  ▼
Promtail / Grafana Alloy(采集 Agent)
  │ 推送(HTTP)
  ▼
Loki(存储,标签索引)
  │
  ▼
Grafana(LogQL 查询 + 可视化)

Promtail 配置

# promtail-config.yml
server:
  http_listen_port: 9080
 
clients:
  - url: http://loki:3100/loki/api/v1/push
 
scrape_configs:
  - job_name: spring-boot-logs
    static_configs:
      - targets:
          - localhost
        labels:
          service: order-service
          env: prod
          __path__: /opt/app/logs/*-json.log
 
    pipeline_stages:
      - json:
          expressions:
            level: level
            traceId: traceId
            message: message
      - labels:
          level:
          traceId:
      - output:
          source: message

Loki Docker Compose

services:
  loki:
    image: grafana/loki:2.9.5
    ports:
      - "3100:3100"
    command: -config.file=/etc/loki/local-config.yaml
 
  promtail:
    image: grafana/promtail:2.9.5
    volumes:
      - ./promtail-config.yml:/etc/promtail/config.yml
      - /opt/app/logs:/opt/app/logs:ro
    command: -config.file=/etc/promtail/config.yml
 
  grafana:
    image: grafana/grafana:10.4.0
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin

LogQL 查询示例

# 查所有 ERROR 日志
{service="order-service"} | json | level="ERROR"
 
# 按 traceId 关联日志
{service=~"order-service|payment-service"} | json | traceId="abc123"
 
# 统计每分钟错误数
sum(rate({service="order-service"} | json | level="ERROR" [1m])) by (service)
 
# 慢请求(>1s)日志
{service="order-service"} | json | duration > 1000

traceId 关联日志与链路

在 Grafana 中配置 Loki 数据源时,开启 Derived Fields,使 traceId 字段自动生成跳转链接,点击即跳到 Jaeger / Tempo 查看对应的调用链:

# Grafana datasource 配置(grafana.ini 或 UI)
# Data Sources → Loki → Derived Fields
name: TraceID
matcherRegex: '"traceId":"(\w+)"'
url: http://jaeger:16686/trace/${__value.raw}

链路追踪详见 链路追踪,OpenTelemetry 端到端观测详见 OpenTelemetry


K8s 日志采集

K8s 中推荐 DaemonSet 方式部署 Filebeat / Promtail,自动采集所有 Pod 日志:

# filebeat DaemonSet(简化)
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: filebeat
spec:
  selector:
    matchLabels:
      app: filebeat
  template:
    spec:
      containers:
        - name: filebeat
          image: elastic/filebeat:8.13.0
          volumeMounts:
            - name: varlog
              mountPath: /var/log
            - name: varlibdockercontainers
              mountPath: /var/lib/docker/containers
              readOnly: true
      volumes:
        - name: varlog
          hostPath:
            path: /var/log
        - name: varlibdockercontainers
          hostPath:
            path: /var/lib/docker/containers

K8s 部署详见 K8s部署


相关链接

  • 日志 — JSON 结构化日志输出,logback-spring.xml 配置
  • 链路追踪 — traceId / spanId 注入 MDC,与日志关联
  • OpenTelemetry — OTLP 协议统一日志、指标、链路上报
  • 指标采集 — Prometheus 指标与日志在 Grafana 统一展示
  • K8s部署 — DaemonSet 日志采集与 Pod 日志标签
  • Docker部署 — 容器日志驱动(json-file / fluentd)