日志聚合
多实例部署后,日志散落在各台机器上,排查问题需逐台登录查找。日志聚合将所有服务的日志集中收集、统一存储、可视化检索。
主流方案对比
| 方案 | 采集 | 存储 | 可视化 | 特点 |
|---|---|---|---|---|
| ELK | Logstash / Filebeat | Elasticsearch | Kibana | 功能全面,全文检索强,资源占用大 |
| EFK | Filebeat | Elasticsearch | Kibana | Filebeat 替换 Logstash,轻量 |
| Loki + Grafana | Promtail / Alloy | Loki | Grafana | 标签索引,低存储成本,云原生首选 |
| 云厂商 | Agent | 云存储 | 控制台 | 阿里云 SLS、AWS CloudWatch,免运维 |
前提:结构化日志输出
日志聚合的基础是输出 JSON 格式,方便后端解析字段(级别、traceId、userId 等)。
implementation 'net.logstash.logback:logstash-logback-encoder:7.4'<!-- logback-spring.xml -->
<appender name="JSON" class="ch.qos.logback.core.rolling.RollingFileAppender">
<file>logs/app-json.log</file>
<rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
<fileNamePattern>logs/app-json-%d{yyyy-MM-dd}.log.gz</fileNamePattern>
<maxHistory>7</maxHistory>
</rollingPolicy>
<encoder class="net.logstash.logback.encoder.LogstashEncoder">
<!-- 自定义追加字段 -->
<customFields>{"service":"order-service","env":"prod"}</customFields>
<!-- 从 MDC 提取的字段 -->
<includeMdcKeyName>traceId</includeMdcKeyName>
<includeMdcKeyName>spanId</includeMdcKeyName>
<includeMdcKeyName>userId</includeMdcKeyName>
</encoder>
</appender>输出示例:
{
"@timestamp": "2025-04-27T10:23:45.123Z",
"level": "INFO",
"logger_name": "com.example.OrderService",
"message": "订单创建成功, orderId=1001",
"thread_name": "http-nio-8080-exec-1",
"traceId": "abc123def456",
"spanId": "001122",
"userId": "42",
"service": "order-service",
"env": "prod"
}方案一:EFK(Filebeat + Elasticsearch + Kibana)
架构
Spring Boot 应用
│ 写入 JSON 日志文件
▼
Filebeat(轻量采集 Agent,每台机器部署)
│ 推送
▼
Elasticsearch(存储 + 索引)
│
▼
Kibana(可视化 + 搜索)
Filebeat 配置
# filebeat.yml
filebeat.inputs:
- type: log
enabled: true
paths:
- /opt/app/logs/*-json.log
json.keys_under_root: true # JSON 字段提升到顶层
json.add_error_key: true
fields:
service: order-service
env: prod
fields_under_root: true
output.elasticsearch:
hosts: ["http://elasticsearch:9200"]
index: "app-logs-%{+yyyy.MM.dd}" # 按天分索引
setup.kibana:
host: "kibana:5601"Docker Compose 快速启动
services:
elasticsearch:
image: elasticsearch:8.13.0
environment:
- discovery.type=single-node
- xpack.security.enabled=false
- ES_JAVA_OPTS=-Xms1g -Xmx1g
ports:
- "9200:9200"
volumes:
- es_data:/usr/share/elasticsearch/data
kibana:
image: kibana:8.13.0
ports:
- "5601:5601"
environment:
- ELASTICSEARCH_HOSTS=http://elasticsearch:9200
filebeat:
image: elastic/filebeat:8.13.0
user: root
volumes:
- ./filebeat.yml:/usr/share/filebeat/filebeat.yml:ro
- /opt/app/logs:/opt/app/logs:ro
depends_on:
- elasticsearch
volumes:
es_data:Logstash Appender(直推,无需 Filebeat)
适合日志量不大、不想维护 Filebeat Agent 的场景:
implementation 'net.logstash.logback:logstash-logback-encoder:7.4'<appender name="LOGSTASH" class="net.logstash.logback.appender.LogstashTcpSocketAppender">
<destination>logstash:5044</destination>
<encoder class="net.logstash.logback.encoder.LogstashEncoder"/>
<!-- 网络断开时的缓冲 -->
<reconnectionDelay>10 seconds</reconnectionDelay>
<writeBufferSize>8192</writeBufferSize>
</appender>直推方案的缺点:Logstash 重启时日志可能丢失;Filebeat 方案有磁盘缓冲,更可靠。
方案二:Loki + Grafana(云原生推荐)
Loki 不对日志内容全文索引,只索引标签(service、level、env),存储成本比 Elasticsearch 低 10 倍以上,适合 K8s 环境。
架构
Spring Boot 应用
│ 写 JSON 日志文件
▼
Promtail / Grafana Alloy(采集 Agent)
│ 推送(HTTP)
▼
Loki(存储,标签索引)
│
▼
Grafana(LogQL 查询 + 可视化)
Promtail 配置
# promtail-config.yml
server:
http_listen_port: 9080
clients:
- url: http://loki:3100/loki/api/v1/push
scrape_configs:
- job_name: spring-boot-logs
static_configs:
- targets:
- localhost
labels:
service: order-service
env: prod
__path__: /opt/app/logs/*-json.log
pipeline_stages:
- json:
expressions:
level: level
traceId: traceId
message: message
- labels:
level:
traceId:
- output:
source: messageLoki Docker Compose
services:
loki:
image: grafana/loki:2.9.5
ports:
- "3100:3100"
command: -config.file=/etc/loki/local-config.yaml
promtail:
image: grafana/promtail:2.9.5
volumes:
- ./promtail-config.yml:/etc/promtail/config.yml
- /opt/app/logs:/opt/app/logs:ro
command: -config.file=/etc/promtail/config.yml
grafana:
image: grafana/grafana:10.4.0
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=adminLogQL 查询示例
# 查所有 ERROR 日志
{service="order-service"} | json | level="ERROR"
# 按 traceId 关联日志
{service=~"order-service|payment-service"} | json | traceId="abc123"
# 统计每分钟错误数
sum(rate({service="order-service"} | json | level="ERROR" [1m])) by (service)
# 慢请求(>1s)日志
{service="order-service"} | json | duration > 1000traceId 关联日志与链路
在 Grafana 中配置 Loki 数据源时,开启 Derived Fields,使 traceId 字段自动生成跳转链接,点击即跳到 Jaeger / Tempo 查看对应的调用链:
# Grafana datasource 配置(grafana.ini 或 UI)
# Data Sources → Loki → Derived Fields
name: TraceID
matcherRegex: '"traceId":"(\w+)"'
url: http://jaeger:16686/trace/${__value.raw}链路追踪详见 链路追踪,OpenTelemetry 端到端观测详见 OpenTelemetry。
K8s 日志采集
K8s 中推荐 DaemonSet 方式部署 Filebeat / Promtail,自动采集所有 Pod 日志:
# filebeat DaemonSet(简化)
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: filebeat
spec:
selector:
matchLabels:
app: filebeat
template:
spec:
containers:
- name: filebeat
image: elastic/filebeat:8.13.0
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containersK8s 部署详见 K8s部署。