OpenTelemetry
OpenTelemetry(OTel)是 CNCF 的可观测性标准,将链路追踪、指标、日志三个信号统一到同一套 API / SDK / 协议(OTLP),彻底解决观测工具碎片化问题。
三大信号
可观测性(Observability)
│
├── Traces(链路) 一次请求在多个服务间的完整调用链
├── Metrics(指标) QPS、延迟、错误率等可聚合的数字
└── Logs(日志) 带时间戳的结构化文本事件
三者在 OTel 中通过同一个 TraceId 关联,实现从指标告警 → 定位链路 → 查看日志的闭环排查。
架构总览
应用进程
┌──────────────────────────────────────────┐
│ OTel SDK / Java Agent(自动插桩) │
│ │
│ Traces → OTLP Exporter ──────────────► │
│ Metrics → OTLP Exporter ─────────────► │──► OTel Collector
│ Logs → OTLP Exporter ──────────────► │
└──────────────────────────────────────────┘
OTel Collector
┌─────────────────────────────────────────┐
│ Receiver(OTLP gRPC/HTTP) │
│ Processor(批量、采样、属性过滤) │
│ Exporter │
│ ├── Jaeger / Zipkin(Traces) │
│ ├── Prometheus / Thanos(Metrics) │
│ └── Loki / Elasticsearch(Logs) │
└─────────────────────────────────────────┘
Java Agent 自动插桩(零代码改动)
OTel Java Agent 通过字节码增强,自动为 Spring MVC、JDBC、Redis、Kafka、gRPC 等数十种框架注入 Span:
# 下载 Agent
curl -L https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/latest/download/opentelemetry-javaagent.jar \
-o opentelemetry-javaagent.jar
# 启动应用时挂载(不改代码)
java \
-javaagent:opentelemetry-javaagent.jar \
-Dotel.service.name=order-service \
-Dotel.exporter.otlp.endpoint=http://otel-collector:4318 \
-Dotel.exporter.otlp.protocol=http/protobuf \
-Dotel.logs.exporter=otlp \
-Dotel.metrics.exporter=otlp \
-Dotel.traces.exporter=otlp \
-Dotel.resource.attributes=env=prod,version=1.2.0 \
-jar app.jar# Docker / K8s 环境变量等价写法
OTEL_SERVICE_NAME: order-service
OTEL_EXPORTER_OTLP_ENDPOINT: http://otel-collector:4318
OTEL_EXPORTER_OTLP_PROTOCOL: http/protobuf
OTEL_LOGS_EXPORTER: otlp
OTEL_METRICS_EXPORTER: otlp
OTEL_TRACES_EXPORTER: otlp自动插桩支持的框架列表见 opentelemetry-java-instrumentation 官方文档。
Spring Boot 3.x 集成(Micrometer 桥接)
Spring Boot 3.x 通过 Micrometer Tracing + OTel Bridge 集成,不需要 Java Agent:
// build.gradle
implementation 'io.micrometer:micrometer-tracing-bridge-otel'
implementation 'io.opentelemetry:opentelemetry-exporter-otlp'
implementation 'io.micrometer:micrometer-registry-otlp' // 指标也走 OTLP# application.yml
management:
tracing:
sampling:
probability: 1.0
otlp:
tracing:
endpoint: http://otel-collector:4318/v1/traces
metrics:
endpoint: http://otel-collector:4318/v1/metrics
export:
step: 30s
spring:
application:
name: order-serviceMicrometer Tracing 详见 链路追踪,指标详见 指标采集。
手动插桩(自定义 Span / 属性)
import io.opentelemetry.api.GlobalOpenTelemetry;
import io.opentelemetry.api.trace.*;
@Service
public class PaymentService {
private final Tracer tracer = GlobalOpenTelemetry.getTracer("payment-service");
public PayResult pay(PayRequest req) {
// 创建子 Span
Span span = tracer.spanBuilder("payment.process")
.setSpanKind(SpanKind.INTERNAL)
.startSpan();
try (Scope scope = span.makeCurrent()) {
span.setAttribute("payment.method", req.getMethod());
span.setAttribute("payment.amount", req.getAmount());
PayResult result = doCharge(req);
span.setAttribute("payment.txnId", result.getTxnId());
return result;
} catch (PaymentException e) {
// 标记 Span 状态为错误
span.setStatus(StatusCode.ERROR, e.getMessage());
span.recordException(e);
throw e;
} finally {
span.end();
}
}
}通过注解简化(Spring AOP + Micrometer)
@NewSpan("inventory.check")
public boolean checkInventory(@SpanTag("productId") Long productId) {
return inventoryRepo.hasStock(productId);
}OTel Collector 部署
Collector 作为中间层,隔离应用与后端存储,支持多路输出、采样、数据转换:
# otel-collector-config.yml
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
timeout: 5s
send_batch_size: 1024
memory_limiter:
limit_mib: 512
# 尾部采样(基于完整链路决策,比头部采样更精准)
tail_sampling:
decision_wait: 10s
policies:
- name: error-policy
type: status_code
status_code: { status_codes: [ERROR] }
- name: slow-policy
type: latency
latency: { threshold_ms: 1000 }
- name: sample-rest
type: probabilistic
probabilistic: { sampling_percentage: 10 }
exporters:
# 链路 → Jaeger
jaeger:
endpoint: jaeger:14250
tls:
insecure: true
# 链路 → Zipkin(可同时输出多个后端)
zipkin:
endpoint: http://zipkin:9411/api/v2/spans
# 指标 → Prometheus(pull 模式)
prometheus:
endpoint: 0.0.0.0:8889
# 日志 → Loki
loki:
endpoint: http://loki:3100/loki/api/v1/push
# 全信号 → OTLP(上游 Collector 或 SaaS)
otlp:
endpoint: https://api.honeycomb.io:443
headers:
x-honeycomb-team: ${env:HONEYCOMB_API_KEY}
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, tail_sampling, batch]
exporters: [jaeger, zipkin]
metrics:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [prometheus]
logs:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [loki]# docker-compose.yml(最小化可观测栈)
services:
otel-collector:
image: otel/opentelemetry-collector-contrib:0.99.0
ports:
- "4317:4317" # OTLP gRPC
- "4318:4318" # OTLP HTTP
- "8889:8889" # Prometheus 指标暴露
volumes:
- ./otel-collector-config.yml:/etc/otelcol/config.yaml
jaeger:
image: jaegertracing/all-in-one:1.56
ports:
- "16686:16686" # Jaeger UI
- "14250:14250" # gRPC
prometheus:
image: prom/prometheus:v2.51.0
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
grafana:
image: grafana/grafana:10.4.0
ports:
- "3000:3000"OTLP 协议
OTLP(OpenTelemetry Protocol)是 OTel 定义的传输协议,基于 Protocol Buffers:
| 传输方式 | 端口 | 适用场景 |
|---|---|---|
| gRPC | 4317 | 低延迟、高吞吐,服务端之间 |
| HTTP/protobuf | 4318 | 防火墙友好,通用 |
| HTTP/JSON | 4318 | 调试、简单集成 |
SaaS 可观测性平台
不想自建 Collector + 后端的场景,可直接对接商业 SaaS:
| 平台 | 支持信号 | 特点 |
|---|---|---|
| Grafana Cloud | Traces / Metrics / Logs | 免费额度,Tempo + Mimir + Loki |
| Datadog | 全信号 | 功能最全,价格较高 |
| Honeycomb | Traces | 高基数分析能力强 |
| New Relic | 全信号 | 100GB/月免费 |
| 阿里云 ARMS | 全信号 | 国内延迟低,与阿里云生态集成 |
只需修改 Exporter 的 endpoint 和认证 Header,代码无需改动。
与 Spring Boot 观测体系的关系
Spring Boot 应用
│
├── Micrometer Tracing ──► OTel Bridge ──► OTel SDK ──► OTLP ──► Collector
├── Micrometer Metrics ──► OTel Exporter ──────────────────────► Collector
└── SLF4J + Logback ──► logstash-logback-encoder ──► 文件 ──► Promtail ──► Loki
或 OTel Log Appender ──────────────────────────────► Collector
- 链路追踪:链路追踪(Micrometer Tracing)
- 指标采集:指标采集(Micrometer Metrics)
- 日志输出:日志(SLF4J + Logback)
- 日志聚合:日志聚合(ELK / Loki)
相关链接
架构
- 可观测性(架构)
- 云原生概述
- Kubernetes(Collector 部署)
- 系统全貌