Resilience4j

Resilience4j 是专为 Java 8+ 和函数式编程设计的轻量级容错库,是 Spring Boot 3.x 官方推荐的 Hystrix 替代方案。模块化设计,按需引入,核心功能通过装饰器模式叠加到任意函数上。


核心模块

模块依赖 artifactId功能
CircuitBreakerresilience4j-circuitbreaker熔断器
RateLimiterresilience4j-ratelimiter速率限流
Bulkheadresilience4j-bulkhead舱壁隔离(并发控制)
Retryresilience4j-retry自动重试
TimeLimiterresilience4j-timelimiter超时控制
Cacheresilience4j-cache结果缓存

Spring Boot 集成

<dependency>
    <groupId>io.github.resilience4j</groupId>
    <artifactId>resilience4j-spring-boot3</artifactId>
    <version>2.2.0</version>
</dependency>
<!-- 依赖 Spring AOP -->
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-aop</artifactId>
</dependency>

熔断器(CircuitBreaker)

状态机

CLOSED ──(失败率/慢调用率超阈值)──▶ OPEN
  ▲                                      │
  │                                   等待 waitDurationInOpenState
  │                                      │
  └──(半开测试通过)──── HALF_OPEN ◀───┘
                              │
                         (测试失败)→ OPEN

配置

resilience4j:
  circuitbreaker:
    instances:
      orderService:
        # 滑动窗口:COUNT_BASED(按调用次数)或 TIME_BASED(按时间窗口)
        slidingWindowType: COUNT_BASED
        slidingWindowSize: 10            # 窗口内统计 10 次调用
        failureRateThreshold: 50         # 失败率 ≥ 50% 触发熔断
        slowCallRateThreshold: 80        # 慢调用率 ≥ 80% 触发熔断
        slowCallDurationThreshold: 2s    # 超过 2s 视为慢调用
        waitDurationInOpenState: 10s     # OPEN 状态持续 10s 后进入 HALF_OPEN
        permittedNumberOfCallsInHalfOpenState: 3  # 半开状态允许 3 次探测
        minimumNumberOfCalls: 5          # 至少 5 次调用后才计算

使用

@CircuitBreaker(name = "orderService", fallbackMethod = "getOrderFallback")
public OrderDTO getOrder(Long id) {
    return orderClient.getById(id);
}
 
public OrderDTO getOrderFallback(Long id, CallNotPermittedException ex) {
    return OrderDTO.empty();   // 熔断时的降级响应
}
 
public OrderDTO getOrderFallback(Long id, Exception ex) {
    log.error("Order service error", ex);
    return OrderDTO.empty();   // 业务异常时的降级响应
}

限流器(RateLimiter)

基于令牌桶算法,控制单位时间内的请求数:

resilience4j:
  ratelimiter:
    instances:
      orderApi:
        limitForPeriod: 100          # 每个周期内允许 100 次请求
        limitRefreshPeriod: 1s       # 令牌刷新周期 1s(即 QPS=100)
        timeoutDuration: 500ms       # 等待令牌的最大超时时间
@RateLimiter(name = "orderApi", fallbackMethod = "rateLimitFallback")
public OrderDTO createOrder(OrderDTO dto) {
    return orderService.create(dto);
}
 
public OrderDTO rateLimitFallback(OrderDTO dto, RequestNotPermitted ex) {
    throw new TooManyRequestsException("服务繁忙,请稍后再试");
}

舱壁(Bulkhead)

限制并发调用数量,防止单个服务耗尽线程资源:

信号量舱壁(同线程执行,仅限制并发数):

resilience4j:
  bulkhead:
    instances:
      orderService:
        maxConcurrentCalls: 20        # 最大并发 20
        maxWaitDuration: 100ms        # 超出时等待 100ms

线程池舱壁(独立线程池,完全隔离):

resilience4j:
  thread-pool-bulkhead:
    instances:
      orderService:
        maxThreadPoolSize: 10
        coreThreadPoolSize: 5
        queueCapacity: 20
@Bulkhead(name = "orderService", type = Bulkhead.Type.SEMAPHORE)
public OrderDTO getOrder(Long id) { ... }
 
@Bulkhead(name = "orderService", type = Bulkhead.Type.THREADPOOL)
public CompletableFuture<OrderDTO> getOrderAsync(Long id) { ... }

重试(Retry)

resilience4j:
  retry:
    instances:
      orderService:
        maxAttempts: 3               # 最多尝试 3 次(含第一次)
        waitDuration: 500ms          # 重试间隔 500ms
        enableExponentialBackoff: true
        exponentialBackoffMultiplier: 2   # 指数退避倍数
        retryExceptions:
          - java.io.IOException
          - feign.RetryableException
        ignoreExceptions:
          - com.example.BusinessException  # 业务异常不重试
@Retry(name = "orderService", fallbackMethod = "retryFallback")
public OrderDTO getOrder(Long id) { ... }

装饰器组合(编程式)

多个容错模式叠加,顺序从外到内依次执行:

// 顺序:TimeLimiter → CircuitBreaker → RateLimiter → Retry → 业务方法
CircuitBreaker cb = circuitBreakerRegistry.circuitBreaker("order");
RateLimiter rl = rateLimiterRegistry.rateLimiter("order");
Retry retry = retryRegistry.retry("order");
TimeLimiter tl = timeLimiterRegistry.timeLimiter("order");
 
Supplier<OrderDTO> supplier = CircuitBreaker.decorateSupplier(cb,
    RateLimiter.decorateSupplier(rl,
        Retry.decorateSupplier(retry,
            () -> orderClient.getById(id))));
 
Try<OrderDTO> result = Try.ofSupplier(supplier)
    .recover(CallNotPermittedException.class, e -> OrderDTO.empty())
    .recover(RequestNotPermitted.class, e -> OrderDTO.empty());

监控集成(Micrometer)

引入 resilience4j-micrometer 后,所有状态指标自动暴露给 Prometheus:

resilience4j_circuitbreaker_state{name="orderService"}          # 0=CLOSED,1=OPEN,2=HALF_OPEN
resilience4j_circuitbreaker_calls_total{kind="successful"}
resilience4j_ratelimiter_available_permissions{name="orderApi"}
resilience4j_retry_calls_total{kind="successful_without_retry"}

Resilience4j vs Sentinel

特性Resilience4jSentinel
生态Spring/NetflixSpring Cloud Alibaba
熔断完整状态机完整状态机
限流令牌桶QPS + 并发线程数 + 热点参数
热点限流不支持支持
控制台无(依赖 Grafana)独立 Dashboard
规则推送配置文件/代码Nacos/Apollo 动态推送
编程模型函数式装饰器注解 / try-with-resources

相关链接