容错降级对比

容错降级对比

概述

在微服务架构中,服务间的依赖关系复杂,任何一个服务的故障都可能引发雪崩效应。容错降级机制通过断路器、限流、降级等手段,保护系统在异常情况下的稳定性。

容错降级模式

断路器模式(Circuit Breaker)

工作原理

断路器有三种状态:关闭、开启、半开启。

java
public class CircuitBreakerExample {
    public enum State {
        CLOSED,    // 关闭状态,正常调用
        OPEN,      // 开启状态,快速失败
        HALF_OPEN  // 半开状态,尝试恢复
    }
    
    private State state = State.CLOSED;
    private int failureCount = 0;
    private long lastFailureTime = 0;
    private final int failureThreshold = 5;
    private final long timeout = 60000; // 60秒
    
    public Object call(Supplier<Object> operation, Supplier<Object> fallback) {
        if (state == State.OPEN) {
            if (System.currentTimeMillis() - lastFailureTime > timeout) {
                state = State.HALF_OPEN;
            } else {
                return fallback.get(); // 快速失败
            }
        }
        
        try {
            Object result = operation.get();
            onSuccess();
            return result;
        } catch (Exception e) {
            onFailure();
            return fallback.get();
        }
    }
    
    private void onSuccess() {
        failureCount = 0;
        state = State.CLOSED;
    }
    
    private void onFailure() {
        failureCount++;
        lastFailureTime = System.currentTimeMillis();
        
        if (failureCount >= failureThreshold) {
            state = State.OPEN;
        }
    }
}
public class CircuitBreakerExample {
    public enum State {
        CLOSED,    // 关闭状态,正常调用
        OPEN,      // 开启状态,快速失败
        HALF_OPEN  // 半开状态,尝试恢复
    }
    
    private State state = State.CLOSED;
    private int failureCount = 0;
    private long lastFailureTime = 0;
    private final int failureThreshold = 5;
    private final long timeout = 60000; // 60秒
    
    public Object call(Supplier<Object> operation, Supplier<Object> fallback) {
        if (state == State.OPEN) {
            if (System.currentTimeMillis() - lastFailureTime > timeout) {
                state = State.HALF_OPEN;
            } else {
                return fallback.get(); // 快速失败
            }
        }
        
        try {
            Object result = operation.get();
            onSuccess();
            return result;
        } catch (Exception e) {
            onFailure();
            return fallback.get();
        }
    }
    
    private void onSuccess() {
        failureCount = 0;
        state = State.CLOSED;
    }
    
    private void onFailure() {
        failureCount++;
        lastFailureTime = System.currentTimeMillis();
        
        if (failureCount >= failureThreshold) {
            state = State.OPEN;
        }
    }
}

限流模式(Rate Limiting)

令牌桶算法

java
public class TokenBucket {
    private final long capacity;
    private final long refillRate;
    private long tokens;
    private long lastRefillTime;
    
    public TokenBucket(long capacity, long refillRate) {
        this.capacity = capacity;
        this.refillRate = refillRate;
        this.tokens = capacity;
        this.lastRefillTime = System.currentTimeMillis();
    }
    
    public synchronized boolean tryConsume(long tokensRequested) {
        refill();
        
        if (tokens >= tokensRequested) {
            tokens -= tokensRequested;
            return true;
        }
        
        return false;
    }
    
    private void refill() {
        long now = System.currentTimeMillis();
        long tokensToAdd = (now - lastRefillTime) * refillRate / 1000;
        tokens = Math.min(capacity, tokens + tokensToAdd);
        lastRefillTime = now;
    }
}
public class TokenBucket {
    private final long capacity;
    private final long refillRate;
    private long tokens;
    private long lastRefillTime;
    
    public TokenBucket(long capacity, long refillRate) {
        this.capacity = capacity;
        this.refillRate = refillRate;
        this.tokens = capacity;
        this.lastRefillTime = System.currentTimeMillis();
    }
    
    public synchronized boolean tryConsume(long tokensRequested) {
        refill();
        
        if (tokens >= tokensRequested) {
            tokens -= tokensRequested;
            return true;
        }
        
        return false;
    }
    
    private void refill() {
        long now = System.currentTimeMillis();
        long tokensToAdd = (now - lastRefillTime) * refillRate / 1000;
        tokens = Math.min(capacity, tokens + tokensToAdd);
        lastRefillTime = now;
    }
}

主流容错框架

Hystrix(已进入维护模式)

特点

  • Netflix开源
  • 断路器模式实现
  • 线程池隔离
  • 实时监控

使用示例

java
@Service
public class UserService {
    
    @HystrixCommand(
        fallbackMethod = "getUserFallback",
        commandProperties = {
            @HystrixProperty(name = "circuitBreaker.requestVolumeThreshold", value = "10"),
            @HystrixProperty(name = "circuitBreaker.errorThresholdPercentage", value = "50"),
            @HystrixProperty(name = "circuitBreaker.sleepWindowInMilliseconds", value = "5000")
        },
        threadPoolProperties = {
            @HystrixProperty(name = "coreSize", value = "10"),
            @HystrixProperty(name = "maxQueueSize", value = "100")
        }
    )
    public User getUser(Long id) {
        // 可能失败的远程调用
        return userServiceClient.getUser(id);
    }
    
    public User getUserFallback(Long id) {
        return new User(id, "Unknown", "[email protected]");
    }
    
    public User getUserFallback(Long id, Throwable throwable) {
        log.error("Failed to get user: " + id, throwable);
        return new User(id, "Error", "[email protected]");
    }
}
@Service
public class UserService {
    
    @HystrixCommand(
        fallbackMethod = "getUserFallback",
        commandProperties = {
            @HystrixProperty(name = "circuitBreaker.requestVolumeThreshold", value = "10"),
            @HystrixProperty(name = "circuitBreaker.errorThresholdPercentage", value = "50"),
            @HystrixProperty(name = "circuitBreaker.sleepWindowInMilliseconds", value = "5000")
        },
        threadPoolProperties = {
            @HystrixProperty(name = "coreSize", value = "10"),
            @HystrixProperty(name = "maxQueueSize", value = "100")
        }
    )
    public User getUser(Long id) {
        // 可能失败的远程调用
        return userServiceClient.getUser(id);
    }
    
    public User getUserFallback(Long id) {
        return new User(id, "Unknown", "[email protected]");
    }
    
    public User getUserFallback(Long id, Throwable throwable) {
        log.error("Failed to get user: " + id, throwable);
        return new User(id, "Error", "[email protected]");
    }
}

监控面板

java
@SpringBootApplication
@EnableHystrixDashboard
@EnableCircuitBreaker
public class HystrixDashboardApplication {
    public static void main(String[] args) {
        SpringApplication.run(HystrixDashboardApplication.class, args);
    }
}
@SpringBootApplication
@EnableHystrixDashboard
@EnableCircuitBreaker
public class HystrixDashboardApplication {
    public static void main(String[] args) {
        SpringApplication.run(HystrixDashboardApplication.class, args);
    }
}

Resilience4j

特点

  • 轻量级库
  • 函数式编程风格
  • 模块化设计
  • 无外部依赖

断路器使用

java
@Service
public class UserService {
    
    private final CircuitBreaker circuitBreaker;
    private final UserServiceClient userServiceClient;
    
    public UserService(UserServiceClient userServiceClient) {
        this.userServiceClient = userServiceClient;
        this.circuitBreaker = CircuitBreaker.ofDefaults("userService");
    }
    
    @CircuitBreaker(name = "userService", fallbackMethod = "getUserFallback")
    @Retry(name = "userService")
    @TimeLimiter(name = "userService")
    public CompletableFuture<User> getUser(Long id) {
        return CompletableFuture.supplyAsync(() -> {
            return userServiceClient.getUser(id);
        });
    }
    
    public CompletableFuture<User> getUserFallback(Long id, Exception ex) {
        return CompletableFuture.completedFuture(
            new User(id, "Fallback", "[email protected]"));
    }
}
@Service
public class UserService {
    
    private final CircuitBreaker circuitBreaker;
    private final UserServiceClient userServiceClient;
    
    public UserService(UserServiceClient userServiceClient) {
        this.userServiceClient = userServiceClient;
        this.circuitBreaker = CircuitBreaker.ofDefaults("userService");
    }
    
    @CircuitBreaker(name = "userService", fallbackMethod = "getUserFallback")
    @Retry(name = "userService")
    @TimeLimiter(name = "userService")
    public CompletableFuture<User> getUser(Long id) {
        return CompletableFuture.supplyAsync(() -> {
            return userServiceClient.getUser(id);
        });
    }
    
    public CompletableFuture<User> getUserFallback(Long id, Exception ex) {
        return CompletableFuture.completedFuture(
            new User(id, "Fallback", "[email protected]"));
    }
}

配置

yaml
resilience4j:
  circuitbreaker:
    instances:
      userService:
        sliding-window-size: 10
        minimum-number-of-calls: 5
        failure-rate-threshold: 50
        wait-duration-in-open-state: 30s
        permitted-number-of-calls-in-half-open-state: 3
  retry:
    instances:
      userService:
        max-attempts: 3
        wait-duration: 1s
        exponential-backoff-multiplier: 2
  timelimiter:
    instances:
      userService:
        timeout-duration: 3s
  ratelimiter:
    instances:
      userService:
        limit-for-period: 100
        limit-refresh-period: 1s
        timeout-duration: 0s
resilience4j:
  circuitbreaker:
    instances:
      userService:
        sliding-window-size: 10
        minimum-number-of-calls: 5
        failure-rate-threshold: 50
        wait-duration-in-open-state: 30s
        permitted-number-of-calls-in-half-open-state: 3
  retry:
    instances:
      userService:
        max-attempts: 3
        wait-duration: 1s
        exponential-backoff-multiplier: 2
  timelimiter:
    instances:
      userService:
        timeout-duration: 3s
  ratelimiter:
    instances:
      userService:
        limit-for-period: 100
        limit-refresh-period: 1s
        timeout-duration: 0s

编程式使用

java
@Service
public class UserServiceProgrammatic {
    
    private final CircuitBreaker circuitBreaker;
    private final Retry retry;
    private final TimeLimiter timeLimiter;
    private final RateLimiter rateLimiter;
    
    public User getUser(Long id) {
        Supplier<User> decoratedSupplier = Decorators.ofSupplier(() -> userServiceClient.getUser(id))
            .withCircuitBreaker(circuitBreaker)
            .withRetry(retry)
            .withRateLimiter(rateLimiter)
            .withFallback(Arrays.asList(Exception.class), 
                throwable -> new User(id, "Fallback", "[email protected]"));
        
        return decoratedSupplier.get();
    }
}
@Service
public class UserServiceProgrammatic {
    
    private final CircuitBreaker circuitBreaker;
    private final Retry retry;
    private final TimeLimiter timeLimiter;
    private final RateLimiter rateLimiter;
    
    public User getUser(Long id) {
        Supplier<User> decoratedSupplier = Decorators.ofSupplier(() -> userServiceClient.getUser(id))
            .withCircuitBreaker(circuitBreaker)
            .withRetry(retry)
            .withRateLimiter(rateLimiter)
            .withFallback(Arrays.asList(Exception.class), 
                throwable -> new User(id, "Fallback", "[email protected]"));
        
        return decoratedSupplier.get();
    }
}

Sentinel

特点

  • 阿里巴巴开源
  • 实时流量控制
  • 丰富的降级规则
  • 可视化监控

使用示例

java
@RestController
public class UserController {
    
    @GetMapping("/users/{id}")
    @SentinelResource(
        value = "getUser",
        fallback = "getUserFallback",
        blockHandler = "getUserBlocked"
    )
    public User getUser(@PathVariable Long id) {
        return userService.getUser(id);
    }
    
    // 降级方法
    public User getUserFallback(Long id, Throwable throwable) {
        return new User(id, "Fallback", "[email protected]");
    }
    
    // 限流方法
    public User getUserBlocked(Long id, BlockException ex) {
        return new User(id, "Blocked", "[email protected]");
    }
}
@RestController
public class UserController {
    
    @GetMapping("/users/{id}")
    @SentinelResource(
        value = "getUser",
        fallback = "getUserFallback",
        blockHandler = "getUserBlocked"
    )
    public User getUser(@PathVariable Long id) {
        return userService.getUser(id);
    }
    
    // 降级方法
    public User getUserFallback(Long id, Throwable throwable) {
        return new User(id, "Fallback", "[email protected]");
    }
    
    // 限流方法
    public User getUserBlocked(Long id, BlockException ex) {
        return new User(id, "Blocked", "[email protected]");
    }
}

规则配置

java
@PostConstruct
public void initFlowRules() {
    List<FlowRule> rules = new ArrayList<>();
    
    // 流控规则
    FlowRule flowRule = new FlowRule();
    flowRule.setResource("getUser");
    flowRule.setGrade(RuleConstant.FLOW_GRADE_QPS);
    flowRule.setCount(20);
    rules.add(flowRule);
    
    // 降级规则
    DegradeRule degradeRule = new DegradeRule();
    degradeRule.setResource("getUser");
    degradeRule.setGrade(RuleConstant.DEGRADE_GRADE_EXCEPTION_RATIO);
    degradeRule.setCount(0.5); // 异常比例
    degradeRule.setTimeWindow(10); // 时间窗口
    
    FlowRuleManager.loadRules(rules);
    DegradeRuleManager.loadRules(Arrays.asList(degradeRule));
}
@PostConstruct
public void initFlowRules() {
    List<FlowRule> rules = new ArrayList<>();
    
    // 流控规则
    FlowRule flowRule = new FlowRule();
    flowRule.setResource("getUser");
    flowRule.setGrade(RuleConstant.FLOW_GRADE_QPS);
    flowRule.setCount(20);
    rules.add(flowRule);
    
    // 降级规则
    DegradeRule degradeRule = new DegradeRule();
    degradeRule.setResource("getUser");
    degradeRule.setGrade(RuleConstant.DEGRADE_GRADE_EXCEPTION_RATIO);
    degradeRule.setCount(0.5); // 异常比例
    degradeRule.setTimeWindow(10); // 时间窗口
    
    FlowRuleManager.loadRules(rules);
    DegradeRuleManager.loadRules(Arrays.asList(degradeRule));
}

动态规则配置

java
@Component
public class SentinelRuleConfig {
    
    @PostConstruct
    public void initRules() {
        // 从Nacos读取规则
        ReadableDataSource<String, List<FlowRule>> flowRuleDataSource = 
            new NacosDataSource<>(remoteAddress, groupId, dataId,
                source -> JSON.parseObject(source, new TypeReference<List<FlowRule>>() {}));
        
        FlowRuleManager.register2Property(flowRuleDataSource.getProperty());
    }
}
@Component
public class SentinelRuleConfig {
    
    @PostConstruct
    public void initRules() {
        // 从Nacos读取规则
        ReadableDataSource<String, List<FlowRule>> flowRuleDataSource = 
            new NacosDataSource<>(remoteAddress, groupId, dataId,
                source -> JSON.parseObject(source, new TypeReference<List<FlowRule>>() {}));
        
        FlowRuleManager.register2Property(flowRuleDataSource.getProperty());
    }
}

框架对比

功能对比

功能HystrixResilience4jSentinel
断路器
限流
重试
超时
舱壁隔离
实时监控
规则动态配置

性能对比

特性HystrixResilience4jSentinel
性能开销中等
内存占用中等
响应延迟中等
吞吐量中等

易用性对比

方面HystrixResilience4jSentinel
学习成本中等中等
配置复杂度中等
集成难度中等中等
文档质量中等

选择建议

场景分析

1. 新项目开发

推荐:Resilience4j

  • 轻量级,性能好
  • 函数式编程风格
  • 模块化设计

2. 已有Hystrix项目

推荐:迁移到Resilience4j

  • Hystrix已进入维护模式
  • 迁移成本相对较低

3. 阿里云环境

推荐:Sentinel

  • 与阿里云集成好
  • 功能丰富
  • 动态规则配置

4. 复杂流控需求

推荐:Sentinel

  • 流控规则丰富
  • 实时监控强大
  • 支持热点参数限流

迁移策略

从Hystrix迁移到Resilience4j

java
// Hystrix
@HystrixCommand(fallbackMethod = "fallback")
public String getData() {
    return externalService.getData();
}

// Resilience4j
@CircuitBreaker(name = "service", fallbackMethod = "fallback")
@Retry(name = "service")
public String getData() {
    return externalService.getData();
}
// Hystrix
@HystrixCommand(fallbackMethod = "fallback")
public String getData() {
    return externalService.getData();
}

// Resilience4j
@CircuitBreaker(name = "service", fallbackMethod = "fallback")
@Retry(name = "service")
public String getData() {
    return externalService.getData();
}

最佳实践

1. 降级策略设计

  • 快速失败 vs 静默失败
  • 返回默认值 vs 缓存数据
  • 降级链路设计

2. 监控告警

  • 断路器状态监控
  • 降级触发频率
  • 系统整体健康度

3. 测试验证

  • 故障注入测试
  • 压力测试验证
  • 降级效果评估

4. 配置管理

  • 规则动态调整
  • 环境隔离配置
  • 配置变更审计

实际应用案例

电商系统容错设计

java
@Service
public class OrderService {
    
    // 库存服务降级
    @CircuitBreaker(name = "inventory", fallbackMethod = "checkInventoryFallback")
    public boolean checkInventory(Long productId, int quantity) {
        return inventoryService.checkStock(productId, quantity);
    }
    
    public boolean checkInventoryFallback(Long productId, int quantity, Exception ex) {
        // 降级策略:允许下单,后续异步校验
        log.warn("Inventory service unavailable, allowing order: {}", productId);
        return true;
    }
    
    // 支付服务限流
    @RateLimiter(name = "payment")
    public PaymentResult processPayment(PaymentRequest request) {
        return paymentService.process(request);
    }
    
    // 用户服务重试
    @Retry(name = "user")
    public User getUser(Long userId) {
        return userService.getUser(userId);
    }
}
@Service
public class OrderService {
    
    // 库存服务降级
    @CircuitBreaker(name = "inventory", fallbackMethod = "checkInventoryFallback")
    public boolean checkInventory(Long productId, int quantity) {
        return inventoryService.checkStock(productId, quantity);
    }
    
    public boolean checkInventoryFallback(Long productId, int quantity, Exception ex) {
        // 降级策略:允许下单,后续异步校验
        log.warn("Inventory service unavailable, allowing order: {}", productId);
        return true;
    }
    
    // 支付服务限流
    @RateLimiter(name = "payment")
    public PaymentResult processPayment(PaymentRequest request) {
        return paymentService.process(request);
    }
    
    // 用户服务重试
    @Retry(name = "user")
    public User getUser(Long userId) {
        return userService.getUser(userId);
    }
}

总结

容错降级是微服务架构中的重要保障机制。Hystrix虽然功能完善但已进入维护模式,Resilience4j以其轻量级和高性能成为新项目的首选,Sentinel则在流控和监控方面表现突出。选择时需要考虑项目需求、团队技术栈和运维能力,确保选择的方案能够有效保护系统稳定性。