Redis哨兵

Redis哨兵

概述

Redis Sentinel是Redis的高可用性解决方案,通过监控、通知、自动故障转移和配置提供等功能,确保Redis主从架构的高可用性。Sentinel系统由多个Sentinel进程组成,它们协同工作来监控Redis实例。

哨兵功能

核心功能

1. 监控(Monitoring)

  • 持续检查主从实例是否正常工作
  • 监控其他Sentinel实例的状态

2. 通知(Notification)

  • 当监控的Redis实例出现问题时发送通知
  • 通过API向管理员或应用程序发送警告

3. 自动故障转移(Automatic Failover)

  • 当主服务器不能正常工作时,自动进行故障转移
  • 将一个从服务器升级为新的主服务器

4. 配置提供(Configuration Provider)

  • 客户端连接到Sentinel获取当前主服务器地址
  • 故障转移后,返回新的主服务器地址

哨兵架构

部署架构

bash
# 典型的3节点Sentinel + 1主2从Redis架构
Sentinel1(26379) ─┐
Sentinel2(26380) ─┼─ 监控 ─ Redis Master(6379)
Sentinel3(26381) ─┘              │
                                 ├─ Redis Slave1(6380)
                                 └─ Redis Slave2(6381)
# 典型的3节点Sentinel + 1主2从Redis架构
Sentinel1(26379) ─┐
Sentinel2(26380) ─┼─ 监控 ─ Redis Master(6379)
Sentinel3(26381) ─┘              │
                                 ├─ Redis Slave1(6380)
                                 └─ Redis Slave2(6381)

工作原理

java
// Sentinel工作流程示意
public class SentinelWorkflow {
    
    public void monitoringProcess() {
        // 1. 发现阶段
        discoverMasterAndSlaves();
        discoverOtherSentinels();
        
        // 2. 监控阶段
        while (true) {
            // 向主从服务器发送PING命令
            pingRedisInstances();
            
            // 向其他Sentinel发送PING命令
            pingOtherSentinels();
            
            // 检查是否需要进入主观下线状态
            checkSubjectiveDown();
            
            // 检查是否需要进入客观下线状态
            checkObjectiveDown();
            
            // 如果需要,启动故障转移
            if (shouldStartFailover()) {
                startFailover();
            }
            
            Thread.sleep(1000); // 每秒检查一次
        }
    }
}
// Sentinel工作流程示意
public class SentinelWorkflow {
    
    public void monitoringProcess() {
        // 1. 发现阶段
        discoverMasterAndSlaves();
        discoverOtherSentinels();
        
        // 2. 监控阶段
        while (true) {
            // 向主从服务器发送PING命令
            pingRedisInstances();
            
            // 向其他Sentinel发送PING命令
            pingOtherSentinels();
            
            // 检查是否需要进入主观下线状态
            checkSubjectiveDown();
            
            // 检查是否需要进入客观下线状态
            checkObjectiveDown();
            
            // 如果需要,启动故障转移
            if (shouldStartFailover()) {
                startFailover();
            }
            
            Thread.sleep(1000); // 每秒检查一次
        }
    }
}

哨兵配置

Sentinel配置文件

bash
# sentinel.conf
port 26379
daemonize yes
pidfile /var/run/redis-sentinel.pid
logfile /var/log/redis/sentinel.log
dir /tmp

# 监控主服务器
sentinel monitor mymaster 127.0.0.1 6379 2

# 主观下线时间(毫秒)
sentinel down-after-milliseconds mymaster 30000

# 故障转移超时时间(毫秒)
sentinel failover-timeout mymaster 180000

# 并行同步的从服务器数量
sentinel parallel-syncs mymaster 1

# 认证密码
sentinel auth-pass mymaster yourpassword

# 通知脚本
sentinel notification-script mymaster /var/redis/notify.sh

# 故障转移脚本
sentinel client-reconfig-script mymaster /var/redis/reconfig.sh
# sentinel.conf
port 26379
daemonize yes
pidfile /var/run/redis-sentinel.pid
logfile /var/log/redis/sentinel.log
dir /tmp

# 监控主服务器
sentinel monitor mymaster 127.0.0.1 6379 2

# 主观下线时间(毫秒)
sentinel down-after-milliseconds mymaster 30000

# 故障转移超时时间(毫秒)
sentinel failover-timeout mymaster 180000

# 并行同步的从服务器数量
sentinel parallel-syncs mymaster 1

# 认证密码
sentinel auth-pass mymaster yourpassword

# 通知脚本
sentinel notification-script mymaster /var/redis/notify.sh

# 故障转移脚本
sentinel client-reconfig-script mymaster /var/redis/reconfig.sh

Redis主从配置

bash
# 主服务器 redis-master.conf
port 6379
daemonize yes
pidfile /var/run/redis_6379.pid
logfile /var/log/redis/redis-master.log
save 900 1
save 300 10
save 60 10000
requirepass yourpassword

# 从服务器 redis-slave.conf
port 6380
daemonize yes
pidfile /var/run/redis_6380.pid
logfile /var/log/redis/redis-slave.log
slaveof 127.0.0.1 6379
masterauth yourpassword
slave-read-only yes
# 主服务器 redis-master.conf
port 6379
daemonize yes
pidfile /var/run/redis_6379.pid
logfile /var/log/redis/redis-master.log
save 900 1
save 300 10
save 60 10000
requirepass yourpassword

# 从服务器 redis-slave.conf
port 6380
daemonize yes
pidfile /var/run/redis_6380.pid
logfile /var/log/redis/redis-slave.log
slaveof 127.0.0.1 6379
masterauth yourpassword
slave-read-only yes

启动哨兵

bash
# 启动Redis实例
redis-server redis-master.conf
redis-server redis-slave1.conf
redis-server redis-slave2.conf

# 启动Sentinel实例
redis-sentinel sentinel1.conf
redis-sentinel sentinel2.conf
redis-sentinel sentinel3.conf

# 或者使用redis-server启动
redis-server sentinel1.conf --sentinel
# 启动Redis实例
redis-server redis-master.conf
redis-server redis-slave1.conf
redis-server redis-slave2.conf

# 启动Sentinel实例
redis-sentinel sentinel1.conf
redis-sentinel sentinel2.conf
redis-sentinel sentinel3.conf

# 或者使用redis-server启动
redis-server sentinel1.conf --sentinel

客户端集成

Java客户端(Jedis)

java
@Configuration
public class RedisSentinelConfig {
    
    @Bean
    public JedisSentinelPool jedisSentinelPool() {
        Set<String> sentinels = new HashSet<>();
        sentinels.add("127.0.0.1:26379");
        sentinels.add("127.0.0.1:26380");
        sentinels.add("127.0.0.1:26381");
        
        JedisPoolConfig poolConfig = new JedisPoolConfig();
        poolConfig.setMaxTotal(100);
        poolConfig.setMaxIdle(10);
        poolConfig.setMinIdle(5);
        poolConfig.setTestOnBorrow(true);
        
        return new JedisSentinelPool("mymaster", sentinels, poolConfig, "yourpassword");
    }
}

@Service
public class RedisService {
    
    @Autowired
    private JedisSentinelPool sentinelPool;
    
    public void set(String key, String value) {
        try (Jedis jedis = sentinelPool.getResource()) {
            jedis.set(key, value);
        }
    }
    
    public String get(String key) {
        try (Jedis jedis = sentinelPool.getResource()) {
            return jedis.get(key);
        }
    }
    
    // 获取当前主服务器信息
    public HostAndPort getCurrentMaster() {
        return sentinelPool.getCurrentHostMaster();
    }
}
@Configuration
public class RedisSentinelConfig {
    
    @Bean
    public JedisSentinelPool jedisSentinelPool() {
        Set<String> sentinels = new HashSet<>();
        sentinels.add("127.0.0.1:26379");
        sentinels.add("127.0.0.1:26380");
        sentinels.add("127.0.0.1:26381");
        
        JedisPoolConfig poolConfig = new JedisPoolConfig();
        poolConfig.setMaxTotal(100);
        poolConfig.setMaxIdle(10);
        poolConfig.setMinIdle(5);
        poolConfig.setTestOnBorrow(true);
        
        return new JedisSentinelPool("mymaster", sentinels, poolConfig, "yourpassword");
    }
}

@Service
public class RedisService {
    
    @Autowired
    private JedisSentinelPool sentinelPool;
    
    public void set(String key, String value) {
        try (Jedis jedis = sentinelPool.getResource()) {
            jedis.set(key, value);
        }
    }
    
    public String get(String key) {
        try (Jedis jedis = sentinelPool.getResource()) {
            return jedis.get(key);
        }
    }
    
    // 获取当前主服务器信息
    public HostAndPort getCurrentMaster() {
        return sentinelPool.getCurrentHostMaster();
    }
}

Spring Data Redis

java
@Configuration
public class RedisSentinelConfiguration {
    
    @Bean
    public LettuceConnectionFactory redisConnectionFactory() {
        RedisSentinelConfiguration sentinelConfig = new RedisSentinelConfiguration()
            .master("mymaster")
            .sentinel("127.0.0.1", 26379)
            .sentinel("127.0.0.1", 26380)
            .sentinel("127.0.0.1", 26381);
            
        sentinelConfig.setPassword("yourpassword");
        
        LettuceClientConfiguration clientConfig = LettuceClientConfiguration.builder()
            .commandTimeout(Duration.ofSeconds(2))
            .shutdownTimeout(Duration.ZERO)
            .build();
            
        return new LettuceConnectionFactory(sentinelConfig, clientConfig);
    }
    
    @Bean
    public RedisTemplate<String, Object> redisTemplate() {
        RedisTemplate<String, Object> template = new RedisTemplate<>();
        template.setConnectionFactory(redisConnectionFactory());
        template.setKeySerializer(new StringRedisSerializer());
        template.setValueSerializer(new GenericJackson2JsonRedisSerializer());
        return template;
    }
}

// 监听主服务器切换事件
@Component
public class RedisMasterSwitchListener {
    
    @EventListener
    public void handleMasterSwitch(RedisConnectionFailureEvent event) {
        log.warn("Redis connection failed: {}", event.getCause().getMessage());
    }
    
    @EventListener
    public void handleMasterSwitch(RedisConnectionRecoveredEvent event) {
        log.info("Redis connection recovered");
    }
}
@Configuration
public class RedisSentinelConfiguration {
    
    @Bean
    public LettuceConnectionFactory redisConnectionFactory() {
        RedisSentinelConfiguration sentinelConfig = new RedisSentinelConfiguration()
            .master("mymaster")
            .sentinel("127.0.0.1", 26379)
            .sentinel("127.0.0.1", 26380)
            .sentinel("127.0.0.1", 26381);
            
        sentinelConfig.setPassword("yourpassword");
        
        LettuceClientConfiguration clientConfig = LettuceClientConfiguration.builder()
            .commandTimeout(Duration.ofSeconds(2))
            .shutdownTimeout(Duration.ZERO)
            .build();
            
        return new LettuceConnectionFactory(sentinelConfig, clientConfig);
    }
    
    @Bean
    public RedisTemplate<String, Object> redisTemplate() {
        RedisTemplate<String, Object> template = new RedisTemplate<>();
        template.setConnectionFactory(redisConnectionFactory());
        template.setKeySerializer(new StringRedisSerializer());
        template.setValueSerializer(new GenericJackson2JsonRedisSerializer());
        return template;
    }
}

// 监听主服务器切换事件
@Component
public class RedisMasterSwitchListener {
    
    @EventListener
    public void handleMasterSwitch(RedisConnectionFailureEvent event) {
        log.warn("Redis connection failed: {}", event.getCause().getMessage());
    }
    
    @EventListener
    public void handleMasterSwitch(RedisConnectionRecoveredEvent event) {
        log.info("Redis connection recovered");
    }
}

故障转移机制

主观下线(SDOWN)

java
// 主观下线检测
public class SubjectiveDownDetection {
    
    private long downAfterMilliseconds = 30000; // 30秒
    
    public boolean isSubjectiveDown(RedisInstance instance) {
        long lastPingReply = instance.getLastPingReply();
        long currentTime = System.currentTimeMillis();
        
        // 超过指定时间未响应,标记为主观下线
        return (currentTime - lastPingReply) > downAfterMilliseconds;
    }
}
// 主观下线检测
public class SubjectiveDownDetection {
    
    private long downAfterMilliseconds = 30000; // 30秒
    
    public boolean isSubjectiveDown(RedisInstance instance) {
        long lastPingReply = instance.getLastPingReply();
        long currentTime = System.currentTimeMillis();
        
        // 超过指定时间未响应,标记为主观下线
        return (currentTime - lastPingReply) > downAfterMilliseconds;
    }
}

客观下线(ODOWN)

java
// 客观下线检测
public class ObjectiveDownDetection {
    
    private int quorum = 2; // 需要2个Sentinel同意
    
    public boolean isObjectiveDown(RedisInstance master, List<Sentinel> sentinels) {
        int sdownCount = 0;
        
        for (Sentinel sentinel : sentinels) {
            if (sentinel.isMarkedAsSubjectiveDown(master)) {
                sdownCount++;
            }
        }
        
        // 达到法定人数,标记为客观下线
        return sdownCount >= quorum;
    }
}
// 客观下线检测
public class ObjectiveDownDetection {
    
    private int quorum = 2; // 需要2个Sentinel同意
    
    public boolean isObjectiveDown(RedisInstance master, List<Sentinel> sentinels) {
        int sdownCount = 0;
        
        for (Sentinel sentinel : sentinels) {
            if (sentinel.isMarkedAsSubjectiveDown(master)) {
                sdownCount++;
            }
        }
        
        // 达到法定人数,标记为客观下线
        return sdownCount >= quorum;
    }
}

选举新主服务器

java
// 新主服务器选举
public class MasterElection {
    
    public RedisInstance electNewMaster(List<RedisInstance> slaves) {
        // 1. 过滤不符合条件的从服务器
        List<RedisInstance> candidates = slaves.stream()
            .filter(slave -> !slave.isDown())
            .filter(slave -> slave.getLastAvailableTime() < 5000) // 5秒内可用
            .filter(slave -> slave.getInfoReplicationOffset() > 0) // 有复制偏移量
            .collect(Collectors.toList());
            
        if (candidates.isEmpty()) {
            return null;
        }
        
        // 2. 按优先级排序
        candidates.sort((s1, s2) -> {
            // 优先级高的优先
            int priorityCompare = Integer.compare(s1.getPriority(), s2.getPriority());
            if (priorityCompare != 0) {
                return priorityCompare;
            }
            
            // 复制偏移量大的优先
            long offsetCompare = Long.compare(s2.getReplicationOffset(), s1.getReplicationOffset());
            if (offsetCompare != 0) {
                return (int) offsetCompare;
            }
            
            // 运行ID小的优先
            return s1.getRunId().compareTo(s2.getRunId());
        });
        
        return candidates.get(0);
    }
}
// 新主服务器选举
public class MasterElection {
    
    public RedisInstance electNewMaster(List<RedisInstance> slaves) {
        // 1. 过滤不符合条件的从服务器
        List<RedisInstance> candidates = slaves.stream()
            .filter(slave -> !slave.isDown())
            .filter(slave -> slave.getLastAvailableTime() < 5000) // 5秒内可用
            .filter(slave -> slave.getInfoReplicationOffset() > 0) // 有复制偏移量
            .collect(Collectors.toList());
            
        if (candidates.isEmpty()) {
            return null;
        }
        
        // 2. 按优先级排序
        candidates.sort((s1, s2) -> {
            // 优先级高的优先
            int priorityCompare = Integer.compare(s1.getPriority(), s2.getPriority());
            if (priorityCompare != 0) {
                return priorityCompare;
            }
            
            // 复制偏移量大的优先
            long offsetCompare = Long.compare(s2.getReplicationOffset(), s1.getReplicationOffset());
            if (offsetCompare != 0) {
                return (int) offsetCompare;
            }
            
            // 运行ID小的优先
            return s1.getRunId().compareTo(s2.getRunId());
        });
        
        return candidates.get(0);
    }
}

故障转移流程

bash
# 故障转移步骤
# 1. 发现主服务器客观下线
# 2. 选举领头Sentinel
# 3. 领头Sentinel执行故障转移:
#    a. 从从服务器中选出新主服务器
#    b. 向新主服务器发送SLAVEOF NO ONE命令
#    c. 向其他从服务器发送SLAVEOF命令,让它们复制新主服务器
#    d. 更新配置,通知客户端新的主服务器地址
# 故障转移步骤
# 1. 发现主服务器客观下线
# 2. 选举领头Sentinel
# 3. 领头Sentinel执行故障转移:
#    a. 从从服务器中选出新主服务器
#    b. 向新主服务器发送SLAVEOF NO ONE命令
#    c. 向其他从服务器发送SLAVEOF命令,让它们复制新主服务器
#    d. 更新配置,通知客户端新的主服务器地址

监控和管理

Sentinel命令

bash
# 连接到Sentinel
redis-cli -p 26379

# 查看监控的主服务器
SENTINEL masters

# 查看指定主服务器的从服务器
SENTINEL slaves mymaster

# 查看其他Sentinel
SENTINEL sentinels mymaster

# 获取主服务器地址
SENTINEL get-master-addr-by-name mymaster

# 手动故障转移
SENTINEL failover mymaster

# 重置主服务器状态
SENTINEL reset mymaster
# 连接到Sentinel
redis-cli -p 26379

# 查看监控的主服务器
SENTINEL masters

# 查看指定主服务器的从服务器
SENTINEL slaves mymaster

# 查看其他Sentinel
SENTINEL sentinels mymaster

# 获取主服务器地址
SENTINEL get-master-addr-by-name mymaster

# 手动故障转移
SENTINEL failover mymaster

# 重置主服务器状态
SENTINEL reset mymaster

监控指标

java
@Component
public class SentinelMonitor {
    
    @Autowired
    private JedisSentinelPool sentinelPool;
    
    @Scheduled(fixedDelay = 30000)
    public void monitorSentinel() {
        try (Jedis jedis = new Jedis("127.0.0.1", 26379)) {
            // 获取主服务器信息
            List<Map<String, String>> masters = jedis.sentinelMasters();
            for (Map<String, String> master : masters) {
                String name = master.get("name");
                String status = master.get("flags");
                int numSlaves = Integer.parseInt(master.get("num-slaves"));
                int numSentinels = Integer.parseInt(master.get("num-other-sentinels"));
                
                // 发送监控指标
                sendMetric("sentinel.master.status", status.contains("master") ? 1 : 0);
                sendMetric("sentinel.slaves.count", numSlaves);
                sendMetric("sentinel.sentinels.count", numSentinels);
            }
            
            // 获取从服务器信息
            List<Map<String, String>> slaves = jedis.sentinelSlaves("mymaster");
            int healthySlaves = 0;
            for (Map<String, String> slave : slaves) {
                if (!slave.get("flags").contains("down")) {
                    healthySlaves++;
                }
            }
            sendMetric("sentinel.healthy.slaves", healthySlaves);
            
        } catch (Exception e) {
            log.error("Sentinel monitoring failed", e);
        }
    }
}
@Component
public class SentinelMonitor {
    
    @Autowired
    private JedisSentinelPool sentinelPool;
    
    @Scheduled(fixedDelay = 30000)
    public void monitorSentinel() {
        try (Jedis jedis = new Jedis("127.0.0.1", 26379)) {
            // 获取主服务器信息
            List<Map<String, String>> masters = jedis.sentinelMasters();
            for (Map<String, String> master : masters) {
                String name = master.get("name");
                String status = master.get("flags");
                int numSlaves = Integer.parseInt(master.get("num-slaves"));
                int numSentinels = Integer.parseInt(master.get("num-other-sentinels"));
                
                // 发送监控指标
                sendMetric("sentinel.master.status", status.contains("master") ? 1 : 0);
                sendMetric("sentinel.slaves.count", numSlaves);
                sendMetric("sentinel.sentinels.count", numSentinels);
            }
            
            // 获取从服务器信息
            List<Map<String, String>> slaves = jedis.sentinelSlaves("mymaster");
            int healthySlaves = 0;
            for (Map<String, String> slave : slaves) {
                if (!slave.get("flags").contains("down")) {
                    healthySlaves++;
                }
            }
            sendMetric("sentinel.healthy.slaves", healthySlaves);
            
        } catch (Exception e) {
            log.error("Sentinel monitoring failed", e);
        }
    }
}

配置优化

性能调优

bash
# sentinel.conf优化配置

# 减少误判的配置
sentinel down-after-milliseconds mymaster 5000    # 5秒检测下线
sentinel failover-timeout mymaster 60000          # 60秒故障转移超时

# 网络优化
tcp-keepalive 60                                  # TCP保活时间
timeout 0                                         # 客户端超时时间

# 并发控制
sentinel parallel-syncs mymaster 1               # 同时同步的从服务器数量

# 日志配置
loglevel notice                                   # 日志级别
syslog-enabled yes                               # 启用系统日志
# sentinel.conf优化配置

# 减少误判的配置
sentinel down-after-milliseconds mymaster 5000    # 5秒检测下线
sentinel failover-timeout mymaster 60000          # 60秒故障转移超时

# 网络优化
tcp-keepalive 60                                  # TCP保活时间
timeout 0                                         # 客户端超时时间

# 并发控制
sentinel parallel-syncs mymaster 1               # 同时同步的从服务器数量

# 日志配置
loglevel notice                                   # 日志级别
syslog-enabled yes                               # 启用系统日志

通知脚本

bash
#!/bin/bash
# notify.sh - 故障通知脚本

EVENT_TYPE=$1
EVENT_INSTANCE=$2
EVENT_IP=$3
EVENT_PORT=$4

case $EVENT_TYPE in
    "+sdown")
        echo "Master $EVENT_INSTANCE is subjectively down" | mail -s "Redis Alert" [email protected]
        ;;
    "+odown")
        echo "Master $EVENT_INSTANCE is objectively down" | mail -s "Redis Alert" [email protected]
        ;;
    "+failover-end")
        echo "Failover completed for $EVENT_INSTANCE, new master: $EVENT_IP:$EVENT_PORT" | mail -s "Redis Alert" [email protected]
        ;;
esac
#!/bin/bash
# notify.sh - 故障通知脚本

EVENT_TYPE=$1
EVENT_INSTANCE=$2
EVENT_IP=$3
EVENT_PORT=$4

case $EVENT_TYPE in
    "+sdown")
        echo "Master $EVENT_INSTANCE is subjectively down" | mail -s "Redis Alert" [email protected]
        ;;
    "+odown")
        echo "Master $EVENT_INSTANCE is objectively down" | mail -s "Redis Alert" [email protected]
        ;;
    "+failover-end")
        echo "Failover completed for $EVENT_INSTANCE, new master: $EVENT_IP:$EVENT_PORT" | mail -s "Redis Alert" [email protected]
        ;;
esac

最佳实践

1. 部署建议

  • 奇数个Sentinel:建议部署3个或5个Sentinel实例
  • 分布式部署:Sentinel实例分布在不同的物理机器上
  • 网络隔离:避免Sentinel和Redis在同一网络分区

2. 配置建议

  • 合理设置超时时间:避免网络抖动导致的误判
  • 配置通知脚本:及时获得故障通知
  • 定期备份配置:保存Sentinel配置文件

3. 监控建议

  • 监控Sentinel状态:确保Sentinel集群正常工作
  • 监控故障转移:记录故障转移的频率和原因
  • 监控网络延迟:确保Sentinel间通信正常

4. 运维建议

  • 定期演练:定期进行故障转移演练
  • 版本一致性:保持Redis和Sentinel版本一致
  • 文档维护:维护详细的运维文档

总结

Redis Sentinel提供了可靠的高可用性解决方案,通过自动故障检测和转移,确保Redis服务的连续性。正确配置和监控Sentinel系统,能够大大提高Redis服务的可用性和稳定性。在生产环境中,建议结合监控、告警和自动化运维工具,构建完整的高可用架构。