Redis哨兵
Redis哨兵
概述
Redis Sentinel是Redis的高可用性解决方案,通过监控、通知、自动故障转移和配置提供等功能,确保Redis主从架构的高可用性。Sentinel系统由多个Sentinel进程组成,它们协同工作来监控Redis实例。
哨兵功能
核心功能
1. 监控(Monitoring)
- 持续检查主从实例是否正常工作
- 监控其他Sentinel实例的状态
2. 通知(Notification)
- 当监控的Redis实例出现问题时发送通知
- 通过API向管理员或应用程序发送警告
3. 自动故障转移(Automatic Failover)
- 当主服务器不能正常工作时,自动进行故障转移
- 将一个从服务器升级为新的主服务器
4. 配置提供(Configuration Provider)
- 客户端连接到Sentinel获取当前主服务器地址
- 故障转移后,返回新的主服务器地址
哨兵架构
部署架构
bash
# 典型的3节点Sentinel + 1主2从Redis架构
Sentinel1(26379) ─┐
Sentinel2(26380) ─┼─ 监控 ─ Redis Master(6379)
Sentinel3(26381) ─┘ │
├─ Redis Slave1(6380)
└─ Redis Slave2(6381)
# 典型的3节点Sentinel + 1主2从Redis架构
Sentinel1(26379) ─┐
Sentinel2(26380) ─┼─ 监控 ─ Redis Master(6379)
Sentinel3(26381) ─┘ │
├─ Redis Slave1(6380)
└─ Redis Slave2(6381)
工作原理
java
// Sentinel工作流程示意
public class SentinelWorkflow {
public void monitoringProcess() {
// 1. 发现阶段
discoverMasterAndSlaves();
discoverOtherSentinels();
// 2. 监控阶段
while (true) {
// 向主从服务器发送PING命令
pingRedisInstances();
// 向其他Sentinel发送PING命令
pingOtherSentinels();
// 检查是否需要进入主观下线状态
checkSubjectiveDown();
// 检查是否需要进入客观下线状态
checkObjectiveDown();
// 如果需要,启动故障转移
if (shouldStartFailover()) {
startFailover();
}
Thread.sleep(1000); // 每秒检查一次
}
}
}
// Sentinel工作流程示意
public class SentinelWorkflow {
public void monitoringProcess() {
// 1. 发现阶段
discoverMasterAndSlaves();
discoverOtherSentinels();
// 2. 监控阶段
while (true) {
// 向主从服务器发送PING命令
pingRedisInstances();
// 向其他Sentinel发送PING命令
pingOtherSentinels();
// 检查是否需要进入主观下线状态
checkSubjectiveDown();
// 检查是否需要进入客观下线状态
checkObjectiveDown();
// 如果需要,启动故障转移
if (shouldStartFailover()) {
startFailover();
}
Thread.sleep(1000); // 每秒检查一次
}
}
}
哨兵配置
Sentinel配置文件
bash
# sentinel.conf
port 26379
daemonize yes
pidfile /var/run/redis-sentinel.pid
logfile /var/log/redis/sentinel.log
dir /tmp
# 监控主服务器
sentinel monitor mymaster 127.0.0.1 6379 2
# 主观下线时间(毫秒)
sentinel down-after-milliseconds mymaster 30000
# 故障转移超时时间(毫秒)
sentinel failover-timeout mymaster 180000
# 并行同步的从服务器数量
sentinel parallel-syncs mymaster 1
# 认证密码
sentinel auth-pass mymaster yourpassword
# 通知脚本
sentinel notification-script mymaster /var/redis/notify.sh
# 故障转移脚本
sentinel client-reconfig-script mymaster /var/redis/reconfig.sh
# sentinel.conf
port 26379
daemonize yes
pidfile /var/run/redis-sentinel.pid
logfile /var/log/redis/sentinel.log
dir /tmp
# 监控主服务器
sentinel monitor mymaster 127.0.0.1 6379 2
# 主观下线时间(毫秒)
sentinel down-after-milliseconds mymaster 30000
# 故障转移超时时间(毫秒)
sentinel failover-timeout mymaster 180000
# 并行同步的从服务器数量
sentinel parallel-syncs mymaster 1
# 认证密码
sentinel auth-pass mymaster yourpassword
# 通知脚本
sentinel notification-script mymaster /var/redis/notify.sh
# 故障转移脚本
sentinel client-reconfig-script mymaster /var/redis/reconfig.sh
Redis主从配置
bash
# 主服务器 redis-master.conf
port 6379
daemonize yes
pidfile /var/run/redis_6379.pid
logfile /var/log/redis/redis-master.log
save 900 1
save 300 10
save 60 10000
requirepass yourpassword
# 从服务器 redis-slave.conf
port 6380
daemonize yes
pidfile /var/run/redis_6380.pid
logfile /var/log/redis/redis-slave.log
slaveof 127.0.0.1 6379
masterauth yourpassword
slave-read-only yes
# 主服务器 redis-master.conf
port 6379
daemonize yes
pidfile /var/run/redis_6379.pid
logfile /var/log/redis/redis-master.log
save 900 1
save 300 10
save 60 10000
requirepass yourpassword
# 从服务器 redis-slave.conf
port 6380
daemonize yes
pidfile /var/run/redis_6380.pid
logfile /var/log/redis/redis-slave.log
slaveof 127.0.0.1 6379
masterauth yourpassword
slave-read-only yes
启动哨兵
bash
# 启动Redis实例
redis-server redis-master.conf
redis-server redis-slave1.conf
redis-server redis-slave2.conf
# 启动Sentinel实例
redis-sentinel sentinel1.conf
redis-sentinel sentinel2.conf
redis-sentinel sentinel3.conf
# 或者使用redis-server启动
redis-server sentinel1.conf --sentinel
# 启动Redis实例
redis-server redis-master.conf
redis-server redis-slave1.conf
redis-server redis-slave2.conf
# 启动Sentinel实例
redis-sentinel sentinel1.conf
redis-sentinel sentinel2.conf
redis-sentinel sentinel3.conf
# 或者使用redis-server启动
redis-server sentinel1.conf --sentinel
客户端集成
Java客户端(Jedis)
java
@Configuration
public class RedisSentinelConfig {
@Bean
public JedisSentinelPool jedisSentinelPool() {
Set<String> sentinels = new HashSet<>();
sentinels.add("127.0.0.1:26379");
sentinels.add("127.0.0.1:26380");
sentinels.add("127.0.0.1:26381");
JedisPoolConfig poolConfig = new JedisPoolConfig();
poolConfig.setMaxTotal(100);
poolConfig.setMaxIdle(10);
poolConfig.setMinIdle(5);
poolConfig.setTestOnBorrow(true);
return new JedisSentinelPool("mymaster", sentinels, poolConfig, "yourpassword");
}
}
@Service
public class RedisService {
@Autowired
private JedisSentinelPool sentinelPool;
public void set(String key, String value) {
try (Jedis jedis = sentinelPool.getResource()) {
jedis.set(key, value);
}
}
public String get(String key) {
try (Jedis jedis = sentinelPool.getResource()) {
return jedis.get(key);
}
}
// 获取当前主服务器信息
public HostAndPort getCurrentMaster() {
return sentinelPool.getCurrentHostMaster();
}
}
@Configuration
public class RedisSentinelConfig {
@Bean
public JedisSentinelPool jedisSentinelPool() {
Set<String> sentinels = new HashSet<>();
sentinels.add("127.0.0.1:26379");
sentinels.add("127.0.0.1:26380");
sentinels.add("127.0.0.1:26381");
JedisPoolConfig poolConfig = new JedisPoolConfig();
poolConfig.setMaxTotal(100);
poolConfig.setMaxIdle(10);
poolConfig.setMinIdle(5);
poolConfig.setTestOnBorrow(true);
return new JedisSentinelPool("mymaster", sentinels, poolConfig, "yourpassword");
}
}
@Service
public class RedisService {
@Autowired
private JedisSentinelPool sentinelPool;
public void set(String key, String value) {
try (Jedis jedis = sentinelPool.getResource()) {
jedis.set(key, value);
}
}
public String get(String key) {
try (Jedis jedis = sentinelPool.getResource()) {
return jedis.get(key);
}
}
// 获取当前主服务器信息
public HostAndPort getCurrentMaster() {
return sentinelPool.getCurrentHostMaster();
}
}
Spring Data Redis
java
@Configuration
public class RedisSentinelConfiguration {
@Bean
public LettuceConnectionFactory redisConnectionFactory() {
RedisSentinelConfiguration sentinelConfig = new RedisSentinelConfiguration()
.master("mymaster")
.sentinel("127.0.0.1", 26379)
.sentinel("127.0.0.1", 26380)
.sentinel("127.0.0.1", 26381);
sentinelConfig.setPassword("yourpassword");
LettuceClientConfiguration clientConfig = LettuceClientConfiguration.builder()
.commandTimeout(Duration.ofSeconds(2))
.shutdownTimeout(Duration.ZERO)
.build();
return new LettuceConnectionFactory(sentinelConfig, clientConfig);
}
@Bean
public RedisTemplate<String, Object> redisTemplate() {
RedisTemplate<String, Object> template = new RedisTemplate<>();
template.setConnectionFactory(redisConnectionFactory());
template.setKeySerializer(new StringRedisSerializer());
template.setValueSerializer(new GenericJackson2JsonRedisSerializer());
return template;
}
}
// 监听主服务器切换事件
@Component
public class RedisMasterSwitchListener {
@EventListener
public void handleMasterSwitch(RedisConnectionFailureEvent event) {
log.warn("Redis connection failed: {}", event.getCause().getMessage());
}
@EventListener
public void handleMasterSwitch(RedisConnectionRecoveredEvent event) {
log.info("Redis connection recovered");
}
}
@Configuration
public class RedisSentinelConfiguration {
@Bean
public LettuceConnectionFactory redisConnectionFactory() {
RedisSentinelConfiguration sentinelConfig = new RedisSentinelConfiguration()
.master("mymaster")
.sentinel("127.0.0.1", 26379)
.sentinel("127.0.0.1", 26380)
.sentinel("127.0.0.1", 26381);
sentinelConfig.setPassword("yourpassword");
LettuceClientConfiguration clientConfig = LettuceClientConfiguration.builder()
.commandTimeout(Duration.ofSeconds(2))
.shutdownTimeout(Duration.ZERO)
.build();
return new LettuceConnectionFactory(sentinelConfig, clientConfig);
}
@Bean
public RedisTemplate<String, Object> redisTemplate() {
RedisTemplate<String, Object> template = new RedisTemplate<>();
template.setConnectionFactory(redisConnectionFactory());
template.setKeySerializer(new StringRedisSerializer());
template.setValueSerializer(new GenericJackson2JsonRedisSerializer());
return template;
}
}
// 监听主服务器切换事件
@Component
public class RedisMasterSwitchListener {
@EventListener
public void handleMasterSwitch(RedisConnectionFailureEvent event) {
log.warn("Redis connection failed: {}", event.getCause().getMessage());
}
@EventListener
public void handleMasterSwitch(RedisConnectionRecoveredEvent event) {
log.info("Redis connection recovered");
}
}
故障转移机制
主观下线(SDOWN)
java
// 主观下线检测
public class SubjectiveDownDetection {
private long downAfterMilliseconds = 30000; // 30秒
public boolean isSubjectiveDown(RedisInstance instance) {
long lastPingReply = instance.getLastPingReply();
long currentTime = System.currentTimeMillis();
// 超过指定时间未响应,标记为主观下线
return (currentTime - lastPingReply) > downAfterMilliseconds;
}
}
// 主观下线检测
public class SubjectiveDownDetection {
private long downAfterMilliseconds = 30000; // 30秒
public boolean isSubjectiveDown(RedisInstance instance) {
long lastPingReply = instance.getLastPingReply();
long currentTime = System.currentTimeMillis();
// 超过指定时间未响应,标记为主观下线
return (currentTime - lastPingReply) > downAfterMilliseconds;
}
}
客观下线(ODOWN)
java
// 客观下线检测
public class ObjectiveDownDetection {
private int quorum = 2; // 需要2个Sentinel同意
public boolean isObjectiveDown(RedisInstance master, List<Sentinel> sentinels) {
int sdownCount = 0;
for (Sentinel sentinel : sentinels) {
if (sentinel.isMarkedAsSubjectiveDown(master)) {
sdownCount++;
}
}
// 达到法定人数,标记为客观下线
return sdownCount >= quorum;
}
}
// 客观下线检测
public class ObjectiveDownDetection {
private int quorum = 2; // 需要2个Sentinel同意
public boolean isObjectiveDown(RedisInstance master, List<Sentinel> sentinels) {
int sdownCount = 0;
for (Sentinel sentinel : sentinels) {
if (sentinel.isMarkedAsSubjectiveDown(master)) {
sdownCount++;
}
}
// 达到法定人数,标记为客观下线
return sdownCount >= quorum;
}
}
选举新主服务器
java
// 新主服务器选举
public class MasterElection {
public RedisInstance electNewMaster(List<RedisInstance> slaves) {
// 1. 过滤不符合条件的从服务器
List<RedisInstance> candidates = slaves.stream()
.filter(slave -> !slave.isDown())
.filter(slave -> slave.getLastAvailableTime() < 5000) // 5秒内可用
.filter(slave -> slave.getInfoReplicationOffset() > 0) // 有复制偏移量
.collect(Collectors.toList());
if (candidates.isEmpty()) {
return null;
}
// 2. 按优先级排序
candidates.sort((s1, s2) -> {
// 优先级高的优先
int priorityCompare = Integer.compare(s1.getPriority(), s2.getPriority());
if (priorityCompare != 0) {
return priorityCompare;
}
// 复制偏移量大的优先
long offsetCompare = Long.compare(s2.getReplicationOffset(), s1.getReplicationOffset());
if (offsetCompare != 0) {
return (int) offsetCompare;
}
// 运行ID小的优先
return s1.getRunId().compareTo(s2.getRunId());
});
return candidates.get(0);
}
}
// 新主服务器选举
public class MasterElection {
public RedisInstance electNewMaster(List<RedisInstance> slaves) {
// 1. 过滤不符合条件的从服务器
List<RedisInstance> candidates = slaves.stream()
.filter(slave -> !slave.isDown())
.filter(slave -> slave.getLastAvailableTime() < 5000) // 5秒内可用
.filter(slave -> slave.getInfoReplicationOffset() > 0) // 有复制偏移量
.collect(Collectors.toList());
if (candidates.isEmpty()) {
return null;
}
// 2. 按优先级排序
candidates.sort((s1, s2) -> {
// 优先级高的优先
int priorityCompare = Integer.compare(s1.getPriority(), s2.getPriority());
if (priorityCompare != 0) {
return priorityCompare;
}
// 复制偏移量大的优先
long offsetCompare = Long.compare(s2.getReplicationOffset(), s1.getReplicationOffset());
if (offsetCompare != 0) {
return (int) offsetCompare;
}
// 运行ID小的优先
return s1.getRunId().compareTo(s2.getRunId());
});
return candidates.get(0);
}
}
故障转移流程
bash
# 故障转移步骤
# 1. 发现主服务器客观下线
# 2. 选举领头Sentinel
# 3. 领头Sentinel执行故障转移:
# a. 从从服务器中选出新主服务器
# b. 向新主服务器发送SLAVEOF NO ONE命令
# c. 向其他从服务器发送SLAVEOF命令,让它们复制新主服务器
# d. 更新配置,通知客户端新的主服务器地址
# 故障转移步骤
# 1. 发现主服务器客观下线
# 2. 选举领头Sentinel
# 3. 领头Sentinel执行故障转移:
# a. 从从服务器中选出新主服务器
# b. 向新主服务器发送SLAVEOF NO ONE命令
# c. 向其他从服务器发送SLAVEOF命令,让它们复制新主服务器
# d. 更新配置,通知客户端新的主服务器地址
监控和管理
Sentinel命令
bash
# 连接到Sentinel
redis-cli -p 26379
# 查看监控的主服务器
SENTINEL masters
# 查看指定主服务器的从服务器
SENTINEL slaves mymaster
# 查看其他Sentinel
SENTINEL sentinels mymaster
# 获取主服务器地址
SENTINEL get-master-addr-by-name mymaster
# 手动故障转移
SENTINEL failover mymaster
# 重置主服务器状态
SENTINEL reset mymaster
# 连接到Sentinel
redis-cli -p 26379
# 查看监控的主服务器
SENTINEL masters
# 查看指定主服务器的从服务器
SENTINEL slaves mymaster
# 查看其他Sentinel
SENTINEL sentinels mymaster
# 获取主服务器地址
SENTINEL get-master-addr-by-name mymaster
# 手动故障转移
SENTINEL failover mymaster
# 重置主服务器状态
SENTINEL reset mymaster
监控指标
java
@Component
public class SentinelMonitor {
@Autowired
private JedisSentinelPool sentinelPool;
@Scheduled(fixedDelay = 30000)
public void monitorSentinel() {
try (Jedis jedis = new Jedis("127.0.0.1", 26379)) {
// 获取主服务器信息
List<Map<String, String>> masters = jedis.sentinelMasters();
for (Map<String, String> master : masters) {
String name = master.get("name");
String status = master.get("flags");
int numSlaves = Integer.parseInt(master.get("num-slaves"));
int numSentinels = Integer.parseInt(master.get("num-other-sentinels"));
// 发送监控指标
sendMetric("sentinel.master.status", status.contains("master") ? 1 : 0);
sendMetric("sentinel.slaves.count", numSlaves);
sendMetric("sentinel.sentinels.count", numSentinels);
}
// 获取从服务器信息
List<Map<String, String>> slaves = jedis.sentinelSlaves("mymaster");
int healthySlaves = 0;
for (Map<String, String> slave : slaves) {
if (!slave.get("flags").contains("down")) {
healthySlaves++;
}
}
sendMetric("sentinel.healthy.slaves", healthySlaves);
} catch (Exception e) {
log.error("Sentinel monitoring failed", e);
}
}
}
@Component
public class SentinelMonitor {
@Autowired
private JedisSentinelPool sentinelPool;
@Scheduled(fixedDelay = 30000)
public void monitorSentinel() {
try (Jedis jedis = new Jedis("127.0.0.1", 26379)) {
// 获取主服务器信息
List<Map<String, String>> masters = jedis.sentinelMasters();
for (Map<String, String> master : masters) {
String name = master.get("name");
String status = master.get("flags");
int numSlaves = Integer.parseInt(master.get("num-slaves"));
int numSentinels = Integer.parseInt(master.get("num-other-sentinels"));
// 发送监控指标
sendMetric("sentinel.master.status", status.contains("master") ? 1 : 0);
sendMetric("sentinel.slaves.count", numSlaves);
sendMetric("sentinel.sentinels.count", numSentinels);
}
// 获取从服务器信息
List<Map<String, String>> slaves = jedis.sentinelSlaves("mymaster");
int healthySlaves = 0;
for (Map<String, String> slave : slaves) {
if (!slave.get("flags").contains("down")) {
healthySlaves++;
}
}
sendMetric("sentinel.healthy.slaves", healthySlaves);
} catch (Exception e) {
log.error("Sentinel monitoring failed", e);
}
}
}
配置优化
性能调优
bash
# sentinel.conf优化配置
# 减少误判的配置
sentinel down-after-milliseconds mymaster 5000 # 5秒检测下线
sentinel failover-timeout mymaster 60000 # 60秒故障转移超时
# 网络优化
tcp-keepalive 60 # TCP保活时间
timeout 0 # 客户端超时时间
# 并发控制
sentinel parallel-syncs mymaster 1 # 同时同步的从服务器数量
# 日志配置
loglevel notice # 日志级别
syslog-enabled yes # 启用系统日志
# sentinel.conf优化配置
# 减少误判的配置
sentinel down-after-milliseconds mymaster 5000 # 5秒检测下线
sentinel failover-timeout mymaster 60000 # 60秒故障转移超时
# 网络优化
tcp-keepalive 60 # TCP保活时间
timeout 0 # 客户端超时时间
# 并发控制
sentinel parallel-syncs mymaster 1 # 同时同步的从服务器数量
# 日志配置
loglevel notice # 日志级别
syslog-enabled yes # 启用系统日志
通知脚本
bash
#!/bin/bash
# notify.sh - 故障通知脚本
EVENT_TYPE=$1
EVENT_INSTANCE=$2
EVENT_IP=$3
EVENT_PORT=$4
case $EVENT_TYPE in
"+sdown")
echo "Master $EVENT_INSTANCE is subjectively down" | mail -s "Redis Alert" [email protected]
;;
"+odown")
echo "Master $EVENT_INSTANCE is objectively down" | mail -s "Redis Alert" [email protected]
;;
"+failover-end")
echo "Failover completed for $EVENT_INSTANCE, new master: $EVENT_IP:$EVENT_PORT" | mail -s "Redis Alert" [email protected]
;;
esac
#!/bin/bash
# notify.sh - 故障通知脚本
EVENT_TYPE=$1
EVENT_INSTANCE=$2
EVENT_IP=$3
EVENT_PORT=$4
case $EVENT_TYPE in
"+sdown")
echo "Master $EVENT_INSTANCE is subjectively down" | mail -s "Redis Alert" [email protected]
;;
"+odown")
echo "Master $EVENT_INSTANCE is objectively down" | mail -s "Redis Alert" [email protected]
;;
"+failover-end")
echo "Failover completed for $EVENT_INSTANCE, new master: $EVENT_IP:$EVENT_PORT" | mail -s "Redis Alert" [email protected]
;;
esac
最佳实践
1. 部署建议
- 奇数个Sentinel:建议部署3个或5个Sentinel实例
- 分布式部署:Sentinel实例分布在不同的物理机器上
- 网络隔离:避免Sentinel和Redis在同一网络分区
2. 配置建议
- 合理设置超时时间:避免网络抖动导致的误判
- 配置通知脚本:及时获得故障通知
- 定期备份配置:保存Sentinel配置文件
3. 监控建议
- 监控Sentinel状态:确保Sentinel集群正常工作
- 监控故障转移:记录故障转移的频率和原因
- 监控网络延迟:确保Sentinel间通信正常
4. 运维建议
- 定期演练:定期进行故障转移演练
- 版本一致性:保持Redis和Sentinel版本一致
- 文档维护:维护详细的运维文档
总结
Redis Sentinel提供了可靠的高可用性解决方案,通过自动故障检测和转移,确保Redis服务的连续性。正确配置和监控Sentinel系统,能够大大提高Redis服务的可用性和稳定性。在生产环境中,建议结合监控、告警和自动化运维工具,构建完整的高可用架构。