Wednesday 1 February 2017

Redis - How to take individual nodes in redis cluster down for maintainence.

Redis is a very fast caching server, keeps all the data in RAM, and it is able to serve requests with sub microseconds latency.
Considering it keeps all the data in memory, if we need to keep more data than what fits in a single machine's memory, we create a redis cluster.
In a redis cluster, the data is distributed among its N masters, where N >=3. Also it is recommended to replicate the data, so that each master node has a backup node in case it goes down. The same backup node can be used and be promoted to master in case we have to bring down the master node for maintenance.

The main command used for this is the CLUSTER FAILOVER command.
This command should always be executed on a slave node, and it promotes the slave in a cluster to a master, (and master becomes the slave).

Below, we will show how to bring the master nodes of redis down one by one for maintainence(or for other activities)
Lets say you have the cluster in running mode with 3 masters and 3 slaves.

for port in {7000..7005}; do ../src/redis-server $port/redis.conf ; done ; 

I have modified the conf files so that above will start the 6 instances of redis in cluster mode, in daemon threads, and with appropriate logging.
After that, the cluster will be created using

../src/redis-trib.rb create --replicas 1 127.0.0.1:7000 127.0.0.1:7001 127.0.0.1:7002 127.0.0.1:7003 127.0.0.1:7004 127.0.0.1:7005

The cluster can be verified using cluster nodes or cluster info command.

../src/redis-cli -p 7001 cluster nodes

cluster-test $../src/redis-cli -p 7001 cluster nodes
daabc7bdc914f73682e07ed40769766ecee06c8f 127.0.0.1:7000 master - 0 1485859005956 1 connected 0-5460
276ad9dbd675f7e3183ab808549f1436d91e56b3 127.0.0.1:7001 myself,master - 0 0 2 connected 5461-10922
2a2e22e2e03bbc934d972a247b4a9c5c0b341747 127.0.0.1:7005 slave 5c0585a57362901380c392b4f3041777e169adec 0 1485859005653 6 connected
5c0585a57362901380c392b4f3041777e169adec 127.0.0.1:7002 master - 0 1485859006463 3 connected 10923-16383
ad7b5e40d14c4f63b2e1b7a901a3c258920d8898 127.0.0.1:7003 slave daabc7bdc914f73682e07ed40769766ecee06c8f 0 1485859005451 4 connected
ffc3155ca303da41825d42db3d109b67525e1f5a 127.0.0.1:7004 slave 276ad9dbd675f7e3183ab808549f1436d91e56b3 0 1485859005653 5 connected

../src/redis-cli -p 7001 cluster info

cluster-test $ ../src/redis-cli -p 7001 cluster info
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:6
cluster_my_epoch:2
cluster_stats_messages_sent:9298
cluster_stats_messages_received:9081

The general rules are these.

  1. A slave can be brought down without a problem.
  2. A master should be first converted to a slave before bringing it down.
  3. After converting a master to a slave, always verify it.


For bringing down a slave like the instance running on 7005, we can simple stop redis, using shutdown or by killing the process(ideally shutdown should be renamed command).
../src/redis-cli -p 7005 shutdown

After doing the maintainence activity, we can again start redis using.
../src/redis-server 7005/redis.conf

For bringing down a master(like instance on 7002), we need to convert it into a slave and then stop the redis server.
This is done by running the cluster failover command on the slave of the master.
To find out which instance is the slave of the 7002 instance, we can execute the info replication command on 7002 instance

cluster-test $ redis-cli -p 7002 info replication
# Replication
role:master
connected_slaves:1
slave0:ip=127.0.0.1,port=7005,state=online,offset=5729,lag=0
master_repl_offset:5729
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:5716
repl_backlog_histlen:14
cluster-test $ 

From the info replication command above, we can see that the slave of instance on port 7002 is instance on 7005.

So, we run the cluster failover command on the slave(7005), so that it becomes the master, and 7002 becomes the slave.

cluster-test $ ../src/redis-cli -p 7005 cluster failover
OK
cluster-test $ ../src/redis-cli -p 7005 cluster failover
(error) ERR You should send CLUSTER FAILOVER to a slave

Note that many times the cluster failover command returns "OK", but the slave is not promoted to a master, because the slave may not be in sync with the master. So, it is strongly recommended to run the command again so that we can be sure that 7005 has converted to the master.

We can also check it by executing the info replication command.

cluster-test $ redis-cli -p 7005 info replication
# Replication
role:master
connected_slaves:1
slave0:ip=127.0.0.1,port=7002,state=online,offset=16762,lag=1
master_repl_offset:16762
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:16693
repl_backlog_histlen:70

After that, since the 7002 instance is a slave, it can be brought down by the usual shutdown command(or by killing the redis process or by stopping the service).

../src/redis-cli -p 7002 shutdown

:)

Normally, shutdown is one of the dangerous commands which should be disabled or atleast renamed in the redis servers to avoid the misuse.



No comments:

Post a Comment