Thursday, 22 December 2016

Redis Cluster Monitoring - part 2, global cluster monitoring script.

In the first part of the series on how to effectively monitor redis cluster stacks here, we had seen the first of the four types of scripts that can be useful for us. It also mentioned about the monitoring shell script running on all the systems and monitoring the health of redis on that particular system.

So, lets say we have the redis instance monitoring script (described in part 1) running on a particular redis server and checking every 2 minutes by crontab, that the redis server process is up on that machine.
But what happens if the machine itself is down. If that is the case, the cron will not run and we will not get the notification about redis nodes on that machine. :(

In this part, we will see how we can globally maintain a script to fix the above problem.
In this script, we use the "cluster info" command to find out whether the cluster is in OK state or not.

#!/bin/sh

erroringStacks=''

# it expects five arguments, 1st being the cluster identifier, 
# 2nd n 3rd are IP port of a master in the cluster, 
# 4th n 5th are the IP port of the slave of the master.
checkCluster() {
   msg="Redis Cluster $1 $2 $3 $4 $5 "
   # try to get the cluster state using the first IP port combination. 
   val=`redis-cli -c -h $2 -p $3 cluster info 2> /dev/null | grep 'cluster_state'` 

   if [[ $val == *"ok"* ]]
   then
        msg="$msg OK"
   else
# if the first IP port did not give ok response, 
# try to get the cluster state from the second IP port combination.
val=`redis-cli -c -h $4 -p $5 cluster info 2> /dev/null | grep 'cluster_state'`

if [[ $val == *"ok"* ]]
        then
                msg="$msg OK"
        else
                msg="$msg down"
                erroringStacks="$erroringStacks $1"
        fi
   fi
}

# expects 3 args, 1st is stack identifier, 2nd,3rd are IP port
checkStandalone() {
   msg="Redis Standalone $1 $2 $3 "
   # try to set a simple value in the standalone instance, and get the output.
   val=`redis-cli -h $2 -p $3 set healthcheck:abc def 2> /dev/null`
   if [[ $val == *"OK"* ]]
   then
msg="$msg OK"
   else
msg="$msg down, not able to insert data"
erroringStacks="$erroringStacks $1"
   fi

}

checkCluster C1 127.0.0.1 7000 127.0.0.1 7001
checkStandalone S1 127.0.0.1 6380
checkCluster C2 127.0.0.1 7010 127.0.0.1 7011


if [[ "$erroringStacks" == "" ]]
then
    echo "all well" 
else
    echo "The following stacks are erroring out: $erroringStacks " 
    # send an email to the concerned team.
fi


The above script checks whether the various stacks are up or not. 

For clusters, it gets the info from redis-cli cluster info command. Further, if a particular node specified is not up, it checks its slave to see that it is up, and the cluster info is ok. If both the master and slave specified are not OK, then the cluster is bound to be down and an error is returned. Note that the above method will not work for cluster if cluster-require-full-coverage is set to 'no' in redis.conf file, which means that the cluster will still be up even if some of its slots are not being served properly.

For standalone instance, it can get to check the state by inserting a dummy key value pair in redis. If the key value pair cannot be inserted, then the error will be thrown.

The same code can be done for multiple stacks by specifying a stack name which is the identifier for the stack. Here the cluster stacks are mentioned as C1, and C2 and standalone stacks is S1.

In case, the error is thrown, an appropriate alert can be generated.

This script can be scheduled in crontab and could run every fixed interval like 5 minutes.

*/5 * * * * sh /redis/monitorGlobalRedisStacks.sh

Also, ideally this script should run on more than 2-3 machines, atleast 1 of those should be guaranteed to be up.

This concludes our second part of this series. In the third part, we will see how we can have an hourly checking script on the threshold of various important params in redis servers, and have a mailer in case the threshold in memory, connection, replication is reached.

:)

Wednesday, 21 December 2016

Redis Cluster monitoring - part 1 - node monitoring script

Redis is an in-memory database used for caching which provides very high performance and can run uninterrupted for months. Considering redis stores all the data in memory, if our data size is more than the memory for a single machine, we have to distribute the data on various machines. This is where Redis Cluster comes in, and it provides us a way to distribute the data on different machines, add/remove new machines etc.

However, once we have a large numbers of nodes in a redis cluster, it becomes imperative to continuously monitor the state and health of each redis cluster node/system because the chances of failure of one or more nodes increase.

Also, it helps to have automatic scripts which can monitor the redis nodes and alert in case it senses an error or that some memory/connection threshold has reached.

We should have the following monitoring in a cluster.
  1. monitoring individual nodes.
  2. monitoring overall cluster health.
  3. monitoring stats on redis nodes.
  4. viewing the redis stats over time on some graph

Monitoring individual nodes basically involves monitoring the redis process running on of each node and restarts it if it stops.

Monitoring overall cluster health is required because it is possible that one of the machine is down, so that its monitoring script running on it cannot send an alert. In this case, the global redis monitoring script should try to do a basic insert in each cluster stack, and if it fails, it should trigger the email alert.

Monitoring stats on redis nodes is required so that we don't have to wait for the things to go bad in redis, and we can identify whenever the threshold is reached for various indices in redis. This involves automatic monitoring of individual redis nodes, for the connections, memory, replication lag etc.

Finally, we need to have the stats of various redis nodes represented in terms of graphs. This is required to identify uneven patterns in the data usage/access and to have a global view of how the redis stack is used.

In this part, we will go through how we can monitor individual nodes.

Individual nodes can be monitored by a shell script, essentially a shell script will run every 30 seconds or so and will see if a redis server is running along with a port(s), if the redis server is not running on the ports defined, it restarts the redis server.

This can be achieved using a simple shell script as below.

#!/bin/bash

START_PORT=7000
END_PORT=7003

error=0
ports=''
checkRedis(){
        count=`ps -ef | grep "redis-server" | grep ":$1" | wc -l`
        if [[ $count -ne 1 ]]
        then
                error=1
                ports="$ports $1"
                echo "starting redis on port $1"
                # start redis either by redis-server or by service if redis is installed as a service
                # service redis-$1 start
                src/redis-server cluster-test/$1/redis.conf
        fi
}

for ((i=START_PORT;i<=END_PORT;i++)); do
    checkRedis $i
done

if [[ $error -eq 1 ]]
then
        echo "need to send mail that redis was started on ports $ports"
fi


The above script should run on each machine having one or many redis nodes. If a machine has redis running on different ports, they can be specified. In the above script, we specified that the redis will be running on ports 7000, 7001, 7002, 7003.

The above script can be saved in a file like 'monitorIndividualNodes.sh' and can be run every 2 minutes in crontab using

*/2 * * * * sh /redis/monitorIndividualNodes.sh

The script can be configured to run every interval, like every minute or so through crontab or any other trustworthy scheduling service, and will check whether the redis server is running on predefined ports on those machines. If it is not running, it will start the redis. Optionally, it should also send an email to alert the concerned.

Also, even in case of system restart, cron will run the script appropriately and all the redis instances will start.

Considering redis is very stable, and does not stop unless there is a machine restart, we don't have to worry about receiving too many emails. :)

In the next part, we will see the script to monitor the overall health of the cluster. This can be useful in case one or more machines are down as as result of which the monitoring script of individual nodes cannot run on them and no alert is generated by them.

Happy redising. :)

Tuesday, 6 December 2016

handling and killing idle clients in redis

Considering redis is single threaded, it is best if an application maintains a single connection and uses it to query redis, because redis will process all requests one by one, anyways.

Many times in redis, we have to kill the idle connections in redis. This is useful if there is an application which creates a number of connections but does not close them.

There are some ways to avoid the idle connections in redis.

One of way is to set an idle timeout in redis.
This is done by setting the "timeout" value in seconds in redis.conf which requires a redis restart. Below is the entry in redis.conf

#Close the connection after a client is idle for N seconds (0 to disable)
timeout 3600

This can also be done by setting the timeout through config without redis restart by executing the below command.

src/redis-cli CONFIG SET timeout 3600

Considering, the config value set using the method 2 above is lost if the redis restarts, it is important to set it in redis.conf(method1) and then set it using config(method 2) so that redis does not require a restart and also the value persists even after there is a restart.

Note that, by default the value of timeout is 0, ie idle connections are everlasting and are never killed by redis server.

The Catch
The catch here is that the timeout is only applicable for normal clients, not for pub-sub ones, so that a pubsub client will not timeout even though its idle time exceeds the idle timeout defined. This is because the default behavior of pub-sub clients is to wait for events.  This is mentioned here.


Killing clients of a particular type in redis

Sometimes if you want to kill all the clients of a particular type, the below command may come in handy.

# kills all normal clients
src/redis-cli CLIENT KILL TYPE normal

# kills all pub-sub clients
src/redis-cli CLIENT KILL TYPE pubsub

# kills all slave clients
src/redis-cli CLIENT KILL TYPE slave

It is important to note that the above commands should be executed after due consideration. If our redis clients(like Jedis/Lettuce clients in java) reconnect, the above command will kill all connections and valid connections will reconnect, so that the application will be ok.

However if our clients don't reconnect, then we need to manually identify the clients/ip addresses which are expired/need to be terminated, and we need to kill those clients only.


Killing redis connections from an ip address

If we want to kill connections of a particular type from a particular ip address, then we can identify those connections using the CLIENT LIST command in redis, and kill them using the CLIENT KILL command in redis.

The below command will kill all pub-sub connections of a given ip address '10.150.20.30'

redis-cli -h 127.0.0.1 -p 6379 CLIENT LIST | grep 'sub=1' | grep '10.150.20.30' | awk  {'print $2'} | awk -F "=" {'print "CLIENT KILL ADDR " $2'} |  redis-cli -h 127.0.0.1

In the above, "redis-cli -h 127.0.0.1 -p 6379 CLIENT LIST | grep 'sub=1' | grep '10.150.20.30' " will get the pub-sub connections from redis client list command, then get "id=10.150.20.30:port" using awk  {'print $2'} and print client kill ADDR <ip:port> and pass it to redis-cli command.

This will ensure that all those connections satusfying the criteria are killed.

Similar to above, the below command can also be used, but is a bit slower.

redis-cli -h 127.0.0.1 -p 6379 CLIENT LIST | grep '10.150.20.30' | awk  {'print $2'} | awk -F "=" {'print "CLIENT KILL TYPE pubsub addr " $2'} |  redis-cli -h 127.0.0.1

The complete details of the redis client kill command can be found here.

:)



Friday, 2 December 2016

How to get the rdb location, and config file location in redis.

Sometimes when we login to a redis server, we can see the running redis on it, but we don't know where are the configuration files and the rdb backups files for the redis. In this post, we will see how to find out the location of various configuration files in redis.

The below are the important files in redis.


  1. redis.conf          => main configuration file in redis
  2. dump.rdb          => backup file generated by background save in redis
  3. sentinel.conf     => sentinel  configuration file, required only while running sentinel
  4. nodes.conf        => cluster configuration file, auto-generated by running redis in cluster mode.

Location of redis.conf
redis.conf is the major redis configuration file which has lots of very useful info and lots of highly configurable parameters.
Whether you are a developer who wants to dive more in redis or a redis admin, it is very important to read the file and try to understand the configuration params. See the sample here.

For finding out the location of redis.conf, we will use the INFO command.
info command gives a lot of very useful information about the redis server, one of which is the location of config file.

so, below command gives us the location of redis.conf file.

src/redis-cli -p 6379 info | grep 'config_file'

Location of sentinel.conf file
sentinel.conf file is only required if you are using a sentinel in redis. See the sample sentinel.conf file here. It is normally present in the same directory as redis.conf.
For finding out the sentinel.conf, we can execute the command on sentinel(note that sentinel by default runs on 26379 port while redis runs on 6379 port)

src/redis-cli -p 26379 info | grep 'config_file'

Location of dump.rdb file
dump.rdb file is the default file in which redis will save the data to disk if you enable rdb based persistence in the redis.conf file.
The location of dump.rdb can be obtained in either of the two ways

  • reading the "dir" value in redis.conf file found from the location of redis.conf
           cat redis.conf | grep "dir "
  • getting it from redis config
            Sometimes if we get the "dir " from the redis.conf file, it just shows the current directory( "./")
            In this case, it is better to get it from the redis config by executing the command below.

           src/redis-cli -p 6379 CONFIG GET dir

Note that #2 will only work if CONFIG command is not renamed/disabled in redis. 
If it is you will have to find out the renamed command and then use the renamed command instead of CONFIG.

Location of nodes.conf file
In a redis cluster, each cluster node/instance automatically generates a nodes.conf file where the view of the cluster with respect to that node is stored. 
It stores the cluster related information like

For all the nodes in the cluster, it stores
  1. the current node, and whether it is a master or a slave.
  2. no of clients connected to it.
  3. slot ranges for which it holds data(if it is a master)
  4. the epoch
the nodes.conf file is also stored in the same location as dump.rdb.