Friday, 8 September 2017

How does HyperLogLog work in redis

               
Redis provides various functionality for a particular key implemented using each data structure it provides(like string, map, set, sorted set).

One of the most amazing functionality which redis provides is estimating the unique counts.

Imagine you were asked to keep track of unique visitors on a particular page.
Ideally to keep the count unique visitors on the page, whenever a particular user lands on the site with an id, it needs to be checked whether it exists in the existing set of users or not. Doing so requires space to keep all the individual user ids.
Almost magically, redis does it with finite space of 12 KB, and with an average accuracy of over 99%. It does it with the commands PFADD and PFCOUNT

It is not something which is unique to redis, as the concept of HyperLogLog(HLL) originated from this paper by Flajolet et al.

In the below section, we will learn the basic concept of HyperLogLog(HLL) and we will reach on the number 12 KB, which is the maximum amount of memory, which redis will use for a particular key to count the unique values in it, regardless of the number of values in it.

Lets say you have some free time on your hands, and you flip a coin N1 times and keep a track of the heads obtained. A flip can result in a heads or tails with equal probability.
Then you mention it to me that you had got 3 consecutive heads results.

Then, you start flipping the coin again after ignoring the earlier results. This time you mention to me that you had flipped the coin N2 times and got 5 consecutive head results.

If i had to make a half decent guess between relationship between N1 and N2, i would guess that N1 < N2.

Now, coming to probability, the probability of getting 3 consecutive heads is (1/2)^3  = 1/8, and of getting 5 consecutive heads is 1/16.
It can more of less mean that you could get 3 consecutive heads if you flip the coin around 8 times(or may be between 8-10 times because you need 3 flips for 3 consecutive occurence).

But you may get the 3 heads in 3 flips also, and its possible, if you are quite unlucky, to spend more time like 50-100 flips to get to that elusive 3rd consecutive heads.
So, may be we should take an average or the mean of the number of times we do this iteration(getting 3 consecutive heads and counting the no).

Viewing it in another way, if you tell me the number of consecutive heads, i can roughly guess the number of times you had to do the coin flip by asking you to do the iteration multiple times.

The above is the core concept of the HyperLogLog.

What redis does.

When we do "PFADD KEY VAL" in redis, redis will find out the hashcode of the key. The hashcode will be a 64 bit integer.

Now, know that redis has 16384 slots or buckets. Somehow redis needs to map each key to a bucket.

So what it will do is that it will take the first 14 bits of the hashcode of the key and map it to a bucket.
Note that 16384 = 2^14, and each bit can have 2 values, ie 0 and 1. So that there can be 2^14 = 16384 different values for the first 14 bits of the hashcode.
So, redis will take the first 14 bits of the hashcode, map it to the appropriate bucket out of 16384 buckets.

Now, remember that the hashcode had 64 bits, out of which 14 bits were used for selecting the bucket.
Out of the rest 50 bits, redis will find out the number of consecutive 0s that are occuring for that hashcode. It will try to keep the number of consecutive 0 in the bucket where the key's hashcode belongs. If this number is greater than the number of consecutive 0s seen by that bucket, it will replace the existing number in the bucket.

At the end, each of the 16384 buckets will have to keep a number to denote the number of consecutive 0s which have been seen by it.
Note that we had seen consecutive 0s in 50 bits. The maximum num of consecutive 0s in 50 bits is 50(if all bits are 0).
So, each bucket will have to keep a number between 0 and 50.
How many bits does each bucket need? To store a number between 0 and 50, you need 6 bits(because 2^6 = 64 > 50  > 32 = 2^5).
Now, there are 16384 buckets, each needs 6 bits.
So that 16384 * 6 bits.
Recall that 8 bits = 1 byte.
So 16384 buckets will need 16384 * 6 bits = 16384 * 6/8 bytes = 2024 * 6 bytes = 12 KB.
So, each value will need a max of 12 KB of memory. :)

Also, the standard error in this has been shown to be 1.04/sqrt(N) where N is the no of buckets. Considering redis uses 16384 slots or buckets, the error is ~0.8%, or the accuracy is > 99%.

Considering, the harmonic mean of the values in the buckets is taken, and not the simple mean, the algo is called HyperLogLog.

Harmonic mean is taken because it minimises the impact of outliers in the buckets(in a bucket having seen 10 consecutive 0s while all other having seen 3-4 consecutive 0s, 10 is the outlier because it is very far from the normal mean)


The below are very useful links from where i could grasp this concept.

http://antirez.com/news/75
http://blog.kiip.me/engineering/sketching-scaling-everyday-hyperloglog/
http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf
https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/40671.pdf

:)

Sunday, 7 May 2017

Securing your redis instance.

Redis is a very fast caching engine which has support for a number of data structures like list, sets, sorted sets etc.

Many times, we are faced with a problem of how to secure our redis instance. The problem comes up because anyone can connect to redis and manipulate the data.
Redis is designed for very fast performance and does not have very strong security features built in it by default.

There is one very important rule while using redis. Ignoring this can lead to catastrophe in your redis world, if not anywhere else.

Never expose a redis instance directly to the internet

It is almost impossible to think of a valid use case whether the redis instance will be directly open to the internet. Instead only the trusted applications/ip addresses should interact with it. This is because anybody will be able to connect to it on the internet and manipulate the data. So, it is essential that the machines running redis instances have private ip addresses and are strictly behind a firewall.


Next, we will discuss the security features that can be setup.

Setting an authentication password

         This is the easiest way to secure your redis instance. It involves setting the password using the "requirepass" parameter in the config file. The same password can be used by the clients to authenticate themselves first before making calls to redis.

The following is set in the redis.conf file

#requirepass foobared
requirepass c0nf!d3nt!@|

So, every client has to connect to it by giving the password while creating the connection

redis-3.0.6 $ redis-cli
127.0.0.1:6379> set a b
(error) NOAUTH Authentication required.
127.0.0.1:6379> auth c0nf!d3nt!@|
OK
127.0.0.1:6379> set a b
OK
127.0.0.1:6379> set b c
OK

However considering the password is in plain text in the config file, it is not a foolproof solution. This is because applications will keep a copy of the password with them, and it is possible for it to be compromised.

Use IP tables

         This involves restricting connections to a particular machine and port only from predefined systems. IP tables come pre-installed on almost all linux distributions. In my opinion, this is one of the best ways to secure your redis instance.

iptables -F
iptables -A INPUT -p tcp -s 192.168.10.40,192.168.10.41 --dport 6379 -j ACCEPT
iptables -A INPUT -p tcp -dport 6379 -j DROP
iptables-save

The above needs to be run on the redis instance. With this the machine will only accept connections from 192.168.10.40 and 192.168.10.41 on its 6379 port(on which redis is running).

Rename dangerous commands

 Also it is a good idea to rename the dangerous commands. It is done using the "rename-command" parameter. Some of the commands which should be renamed are "FLUSHALL", "FLUSHDB", "CONFIG", "DEBUG", "RENAME", "BGSAVE", "SAVE", "MONITOR", "KEYS". 
The commands can be renamed by giving an alternate name or disabled by specifying "" as the name. 

rename-command FLUSHALL ""
rename-command FLUSHDB ""
rename-command SAVE ""
rename-command RENAME ""
rename-command SHUTDOWN ""
rename-command MONITOR 3@v3sdr0p
rename-command CONFIG custom-config
rename-command KEYS get-all-keys
rename-command DEBUG k!||-@||-bugs
rename-command BGSAVE backgroundsave

The above will disable the commands "FLUSHALL", "FLUSHDB", "SAVE", "RENAME", "SHUTDOWN" and rename the "MONITOR", "CONFIG", "KEYS", "DEBUG" and "BGSAVE" command.

Note that once you rename the MONITOR command, executing the renamed command becomes slightly more tricky because the behavior of monitor command depends on both the server and the client(the server sends all commands to the particular client running the monitor and client outputs it). More details here.

:)

Thursday, 27 April 2017

redis info command - key metrics to monitor

While running redis, it is very essential to continuously monitor the health of the redis instance. This helps in identifying any problems that may come up in next some days and appropriate action can be taken to avoid it.

Redis INFO command is one of the most important commands to monitor the health of redis.

The INFO command also taken an optional "section" as parameter so that it will display redis stats about that section only.
Sections can be one of the following.

  • server
  • clients
  • memory
  • persistence 
  • stats
  • replication
  • cpu
  • commandstats
  • cluster
  • keyspace
so that redis-cli -p 6379 info memory will give the information about the memory consumed by redis, while redis-cli -p 6379 info keyspace will give the information about the keys in redis, ie their number, expiring keys, average TTL etc.

Note that there are other two other sections, default and all.
redis-cli -p 6379 info will give the default fields.
redis-cli -p 6379 info all will give all the fields of all the sections.

Below is the output of info all command on one of the servers.

redis-cli info all
# Server
redis_version:3.2.8
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:f45ba3990f1644ad
redis_mode:standalone
os:Linux 3.10.0-229.el7.x86_64 x86_64
arch_bits:64
multiplexing_api:epoll
gcc_version:4.8.3
process_id:3379
run_id:70c7bdd361b8a65c68612092dc559cf5412e76dd
tcp_port:6379
uptime_in_seconds:3561216
uptime_in_days:41
hz:10
lru_clock:141552
executable:/opt/redis-3.2.8/src/redis-server
config_file:/opt/redis-3.2.8/redis.conf

# Clients
connected_clients:78
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0

# Memory
used_memory:51602584992
used_memory_human:48.06G
used_memory_rss:70614675456
used_memory_rss_human:65.77G
used_memory_peak:68906025968
used_memory_peak_human:64.17G
total_system_memory:135059263488
total_system_memory_human:125.78G
used_memory_lua:99328
used_memory_lua_human:97.00K
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
mem_fragmentation_ratio:1.37
mem_allocator:jemalloc-4.0.3

# Persistence
loading:0
rdb_changes_since_last_save:53293
rdb_bgsave_in_progress:0
rdb_last_save_time:1493312911
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:200
rdb_current_bgsave_time_sec:-1
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok

# Stats
total_connections_received:216566
total_commands_processed:6332830614
instantaneous_ops_per_sec:419
total_net_input_bytes:968574281425
total_net_output_bytes:12348467048288
instantaneous_input_kbps:26.09
instantaneous_output_kbps:97.82
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:122577760
evicted_keys:0
keyspace_hits:870705988
keyspace_misses:2491310815
pubsub_channels:25
pubsub_patterns:2
latest_fork_usec:1093910
migrate_cached_sockets:0

# Replication
role:master
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

# CPU
used_cpu_sys:60615.77
used_cpu_user:69274.88
used_cpu_sys_children:42067.91
used_cpu_user_children:642783.81

# Commandstats
cmdstat_get:calls=5225350,usec=30469704,usec_per_call=5.83
cmdstat_set:calls=20476309,usec=178583650,usec_per_call=8.72
cmdstat_setex:calls=61479312,usec=434391737,usec_per_call=7.07
cmdstat_del:calls=185544,usec=52523131,usec_per_call=283.08
cmdstat_exists:calls=7918,usec=50075,usec_per_call=6.32
cmdstat_incr:calls=61425211,usec=174208343,usec_per_call=2.84
cmdstat_mget:calls=749917,usec=154888129,usec_per_call=206.54
cmdstat_rpush:calls=5,usec=127,usec_per_call=25.40
cmdstat_lpush:calls=6,usec=159,usec_per_call=26.50
cmdstat_lrange:calls=12,usec=180,usec_per_call=15.00
cmdstat_ltrim:calls=2,usec=45,usec_per_call=22.50
cmdstat_sadd:calls=35711154,usec=482334843,usec_per_call=13.51
cmdstat_srem:calls=321972,usec=4654957,usec_per_call=14.46
cmdstat_sismember:calls=640,usec=2558,usec_per_call=4.00
cmdstat_scard:calls=29948693,usec=46578706,usec_per_call=1.56
cmdstat_spop:calls=29947974,usec=418436228,usec_per_call=13.97
cmdstat_sinter:calls=1,usec=67,usec_per_call=67.00
cmdstat_smembers:calls=197297,usec=1346705,usec_per_call=6.83
cmdstat_sscan:calls=3,usec=9,usec_per_call=3.00
cmdstat_zadd:calls=816683,usec=2615273,usec_per_call=3.20
cmdstat_hset:calls=9007518,usec=118520557,usec_per_call=13.16
cmdstat_hsetnx:calls=3,usec=53,usec_per_call=17.67
cmdstat_hget:calls=416016545,usec=681666715,usec_per_call=1.64
cmdstat_hmset:calls=111728508,usec=633862773,usec_per_call=5.67
cmdstat_hmget:calls=114510880,usec=1689683861,usec_per_call=14.76
cmdstat_hincrby:calls=1,usec=18,usec_per_call=18.00
cmdstat_hdel:calls=9219162,usec=40814849,usec_per_call=4.43
cmdstat_hlen:calls=3489,usec=22029,usec_per_call=6.31
cmdstat_hstrlen:calls=3,usec=40,usec_per_call=13.33
cmdstat_hkeys:calls=1987,usec=27508499,usec_per_call=13844.24
cmdstat_hvals:calls=2507470589,usec=3927244340,usec_per_call=1.57
cmdstat_hgetall:calls=165563433,usec=954226498,usec_per_call=5.76
cmdstat_hexists:calls=474570,usec=727412,usec_per_call=1.53
cmdstat_hscan:calls=13439743,usec=17244866871,usec_per_call=1283.12
cmdstat_incrby:calls=1042,usec=28983,usec_per_call=27.81
cmdstat_randomkey:calls=2,usec=9,usec_per_call=4.50
cmdstat_expire:calls=61430368,usec=217149820,usec_per_call=3.53
cmdstat_pexpire:calls=7906572,usec=77505182,usec_per_call=9.80
cmdstat_keys:calls=1008,usec=312829814,usec_per_call=310347.06
cmdstat_scan:calls=27145,usec=116950316,usec_per_call=4308.36
cmdstat_dbsize:calls=18,usec=21,usec_per_call=1.17
cmdstat_auth:calls=4,usec=10,usec_per_call=2.50
cmdstat_ping:calls=2520082001,usec=891648507,usec_per_call=0.35
cmdstat_type:calls=11,usec=43,usec_per_call=3.91
cmdstat_info:calls=1084582,usec=108458365,usec_per_call=100.00
cmdstat_monitor:calls=41,usec=43,usec_per_call=1.05
cmdstat_role:calls=1,usec=17,usec_per_call=17.00
cmdstat_debug:calls=1,usec=254,usec_per_call=254.00
cmdstat_config:calls=1,usec=20,usec_per_call=20.00
cmdstat_subscribe:calls=491335,usec=1366115,usec_per_call=2.78
cmdstat_unsubscribe:calls=71,usec=4243,usec_per_call=59.76
cmdstat_psubscribe:calls=896,usec=5074,usec_per_call=5.66
cmdstat_punsubscribe:calls=60,usec=1181,usec_per_call=19.68
cmdstat_publish:calls=14850264,usec=194492588,usec_per_call=13.10
cmdstat_client:calls=348,usec=114949,usec_per_call=330.31
cmdstat_eval:calls=15,usec=2353,usec_per_call=156.87
cmdstat_evalsha:calls=131774371,usec=9267334162,usec_per_call=70.33
cmdstat_script:calls=108337,usec=277045,usec_per_call=2.56
cmdstat_command:calls=379,usec=376033,usec_per_call=992.17
cmdstat_pfadd:calls=1141298,usec=6958117,usec_per_call=6.10
cmdstat_pfcount:calls=6,usec=186,usec_per_call=31.00
cmdstat_host::calls=3,usec=294,usec_per_call=98.00

# Cluster
cluster_enabled:0

# Keyspace
db0:keys=757200,expires=24896,avg_ttl=3036101



It is important to understand each and every one of the metrics.
Important metrics are


  • config_file  => location of config file, important when you forget its location. :)
  • connected_clients => no of connected clients.
  • used_memory_human => used memory
  • role   => master/slave
  • used_cpu_user   => time spent by CPU in seconds in user space, like processing and returning back the data.
  • used_cpu_sys    => time spent by CPU in seconds in system space because of user calls, like while allocating memory.
  • cmdstat_<command>_calls => calls made for command
  • cmdstat_<command>_usec  => time spent in microsecs for those calls.
  • cmdstat_<command>_usec_per_call  => time spent per call in microsecs for those calls.
  • cluster_enabled => whether the redis is running as part of a cluster.
  • db0:keys   => no of keys 
  • db0:expires  => no of keys on which expiry is set.
In case you wish to create a graph to see the variation of these params, it is advisable to query the redis instances every predefined interval(like 5 mins), and push the data to a location and create graphs. We push the data from all redis instances every 5 mins, push the data to graphite and create graphs using grafana. All sorts of graphs help us feel like we are in control(are we?, aren't the machines going to take over..)
:)




Thursday, 23 February 2017

some interesting redis-cli commands and usages

redis-cli is the command line interface to redis and is used as the client provided by redis. It is normally used to connect to redis and query redis as below.

redis-cli -h 127.0.0.1 -p 6379
127.0.0.1:6379> randomkey
"myhllkey"
127.0.0.1:6379> type myhllkey
string

However there are other ways in which redis-cli can be useful in exploring redis

1) finding out the general redis stats.

redis-3.2.4 $ src/redis-cli --stat -i 2
------- data ------ --------------------- load -------------------- - child -
keys       mem      clients blocked requests            connections          
6          985.97K  1       0       21 (+0)             5           
6          985.97K  1       0       22 (+1)             5           
6          985.97K  1       0       23 (+1)             5           
6          985.97K  1       0       24 (+1)             5           
6          985.97K  1       0       25 (+1)             5  

The above command "redis-cli --stat -i 2"will keep on returning the general info on the redis server after every 2 seconds.
This info includes number of keys, memory, clients, blocked clients, no of requests and connections.
The paramater 'i' is the time interval in seconds after which the redis-cli will query redis. The default value of i is 1 so that "src/redis-cli --stat" will give the data after every 1 second.


2) finding out the big keys in redis.

   "redis-cli --bigkeys" command will scan the redis db and will start giving you information about the biggest keys of each type that it encounters as it scans the redis db.
Finally it will give information about the numbers of keys of each type found, with its average key members and average key size. This is a very useful command because periodically running it and analysing its output can sometimes give a clue if your redis data size is always increasing, and to potentially big keys.
The output for me was as below.

# Scanning the entire keyspace to find biggest keys as well as
# average sizes per key type.  You can use -i 0.1 to sleep 0.1 sec
# per 100 SCAN commands (not usually needed).

[00.00%] Biggest string found so far 'mystringkey' with 13 bytes
[00.00%] Biggest string found so far 'myhllkey' with 90 bytes
[00.00%] Biggest zset   found so far 'mysortedsetkey' with 6 members
[00.00%] Biggest list   found so far 'mylistkey' with 7 items
[00.00%] Biggest set    found so far 'mysetkey' with 4 members
[00.00%] Biggest hash   found so far 'myhashkey' with 2 fields

-------- summary -------

Sampled 6 keys in the keyspace!
Total key length in bytes is 59 (avg len 9.83)

Biggest string found 'myhllkey' has 90 bytes
Biggest   list found 'mylistkey' has 7 items
Biggest    set found 'mysetkey' has 4 members
Biggest   hash found 'myhashkey' has 2 fields
Biggest   zset found 'mysortedsetkey' has 6 members

2 strings with 103 bytes (33.33% of keys, avg size 51.50)
1 lists with 7 items (16.67% of keys, avg size 7.00)
1 sets with 4 members (16.67% of keys, avg size 4.00)
1 hashs with 2 fields (16.67% of keys, avg size 2.00)
1 zsets with 6 members (16.67% of keys, avg size 6.00)

3) scanning keys with pattern

redis-cli --scan --pattern "*list*" will find out all the keys which have the word 'list' in them. Also, it will internally use 'scan' and not keys, and will not block other commands for large amount of time even if the database size is very large.

4) repeatedly querying redis after every interval for a certain number of times.
"redis-cli -r 5 -i 1 set a b" will set the value of key 'a' to be 'b' 5 times after an interval of 1 sec.


redis-cli --help can be used to find out all the options in redis cli.
It has various options like outputting the data in csv format, for finding out the redis latency, system latency, reading arguments from standard input etc.

:)




Tuesday, 21 February 2017

finding out the memory occupied by a single key in redis

Many times while using redis, we need to find out how much memory is occupied by a particular key.
This is very useful because redis keeps everything in memory and we need to make the best use of whatever memory we have to reduce the infrastructure cost.

Redis provides a command 'DEBUG' which can be used for this purpose.

This command gives us information like size about a particular key. However this length is the serialized length(i.e. the bytes used to store this key in the rdb backup file).

The below shows how we can find the information about two keys(one string and one hash) in redis.


In the above, the serialized length is the number of bytes which are used by redis for storing the value in the backup file.

More often than not, we are not interested in the serialized length but the memory used by that key.

Unfortunately redis does not provide any such information by its own(as per my knowledge), but thanks to the open source community, an excellent such utility 'redis-rdb-tools' is written which gives us a very good estimate of the memory used by redis for a particular key.

The link to the utility is here.

Redis stores different keys in different formats, and the utility reverse engineers a key with its type and value, to get the approximate amount of memory used by a key. Although the utility underestimates the memory occupied by a key and the real memory used by a key is a little higher(can be higher by upto 20%), but even then it's very important because it can find out the large keys, and it can generate a CSV with the information about all the keys. The CSV file generated can be further analyzed for more insights into the data.

Usage:

After installing the redis-rdb-tools as described here, we can use it to find out the memory used by a key.

finding out memory for a key from running redis.

redis-3.2.4 $ redis-memory-for-key -s localhost -p 6379 mystringkey
Key "mystringkey"
Bytes 88
Type string
redis-3.2.4 $ redis-memory-for-key -s localhost -p 6379 myhashkey
Key "myhashkey"
Bytes 115
Type hash
Encoding ziplist
Number of Elements 2
Length of Largest Element 6

redis-3.2.4 $ 

finding out memory for a key from a rdb file.

redis-3.2.4 $ rdb -c memory dump.rdb -k mystringkey
database,type,key,size_in_bytes,encoding,num_elements,len_largest_element

0,string,"mystringkey",88,string,13,13

finding out memory for all keys for a pattern.

redis-3.2.4 $ rdb -c memory dump.rdb -k my.*
database,type,key,size_in_bytes,encoding,num_elements,len_largest_element
0,list,"mylistkey",219,quicklist,7,6
0,sortedset,"mysortedsetkey",143,ziplist,6,5
0,hash,"myhashkey",115,ziplist,2,6
0,string,"mystringkey",88,string,13,13
0,string,"myhllkey",168,string,90,90

0,set,"mysetkey",452,hashtable,4,6


finding out memory for all keys for a pattern and exporting to a csv file.

redis-3.2.4 $ rdb -c memory dump.rdb -k my.* -f memory.csv
redis-3.2.4 $ head memory.csv 
database,type,key,size_in_bytes,encoding,num_elements,len_largest_element
0,list,"mylistkey",219,quicklist,7,6
0,sortedset,"mysortedsetkey",143,ziplist,6,5
0,hash,"myhashkey",115,ziplist,2,6
0,string,"mystringkey",88,string,13,13
0,string,"myhllkey",168,string,90,90
0,set,"mysetkey",452,hashtable,4,6

After we have the csv file from the above method, we can do all sorts of analysis after the excel tricks by finding out the largest N keys of a particular type, keys occupying highest amount of memory etc.

:)

Friday, 17 February 2017

how to make redis more proactive in expiring keys.

Redis expires keys in two ways, active way, and passive way.

In the 'passive' way, when a user tries to get a key, redis will check whether the key should have expired, and if the key should be expired, redis will expire the key and return to the user that the key does not exist. This is called 'passive' way because redis did not 'actively' expired the key by itself, but only 'passively' or 'lazily' expired it when it came to know about it on user's request.

In the 'active' way, redis will run background job which will randomly get a sample of keys, and will expire whichever keys need to be expired. Further if it sees that more than 75% of those keys should be expired, it will further get another sample. Lets call this process 'keys expiring process'. The process is described here.

It means that at any point, on an average, there are a maximum of 25% keys which should have expired but did not. Needless to say, each of these keys occupies memory.

However when we restart redis, it loads all the keys, and will expire all the keys which should expire.

However there is a way for us to direct redis to be more 'proactive' in expiring the keys.
In the redis configuration, there is a parameter named 'hz', which is the number of times redis will run the process to remove the expiring keys in a second('keys expiring process' described above).

The default value is 10, so that redis runs 10 such processes in a second.
As per redis.conf file, although the value of 'hz' should be less than 500, it is recommended to keep the value less than 100.

We can increase this value to some value, like 50.

redis-3.0.6 $redis-cli config get hz
1) "hz"
2) "10"
redis-3.0.6 $ cat redis.conf | grep 'hz '
hz 10
redis-3.0.6 $ redis-cli config set hz 50
OK

redis-3.0.6 $ 

It is important to note that increasing the 'hz' value can lead to a slight increase in the CPU usage, so it is recommended to check the CPU usage after increasing it.

When we did it in our production stack, we saw that the number of keys reduced to some point and became stable, and same was the case with memory. CPU usage did not show any spike so, we were quite pleased to recover some more memory. 

:)





Wednesday, 1 February 2017

Redis - How to take individual nodes in redis cluster down for maintainence.

Redis is a very fast caching server, keeps all the data in RAM, and it is able to serve requests with sub microseconds latency.
Considering it keeps all the data in memory, if we need to keep more data than what fits in a single machine's memory, we create a redis cluster.
In a redis cluster, the data is distributed among its N masters, where N >=3. Also it is recommended to replicate the data, so that each master node has a backup node in case it goes down. The same backup node can be used and be promoted to master in case we have to bring down the master node for maintenance.

The main command used for this is the CLUSTER FAILOVER command.
This command should always be executed on a slave node, and it promotes the slave in a cluster to a master, (and master becomes the slave).

Below, we will show how to bring the master nodes of redis down one by one for maintainence(or for other activities)
Lets say you have the cluster in running mode with 3 masters and 3 slaves.

for port in {7000..7005}; do ../src/redis-server $port/redis.conf ; done ; 

I have modified the conf files so that above will start the 6 instances of redis in cluster mode, in daemon threads, and with appropriate logging.
After that, the cluster will be created using

../src/redis-trib.rb create --replicas 1 127.0.0.1:7000 127.0.0.1:7001 127.0.0.1:7002 127.0.0.1:7003 127.0.0.1:7004 127.0.0.1:7005

The cluster can be verified using cluster nodes or cluster info command.

../src/redis-cli -p 7001 cluster nodes

cluster-test $../src/redis-cli -p 7001 cluster nodes
daabc7bdc914f73682e07ed40769766ecee06c8f 127.0.0.1:7000 master - 0 1485859005956 1 connected 0-5460
276ad9dbd675f7e3183ab808549f1436d91e56b3 127.0.0.1:7001 myself,master - 0 0 2 connected 5461-10922
2a2e22e2e03bbc934d972a247b4a9c5c0b341747 127.0.0.1:7005 slave 5c0585a57362901380c392b4f3041777e169adec 0 1485859005653 6 connected
5c0585a57362901380c392b4f3041777e169adec 127.0.0.1:7002 master - 0 1485859006463 3 connected 10923-16383
ad7b5e40d14c4f63b2e1b7a901a3c258920d8898 127.0.0.1:7003 slave daabc7bdc914f73682e07ed40769766ecee06c8f 0 1485859005451 4 connected
ffc3155ca303da41825d42db3d109b67525e1f5a 127.0.0.1:7004 slave 276ad9dbd675f7e3183ab808549f1436d91e56b3 0 1485859005653 5 connected

../src/redis-cli -p 7001 cluster info

cluster-test $ ../src/redis-cli -p 7001 cluster info
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:6
cluster_my_epoch:2
cluster_stats_messages_sent:9298
cluster_stats_messages_received:9081

The general rules are these.

  1. A slave can be brought down without a problem.
  2. A master should be first converted to a slave before bringing it down.
  3. After converting a master to a slave, always verify it.


For bringing down a slave like the instance running on 7005, we can simple stop redis, using shutdown or by killing the process(ideally shutdown should be renamed command).
../src/redis-cli -p 7005 shutdown

After doing the maintainence activity, we can again start redis using.
../src/redis-server 7005/redis.conf

For bringing down a master(like instance on 7002), we need to convert it into a slave and then stop the redis server.
This is done by running the cluster failover command on the slave of the master.
To find out which instance is the slave of the 7002 instance, we can execute the info replication command on 7002 instance

cluster-test $ redis-cli -p 7002 info replication
# Replication
role:master
connected_slaves:1
slave0:ip=127.0.0.1,port=7005,state=online,offset=5729,lag=0
master_repl_offset:5729
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:5716
repl_backlog_histlen:14
cluster-test $ 

From the info replication command above, we can see that the slave of instance on port 7002 is instance on 7005.

So, we run the cluster failover command on the slave(7005), so that it becomes the master, and 7002 becomes the slave.

cluster-test $ ../src/redis-cli -p 7005 cluster failover
OK
cluster-test $ ../src/redis-cli -p 7005 cluster failover
(error) ERR You should send CLUSTER FAILOVER to a slave

Note that many times the cluster failover command returns "OK", but the slave is not promoted to a master, because the slave may not be in sync with the master. So, it is strongly recommended to run the command again so that we can be sure that 7005 has converted to the master.

We can also check it by executing the info replication command.

cluster-test $ redis-cli -p 7005 info replication
# Replication
role:master
connected_slaves:1
slave0:ip=127.0.0.1,port=7002,state=online,offset=16762,lag=1
master_repl_offset:16762
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:16693
repl_backlog_histlen:70

After that, since the 7002 instance is a slave, it can be brought down by the usual shutdown command(or by killing the redis process or by stopping the service).

../src/redis-cli -p 7002 shutdown

:)

Normally, shutdown is one of the dangerous commands which should be disabled or atleast renamed in the redis servers to avoid the misuse.



Monday, 30 January 2017

Redis - setting redis query timeouts using Lettuce in java

While writing applications, sometimes we need to get the data within a particular time, and if the time taken by the redis or the networks is more than a particular time, the application may not want to wait that long for the data to come back and may take some other route.

This is particularly important in scenarios like getting the user profiles while writing adservers where the overall response time is expected to be less than 100 ms, and if any component takes more than a few milliseconds, it should be discarded and the flow should continue without it. In the above scenarios, to make sure that the requests are served as fast as possible, while having a backup plan(how to get the required data in case the timeout happens from redis), it is a good idea to set the timeout.

Normally redis is very fast, and can serve data in a few microseconds(1 microsecond is a thousandth of a millisecond), but it is possible that the network between the redis and the application is slow and it is leading to slow queries.

The Redis client library in Java, Lettuce, provides a way for the application to set a timeout on the query. This way, the application for only wait for that time for the response to come back, otherwise it will proceed by throwing a timeout exception.

You will need the following entry in pom.xml file.

<dependency>
    <groupId>biz.paluch.redis</groupId>
    <artifactId>lettuce</artifactId>
    <version>4.3.0.Final</version>
</dependency>


The below code demonstrates the same.

public static void main(String[] args) {
String key = "testKey";
String value = null;

RedisURI uri = RedisURI.create("localhost", 6379);
RedisClient client = RedisClient.create(uri);

/*
* Normal way
*/
RedisCommands<String, String> commands = client.connect().sync();
System.out.println(commands.get(key));


/* 
* Reactive way
*/
RedisReactiveCommands<String, String> reactiveCommand = client.connect().reactive();
rx.Observable<String> observable = reactiveCommand.get(key);

try{
value = observable.timeout(100, TimeUnit.MILLISECONDS).toBlocking().singleOrDefault(null);
} catch (RuntimeException e) {
if (e.getCause() instanceof TimeoutException){
System.out.println("timeout exception thrown");
}
}

System.out.println(value);
reactiveCommand.close();



The above code shows both ways to get the data using Lettuce Client, ie. Normal way and the reactive way.
In the normal way, we get the data from redis regardless of how long redis(and the network) take to return the data.
However, in the reactive way, if the value of "testkey" in not returned within 100 ms, a Timeout exception wrapped inside RuntimeException is thrown.

The application may handle the exception and take whatever it needs to do(like get the user's profile from a different source or ignore the user's profile altogether, treating it as a new user or depending on the business use case).

Note that it is a good idea to log such timeouts and analyse the reasons if the timeouts occur very frequently.

:)