Poison

Redis 中 Bitmap 的存储实现

在 Redis 官网的数据类型介绍中对 Bitmap 部分有如下描述:

Bitmaps are not an actual data type, but a set of bit-oriented operations defined on the String type. Since strings are binary safe blobs and their maximum length is 512 MB, they are suitable to set up to 232 different bits.

可知,Redis 中的 Bitmap 由 String 实现,且因为 Redis 中的 String 是二进制安全的且最大长度为 512MB,所以支持存储 512 × 1024 × 1024 × 8 = 4294967296 = 232 个比特位。其中还提到仅需 512MB 即可存储 40 亿用户的比特位信息。

以上是 Redis 中对 Bitmap 的介绍,但是有一个明显的缺点即为 Redis 中的 Bitmap 实现是未支持任何压缩特性的,即我们将值为 232 - 1 的仅一个整数存储至 Redis,也需要申请 512MB 的内存。我曾处理过一次涉及 Redis Bitmap 使用的线上事故,原因即为开发对业务需要使用的 Bitmap 个数及内存占用没有做好容量预估,导致上线后 Redis 内存被打满,引起了后续一系列问题。

当时场景大概是这样的,为每个商品维护了一个 Bitmap 用户存储访问过该商品的用户 id 集合,而 Bitmap 的内存占用取决于该 Bitmap 的最大偏移量的 Bit 位,我们使用一个 Redis 实例来演示,首先使用 INFO memory 查看当前该实例的内存占用,关于内存占用部分的输出如下:

1
2
3
# Memory
used_memory:853232
used_memory_human:833.23K

此时我们执行 Redis 命令 SETBIT mykey 100000000 1 将用户 id 为 100000000 的 Bit 位设置为 1,并使用 BITCOUNT mykey 验证 mykey 对应的 Bitmap 中只有一位比特位为 1。再次调用 INFO memory 命令输出如下:

1
2
3
# Memory
used_memory:13436192
used_memory_human:12.81M

可以看出,即使我们只存储了一个用户 id,但是 Redis 内存却增加了 12MB 左右,即约等于 100000000 / 8 / 1024 /1024 (还有对象头等内存开销),假如我们现在有 50000 个商品,需要使用 50000 个 Bitmap 来存储各个商品的用户访问记录,那么极端情况下就需要 50000 × 10 = 500000 MB 约等于 500GB 的内存占用,可知,对内存的占用还是很高的,这就是之前线上事故的触发原因。

RoaringBitmap 的文档中提到了什么时候需要使用压缩的 Bitmap,其中描述如下:

An uncompressed BitSet can use a lot of memory. For example, if you take a BitSet and set the bit at position 1,000,000 to true and you have just over 100kB. That’s over 100kB to store the position of one bit. This is wasteful even if you do not care about memory: suppose that you need to compute the intersection between this BitSet and another one that has a bit at position 1,000,001 to true, then you need to go through all these zeroes, whether you like it or not. That can become very wasteful.

This being said, there are definitively cases where attempting to use compressed bitmaps is wasteful. For example, if you have a small universe size. E.g., your bitmaps represent sets of integers from [0,n) where n is small (e.g., n=64 or n=128). If you are able to uncompressed BitSet and it does not blow up your memory usage, then compressed bitmaps are probably not useful to you. In fact, if you do not need compression, then a BitSet offers remarkable speed.

The sparse scenario is another use case where compressed bitmaps should not be used. Keep in mind that random-looking data is usually not compressible. E.g., if you have a small set of 32-bit random integers, it is not mathematically possible to use far less than 32 bits per integer, and attempts at compression can be counterproductive.

References

An introduction to Redis data types and abstractions – Redis
SETBIT – Redis
Binary-Safe String