Impact of encoding on range read performance

(mehmet onsiper) #1

Hello,

I have been running some tests to get an idea about the impact of different encodings on range read performance. To be more specific, I had 16 digit integer timestamps as keys and tested first by encoding them with utf-8, base85 and fdb’s tuple layer and then performing as many random 1000-key range reads as possible in a minute via 20 threads. I ran these tests on a 5 node ssd cluster with double redundancy and 20 million loaded rows. I found out that fdb.tuple.pack results in smallest key size (8bytes) vs base85’s 10 bytes and utf-8’s 16bytes, however, in terms of range read performance (num of keys/sec), base85 seems to be performing 3 times better than utf-8 and fdb’s tuple layer. More surprisingly, fdb’s tuple layer performs slightly worse than utf-8 at times. I’m just wondering if this is expected/reasonable behavior (especially the fact that base85 performs 3 times better than fdb’s tuple layer despite having longer key size).

Thanks,

(Ryan Worl) #2

The key value store doesn’t care what the keys are encoded with. It only cares about the size, and smaller keys means more keys fit in memory and less disk IO.

Your language bindings, on the other hand, may choose to automatically decode keys written with the tuple layer when you read them back out.

The tuple layer will allocate at least one of whatever the array representation is in your language. Parsing a UTF-8 string into a number is very well optimized in many languages and you could even do it without allocating any additional memory by reusing a buffer across calls (see Go’s strconv package). The same can be said of the base-85 encoding.

If you need the absolute highest performance you could make a key format yourself and write the bytes of the integer into the key directly (big-endian if you need sorting), which wouldn’t require any (significant) decoding.

1 Like
(Christophe Chevalier) #3

Given the fact that you are seeing 3x performance difference simply by changing from 10-bytes (base85) to 16-bytes (“utf-8”) or 8-bytes (tuple), I would think that the main difference is in the parsing of keys by the binding itself, less than the querying of the keys at the cluster level.

I’ve done a lot of profiling of the tuple encoding implementation of one of the binding, and it can become a cpu bottleneck especially in synthetic benchmark where your thread spend most of its time encoding/decoding millions of keys (the rest is IO waiting for the data to come back). In garbage collected language, the most significant waste is the allocation/garbage collection of the temporary buffers, and also a lot of memory copying.

If in real life, you will spend a lot of time reading and parsing millions of records at a time, then you should definitely try to optimize the handling of the keys. If on the other hand you will read/write a few keys at a time per request, the overhead of the rest of the code (http request overhead, acl checks, business logic, etc…) will dwarf the cpu required to encode/decode keys.

If you are only storing a single type of keys which is a 16-byte integer, then you should probably try to roll your own encoding: It can be as simple as storing your keys as a 16-byte big-endian byte blob (to preserve order). This will take 16 bytes per key but will be dirt cheap to encode/decode.

If your 16-byte identifier is a GUID then there’s probably no better encoding than that.

If you 16-byte identifier is a sequential number starting a 1, then you may look into some sort of varint encoding that preserve lexicographical order. The encoding used by the tuple layer can be a good candidate, and you could probably may extract just the code that deal with integers, and discard all the overhead in the tuple layer that has to deal with variable size tuples and dynamic typing, to get the best performance as possible.

If you need to store other data alongside, or do some sort of multi-tenant storage, then you can either simply use the tuple layer and hope that the performance will be sufficient, or just add a custom prefix to the 16-byte keys?

(Seph Gentle) #4

Out of curiosity, which bindings are you using? (C/python/java/go/node/rust/etc)?

(Steve Atherton) #5

A decimal integer with 16 digits is in the 7-byte binary range. The Tuple encoded form would have 1 header bytes and then the 7 value bytes, high bits first, but getting it into and out of that form is not particularly efficient, as you’ve noted.

If your language represents the timestamps natively as 64 bit integers then your best performing option might be to convert the integer to BigEndian (usually a very fast operation) and use the raw 8 bytes of the integer as your key (or key suffix).

(mehmet onsiper) #6

I was using Python bindings.