Sorting keys chronologically

Hi,

I am defining a selector range and iterating over all the key values using stream. Each key-value has a timestamp associated with it. Is there any way we can sort all the keys chronologically and then iterate? There are approximately 170 million keys.

The FDB keys will always be returned back to the client in the unsigned lexicographic byte order of the keys. (You do have the option to scan in reverse, i.e., in descending order, instead of in ascending order.)

In terms of increasing sorting speeds, there’s not much you can do once the data are written. You could maybe try and parallelize the reads, but then you’d still need to read all of the data prior to sorting, potentially using the shard boundaries (exposed via the locality API: LocalityUtil (FoundationDB Java Client API) or its equivalent in your binding of choice) to get a rough list of reasonable places to split the data up, but you’d still need to read it all somewhere to combine it and sort it.

The suggested alternative would be to change the way the data are written so that the order you want to read them in lines up with the order of the keys. The FDB tuple layer, for example, provides an order-preserving data serialization scheme that can be used for this purpose: Data Modeling — FoundationDB 7.2. Another common technique would be indexing the timestamp of these keys, which would mean maintaining a second subspace where the keys were ordered by timestamp and which point to the original keys somehow. There’s more on how you might maintain such an index later in the data modeling guide: Data Modeling — FoundationDB 7.2

The Record Layer has mechanisms for defining and maintaining indexes on data, so that’s something you could explore if you need a more sophisticated API than raw key-value.

1 Like