Versionstamp ranged deletes

newhook · January 25, 2024, 6:18pm

I have a usecase where I’m using versionstamps to insert a rolling set of data which I want to delete over time – a use-case like having the last 30 days of data.

My idea was to delete the old data with a ranged delete at regular intervals. However, as best as I can see that requires keeping a some sort of mapping between time and versionstamps so I can look up which versionstamp is most closely associated with 30 days ago, and delete up to that key.

Reading Versionstamp as absolute time I can see there is a built in way to do that, except its not exposed via an official API.

Is there some other mechanism I’m not thinking of to accomplish my goal (besides keeping the mapping in the userspace the database)?

Imperatorx · January 26, 2024, 5:19pm

Since on each recovery the versionstamp can increase by hundreds of millions (unreliable to calculate exact time), I would just set up a scheduled job to create a current time to versionstamp mapping every minute or so in the userspace. Or I would just generate the ids of the records on the client side and use a TSID that encodes the current time.

jzhou · January 27, 2024, 6:05am

The Timekeeper as mentioned in the other thread keeps the mapping every 10s. There is already code to convert a Unix timestamp to a FDB version. Note the error is about 10s. This is probably the easiest way.

SteavedHams · January 29, 2024, 8:45pm

Having Timekeeper lookup functions in the FDB API would be a nice feature.

MMcM · January 29, 2024, 9:04pm

You did not mention languages, but there is also a Java version of this in Record Layer.

Imperatorx · January 30, 2024, 4:52pm

Does this mapping data survive a backup → restore to empty cluster cycle? (is the 0xFF keyspace backed up/restored?)

SteavedHams · January 30, 2024, 6:57pm

The mapping is not included in a backup or restored by restore. Both backup and restore accept a list of target ranges, and while I don’t think anything would block inclusion of the Timekeeper range in both operations I’m not sure exactly what the result would be. The destination cluster’s version timeline is independent of the source cluster and it has its own Timekeeper data, and it would also be generating more Timekeeper data during restore. Depending on how the cluster versions in the backup data and the destination cluster align, weird things could happen.

Topic		Replies	Views
Versionstamp as absolute time Using FoundationDB	3	1503	November 18, 2020
VersionStamp uniqueness and monotonicity Using FoundationDB	11	2773	March 4, 2019
Use case of versionstamp and behavior of pack_with_versionstamp Using FoundationDB bindings	6	2401	November 26, 2018
VersionStamp vs CommittedVersion Using FoundationDB	6	3911	September 17, 2018
Deleting data up to a specific version Using FoundationDB	1	267	March 29, 2023

Versionstamp ranged deletes

Related topics