Efficient way to scan entire database

We have a database with 10TB+ key-value size. We use record layer on top of it.
What is the most efficient way to scan the entire database?

I think we probably need more information about what you are trying to do to give a useful answer.

Is this database one single record layer or many? When you say the entire database, do you mean concretely all the records in the database or do you want to traverse everything? What does this scanning do?

I suspect that part of the answer is to use multiple reader processes. Perhaps with each assigned a portion of the record store, divided up according to the locality API. And, if the processing you need to do is variable, perhaps divided into many more pieces and put into a pending queue so that it smooths out. But even that might not be correct if the database is under active load and by efficient you don’t mean devoting as much of the cluster to getting the job done as possible.

Just one record layer.

Might need to traverse all records. Primary key is a combination of two fields (f1,f2) and one use case is to delete records if f2 is present in given list of keys. Yes, I was looking into multiple reader approach. This job will probably run in background without disturbing main workload to cluster.

Also, what is best way to count/get total number records?

When breaking things up, you may be interested the getPrimaryKeyBoundaries API, which given a range, will tell you the primary keys closest to where FDB is sharding your data, so for a 10 TB database, you could run that one machine to generate split points, then farm the scan out to multiple readers so that each reader reads a different set of shards. This could allow you to spread out work more evenly than if you needed to do something like pick split points yourself (unless you have domain knowledge of your primary key distribution).

You can define a COUNT index that will tell you the total number of records and/or the total number of records of a given type. See this example here: fdb-record-layer/Main.java at d17664ff507cfef07af6f9bdce0b6e9648ce81f7 · FoundationDB/fdb-record-layer · GitHub

For an ungrouped COUNT index, reading the record count is O(1). You can add additional grouping keys, in which case the index will need to aggregate up 1 key per grouping key value if you scan over multiple groups.

Not sure about your mutation pattern, but note that adding this index can cause hot sharding problems, as defining the index will now cause every record insert and delete to update the COUNT index key(s). This may or may not be a problem for you depending on your mutation rate (and your hardware), and it might be something you’ll need to experiment with. If you can’t use the index due to hot shard concerns, I think the only alternative to get the count is to scan over the full store (possibly breaking it up using the strategy outlined above) and then collect the count.

Also, if you want the size (in bytes) rather than the count, we did add some methods to use some FDB-maintained sampling data in Record Layer 3.0 to produce an approximation, though that’s currently not published to maven (yet) as recent build changes broke our maven publishing pipeline. For the estimate size changes, see: Expose range size estimate on Record Stores · Issue #1229 · FoundationDB/fdb-record-layer · GitHub