Online Indexing on large databases

Background: I have a cluster running on a triple redundancy mode, ssd, 16TB (Sum of K-V sizes), 65TB Disk space. It has >~1.4B records.
Post migration of the records and corresponding indexes, I want to build another index on this record store. What would be the most performant way of building an index on such a store?

Example for reference of the metadata:

    message foo { 
        int32 foo1;        // Primary Key
        int32 foo2;        // Already existing index
        int32 foo3;        // index to be built

Good question. This is somewhat of a hole in our documentation, so I’ve submitted an issue about making the docs in this area better:

There are a few things you might want to do to more performantly build this index.

  1. The OnlineIndexer is deliberately rate limited to avoid overwhelming “foreground” traffic. You can make a single indexer faster by playing with some of the parameters into it, such as the “limit”, which controls the number of records it will index in a single transaction and the “records per second”, which controls its target rate of indexing. (Note that in newer versions of the Record Layer, there are also controls over how many bytes to allow in a single transaction. This is set very high, by default, and is mainly included as a protection against large transactions (which degrade FDB performance), so it probably shouldn’t be adjusted, but having it in place does make it safer to set a higher limit.) Note that if you are too aggressive with the values there, you can hurt the cluster’s performance, so it might be something you need to tune. Note that if you want to do something like start an indexer, see if helps, and then increase the limit, it is perfectly safe to terminate an indexer and restart it, and you won’t lose any progress if you do so.
  2. You can try building different parts of the index in parallel. In particular, the OnlineIndexer has a method called splitIndexBuildRange, which looks at the distribution of keys in FDB to produce (roughly) equal ranges of records to index. The steps here are a bit manual, as the feature is still experimental, but if you create an index builder and call splitIndexBuildRange and then create additional indexers for each range, you can run those index builds in parallel.

Another tip: you probably want to call setProgressLogIntervalMillis on the indexer when you create it, which will cause it to write a log message at the specified interval with its progress. That way, you can monitor how the index build is going. If you call setTrackProgress, then the number of records indexed also gets written to the database so you can then see how many records have been indexed in total by looking at the index state.

Docs on the OnlineIndexer.Builder here for the complete list of options:
And on splitBuildIndexBuildRange:,int)