Dynamic Vector ANN Search Layer with Fair Distributed Maintenance

Imperatorx · January 29, 2026, 10:19am

I got inspired by CockroachDB’s article about their vector search index, and recreated a FoundationDB optimized version of its core ideas. It is using the algorithms from the same papers with some additional ideas, and the maintenance method was changed to support FoundationDB’s limitations. Vector insertion and index maintenance are strict serializable while search is eventually consistent / not consistent if cached.

Index structure:

It is a real-time index without any initial clustering or static build, so an index starts up with 0 vectors and grows online as you insert vectors into it with our OLTP process, either one-by-one or in batches. You can insert a vector into one or more vector indexes. The index is a hieratchical, multi-level IVF tree where nodes are either internal partitions or leaf partitions identified by their centroid, which is the average coordinate of their children. The insert will traverse the index from top to bottom finding the closest partition on each level until it reaches one of the leaves where it inserts the vector.

Quantization:

By default, we insert the vector into the index using 1-bit quantization with the RaBitQ algorithm. This means that a 4096 dimensional vector takes 512 bytes instead of 16 kilobytes. If you want to store your raw vector in 2 different indices, the total space needed is 17 kbyte (16 + 0.5 + 0.5) instead of 48 kbyte (3 x 16). The RaBitQ algorith provides a rigorous error bound, so we use reranking in the search phase to get high quality results from the candidate set. Quantization calculation use the Java Vector API for native SIMD instructions. You can use 1-9 bit quantization with or without reranking, although 1 bit with reranking is recommended.

Index maintenance and growth:

When a leaf has too many vectors, a message is sent to QFoundation’s already existing task queue layer, where a task consumer executes bite-sized maintenance operations within 5 seconds / 10 megs. It runs IO queries in parallel using structured concurrency and virtual threads. The maintenance is a distributed operation (it is paritioned and the more processes you spawn, the more parallellism you get automatically) with fair processing, so frequently updated indices do not affect the processing of rarely updated indices (it queues “pointers to sub-queues of tasks” instead of “tasks”, somewhat similar to Apple’s QuiCK algorithm). Leaf partition splits use the LIRE algorithm from the SPFresh paper to reshuffle local data just like C-SPANN does, but internal partition splits use additional rounds of k-means clustering from the Quake and DeDrift papers for better empirical results.

Search:

Beam search locates the best leaf candidates, then we locate all the candidates that might be in the top k nearest neighbours with ~99% probability. We then load the raw candidate vectors from the DB and select the true k nearest neighbors. We can increase the recall rate with more beams (to mask drifts in the index, the true source of inaccuracies).

Observability:

There is a vector index dashboard on the admin ui (optional), but metrics also flow to your own micrometer collectors like Prometheus.

Recall results:

Recall@10 was measured using high-quality LLM embedding vectors (Wikipedia article embeddings). Vectors were inserted in batches of 32 per transaction with insertion throttled during outstanding maintenance tasks to simulate natural incremental index growth.

Model	Dimensions	Vector Count	Quantization	Reranking	Beams	Recall
Qwen 3	4096	100,000	1-bit	Yes	64	100.0%
Qwen 3	4096	100,000	1-bit	Yes	32	99.1%
Qwen 3	4096	100,000	1-bit	Yes	16	97.7%
Qwen 3	4096	100,000	1-bit	Yes	8	94.3%
Qwen 3	4096	500,000	1-bit	Yes	64	99.0%
Qwen 3	4096	500,000	1-bit	Yes	32	97.5%
Qwen 3	4096	500,000	1-bit	Yes	16	95.6%
Qwen 3	4096	500,000	1-bit	Yes	8	92.1%
OpenAI	1536	100,000	1-bit	Yes	64	99.2%
OpenAI	1536	100,000	1-bit	Yes	32	97.7%
OpenAI	1536	100,000	1-bit	Yes	16	95.7%
OpenAI	1536	100,000	1-bit	Yes	8	92.3%
OpenAI	1536	1,000,000	1-bit	Yes	64	95.6%
OpenAI	1536	1,000,000	1-bit	Yes	32	92.3%
OpenAI	1536	1,000,000	1-bit	Yes	16	87.2%
OpenAI	1536	1,000,000	1-bit	Yes	8	80.1%

How to try:

Create a Java application using the Quarkus framework
Include the QFoundation extension
Turn on the Vector layer and the Messaging layers of QFoundation:

qfoundation.layers.vectors=true
qfoundation.layers.messaging=true

Create an index and insert some vectors:

@Inject
Persistence persistence;

@Inject
VectorIndexer vectorIndexer;

@Transactional
void createIndexAndInsertVectors() {
    var index = vectorIndexer.createIndex(4096); // Uses default parameters

    // Create 3 raw vector entities and insert them into FDB
    var v1 = persistence.persist(new Vector(id1, floatArray1));
    var v2 = persistence.persist(new Vector(id2, floatArray2));
    var vN = persistence.persist(new Vector(idN, floatArrayN));

    // Insert all 3 vectors into this one index
    vectorIndexer.insert(index, List.of(v1, v2, vN));
}

Launch ANN searches:

// This returns the top k approximate nearest neighbours to the query vector from a given index
List<SearchResult> results = vectorSearcher.nearestNeighbors(
    index,
    queryVector,
    new SearchSettings(
        10,    // Top-k results to retrieve
        32,    // Beam count for search
        10.0,  // Reranking cap: fetch at most k × 10.0 raw vectors
        true,  // Enable cache
        true   // Enable reranking
    )
);

It works as is (java app), or built as a native graalvm application
More details in the QFoundation docs.

Topic		Replies	Views
Lucene-Layer on FoundationDB Development	6	3625	December 11, 2020
Version Vector feature Running FoundationDB	0	65	June 3, 2026
Secondary indexing approaches Using FoundationDB	6	3046	October 20, 2018
UKV Docs, Graphs, and Vector layers for FoundationDB FoundationDB Layers bindings	0	654	January 18, 2023
QFoundation - Entity layer for Java FoundationDB Layers	3	425	January 6, 2025

Dynamic Vector ANN Search Layer with Fair Distributed Maintenance

Related topics