I got inspired by CockroachDB’s article about their vector search index, and recreated a FoundationDB optimized version of its core ideas. It is using the algorithms from the same papers with some additional ideas, and the maintenance method was changed to support FoundationDB’s limitations. Vector insertion and index maintenance are strict serializable while search is eventually consistent / not consistent if cached.
Index structure:
It is a real-time index without any initial clustering or static build, so an index starts up with 0 vectors and grows online as you insert vectors into it with our OLTP process, either one-by-one or in batches. You can insert a vector into one or more vector indexes. The index is a hieratchical, multi-level IVF tree where nodes are either internal partitions or leaf partitions identified by their centroid, which is the average coordinate of their children. The insert will traverse the index from top to bottom finding the closest partition on each level until it reaches one of the leaves where it inserts the vector.
Quantization:
By default, we insert the vector into the index using 1-bit quantization with the RaBitQ algorithm. This means that a 4096 dimensional vector takes 512 bytes instead of 16 kilobytes. If you want to store your raw vector in 2 different indices, the total space needed is 17 kbyte (16 + 0.5 + 0.5) instead of 48 kbyte (3 x 16). The RaBitQ algorith provides a rigorous error bound, so we use reranking in the search phase to get high quality results from the candidate set. Quantization calculation use the Java Vector API for native SIMD instructions. You can use 1-9 bit quantization with or without reranking, although 1 bit with reranking is recommended.
Index maintenance and growth:
When a leaf has too many vectors, a message is sent to QFoundation’s already existing task queue layer, where a task consumer executes bite-sized maintenance operations within 5 seconds / 10 megs. It runs IO queries in parallel using structured concurrency and virtual threads. The maintenance is a distributed operation (it is paritioned and the more processes you spawn, the more parallellism you get automatically) with fair processing, so frequently updated indices do not affect the processing of rarely updated indices (it queues “pointers to sub-queues of tasks” instead of “tasks”, somewhat similar to Apple’s QuiCK algorithm). Leaf partition splits use the LIRE algorithm from the SPFresh paper to reshuffle local data just like C-SPANN does, but internal partition splits use additional rounds of k-means clustering from the Quake and DeDrift papers for better empirical results.
Search:
Beam search locates the best leaf candidates, then we locate all the candidates that might be in the top k nearest neighbours with ~99% probability. We then load the raw candidate vectors from the DB and select the true k nearest neighbors. We can increase the recall rate with more beams (to mask drifts in the index, the true source of inaccuracies).
Observability:
There is a vector index dashboard on the admin ui (optional), but metrics also flow to your own micrometer collectors like Prometheus.
Recall results:
Recall@10 was measured using high-quality LLM embedding vectors (Wikipedia article embeddings). Vectors were inserted in batches of 32 per transaction with insertion throttled during outstanding maintenance tasks to simulate natural incremental index growth.
| Model | Dimensions | Vector Count | Quantization | Reranking | Beams | Recall |
|---|---|---|---|---|---|---|
| Qwen 3 | 4096 | 100,000 | 1-bit | Yes | 64 | 100.0% |
| Qwen 3 | 4096 | 100,000 | 1-bit | Yes | 32 | 99.1% |
| Qwen 3 | 4096 | 100,000 | 1-bit | Yes | 16 | 97.7% |
| Qwen 3 | 4096 | 100,000 | 1-bit | Yes | 8 | 94.3% |
| Qwen 3 | 4096 | 500,000 | 1-bit | Yes | 64 | 99.0% |
| Qwen 3 | 4096 | 500,000 | 1-bit | Yes | 32 | 97.5% |
| Qwen 3 | 4096 | 500,000 | 1-bit | Yes | 16 | 95.6% |
| Qwen 3 | 4096 | 500,000 | 1-bit | Yes | 8 | 92.1% |
| OpenAI | 1536 | 100,000 | 1-bit | Yes | 64 | 99.2% |
| OpenAI | 1536 | 100,000 | 1-bit | Yes | 32 | 97.7% |
| OpenAI | 1536 | 100,000 | 1-bit | Yes | 16 | 95.7% |
| OpenAI | 1536 | 100,000 | 1-bit | Yes | 8 | 92.3% |
| OpenAI | 1536 | 1,000,000 | 1-bit | Yes | 64 | 95.6% |
| OpenAI | 1536 | 1,000,000 | 1-bit | Yes | 32 | 92.3% |
| OpenAI | 1536 | 1,000,000 | 1-bit | Yes | 16 | 87.2% |
| OpenAI | 1536 | 1,000,000 | 1-bit | Yes | 8 | 80.1% |
How to try:
- Create a Java application using the Quarkus framework
- Include the QFoundation extension
- Turn on the Vector layer and the Messaging layers of QFoundation:
qfoundation.layers.vectors=true
qfoundation.layers.messaging=true
- Create an index and insert some vectors:
@Inject
Persistence persistence;
@Inject
VectorIndexer vectorIndexer;
@Transactional
void createIndexAndInsertVectors() {
var index = vectorIndexer.createIndex(4096); // Uses default parameters
// Create 3 raw vector entities and insert them into FDB
var v1 = persistence.persist(new Vector(id1, floatArray1));
var v2 = persistence.persist(new Vector(id2, floatArray2));
var vN = persistence.persist(new Vector(idN, floatArrayN));
// Insert all 3 vectors into this one index
vectorIndexer.insert(index, List.of(v1, v2, vN));
}
- Launch ANN searches:
// This returns the top k approximate nearest neighbours to the query vector from a given index
List<SearchResult> results = vectorSearcher.nearestNeighbors(
index,
queryVector,
new SearchSettings(
10, // Top-k results to retrieve
32, // Beam count for search
10.0, // Reranking cap: fetch at most k × 10.0 raw vectors
true, // Enable cache
true // Enable reranking
)
);
- It works as is (java app), or built as a native graalvm application
- More details in the QFoundation docs.