OLAP / sequential scan performance

Bourse_Willis · November 11, 2023, 4:45pm

Hi,

I’m trying to evaluate the relevance of FoundationDB for analytical workflows, typically involving large aggregations/joins that require large scans/reads on large ranges of data, and heavy bulk-inserts. Think map-reduce/Presto workflows.

Looking at the docs (quotes below), I understand that analytical workflows are not built-in, but could be implemented using sequential scans.

I tried to look at a bunch of layers / projects based on FDB, and I couldn’t find good examples of analytical usage - did I miss something, or should I conclude that it’s just not a good fit for FDB ?
Assuming we work-aroud the transaction duration & size limits (i.e. break-down into several transactions), which maximum throughput could be achieved per client on sequential scans ?

My guess is that scanning single-value types from FDB (i.e. one integer per key) would be slower than other storage solutions built for high-scale range scans, so analytics on FDB would probably require storing vectors rather than single values.

Or maybe it’s an anti-pattern to use FDB as primary storage for such use-cases, and FDB is a better fit as a metadata transactional layer used in combination with an external data storage ? (such as blob storage, for example Datadog’s Husky)

Links to the docs:

https://apple.github.io/foundationdb/anti-features.html

Analytic frameworks are outside the scope of the FoundationDB core. However, because the core can use range reads to efficiently scan large swaths of data, analytics can be implemented within a layer, possibly as part of a query language.

https://apple.github.io/foundationdb/features.html

However, because it is an ordered key-value store, FoundationDB can use range reads to efficiently scan large swaths of data. Thus, FoundationDB can be effectively used for online analytical processing (OLAP) workloads as well.

Topic		Replies	Views
Streaming data out of FoundationDB Using FoundationDB	2	2610	September 11, 2018
Stored procedures FoundationDB Layers performance	8	1707	May 5, 2020
Design document of internals & storage? FoundationDB Core	4	2029	April 20, 2018
FoundationDB read performance Using FoundationDB performance	4	1490	September 26, 2018
Range Reads greater and 5 sec and 10000000 bytes Using FoundationDB	4	1502	March 22, 2022

OLAP / sequential scan performance

Related topics