Hi,
I’m trying to evaluate the relevance of FoundationDB for analytical workflows, typically involving large aggregations/joins that require large scans/reads on large ranges of data, and heavy bulk-inserts. Think map-reduce/Presto workflows.
Looking at the docs (quotes below), I understand that analytical workflows are not built-in, but could be implemented using sequential scans.
I tried to look at a bunch of layers / projects based on FDB, and I couldn’t find good examples of analytical usage - did I miss something, or should I conclude that it’s just not a good fit for FDB ?
Assuming we work-aroud the transaction duration & size limits (i.e. break-down into several transactions), which maximum throughput could be achieved per client on sequential scans ?
My guess is that scanning single-value types from FDB (i.e. one integer per key) would be slower than other storage solutions built for high-scale range scans, so analytics on FDB would probably require storing vectors rather than single values.
Or maybe it’s an anti-pattern to use FDB as primary storage for such use-cases, and FDB is a better fit as a metadata transactional layer used in combination with an external data storage ? (such as blob storage, for example Datadog’s Husky)
Links to the docs:
https://apple.github.io/foundationdb/anti-features.html
Analytic frameworks are outside the scope of the FoundationDB core. However, because the core can use range reads to efficiently scan large swaths of data, analytics can be implemented within a layer, possibly as part of a query language.
https://apple.github.io/foundationdb/features.html
However, because it is an ordered key-value store, FoundationDB can use range reads to efficiently scan large swaths of data. Thus, FoundationDB can be effectively used for online analytical processing (OLAP) workloads as well.