I’m trying to evaluate the relevance of FoundationDB for analytical workflows, typically involving large aggregations/joins that require large scans/reads on large ranges of data, and heavy bulk-inserts. Think map-reduce/Presto workflows.
Looking at the docs (quotes below), I understand that analytical workflows are not built-in, but could be implemented using sequential scans.
I tried to look at a bunch of layers / projects based on FDB, and I couldn’t find good examples of analytical usage - did I miss something, or should I conclude that it’s just not a good fit for FDB ?
Assuming we work-aroud the transaction duration & size limits (i.e. break-down into several transactions), which maximum throughput could be achieved per client on sequential scans ?
My guess is that scanning single-value types from FDB (i.e. one integer per key) would be slower than other storage solutions built for high-scale range scans, so analytics on FDB would probably require storing vectors rather than single values.
Or maybe it’s an anti-pattern to use FDB as primary storage for such use-cases, and FDB is a better fit as a metadata transactional layer used in combination with an external data storage ? (such as blob storage, for example Datadog’s Husky)
Links to the docs:
Analytic frameworks are outside the scope of the FoundationDB core. However, because the core can use range reads to efficiently scan large swaths of data, analytics can be implemented within a layer, possibly as part of a query language.
However, because it is an ordered key-value store, FoundationDB can use range reads to efficiently scan large swaths of data. Thus, FoundationDB can be effectively used for online analytical processing (OLAP) workloads as well.