Apache Arrow and FoundationDB

omidb · January 21, 2021, 8:05pm

Hi,

Apache Arrow data format is being used more and more every day. It’s very useful for data analytics because many libraries support it for zero-copy computation (no de-serialization cost). Also, Arrow has an IPC for data transfers.

Has anyone used Arrow with FoundationDB? Has anyone implemented Arrow IPC with the current clients? what are the challenges?

Thanks

markus.pilman · January 21, 2021, 8:22pm

We use flatbuffers for IPC - which also can be read without deserializing it. Is there any particular feature you would hope to get from Arrow?

omidb · January 21, 2021, 9:05pm

I’m trying to imagine a scenario that I can load from FoundationDB directly to Arrow which enables us to go directly a data frame in many computing libraries (e.g. Rapids, Panda …)
I feel it will involve implementing a lot of low-level C++ codes on both sides. I’m wondering if anyone has already done it.

markus.pilman · January 21, 2021, 10:26pm

I am not sure I understand this use-case correctly. Do you want to read some data from FDB and then send it directly to Rapids? Or do you want to ingest FDB data into another system (like have a up-to-date copy of your data in a data warehouse)?

omidb · March 13, 2021, 9:39pm

@markus.pilman I want to load data from FDB and zero-copy use it in Rapids via Arrow protocol.

Topic		Replies	Views
Apache Spark integeration FoundationDB Layers bindings	2	1305	June 25, 2020
FoundationDB vs Amazon Aurora Using FoundationDB	1	818	January 13, 2019
Papers/concepts to better understand foundationdb architecture Development	3	939	October 9, 2019
JanusGraph FoundationDB storage adapter FoundationDB Layers	3	3777	April 27, 2018
Flatbuffers for Network Communication FoundationDB Core	2	1876	January 29, 2019

Apache Arrow and FoundationDB

Related topics