Is it possible to release Foundation DB’s SQL layer?
Foundation DB’s Table Group isn’t something seen often in rdms.
The SQL Layer is not being actively developed.
Although there seems to be older versions of the SQL layer on the internet, they aren’t actively maintained. If Apple provides the last known working version, the community would be interested in maintaining it.
It’s a very different dialect to ANSI SQL. You might as well develop against the API rather than change the SQL your app emits to what the SQL layer does.
What do you mean? The documented syntax doesn’t seem that different. The most glaring limitation is the 5 sec and total transaction size. They also have a syntax for creating and modifying Table Groups.
Also a strange SQL syntax didn’t stop the adoption of Presto or Spark SQL. Someone found uses for their system.
The ability to be more abstract than key values is important. I can imagine someone complete work on a document store layer in a reasonable amount of time, but SQL is a large amount of work.
The original FDB site describes postgresql based drivers. I know that client access is an huge amount of work.
I think the SQL layer had the potential to be a great product, when it had a great team behind it. But that team has moved on. I would hesitate to recommend trying to resurrect it without a team that is up to building and maintaining a full fledged relational database, which is a big job. I would definitely not recommend digging it up and running it in production without first trying to organize such a team.
Could the use of https://calcite.apache.org speed up the SQL layer development?
I don’t know a lot about Calcite. It provides a SQL parser, which is a mostly trivial part of the job. It provides a planner (framework), which is a big part of the job, but it’s pretty hard to say how good the framework is or how much mileage you will get out of what’s already there. It claims not to provide an execution engine, but it pretty clearly does (arguably a couple of them, corresponding to different “calling conventions”). Unfortunately none of these calling conventions are asynchronous, and I think that’s going to be required to get great performance. So a project to build a serious performant SQL database on Calcite and FoundationDB would almost certainly have to create a new fast asynchronous calling convention and a new set of operators for it. Then of course at least one physical data representation (to your taste) and logical-to-physical mapping layer has to be built and optimized (including various index types, metadata, background index construction, etc). And a serious test framework that will keep you honest.
Then all that’s left is the labor of Sisyphus: trying to get the planner to produce efficient plans for an endless list of queries, without breaking the correctness or performance of any others.
Hi, I’m the original author of Apache Calcite. You’re pretty much on the money. It is a lot of work (you should ask the Apache Phoenix folks, who built a SQL layer on Apache HBase).
What is even more work than building a SQL layer using Calcite is to build one from scratch (again, ask the Phoenix folks).
I think the way to start is to build implementations of the core relational operators: scan, project, filter, sort, union-all. You can skip join for now. Support a small set of operators (arithmetic operators and a few basic string operations), and create a JSON/REST API to execute queries. Then you will be able to add Calcite as a layer above that.
I wouldn’t bother creating any new data structures (such as indexes) or algorithms (such as join). Just expose what FoundationDB does already.
If you’re looking for a data format I recommend Apache Arrow.
How could I use FoundationDB to replace MySQL/PostgreSQL from existing massive projects? Being lack of SQL capability makes it inconvenient for such migration. Or else, is it possible to use TiDB as a SQL layer for FoundationDB since TiDB has also a seperate design for SQL layer and storage engine.
The lack of SQL is indeed inconvenient for such a migration. It’d require rewriting your SQL queries into code by hand, and translating the SQL data model over to key-value pairs. Some form of layer that is relational, or relational-like, would indeed greatly simply this transition.
Technically speaking, yes. However, the performance isn’t going to be great.
TiDB does transactions via writing keys to represent locks, and then running a locking-based commit protocol. (You’re welcome to read through the Percolator paper for full details.). Doing this on top of FoundationDB is wasteful, because FoundationDB already supports transactions, and slow, because doing a commit of a single key is slower on FDB than TiKV. However, one can’t outright remove Percolator from TiDB, because FDB doesn’t support transactions that last longer than 5 seconds, whereas TiDB does. So a hybrid approach would need to be taken of doing percolator pre-writes in batches if a transaction is lasting longer than a few seconds, and figuring out what to do with the timestamp oracle and FDB master role doing the same thing.
It could still work, but I don’t think FDB would just be an easy drop-in replacement for TiKV in this usage. But, you know, @siddontang would probably know better than I.
TiDB can work without percolator transaction. What you only to do is to implement your own storage from here. So I think it is doable to run TiDB on FDB directly, but I haven’t tried it yet
Why was the development on SQL layer stopped? I heard there are some inherent performance issues when you try to build SQL layer on top of KV store as opposed to SQL “natively”. Can someone comment please?
Building SQL layer on top of KV has been attempted before. For instance, Google’s Spanner is built on top of SSTable, which are essentially KV files.
My guess of SQL layer being stopped is lacking a dedicated team behind it, and maybe not so much demand?
IMHO, there’s no inherent limitation implementing a SQL layer on top of KV store. Many databases do this. But the problem is that SQL layer is not a simple module as @dave mentioned above. It does require many components from physical storage object definition, logical schema management, datatype processing, expression evaluation, query optimization, query compiling, query execution engine, statistics information gathering. If one considers distributed / parallel query evaluation, the problem gets bigger. It needs a dedicated team definitely including testing and QA persons. From my experience, it will take more than 3 years to develop a query engine which can be used in production. But it is possible if a company decides to invest such a man powers and time to the project with lead team having seasoned experience and matured design abilities.
Check this, a simplified SQL layer built on FoundationDB for tick series data, https://github.com/opentradesolutions/opentick