I’ll start-off with a disclaimer that I know it is not an apples to apples comparison between FDB(with Record Layer) and a RDBMS like DB2 for eg, i just want to understand what kind of performance i should expect.
Background
I’m working on moving one of our systems off of DB2 into a record layer fronted FDB cluster.
We have been asked to do a performance evaluation in terms of read and write times that an equivalent query in FDB will take in comparison to a predefined ( somewhat suboptimal ) DB2 query will take.
Schema
Let me start off by defining the table and corresponding proto definition -
Table - Person
ID(int32) | NAME (varchar) | AGE(int32) | GENDER(char)
Primary Key = ID
proto -
message Person {
int32 id = 1[(com.apple.foundationdb.record.field).primary_key = true];
string name = 2; //consider this to be a large string field ~30-40 bytes
int32 age = 3;
uint32 gender = 4; //sigh, no single byte datatypes in proto
}
Indexes
RDBMS -
Index1(id, Name)
Index2(id, Gender)
Record-Layer -
builder.addIndex("Person", new Index("pk_name", field("name"));
builder.addIndex("Person", new Index("pk_name", field("gender"));
Query
Now for the fun part, the actual querying -
Let’s suppose there are ~50 million records in both.
If i were to for eg: fetch 10 million records , where each record size from the proto is ~50 bytes, in a 10 MB transaction i can then fetch at max - 10 * 2^10 * 2^10 / ~50 = ~200,000 records
which means for 10e7 records i’d need 50 transactions to be run against the database, whereas DB2 imposes no such limits.
Opening a transaction, has some cost associated with getting record store , doing a metadata read adn then the actual query( or ideally a scanIndex
) in this case, which is evident in the performance drops we see while querying.
The above only either needs a scanIndex or a primaryKey lookup at most.
The situation gets a lot trickier when comparing JOINS
. Well-defined( i should say Well-written ) joins are blazingly fast in RDBMS, whereas i have seen some not so efficient query-planning in the record layer space .
Would you say that we should be prepared to expect a performance hit when moving from a relational database for a model like the one which i described above ?
Also, would the performance degradation be more evident in case of JOINS ?
Lastly, do we expect to see any improvements in the query planner anytime soon?
Sorry for the long post!