Best practices for bulk load

brk0v · May 16, 2018, 5:25pm

Hello,

I want to test my working data set under FoundationDB and stuck with a very slow initial insert speed. I can’t figure out how to make importing faster and what are the best practices here.

I use several concurrent goroutines for inserting with 5MB batches sizes. But with a 3 node cluster and double ssd configuration it works really slow and time-to-time I loose my cluster according to fdbcli status command:

fdb> status  details

Using cluster file `/etc/foundationdb/fdb.cluster'.

Could not communicate with a quorum of coordination servers:
  192.168.1.1:4500:tls  (unreachable)
  192.168.1.2:4500:tls  (unreachable)
  192.168.1.3:4500:tls  (unreachable)

Also I can get:

Data:
  Replication health     - (Re)initializing automatic data distribution
  Moving data            - unknown (initializing)
  Sum of key-value sizes - unknown
  Disk space used        - 265.770 GB

Or:

Data:
  Replication health     - Healthy (Rebalancing)

So my questions are:

If I import into one node cluster and then add 2 other nodes, will it be faster? Any thoughts or best practices?
Maybe there is a way to make data durability less strong? Disable fsync() for an import stage?

ryanworl · May 16, 2018, 6:30pm

Are you inserting keys in a sequential order?

Try randomizing the order to spread the load around the cluster rather than it all hitting one machine at the beginning or end of the key space you’re writing to.

I don’t think 5mb is optimal and is probably too high. I have never written that much in a transaction before during my testing over the last few weeks.

The client library is single threaded regardless of how many goroutines you start (AFAIK), so splitting the load across multiple processes should also increase parallelism which is needed to fully use the cluster. This will only work if you’re splitting the keyspace you’re writing to as well.

brk0v · May 16, 2018, 10:33pm

Are you inserting keys in a sequential order?

Yes, I’m doing full scan from another b-tree like db and inserting into FoundationDB.

I don’t think 5mb is optimal and is probably too high. I have never written that much in a transaction before during my testing over the last few weeks.

The client library is single threaded regardless of how many goroutines you start (AFAIK), so splitting the load across multiple processes should also increase parallelism which is needed to fully use the cluster. This will only work if you’re splitting the keyspace you’re writing to as well.

Thank you for sharing your experience and thoughts. Will try to use several processes and sort out how to randomize keys.

umpc · May 17, 2018, 10:27am

I got the best results in Go by starting a goroutine for each transaction, having them do encoding/packing and then performing the actual transaction, because they do work initially and then block for a while.

With that, I can insert a rough average of 60k key-values per second or around 10k transactions on a single node cluster.

Normally in Go, this practice will overload the scheduler and cause a deadlock, so it does seem counterintuitive, though in this case, it is how I could get optimal performance.

I don’t need to set an upperbound on the number of goroutines though you might want to just in case.

KrzysFR · May 17, 2018, 11:40am

If you are bulk loading data that cannot possibly conflict with any other transactions, you may want to look at setting the SetReadYourWritesDisable option, and also maybe playing with SetNextWriteNoWriteConflictRange

WARNING: this may break some ACID guarantees, because these writes will not be seen by concurrent transactions. You may not care if you are restoring/important data while nothing else is running, or if you have another flag somewhere to “publish” the data for readers.

Technical details

Internally, the fdb client creates an implicit write conflict range for all keys that are inserted or mutated. The list of all write conflict ranges is sent to the cluster alongside the mutations when the transaction commits, and the conflict ranges will be checked by the cluster (for write conflicts, they will be stored somewhere to make other transaction conflicts!).

So if you do a set('Hello', 'World'), the client will send the byte literals 'Hello', 'World', 'Hello', 'Hello\0' over the network. The last two “Hello” correspond to the write conflict range (for a single key, the range begins at the key, and ends before the key padded with an extra NUL byte.

This means that - by default - you can have a ~3x amplification of the size of your keys (in memory, on the network, I’m not sure if this is also stored in the logs?).

In practice, I did see a small gain of 20-25% when testing with localhost. I don’t have any data on the impact with remote processes.

By setting NextWriteNoWriteConflictRange before each call to set(..), the client will not create a write conflict range, and only send 'Hello', and 'World' to the proxy, saving some bandwidth.

If you enable ReadYourWritesDisable, then the client will not have to track your mutations internally, saving some CPU cycles. This is only if you don’t read-back the mutated keys from the same transaction! be careful with this, this can break things!

I think that you could also play with manually adding a giant write conflict range that spans the whole range of keys inserted: The client should merge all the single ranges into a single larger range, and only send that to the cluster. But you will still pay the cost of managing and merging this list.

Implementation

You can see how the client formats the transaction commit request in Transaction::commitMutations():

github.com

apple/foundationdb/blob/520aaf731dc231cc31ad3175263ef6a64303f5bc/fdbclient/NativeAPI.actor.cpp#L2458


		} else {
			if (e.code() != error_code_transaction_too_old && e.code() != error_code_not_committed && e.code() != error_code_database_locked)
				TraceEvent(SevError, "tryCommitError").error(e);
			if (trLogInfo)
				trLogInfo->addLog(FdbClientLogEvents::EventCommitError(startTime, static_cast<int>(e.code()), req));
			throw;
		}
	}
}


Future<Void> Transaction::commitMutations() {
	cx->transactionsCommitStarted++;


	if(options.readOnly)
		return transaction_read_only();


	try {
		//if this is a read-only transaction return immediately
		if( !tr.transaction.write_conflict_ranges.size() && !tr.transaction.mutations.size() ) {
			numErrors = 0;

You can see how set(...) handles the mutations in Transaction::set() (the addConflictRange argument is controlled by setting SetNextWriteNoWriteConflictRange before every write):

github.com

apple/foundationdb/blob/520aaf731dc231cc31ad3175263ef6a64303f5bc/fdbclient/NativeAPI.actor.cpp#L2060




void Transaction::makeSelfConflicting() {
	BinaryWriter wr(Unversioned());
	wr.serializeBytes(LiteralStringRef("\xFF/SC/"));
	wr << g_random->randomUniqueID();
	auto r = singleKeyRange( wr.toStringRef(), tr.arena );
	tr.transaction.read_conflict_ranges.push_back( tr.arena, r );
	tr.transaction.write_conflict_ranges.push_back( tr.arena, r );
}


void Transaction::set( const KeyRef& key, const ValueRef& value, bool addConflictRange ) {


	if(key.size() > (key.startsWith(systemKeys.begin) ? CLIENT_KNOBS->SYSTEM_KEY_SIZE_LIMIT : CLIENT_KNOBS->KEY_SIZE_LIMIT))
		throw key_too_large();
	if(value.size() > CLIENT_KNOBS->VALUE_SIZE_LIMIT)
		throw value_too_large();


	auto &req = tr;
	auto &t = req.transaction;
	auto r = singleKeyRange( key, req.arena );
	auto v = ValueRef( req.arena, value );

brk0v · May 23, 2018, 1:20pm

Thank you for answer!

Yes, it’s my case. Going to try these options.

Not it’s more clear to me, thanks!

Before each tx.Set() call, or I can set it for a whole Transaction at the beginning?

Got it.

This optimisation is for saving network, memory and maybe logs, right?

KrzysFR · May 23, 2018, 3:30pm

Unfortunately, it must be before each call, because set(…) reset the flag every time. This means that you cannot call set(…) on the same transaction object concurrently from multiple threads or they would stomp each other.

All of the above, but don’t expect a huge difference. I would only recommend this for bulk imports, not for regular transactions. Or only for very specific optimizations after having studied all the implications.

Also, I think that a benchmark using this optimization would be somewhat cheating, because this does not test the database in the normal conditions.

brk0v · May 25, 2018, 4:09pm

Interesting. Thank you.

Yeah, I understand this. But I need a fast import tool for future performance tests.

Thank you one more time for helping.

Topic		Replies	Views
Bulkload Performance Testing Using FoundationDB performance	12	160	May 26, 2025
Process Classes: Initial vs Production Setup Running FoundationDB performance	2	1095	June 14, 2021
FoundationDB Bulk Loader tool Using FoundationDB	6	1268	April 21, 2018
Clj-foundationdb - Clojure bindings over Java wrapper Using FoundationDB bindings	5	1009	May 10, 2018
Bulk insert 2 billion records Using FoundationDB	1	1299	March 25, 2019

Best practices for bulk load

Related topics