FDB C API and range retrieval has duplicates?

Hi,

I’m evaluating using FDB in one of my projects and during learning of the API
a couple of observations came up:

  1. When using FDBStreamingMode::FDB_STREAMING_MODE_WANT_ALL the first read returns 20 records and any subsequent read returns only 1 record at a time. Why?
  2. When using FDBStreamingMode::FDB_STREAMING_MODE_ITERATOR I always get 1 record at a time. Why?
  3. When retrieving key value pairs for my range (which has about 16,000 entries) I receive duplicates more than 50% of the time resulting in the abysmal performance. What am I doing wrong? My code and setup are explained below.
  4. The results don’t arrive in sequential order. Is there a reason for this kind of behavior? Is this how FDB is supposed to work?

My setup is a FDB running on macOS on an SSD with 2 processes running locally for evaluation purposes.

My code is below.

Please advise.

std::optional<std::vector<char>>
storage::getValue(FDBDatabase *db, const std::string& key0)
{
    fdb_bool_t more = true;
    vector<char> bytes;
    auto key = std::format("{}:{:0>8x}", key0, 0);
    auto endkey = std::format("{}:{:0>8x}", key0, std::numeric_limits<uint32_t>::max());
    int chunk = 0;
    unordered_map<string, vector<char>> chunks;

    while (more) {
        FDBTransaction *transaction = nullptr;
        if (auto err = fdb_database_create_transaction(db, &transaction)) {
            println("error: {}", fdb_get_error(err));
            throw runtime_error("fdb error");
        }

        auto future = fdb_transaction_get_range(transaction,
          FDB_KEYSEL_FIRST_GREATER_OR_EQUAL(reinterpret_cast<const uint8_t *>(key.data()), key.size()),
          FDB_KEYSEL_FIRST_GREATER_OR_EQUAL(reinterpret_cast<const uint8_t *>(endkey.data()), key.size()),
          0 /* no limit */,
          0 /* no byte limit */,
          FDBStreamingMode::FDB_STREAMING_MODE_ITERATOR,
          chunk+1, 0, 0);

        if (auto err = fdb_future_block_until_ready(future)) {
            println("error: {}", fdb_get_error(err));
            fdb_future_destroy(future);
            fdb_transaction_destroy(transaction);
            throw runtime_error("fdb error");
        }

        if (auto err = fdb_future_get_error(future)) {
            println("error: {}", fdb_get_error(err));
            fdb_future_destroy(future);
            fdb_transaction_destroy(transaction);
            throw runtime_error("fdb error");
        }

        const FDBKeyValue *out = nullptr;
        int count = 0;
        if (auto err = fdb_future_get_keyvalue_array(future, &out, &count, &more)) {
            println("error: {}", fdb_get_error(err));
            fdb_future_destroy(future);
            fdb_transaction_destroy(transaction);
            throw runtime_error("fdb error");
        }

        // stitch the chunks together
        for (auto i : views::iota(0, count)) {
            auto k = std::string{reinterpret_cast<const char *>(out[i].key),
              static_cast<size_t>(out[i].key_length)};
            println("received key: {}", k);

            vector<char> bvec;
            bvec.insert(bvec.end(), out[i].value, out[i].value + out[i].value_length);
            chunks.emplace(k, std::move(bvec));
        }

        // use the last key as the new first key
        if (count > 0) {
            key.resize(out[count-1].key_length);
            auto beg = &out[count-1].key[0];
            auto end = &out[count-1].key[key.size()];
            vector<uint8_t> intermediate(key.size());
            std::copy(beg, end, intermediate.begin());
            ++chunk;
            key = std::format("{}:{:0>8x}", key0, chunk);
        }

        fdb_future_destroy(future);
        fdb_transaction_destroy(transaction);
    }

    if (chunks.empty()) {
        return std::nullopt;
    }

    vector<string> sortedKeys;
    for (auto& el : chunks | std::views::keys) {
        sortedKeys.emplace_back(el);
    }
    std::ranges::sort(sortedKeys);

    for (const auto& k : sortedKeys) {
        bytes.insert(bytes.end(), chunks[k].begin(), chunks[k].end());
    }
    return std::move(bytes);
}

chunk is being used both as the iteration counter to fdb_transaction_get_range and as the starting key for range scans after the first. I would expect the latter to be derived from intermediate, along the lines of the comment.

In addition, once the difference between using chunk vs the last key as the next iterations start key, there may be a problem with the first arg to fdb_transaction_get_range always using FDB_KEYSEL_FIRST_GREATER_OR_EQUAL. I believe it should be FDB_KEYSEL_FIRST_GREATER for all entries except for the first one. Otherwise, you’ll end up receiving the last key in each chunk multiple times.

You can see an example of our Python iterator for how the API expects to be called: foundationdb/bindings/python/fdb/impl.py at 4fd61dd9f661eb7c9446a5b70f0c7a13c718badf · apple/foundationdb · GitHub I wouldn’t expect duplicates or out of order results. I’d also add that FDB’s byte ordering is lexicographic. It looks like your {:0>8x} formatting suffix should keep things in order, assuming the chunk value never exceeds 65k.

I’m not sure that explains why each request would return results one record at a time, though. The ITERATOR streaming mode attempts to minimize time to first byte, and so it should return relatively few records in the first few chunks, but that should increase with each response. The WANT_ALL iterator mode should attempt to return 80 kB chunks with each response, so it shouldn’t make a difference what the iteration value is.