Workloads and what they do

To get this file off of my local computer…

We had a meeting where we took @Evan and had him go through the workloads that exist and give a brief description of them with optional commentary. Below is my set of notes from this, and there’s some of them with ??? where no one actually knew or remembered what the workload did or why it exists.

If any enterprising person wishes to dig through some of the ??? workloads to provide the explanation/commentary instead, your help would be greatly appreciated. Otherwise, I probably eventually will, and then add it as comments to the files.


ApiCorrectness:
Does set of operations on database, and in memory, and then compares
Multiple clients can run, but each has their own prefix
Uses more transaction API than ReadsDuringWrites

AsyncFile:
???
Probably related to tests/AsyncFile
But that’s also a broken test

AtomicOps:
Issues atomic operations into one keyspace, and a log of the operations into another
Verifies that atomic operation results matches the log of committed operations
Only does one atomic op type per run
doesn’t check durability

AtomicOpsApiCorrectness:
Tests results of the atomic ops

AtomicRestore:
starts database, locks database, restores
because the lock, no data should be lost, so can be paired with tests that don’t expect data loss

AtomicSwitchover:
Similar to above, but switches between two databases instead of doing a restore

BackgroundSelectors:
pretty useless
issues key selectors in the background during other workloads
None of other tests have coverage for this, so it was added, but it’s never found any issues.
doesn’t verify result, only that reads happen

BackupCorrectness:
Primary tests for backups.
Heavily exercises API.

BackupToDBAbort:
Test that aborts an in-progress DR
Not really sure why here, but it is

BackupToDBCorrectness:
Same as BackupCorrects, but with DR

BackupToDBUpgrade:
DR test done in combination with an upgrade/restarting test
start backup with older version
finish backup with newer version
verifies that backup is backwards compatible

BulkLoad:
Loads data into the database

BulkSetup:
Performance tests use this to load data
Checks if data is already present, and then doesn’t load
Not used in simulation

ChangeConfig:
Used at the start of every simulation to change configuration into desired one
for multi-region, waits for data to be fully replicated before continuing

ClientTransactionProfilCorrectness:
Client transaction profiling test

CommitBugCheck:
“Regression test for 2 commit related bugs”
no memory of what they were

ConfigureDatabase:
Changes database configuration many times during a test
Not mixable with attrition
Also changes region configurations
Anything configure toggleable should be added to this test

ConflictRange:
designed to ensure that conflict ranges added to transaction are as minimal as possible
checks if two transactions should conflict, runs them, sees if they do

ConsistencyCheck:
Run at the end of every simulation test
checks a large number of properties
Makes sure data in replicas match
also run in real life

CpuProfiler:
Does flow profiling for a duration

Cycle:
builds a cycle of keys, and then switches nodes
good at detecting ACI violations
doesn’t check durability

DDBalance:
aggressively tries to cause data movement
moves large amounts of data between different key space prefixes

DDMetrics:
???
Maybe used as part of a circus test?

DiskDurability{,Test}:
Another async file test
may or may not be broken

DummyWorkload:
Example workload/test of workload framework

FastTriggeredWatches:
Ensure time bound on when key being set causes the watch to be fired
Because simulation has lots of errors, doesn’t enforce a tight bound (~12s)

FileSystem:
???
A workload that pretends to be a filesystem and do “operations” on the filesystem

Fuzz:
Made when actor compiler was written to test that actor compiler works
Python program makes actors that should have a known output
This workload runs that, and verifies the correctness

FuzzApiCorrectness:
Randomly uses API with an expected list of exceptions
Verifies that random values don’t produce crashes or unexpected errors

Increment:
Increments keys and then checks the total sums are equal.

IndexScan:
???

Inventory:
Attempting to datamodel a warehouse (pseudo TPC-C)
Doesn’t keep memory copy, so no durability verification

KillRegion:
Sets up a multi-region cluster
Kills a region including satellites
Runs a forced ACI-only recovery
Has data loss, but should be consistent data loss

KVStoreTest:
Directly opens storage engine and does reads/writes

LockDatabase:
Just locks and unlocks a database
Mostly a test of database locking

LogMetrics:
???

LowLatency:
Verifies that recoveries happens in a “timely” manner
Run with buggification disabled has a tight bound
With buggification has a long bound

MachineAttrition:
Kills or reboots processes, machines, datacenters, etc.
Super important workload

MemoryKeyValueStore:
???

MemoryLifetime:
Checks some properties of RYW
???

MetricLogging:
tests TDMetric things

Performance:
???

Ping:
Sending interfaces to workloads and sending messages between them
???
Maybe a network performance test
replaced by fdbserver -r networktestsender

PubSubMultiples:
???

QueuePush:
Sequential insert load for performance test

RandomClogging:
Nemesis behavior test
blocks network communication between two processes for some time

RandomMoveKeys:
turns off data distribution, and does its own
moves shards to random places constantly

RandomSelector:
tests correctness of keyselectors
ensures they resolve to the right thing
Two different selectors that should be equal, and verifies equalness

ReadWrite:
Lots of parameters
Almost all perforamnce tests are this workload

RemoveServersSafely:
Tests excludes
Because of this, can kill much more machines than attrition

Rollback:
Tries to trigger storage server rollbacks

RyowCorrectness:
???

RYWDisable:
???

RYWPerformance:
???

SaveAndKill:
For restarting tests
saves the state of simulation so that it can be restarted
produces a restartInfo file

SelectorCorrectness:
Uses getrange to verify that a keyselector is correct

Serializability:
???

Sideband:
Sends messages between processes while doing commits to find causal errors

SlowTaskWorkload:
???

StatusWorkload:
Runs status along with a workload
All status tests are nondeterministic

Storefront:
Datamodeling a store

StreamingRead:
Performance test workload

TargettedKill:
Alternative to attrition to kill something of a specific role
Used for recovery performance tests

TaskBucketCorrectness:
Tests taskbucket, which is used by backup and DR
Not comprehensive for the API

ThreadSafety:
???

Throttling:
???

Throughput:
Perforamnce test workload
Tries to maintain a certain latency while doing as many writes as possible

TimekeeperCorrectness:
Tests timekeeper

TriggerRecovery:
???

UnitPerf:
???

UnitTests:
Runs UNIT_TESTs that don’t start with an exclamation point

Unreadable:
When doing a versionstamp operation, you can’t read that key with a RYW
This test makes sure that you can’t

VersionStamp:
Verifies everything about versionstamps
Inluding DR, non durability verification

WatchAndWait:
???

Watches:
???

WorkerErrors:
???

WriteBandwidth:
Performance test to do as many writes as possible

WriteDuringRead:
“most random test”
Keeps an in-memory copy of the DB while doing things to the DB
verifies that FDB matches the in-memory state at the end
Sometimes very memory intensive and OOMs

5 Likes

DDMetrics is a very old test that is broken its current state (It doesn’t seem to be used anywhere, as far as I can tell). I am in the middle of refactoring it as part of a project to introduce priority levels to range requests in fetchKeys. The new workload will populate a database with a large amount of data, then exclude a server in the cluster. The workload will record various metrics such as peakMovingData and the amount of time it takes for the data to finish redistributing.

1 Like

I think DDMetrics is actually in use, but just not by anything in the foundationdb repository. Like a few other workloads here, its only usages are part of some tests maintained externally. I believe its purpose is to measure the time it takes for certain types of data movement to complete (and in particular, it’s being used to time how long it takes for the key-space to collapse after clearing the entire database).

Whether or not it’s broken in achieving that goal I couldn’t say for sure. However, I think it probably needs to retain the behavior described above for the purposes of these other tests. If your plans for it would change the behavior of it, it may be better to do so in a new workload. Given the potential confusion surrounding the name (it sounds like your intent is closer to something tracking “DDMetrics”), it may be worth rejiggering the names a bit.

The Throttling workload runs a simple random read-write workload while throttling using a token bucket algorithm using the TPS limit obtained from the health metrics API. It periodically reads health metrics from the health metrics API and tests whether or not the received health metrics are reasonable.