Hi all,
I need some help tweaking my Multi-dc cluster to achieve better performance while using the FDB Operator on OpenShift running in Azure. I’m currently running the YCSB performance tool to determine read/write I/O performance, and comparing to some tests that were run in a single cluster setup. I’m seeing around 50% of the performance compared the single cluster, even while doubling the number of machines in use. I imagine there is something about FDB I don’t understand, or at least a way to gather more information than what I have. Any assistance would help me greatly!
Mulit-Cluster setup
Machines: 50 count of Standard-D8s_v3
NAME PHASE TYPE REGION ZONE AGE
fdb04-dev-va7-zrbmx-master-0 Running Standard_D8s_v3 eastus2 1 12d
fdb04-dev-va7-zrbmx-master-1 Running Standard_D8s_v3 eastus2 2 12d
fdb04-dev-va7-zrbmx-master-2 Running Standard_D8s_v3 eastus2 3 12d
fdb04-dev-va7-zrbmx-worker-eastus21-2sxzp Running Standard_L8s_v3 eastus2 1 22m
fdb04-dev-va7-zrbmx-worker-eastus21-667hw Running Standard_L8s_v3 eastus2 1 22m
fdb04-dev-va7-zrbmx-worker-eastus21-68cc6 Running Standard_L8s_v3 eastus2 1 22m
fdb04-dev-va7-zrbmx-worker-eastus21-6jjm7 Running Standard_L8s_v3 eastus2 1 22m
...
With relevant yaml (I have removed anything obviously non-relevant):
apiVersion: machine.openshift.io/v1beta1
kind: MachineSet
metadata:
annotations:
machine.openshift.io/GPU: "0"
machine.openshift.io/memoryMb: "65536"
machine.openshift.io/vCPU: "8"
replicas: 19
template:
spec:
providerSpec:
value:
acceleratedNetworking: true
apiVersion: machine.openshift.io/v1beta1
image:
offer: aro4
publisher: azureopenshift
kind: AzureMachineProviderSpec
location: eastus2
osDisk:
diskSettings: {}
diskSizeGB: 128
managedDisk:
storageAccountType: Premium_LRS
osType: Linux
We have this exact setup replicated twice, because we were also DR testing and needed separate machinesets to bring up/down. The total machines on this cluster is 50, 3 masters and 47 workers.
relevant FDB setup, we have 3 DC’s in FDB setup, DC1 and DC3 are primary, whereas DC2 is only for replication. This is the FDB config for DC1 and DC3 (With namespaces changed as appropriate):
apiVersion: apps.foundationdb.org/v1beta2
kind: FoundationDBCluster
metadata:
name: fdb-cluster-1
namespace: dc1
spec:
# coordinatorSelection:
# - priority: 0
# processClass: coordinator
# - priority: 5
# processClass: storage
# - priority: 10
# processClass: log
storageServersPerPod: 2
logServersPerPod: 2
automationOptions:
replacements:
enabled: true
processGroupIDPrefix: "dc1"
faultDomain:
key: kubernetes.io/hostname
valueFrom: spec.nodeName
dataCenter: "dc1"
seedConnectionString: $connectionString
databaseConfiguration:
redundancy_mode: "triple"
usable_regions: 2
regions:
- datacenters:
- id: dc1
priority: 1
- id: dc2
satellite: 1
priority: 2
- id: dc3
satellite: 1
priority: 1
satellite_logs: 3
- datacenters:
- id: dc3
priority: 0
- id: dc2
satellite: 1
priority: 2
- id: dc1
satellite: 1
priority: 1
satellite_logs: 3
labels:
filterOnOwnerReference: false
matchLabels:
foundationdb.org/fdb-cluster-name: fdb-cluster-1
processClassLabels:
- foundationdb.org/fdb-process-class
processGroupIDLabels:
- foundationdb.org/fdb-process-group-id
minimumUptimeSecondsForBounce: 60
# The storage process counts here are directly comparable to our single setup, which is comparable to a test setup we had years ago. I'd be willing to change log/stateless/coordinator, but so long as there isn't anything wrong with 17 storage servers, we'd like to keep this the same.
processCounts:
storage: 17
log: 2
stateless: 3
# coordinator: 3
processes:
log:
volumeClaimTemplate:
spec:
storageClassName: fdb-nvme
resources:
requests:
storage: 64G
customParameters:
- knob_disable_posix_kernel_aio=1
podTemplate:
spec:
nodeSelector:
topology.kubernetes.io/zone: eastus2-1
beta.kubernetes.io/instance-type: Standard_L8s_v3
serviceAccountName: fdb-dc1
securityContext:
allowPrivilegeEscalation: false
hostNetwork: true
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
foundationdb.org/fdb-cluster-name: fdb-cluster-1
containers:
- name: foundationdb
resources:
requests:
cpu: 6500m # I found over 6500/8000 and we had trouble scheduling
memory: 31Gi
securityContext:
runAsUser: 0
- name: foundationdb-kubernetes-sidecar
resources:
limits:
cpu: 100m
memory: 128Mi
requests:
cpu: 100m
memory: 128Mi
securityContext:
runAsUser: 0
initContainers:
- name: foundationdb-kubernetes-init
resources:
limits:
cpu: 100m
memory: 128Mi
requests:
cpu: 100m
memory: 128Mi
securityContext:
runAsUser: 0
storage:
volumeClaimTemplate:
spec:
storageClassName: fdb-nvme
resources:
requests:
storage: 64G
customParameters:
- knob_disable_posix_kernel_aio=1
podTemplate:
spec:
nodeSelector:
beta.kubernetes.io/instance-type: Standard_L8s_v3
serviceAccountName: fdb-dc1
securityContext:
allowPrivilegeEscalation: false
hostNetwork: true
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
foundationdb.org/fdb-cluster-name: fdb-cluster-1
containers:
- name: foundationdb
resources:
requests:
cpu: 6500m
memory: 31Gi
securityContext:
runAsUser: 0
- name: foundationdb-kubernetes-sidecar
resources:
limits:
cpu: 100m
memory: 128Mi
requests:
cpu: 100m
memory: 128Mi
securityContext:
runAsUser: 0
initContainers:
- name: foundationdb-kubernetes-init
resources:
limits:
cpu: 100m
memory: 128Mi
requests:
cpu: 100m
memory: 128Mi
securityContext:
runAsUser: 0
general:
volumeClaimTemplate:
spec:
resources:
requests:
storage: 16G
customParameters:
- knob_disable_posix_kernel_aio=1
podTemplate:
spec:
serviceAccountName: fdb-dc1
securityContext:
allowPrivilegeEscalation: false
nodeSelector:
topology.kubernetes.io/zone: eastus2-1
beta.kubernetes.io/instance-type: Standard_D4s_v3
hostNetwork: true
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
foundationdb.org/fdb-cluster-name: fdb-cluster-1
containers:
- name: foundationdb
resources:
requests:
cpu: 3000m
memory: 10Gi
securityContext:
runAsUser: 0
- name: foundationdb-kubernetes-sidecar
resources:
limits:
cpu: 100m
memory: 128Mi
requests:
cpu: 100m
memory: 128Mi
securityContext:
runAsUser: 0
initContainers:
- name: foundationdb-kubernetes-init
resources:
limits:
cpu: 100m
memory: 128Mi
requests:
cpu: 100m
memory: 128Mi
securityContext:
runAsUser: 0
routing:
headlessService: true
publicIPSource: pod
sidecarContainer:
enableLivenessProbe: true
enableReadinessProbe: true
useExplicitListenAddress: true
version: 7.1.26
And DC2:
apiVersion: apps.foundationdb.org/v1beta2
kind: FoundationDBCluster
metadata:
name: fdb-cluster-2
namespace: dc2
spec:
# coordinatorSelection:
# - priority: 0
# processClass: coordinator
# - priority: 5
# processClass: storage
# - priority: 10
# processClass: log
storageServersPerPod: 2
logServersPerPod: 2
automationOptions:
replacements:
enabled: true
faultDomain:
key: kubernetes.io/hostname
valueFrom: spec.nodeName
processGroupIDPrefix: "dc2"
dataCenter: "dc2"
seedConnectionString: $connectionString
databaseConfiguration:
redundancy_mode: "double"
usable_regions: 2
regions:
- datacenters:
- id: dc1
priority: 1
- id: dc2
satellite: 1
priority: 2
- id: dc3
satellite: 1
priority: 1
satellite_logs: 3
- datacenters:
- id: dc3
priority: 0
- id: dc2
satellite: 1
priority: 2
- id: dc1
satellite: 1
priority: 1
satellite_logs: 3
labels:
filterOnOwnerReference: false
matchLabels:
foundationdb.org/fdb-cluster-name: fdb-cluster-2
processClassLabels:
- foundationdb.org/fdb-process-class
processGroupIDLabels:
- foundationdb.org/fdb-process-group-id
minimumUptimeSecondsForBounce: 60
processCounts:
storage: -1
log: 3
stateless: -1
# coordinator: 3
processes:
log:
volumeClaimTemplate:
spec:
storageClassName: fdb-nvme
resources:
requests:
storage: 64G
customParameters:
- knob_disable_posix_kernel_aio=1
podTemplate:
spec:
nodeSelector:
topology.kubernetes.io/zone: eastus2-2
beta.kubernetes.io/instance-type: Standard_L8s_v3
serviceAccountName: fdb-dc2
securityContext:
allowPrivilegeEscalation: false
hostNetwork: true
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
foundationdb.org/fdb-cluster-name: fdb-cluster-2
containers:
- name: foundationdb
resources:
requests:
cpu: 6500m
memory: 31Gi
securityContext:
runAsUser: 0
- name: foundationdb-kubernetes-sidecar
resources:
limits:
cpu: 100m
memory: 128Mi
requests:
cpu: 100m
memory: 128Mi
securityContext:
runAsUser: 0
initContainers:
- name: foundationdb-kubernetes-init
resources:
limits:
cpu: 100m
memory: 128Mi
requests:
cpu: 100m
memory: 128Mi
securityContext:
runAsUser: 0
# storage:
# volumeClaimTemplate:
# spec:
# storageClassName: fdb-nvme
# resources:
# requests:
# storage: 16G
# customParameters:
# - knob_disable_posix_kernel_aio=1
# podTemplate:
# spec:
# nodeSelector:
# topology.kubernetes.io/zone: eastus2-2
# beta.kubernetes.io/instance-type: Standard_L8s_v3
# serviceAccountName: fdb-dc2
# securityContext:
# allowPrivilegeEscalation: false
# hostNetwork: true
# topologySpreadConstraints:
# - maxSkew: 1
# topologyKey: kubernetes.io/hostname
# whenUnsatisfiable: DoNotSchedule
# labelSelector:
# matchLabels:
# foundationdb.org/fdb-cluster-name: fdb-cluster-2
# containers:
# - name: foundationdb
# resources:
# requests:
# cpu: 100m
# memory: 128Mi
# securityContext:
# runAsUser: 0
# - name: foundationdb-kubernetes-sidecar
# resources:
# limits:
# cpu: 100m
# memory: 128Mi
# requests:
# cpu: 100m
# memory: 128Mi
# securityContext:
# runAsUser: 0
# initContainers:
# - name: foundationdb-kubernetes-init
# resources:
# limits:
# cpu: 100m
# memory: 128Mi
# requests:
# cpu: 100m
# memory: 128Mi
# securityContext:
# runAsUser: 0
# general:
# volumeClaimTemplate:
# spec:
# resources:
# requests:
# storage: 16G
# customParameters:
# - knob_disable_posix_kernel_aio=1
# podTemplate:
# spec:
# serviceAccountName: fdb-dc2
# securityContext:
# allowPrivilegeEscalation: false
# nodeSelector:
# topology.kubernetes.io/zone: eastus2-2
# beta.kubernetes.io/instance-type: Standard_D4s_v3
# hostNetwork: true
# topologySpreadConstraints:
# - maxSkew: 1
# topologyKey: kubernetes.io/hostname
# whenUnsatisfiable: DoNotSchedule
# labelSelector:
# matchLabels:
# foundationdb.org/fdb-cluster-name: fdb-cluster-2
# containers:
# - name: foundationdb
# resources:
# requests:
# cpu: 100m
# memory: 128Mi
# securityContext:
# runAsUser: 0
# - name: foundationdb-kubernetes-sidecar
# resources:
# limits:
# cpu: 100m
# memory: 128Mi
# requests:
# cpu: 100m
# memory: 128Mi
# securityContext:
# runAsUser: 0
# initContainers:
# - name: foundationdb-kubernetes-init
# resources:
# limits:
# cpu: 100m
# memory: 128Mi
# requests:
# cpu: 100m
# memory: 128Mi
# securityContext:
# runAsUser: 0
routing:
headlessService: true
publicIPSource: pod
sidecarContainer:
enableLivenessProbe: true
enableReadinessProbe: true
useExplicitListenAddress: true
version: 7.1.26
As you can see, DC2 is just logs, and that is where we run YCSB.
The single node FDB cluster is as follows:
apiVersion: apps.foundationdb.org/v1beta2
kind: FoundationDBCluster
metadata:
name: fdb-cluster
namespace: default
spec:
storageServersPerPod: 8
automationOptions:
replacements:
enabled: true
faultDomain:
key: foundationdb.org/none
labels:
podlabel: fdblabel
filterOnOwnerReference: false
matchLabels:
foundationdb.org/fdb-cluster-name: fdb-cluster
processClassLabels:
- foundationdb.org/fdb-process-class
processGroupIDLabels:
- foundationdb.org/fdb-process-group-id
minimumUptimeSecondsForBounce: 60
processCounts:
cluster_controller: 1
storage: 17
log: 2
stateless: 4
processes:
general:
podTemplate:
spec:
hostNetwork: true
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
foundationdb.org/fdb-cluster-name: fdb-cluster
containers:
- name: foundationdb
resources:
requests:
cpu: 7000m
memory: 31Gi
securityContext:
runAsUser: 0
- name: foundationdb-kubernetes-sidecar
resources:
limits:
cpu: 100m
memory: 128Mi
requests:
cpu: 100m
memory: 128Mi
securityContext:
runAsUser: 0
initContainers:
- name: foundationdb-kubernetes-init
resources:
limits:
cpu: 100m
memory: 128Mi
requests:
cpu: 100m
memory: 128Mi
securityContext:
runAsUser: 0
customParameters:
- knob_disable_posix_kernel_aio=1
volumeClaimTemplate:
spec:
storageClassName: fdb-nvme
resources:
requests:
storage: 64G
routing:
headlessService: true
publicIPSource: pod
sidecarContainer:
enableLivenessProbe: true
enableReadinessProbe: true
databaseConfiguration:
redundancy_mode: triple
useExplicitListenAddress: true
version: 7.1.26
My fdbcli output is mostly irrelevant right now, as we have some failures on it from DR testing, but some relevant sections are as follows:
Configuration:
Redundancy mode - triple
Storage engine - ssd-2
Coordinators - 9
Exclusions - 4 (type `exclude' for details)
Desired Commit Proxies - 2
Desired GRV Proxies - 1
Desired Resolvers - 1
Desired Logs - 3
Desired Remote Logs - 3
Desired Log Routers - 3
Usable Regions - 2
Regions:
Remote -
Datacenter - dc1
Satellite datacenters - dc2, dc3
Satellite Logs - 3
Primary -
Datacenter - dc3
Satellite datacenters - dc2, dc1
Satellite Logs - 3
...
Disk space used - 4.288 TB
...
<no logs are currently relevant>
<Coordination servers are all reachable>
We’re running YCSB as a container on the same cluster, in DC2. I have a quick templating program I wrote to insert these values into the yaml for fast reconfiguration and automated testing.
YCSB values:
"ycsb-statefulset": {
"replicas": 20,
"num_keys": 100000000,
"process_per_host": 8,
"threads_per_process": 32,
"batch_size": 100,
"update_proportion": 0.0,
"read_proportion": 1.0,
"max_execution_time_seconds": 200000
},
We tweak replicas, process_per_host and threads_per_process mostly, we’re keeping batch_size the same, and currently doing read-only tests, but nothing major changes on a 90/10 ratio. num_keys of 100 million was enough that we didn’t see super spiky loads (So much faster results than we expect) but doesn’t take too long to test (about 30 minutes)
YCSB script that uses those values:
host_index=$(echo $HOSTNAME | grep -o '[0-9]\+')
cd ycsb-foundationdb-binding-0.18.0-SNAPSHOT
now=$(date +"%Y_%m_%d_%H_%M_%S")
log_file="ycsb_host_${host_index}__$now.log"
workload='workloads/workloada'
update_proportion=0.1
read_proportion=0.9
num_keys=100000000
value_size_bytes=2000
batch_size=100
operation_count=$((num_keys / batch_size))
num_clients=30
field_count=10
field_length=100
threads_per_process=32
max_execution_time_seconds=200000
keys_per_host=$((num_keys / num_clients))
process_per_host=16
keys_per_process=$((keys_per_host / process_per_host))
start_key_offset=0
start_key_per_host=$((host_index*keys_per_host+start_key_offset))
while true; do
echo "Waiting for cluster file to be created"
if [ -f $FDB_CLUSTER_FILE ]; then
break
fi
sleep 5
done
ycsb_load_cmd="./bin/ycsb load foundationdb -s \
-P $workload \
-p foundationdb.apiversion=620 \
-p foundationdb.clusterfile=$FDB_CLUSTER_FILE \
-p recordcount=$num_keys \
-p insertstart=$start_key_per_host \
-p insertcount=$keys_per_host \
-p operationcount=$operation_count \
-p maxexecutiontime=$max_execution_time_seconds \
-p fieldcount=$field_count \
-p fieldlength=$field_length \
-p readproportion=$read_proportion \
-p updateproportion=$update_proportion \
-p requestdistribution=uniform \
-p threadcount=$threads_per_process"
ycsb_run_cmd="./bin/ycsb run foundationdb -s \
-P $workload \
-p foundationdb.apiversion=620 \
-p foundationdb.clusterfile=$FDB_CLUSTER_FILE \
-p recordcount=$num_keys \
-p insertstart=$start_key_per_host \
-p insertcount=$keys_per_host \
-p operationcount=$operation_count \
-p maxexecutiontime=$max_execution_time_seconds \
-p fieldcount=$field_count \
-p fieldlength=$field_length \
-p readproportion=$read_proportion \
-p updateproportion=$update_proportion \
-p requestdistribution=uniform \
-p threadcount=$threads_per_process"
sh -c "$ycsb_load_cmd"
sh -c "$ycsb_run_cmd"
while true; do
echo "finished... host $HOSTNAME"
sleep 20
done
I know the YCSB connection to FDB hasn’t been updated in awhile, it’s using an old version of the api (620) where the cluster is up to date (7.1.26), but it’s the best we can do to get the closest results comparable to results we did years ago.
I can provide any more information if needed. Any help/ideas are much appreciated.
Thanks in advance