WARNING: A single process is both a transaction log and a storage server

ThomasJ · August 11, 2019, 5:20pm

I know cluster setups were covered a lot, but I’m running into (to me) illogical issue and was hoping to get some advice.

I drained our old cluster and replaced it with a larger one. Everything is working as it should, however getting this warning

WARNING: A single process is both a transaction log and a storage server.
  For best performance use dedicated disks for the transaction logs by setting process classes.

It’s confusing, because the old cluster had similar settings and was fine. Is there a way to find out which process is it? I tried to poke in the json file, but can’t to find it.

Here is the setup

24x 4cpu, 16GB RAM, 1.2TB space

  Redundancy mode        - triple
  Storage engine         - ssd-2
  Coordinators           - 3
  Exclusions             - 26 (type `exclude' for details)
  Desired Proxies        - 7
  Desired Resolvers      - 1
  Desired Logs           - 4

10.128.0.104:4500,stateless
10.128.0.104:4501,storage
10.128.0.104:4502,storage
10.128.0.104:4503,stateless
10.128.0.122:4500,stateless
10.128.0.122:4501,storage
10.128.0.122:4502,storage
10.128.0.122:4503,stateless
10.128.0.215:4500,stateless
10.128.0.215:4501,storage
10.128.0.215:4502,storage
10.128.0.215:4503,stateless
10.128.1.166:4500,stateless
10.128.1.166:4501,storage
10.128.1.166:4502,storage
10.128.1.166:4503,stateless
10.128.1.222:4500,stateless
10.128.1.222:4501,storage
10.128.1.222:4502,storage
10.128.1.222:4503,stateless
10.128.1.250:4500,stateless
10.128.1.250:4501,storage
10.128.1.250:4502,storage
10.128.1.250:4503,stateless
10.128.2.102:4500,stateless
10.128.2.102:4501,storage
10.128.2.102:4502,storage
10.128.2.102:4503,stateless
10.128.2.245:4500,stateless
10.128.2.245:4501,log
10.128.2.245:4502,log
10.128.2.245:4503,stateless
10.128.2.44:4500,stateless
10.128.2.44:4501,storage
10.128.2.44:4502,storage
10.128.2.44:4503,stateless
10.128.2.50:4500,stateless
10.128.2.50:4501,log
10.128.2.50:4502,log
10.128.2.50:4503,stateless
10.128.2.84:4500,stateless
10.128.2.84:4501,storage
10.128.2.84:4502,storage
10.128.2.84:4503,stateless
10.128.3.142:4500,stateless
10.128.3.142:4501,storage
10.128.3.142:4502,storage
10.128.3.142:4503,stateless
10.128.3.231:4500,stateless
10.128.3.231:4501,storage
10.128.3.231:4502,storage
10.128.3.231:4503,stateless
10.128.3.24:4500,stateless
10.128.3.24:4501,storage
10.128.3.24:4502,storage
10.128.3.24:4503,stateless
10.128.3.32:4500,stateless
10.128.3.32:4501,storage
10.128.3.32:4502,storage
10.128.3.32:4503,stateless
10.128.4.149:4500,stateless
10.128.4.149:4501,storage
10.128.4.149:4502,storage
10.128.4.149:4503,stateless
10.128.4.229:4500,stateless
10.128.4.229:4501,storage
10.128.4.229:4502,storage
10.128.4.229:4503,stateless
10.128.4.50:4500,stateless
10.128.4.50:4501,storage
10.128.4.50:4502,storage
10.128.4.50:4503,stateless
10.128.5.105:4500,stateless
10.128.5.105:4501,storage
10.128.5.105:4502,storage
10.128.5.105:4503,stateless
10.128.6.122:4500,stateless
10.128.6.122:4501,storage
10.128.6.122:4502,storage
10.128.6.122:4503,stateless
10.128.6.253:4500,stateless
10.128.6.253:4501,storage
10.128.6.253:4502,storage
10.128.6.253:4503,stateless
10.128.6.6:4500,stateless
10.128.6.6:4501,storage
10.128.6.6:4502,storage
10.128.6.6:4503,stateless
10.128.7.2:4500,stateless
10.128.7.2:4501,storage
10.128.7.2:4502,storage
10.128.7.2:4503,stateless
10.128.7.96:4500,stateless
10.128.7.96:4501,storage
10.128.7.96:4502,storage
10.128.7.96:4503,stateless

If I’m not completely blind, then we are assigning 4 processes on 2 separate servers log and 2 other processes are kept stateless. Then we are requesting 4 log processes in the cluster configuration.

Any pointers are greatly appreciated.

markus.pilman · August 11, 2019, 11:35pm

It seems your problem is that you only run tlogs on two machines:

10.128.2.50:4501,log
10.128.2.50:4502,log

and

10.128.2.50:4501,log
10.128.2.50:4502,log

However, you run with triple replication. Therefore fdb has to recruit tlogs on at least three machines - so it needs to recruit one on a storage server (which is not optimal but the only way to not violate the contract that the data is replicated across three machines).

With triple replication you probably want to run log processes on at least 3, better 5 machines. If you lose a machine and only run logs on three, you will find a tlog on a storage again. With triple replication you can survive two machine failures and therefore 5 is probably a good number.

If you’re starved for resources, I would do the following:

Set disired logs to 5
Run a log process on 5 machines
If a machine fails, you will run with 2 logs, if 2 fail you will run with 3 (which is fine and you can bring back the machines later and fdb will redistribute the log load).

If resources is not an issue for you, I would do the following:

Run 1 log on each machine
set desired logs to something reasonable (for this cluster-size probably 12? not sure…).
Keep the same configuration on each machine: for example 1 log, 1 stateless, 2 storage

I don’t know how your disk layout looks though - if not all machines are the same, you might want to use another topology.

ThomasJ · August 12, 2019, 3:57am

Ah, that’s the missing piece.

It’s 3 NVME disks merged into raid 0.

Wait, I thought logs and storage are not supposed to be mixed together.

Somehow my understanding was that it’s better to have log processes separated and leave some space for breading. So 2 log, 2 stateless. But if 1 log, 1 storage and 2 stateless works better, I can certainly do that.

markus.pilman · August 12, 2019, 3:22pm

Having logs and storages share a process is a bad idea - however, having log and storage processes sharing a machine is completely fine (we do that for years now on most of our clusters, on the ones we didn’t the motivation had more to do with different VM-types on AWS which gives us more flexibility).

The main reason for not having them share a process is CPU: each fdbserver process is single threaded. But log and storage can both consume quite some CPU and if a log process is starved for CPU (for example because you run some range queries) your commit latency will go up which can have all kind of undesired effects.

ajbeamon · August 12, 2019, 3:28pm

We also recommend not sharing disks between the logs and storage processes, as that can be a performance cost as well.

ThomasJ · August 12, 2019, 5:22pm

This is news to me. But I will adjust the cluster and see.

Hm, that will make things quite complicated. But will adjust to it.

ThomasJ · August 12, 2019, 9:05pm

When we are on this, what’s the right size for a log HDD and what kind of settings should we use? We generally use NVME disks that are 375GB

This is what we have in settings

command = /usr/sbin/fdbserver
public_address = auto:$ID
listen_address = public
datadir = /var/lib/foundationdb/data/$ID
logdir = /var/log/foundationdb
logsize = 50MiB
maxlogssize = 10GiB
machine_id = {{machine_id}}
# datacenter_id =
# class =
memory = 13GiB
storage_memory = 1GiB
# metrics_cluster =
# metrics_prefix =

markus.pilman · August 12, 2019, 11:20pm

To determine the size of the disk, I would recommend something like expected write-load in bytes per second and multiply this number by something that makes you comfortable (I would say at least 6 hours).

Basically, if you lose a machine, the log needs to be able to keep all writes to this lost machine on disk. You also should alert if this disk starts running full.

But your config looks wrong to me (or it looks like it doesn’t do what you want it to do):

This means that each process can use up to 13GiB of memory - but your machine only has 16GiB? I think what you want is that all processes together are not allowed to use more than 13GiB of memory? If that is the case you need to divide this number by number of processes.

ThomasJ · August 12, 2019, 11:44pm

Ah! It’s quite confusing as some settings are global, some are “local” per process.

Ok, I will keep it on 375GB NVME. That should be fine.

ThomasJ · August 13, 2019, 1:52am

BTW here is our config. If you spot something we should be doing differently, please let me know. It quite standard.

## foundationdb.conf
##
## Configuration file for FoundationDB server processes
## Full documentation is available at
## https://apple.github.io/foundationdb/configuration.html#the-configuration-file

[fdbmonitor]
user = foundationdb
group = foundationdb

[general]
restart_delay = 60
## by default, restart_backoff = restart_delay_reset_interval = restart_delay
# initial_restart_delay = 0
# restart_backoff = 60
# restart_delay_reset_interval = 60
cluster_file = /etc/foundationdb/fdb.cluster
# delete_envvars =
# kill_on_configuration_change = true

## Default parameters for individual fdbserver processes
[fdbserver]
command = /usr/sbin/fdbserver
public_address = auto:$ID
listen_address = public
datadir = /var/lib/foundationdb/data/$ID
logdir = /var/log/foundationdb
logsize = 50MiB
maxlogssize = 10GiB
machine_id = {{ instance }}
# datacenter_id =
# class =
memory = 4GiB
storage_memory = 1GiB
# metrics_cluster =
# metrics_prefix =

## An individual fdbserver process with id 4500
## Parameters set here override defaults from the [fdbserver] section
[fdbserver.4500]
class=log
[fdbserver.4501]
class=storage
[fdbserver.4502]
class=storage
[fdbserver.4503]
class=stateless

[backup_agent]
command = /usr/lib/foundationdb/backup_agent/backup_agent
logdir = /var/log/foundationdb

[backup_agent.1]

markus.pilman · August 13, 2019, 6:36pm

This looks good to me. The only thing I would recommend:

I would set this lower - I think we have it at 10. 1 minute restart delay is quite long and will put some unnecessary load onto the cluster whenever a single process dies (and with only 4GiB of memory per process, one process will probably OOM on a semi-regular basis).

ThomasJ · August 13, 2019, 6:39pm

What’s your setting per process?

BTW I can’t recall the last time we ran into OOM for any of the processes. And we have quite heavy load on the cluster (500k transactions/sec).

markus.pilman · August 13, 2019, 6:45pm

It depends on the cluster - but we generally run with a lot of memory (up to 100GiB per process). We used to run with 8GiB per process and saw quite many OOMs (although that was with FDB 3).

I think FoundationDB still has a weird bug somewhere (at least in 6.1 - don’t know whether this got accidentally fixed in 6.2) where a process will allocate memory like crazy (usually around 3GiB within a few seconds). It should be rare, but if you run into this issue, your process will OOM.

TBF, you ran with a 13GiB limit. It is very well possible that one process used more than 4GiB at one point but the other didn’t use up their memory…

It is possible that you won’t see OOMs, but it is better to be prepared to see a few which is why I suggested to set the timeout to a lower value.

ThomasJ · August 13, 2019, 6:55pm

I think this is the correct observation I unintentionally dismissed. Will monitor it closely and see.

Just wow!

ajbeamon · August 13, 2019, 8:10pm

The 60 seconds will only kick in when you have two consecutive failures of a process (within the restart_delay_reset_interval, default 60s). With the parameters as they were set, a single process death will result in an immediate restart. I typically use the parameters as they were originally written (0s first restart, 60s second and subsequent, reset to 0 after 60s). There’s also an option to backoff more slowly from the initial delay to the max delay if you wanted to be more responsive during some sort of transient issue but avoid frequent repeated restarts.

markus.pilman · August 13, 2019, 8:22pm

oh - Today I learned something new…

ajbeamon · August 13, 2019, 8:43pm

It seems that these restart parameters aren’t even mentioned in our documentation with the exception of restart_delay, which has a slightly misleading description. That’s probably something we should fix.

Topic		Replies	Views
WARNING A single process is both a transaction log and a storage server Using FoundationDB	2	1908	January 31, 2023
Production optimizations Using FoundationDB	20	6431	August 15, 2018
CPU limited storage processes Using FoundationDB performance	9	1545	May 18, 2021
Cluster tuning cookbook Using FoundationDB	26	8847	February 1, 2019
Process class and machine sharing deployment questions Using FoundationDB	15	2612	September 3, 2019

WARNING: A single process is both a transaction log and a storage server

Related topics