Hi, I’m on FDB 6.2.25. I have a cluster where all the fdbservers were previously set with setclass in fdbcli. I’m trying to change these all back to use the command-line-specified class with setclass <ADDR> default. This worked for all fdbservers on 2 of the 3 hosts. On the last one, however, this has no effect, but there’s no failure reported. I’ve tried setclass <IP>:<PORT> default and just setclass <IP> default and neither have worked. Any advice for how to debug this?
(<IP2> is the problematic one. setclass default worked fine with IP1 and IP2)
fdb> setclass
There are currently 12 processes in the database:
<IP1>:4500:tls: stateless (command_line)
<IP1>:4501:tls: log (command_line)
<IP1>:4502:tls: storage (command_line)
<IP1>:4503:tls: storage (command_line)
<IP2>:4500:tls: stateless (set_class)
<IP2>:4501:tls: log (set_class)
<IP2>:4502:tls: storage (set_class)
<IP2>:4503:tls: storage (set_class)
<IP3>:4500:tls: stateless (command_line)
<IP3>:4501:tls: storage (command_line)
<IP3>:4502:tls: storage (command_line)
<IP3>:4503:tls: stateless (command_line)
fdb> setclass <IP2> default
fdb> setclass
There are currently 12 processes in the database:
<IP1>:4500:tls: stateless (command_line)
<IP1>:4501:tls: log (command_line)
<IP1>:4502:tls: storage (command_line)
<IP1>:4503:tls: storage (command_line)
<IP2>:4500:tls: stateless (set_class) // nothing changed?
<IP2>:4501:tls: log (set_class)
<IP2>:4502:tls: storage (set_class)
<IP2>:4503:tls: storage (set_class)
<IP3>:4500:tls: stateless (command_line)
<IP3>:4501:tls: storage (command_line)
<IP3>:4502:tls: storage (command_line)
<IP3>:4503:tls: stateless (command_line)
It seems that one deficiency here is that the setclass command will not report back to you if there are no processes that match the address you give it. I’ve made an issue to have that problem fixed:
As for why it’s not working, I’m not super clear. The data for these classes is stored under the prefix \xff/processClass/. If you do a range read of that space, does it return anything?
fdb> option on READ_SYSTEM_KEYS
fdb> getrange \xff/processClass/
Based on your output, I would expect that you should see 4 keys corresponding with the 4 set processes. Looking at the code, it seems like the ways this could fail is if none of the worker processes matched the address you gave for some reason, if it modified the wrong key, or if it used the wrong class.
You could possibly test for the first and second case by setting the class to something else and see if it changes, either according to fdbcli setclass or by reading the key range and seeing if it changes. We could also potentially try to decode the keys being stored in the key-space I described above, though you’ll probably need some help as these aren’t meant to be human readable.
Thanks for filing the issue! I think the setclass (no args) command is reading a wrong key somewhere. I just realized that my <IP2> processes in status json do say command_line and not set_class , so it looks like the change worked and is just being listed incorrectly.
getrange \xff/processClass/ returns nothing. I do see a \xff/processClassChanges key (value is a random looking string) and a \xff/processClassChangesVersion (value is 1)