I ran make_public.py and it modified my fdb.cluster to use the public IP. However, fdbcli fails on the machine because the fdbservers are listening on 127.0.0.1 rather than the public interface.
To fix this, I had to manually change “auto” to the server’s public IP in foundationdb.conf. This made fdbcli work, though the status was initially unhappy (I got “Replication health - (Re)initializing automatic data distribution” for a little while).
I didn’t see it mentioned anywhere that I’d need to fix “auto” in the configuration file, and it’s slightly unfortunate that all my servers will need different configs. This makes me think that I might be holding it wrong. Am I missing something?
Using auto tells the server to try opening a connection to the first coordinator in the cluster file and then take the local interface that was used to connect to that coordinator.
Normally I haven’t had any trouble using auto for public addresses, though if there’s any doubt about which interface might get chosen on your host, then specifying it manually would be a safer bet.
One thing I just discovered when testing this myself, though, is that make_public.py did not successfully restart my server processes. Thus when I finished the conversion, those processes were still using the 127.0.0.1 interface. Restarting the server processes was sufficient to clear this up, though.
Is there any chance that the same thing happened to you? If you change the values in foundationdb.conf back to auto, does it work (note this should result in the processes being restarted automatically)? I think there might be some quirkiness with starting/stopping/restarting the foundationdb service right now, which could be why make_public.py didn’t succeed in the restart for me.
Relatedly, is there any information about opting out of the upstart scripts and whatnot and using another process manager for fdbmonitor? This wasn’t the only time I ran into trouble with the scripts that came with the distribution.
I don’t know of anybody who’s done that and documented anything about it. I’m also not personally well-enough versed in that area to offer any guidance myself.
I did just raise a GitHub issue for the problem on our end, though: