Unable to change coordinators

killertypo · January 15, 2019, 10:56pm

We are slowly migrating our fleet from 3.x to 5.x and I have run into an issue when trying to assign new coordinators. The error message is as such:

ERROR: One of the specified coordinators is unreachable (which isn’t incredibly helpful as it does not specify which one it is)

However the coordinators I am assigning are totally reachable, in fact the cluster recognizes them as healthy and a member of the current acting cluster.

Has anyone run into this before? or know which trace files i should begin looking at to ascertain what is causing the coordination state to not change?

I have checked the master, cluster_controller, and the coordinators during the requested change and nothing is outstanding that hints as to why it will not update the coordinators

killertypo · January 15, 2019, 10:57pm

here is some dump from fdbcli

Configuration:
  Redundancy mode        - double
  Storage engine         - ssd-1
  Coordinators           - 3
  Exclusions             - 7 (type `exclude' for details)

Cluster:
  FoundationDB processes - 13 (less 7 excluded; 0 with errors)
  Machines               - 10 (less 7 excluded)
  Memory availability    - 16.4 GB per process on machine with least available
  Retransmissions rate   - 1 Hz
  Fault Tolerance        - 1 machine
  Server time            - 01/15/19 22:56:56

Data:
  Replication health     - Healthy (Removing storage server)
  Moving data            - 201.377 GB
  Sum of key-value sizes - 245.354 GB
  Disk space used        - 852.228 GB

Operating space:
  Storage server         - 382.4 GB free on most full server
  Log server             - 933.0 GB free on most full server

Workload:
  Read rate              - 781 Hz
  Write rate             - 312 Hz
  Transactions started   - 97 Hz
  Transactions committed - 18 Hz
  Conflict rate          - 0 Hz

Backup and DR:
  Running backups        - 0
  Running DRs            - 0

Process performance details:
  10.6.0.6:4500          ( 13% cpu; 11% machine; 0.035 Gbps; 20% disk IO; 2.9 GB / 17.1 GB RAM  )
  10.6.0.7:4500          ( 39% cpu; 15% machine; 0.124 Gbps; 21% disk IO; 2.8 GB / 21.4 GB RAM  )
  10.6.0.7:4501          ( 39% cpu; 15% machine; 0.124 Gbps; 21% disk IO; 2.8 GB / 21.4 GB RAM  )
  10.6.0.22:4500         ( 15% cpu;  8% machine; 0.034 Gbps; 29% disk IO; 3.0 GB / 16.4 GB RAM  )
  10.6.0.26:4500         ( 39% cpu; 18% machine; 0.152 Gbps; 25% disk IO; 2.8 GB / 21.5 GB RAM  )
  10.6.0.26:4501         ( 53% cpu; 18% machine; 0.152 Gbps; 25% disk IO; 2.8 GB / 21.5 GB RAM  )
  10.6.0.30:4500         ( 13% cpu; 10% machine; 0.044 Gbps; 16% disk IO; 3.0 GB / 16.8 GB RAM  )
  10.6.0.37:4500         ( 17% cpu; 11% machine; 0.042 Gbps; 27% disk IO; 3.0 GB / 17.0 GB RAM  )
  10.6.0.40:4500         ( 15% cpu; 10% machine; 0.029 Gbps; 41% disk IO; 3.0 GB / 16.6 GB RAM  )
  10.6.0.41:4500         ( 44% cpu; 19% machine; 0.152 Gbps; 26% disk IO; 2.8 GB / 21.2 GB RAM  )
  10.6.0.41:4501         ( 55% cpu; 19% machine; 0.152 Gbps; 26% disk IO; 2.8 GB / 21.2 GB RAM  )
  10.6.0.46:4500         ( 40% cpu; 17% machine; 0.142 Gbps; 64% disk IO; 2.9 GB / 16.7 GB RAM  )
  10.6.0.58:4500         ( 16% cpu;  9% machine; 0.038 Gbps; 45% disk IO; 2.8 GB / 16.6 GB RAM  )

Coordination servers:
  10.6.0.22:4500  (reachable)
  10.6.0.37:4500  (reachable)
  10.6.0.40:4500  (reachable)

Client time: 01/15/19 22:56:56

fdb> coordinators 10.6.0.7:4500 10.6.0.26:4500 10.6.0.41:4500

WARNING: Long delay (Ctrl-C to interrupt)
ERROR: One of the specified coordinators is unreachable
fdb>

panghy · January 16, 2019, 1:05am

Internal issue to us (meaning my code =p). Nothing to see here.

alexmiller · January 16, 2019, 1:13am

You’re welcome to file an issue for this if you’d like to see it change.

killertypo · January 16, 2019, 9:42pm

@alexmiller

I did some digging into our custom 5.x branch and what is on the most current 5.2 release in github and found that a fresh checkout of release-5.2 i am unable to change my coordinators,

I tracked it down to this commit https://github.com/apple/foundationdb/commit/b8486d4a2ae1fbffac58f11dbaec272a5d24d92f

once I remove the changes to the coordinator election process i am able to again able to change my coordinators

another note is that the release-5.2 branch is running an unreleased 5.2.8-PRERELEASE version vs. the release page of 5.2.6

killertypo · January 16, 2019, 10:11pm

ah found the issue was fixed in the 6x branch (never ported to the 5x branch) we should be able to proceed now.

Topic		Replies	Views
One of the specified coordinators is unreachable Using FoundationDB	2	555	March 19, 2022
EndpointNotFound in trace when configure coordinator Development	1	609	November 23, 2018
Could not communicate with a quorum of coordination servers Using FoundationDB	2	2252	March 5, 2020
Recover 'unreachable' Using FoundationDB	3	883	January 15, 2021
How to restore cluster after accidentally dropping coordinators Using FoundationDB	9	2330	February 11, 2021

Unable to change coordinators

Related topics