When calling MustOpenDefault, with a valid default, the function does not return if the database is an indeterminate state. For example, the database is supposedly accessible, but it is repartitioning and the client never returns from call to MustOpenDefault.
Repartitioning is not an operation that blocks the client. Repartitioning is a background operation that, on busy clusters, is happening all the time. For you to even know that repartitioning was happening, the database had to have been available enough for you to read the status key. I’ll assume you determined this with fdbcli
.
MustOpenDefault
does block indefinitely as far as I know. You’re free to wrap this yourself in another goroutine and enforce some timeout if you want. For applications which are entirely dependent on FDB, it may not make sense for them to do anything until FDB is available. Another angle is you might prefer your application to crash after a certain time if FDB isn’t available (hopefully to be restarted by systemd
or similar), as that signals to operators something is wrong. This is a policy decision for you to make for yourself.
Could you describe the environment you’re running both your program and FDB in? Is either using Docker?
I don’t think MustOpenDefault is supposed to block. The implementation changed a bit in 6.1, what client version are you using?
Would you mind sharing some sample code that blocks? What does status say in fdbcli for this cluster when the Go program is blocked?
I’m working on getting confirmation about where the blocking happened in the Go app I was thinking of when I made my comment. Someone was trying to use docker-compose and connect to FDB using a DNS name. The latter doesn’t work at all. I could be wrong on both counts and it doesn’t block at all until you try to start a transaction with no timeout set.
configuration is all Ubuntu on ubuntu 16. The server version is 6.1.8 as is client. I had this occur before when I was using an older version of the client, so I made sure the client version is also 6.1.8. This is status from fdbcli.
Using cluster file `/etc/foundationdb/fdb.cluster’.
Configuration:
Redundancy mode - double
Storage engine - memory-2
Coordinators - 3
Cluster:
FoundationDB processes - 51
Machines - 6
Memory availability - 48.2 GB per process on machine with least available
Retransmissions rate - 3 Hz
Fault Tolerance - 1 machine
Server time - 10/12/19 16:30:48
Data:
Replication health - Healthy (Repartitioning.)
Moving data - 0.001 GB
Sum of key-value sizes - 1 MB
Disk space used - 12.580 GB
Operating space:
Storage server - 49.0 GB free on most full server
Log server - 5620.7 GB free on most full server
Workload:
Read rate - 9 Hz
Write rate - 0 Hz
Transactions started - 5 Hz
Transactions committed - 0 Hz
Conflict rate - 0 Hz
Backup and DR:
Running backups - 0
Running DRs - 0
Here’s code as I’ve currently got it for connecting:
fdb.MustAPIVersion(600)
// Open the default database from the system cluster
gdb := fdb.MustOpenDefault()
fmt.Println(“database opened…”)
The Println message is never printed. This code has connected before. We’re testing a large bare metal configuration (6 servers, 256GB RAM each). What else can I else can I tell you? about config?
- fdbcli clearly connects
- multiple golang clients don’t
- I’m trying a python client connect now.
python client connects, and fdbcli connects, so it’s something in the go library. working on building a VERY simple go test now.
Minimal go client connects, so something else is causing trouble. I’ll post solution when I find it. Thanks.
Turns out it’s hanging on directory.CreateOrOpen
Here’s code:
// Different API versions may expose different runtime behaviors.
fdb.MustAPIVersion(520)
// Open the default database from the system cluster
db := fdb.MustOpenDefault()
// shit test
dbName := "you're kidding, right"
color.Red("directory.CreateOrOpen:%s", dbName)
vaultDir, err := directory.CreateOrOpen(db, []string{dbName}, nil)
color.Red("directory.CreateOrOpen error:%s", err)
It never returns from call to directory.CreateOrOpen.
this behavior manifests regardless of the value of dbName, so it doesn’t return under any condition. the program runs on my development machine (mac OSX, single server). So it’s something about the production system.
Hanging on a call to create a directory likely implies that you can’t connect to the database or that the database isn’t available. Some possible reasons that this may happen are:
- Your fdb_c client library version is different than the cluster version
- The cluster file you are using doesn’t match the one used on the server processes
- You are using TLS but have a problem with the configuration of it
- Some or all of the processes in the cluster are not reachable from where you are running your client
- The cluster is down
Usually in this situation my first step would be to try to connect fdbcli to the cluster from the same host and with the same cluster file as your client process. Then you can check status and see if you have any problem connecting or if the cluster reports itself having a problem.
If that works fine, there are a couple other things you can do. One is to verify that you are using the correct version of libfdb_c. One way to do this is by trying to use the Python bindings to read a key with the same client library. You could also just grab a client lib with a known correct version and try it. I have a Python script that we use internally to determine and print the version of a given client library, we should probably try to publish it for cases like this.
Another thing to do is to turn on trace logging and inspect the logs for any indication of what is going wrong. On the client, you would use this function to do so. The server logs may also be interesting in some cases (like TLS configuration problems), so you could check those too. They are enabled by default.
Usually in this situation my first step would be to try to connect fdbcli to the cluster from the same host and with the same cluster file as your client process. Then you can check status and see if you have any problem connecting or if the cluster reports itself having a problem.
fdbcli status is as above, no problem. cluster file is the same for all programs (fdbcli, go clients, python)
the go program as shown above runs fine without the call to directory.CreateOrOpen.
Question: does fdbcli use the same dynamically linked libfdb_c file? or is it statically linked?
I’ll try the createDirectory equivalant in Python, and also reading keys written by go client, and read by python client.
The script you mention would be of great help.
Thanks.
It’s statically linked, so you would need to use a fdbcli of the same major.minor version as the servers.
I’ve confirmed that all versions of libfdb_c.so are the correct version for the server (6.1.8)
I modified sample code to the following. The transaction completes, but it hangs on the call
to directory.CreateOrOpen
func main() {
// Different API versions may expose different runtime behaviors.
fdb.MustAPIVersion(520)
o := fdb.Options()
o.SetTraceEnable("")
// Open the default database from the system cluster
db := fdb.MustOpenDefault()
// Database reads and writes happen inside transactions
ret, e := db.Transact(func(tr fdb.Transaction) (interface{}, error) {
tr.Set(fdb.Key("hello"), []byte("world"))
tr.Set(fdb.Key("foo"), []byte("bleek"))
return tr.Get(fdb.Key("foo")).MustGet(), nil
// db.Transact automatically commits (and if necessary,
// retries) the transaction
})
if e != nil {
log.Fatalf("Unable to perform FDB transaction (%v)", e)
}
fmt.Printf("hello is now world, foo was: %s\n", string(ret.([]byte)))
// test CreateOrOpen
dbName := "you're kidding, right"
color.Red("directory.CreateOrOpen:%s", dbName)
vaultDir, err := directory.CreateOrOpen(db, []string{dbName}, nil)
color.Red("directory.CreateOrOpen error:%s", err)
if err != nil {
color.Red("unable to CreateOrOpen vaultDir:%v error:%v", dbName, err)
panic(0)
}
color.Green("createOrOpen: database name:%v vaultDir:%+v", dbName, vaultDir)
}
The output of the trace file is attached below. There is a socket connect problem when the client is trying to connect to a server that is not a coordination server (10.0.3.55:4505)
<?xml version="1.0"?>
We are restarting/reinstalling the cluster from scratch. I’ll keep you posted on the results.
Trace didn’t show up. Here it is
<?xml version="1.0"?>
<Trace>
<Event Severity="10" Time="1571067227.179720" Type="SetNetworkOption" ID="0000000000000000" Option="TRACE_ENABLE" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="10" Time="1571067227.179720" Type="Net2Starting" ID="0000000000000000" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="10" Time="1571067227.180139" Type="TLSConnectionLoadingPlugin" ID="0000000000000000" Plugin="fdb-libressl-plugin" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="10" Time="1571067227.180139" Type="SetNetworkOption" ID="0000000000000000" Option="SUPPORTED_CLIENT_VERSIONS" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="10" Time="1571067227.180139" Type="Net2Running" ID="0000000000000000" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="10" Time="1571067227.180139" Type="ProcessTimeOffset" ID="0000000000000000" ProcessTime="1571067227.184000" SystemTime="1571067227.184001" OffsetFromSystemTime="-0.000001" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="10" Time="1571067227.184059" Type="ClientStart" ID="0000000000000000" SourceVersion="bd6b10cbcee08910667194e6388733acd3b80549" Version="6.1.8" PackageName="6.1" ClusterFile="/etc/foundationdb/fdb.cluster" ConnectionString="KE0R6iP1:6n4coUEJ9j8Sog2LM9RVeEMxXhyaJDrt@10.0.3.51:4500,10.0.3.52:4500,10.0.3.53:4500" ActualTime="1571067227" ApiVersion="520" ImageOffset="0x7ff87d4e6000" Machine="10.0.3.50:32969" LogGroup="default" TrackLatestType="Original" />
<Event Severity="10" Time="1571067227.184059" Type="MachineLoadDetail" ID="0000000000000000" User="43595078" Nice="19859" System="43142945" Idle="9234093134" IOWait="121941" IRQ="0" SoftIRQ="1258909" Steal="0" Guest="0" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="10" Time="1571067227.184059" Type="MonitorLeaderChange" ID="0000000000000000" NewLeader="0000000000000001" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="10" Time="1571067227.188555" Type="CodeCoverage" ID="0000000000000000" File="fdbclient/ReadYourWrites.actor.cpp" Line="1188" Condition="true" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="10" Time="1571067227.188555" Type="ConnectingTo" ID="0000000000000000" SuppressedEventCount="0" PeerAddr="10.0.3.51:4500" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="10" Time="1571067227.188555" Type="ConnectionExchangingConnectPacket" ID="ecae3aa60e63f812" SuppressedEventCount="0" PeerAddr="10.0.3.51:4500" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="10" Time="1571067227.190007" Type="ConnectionEstablished" ID="b781b1efe4cc3e01" SuppressedEventCount="0" Peer="10.0.3.52:4500" ConnectionId="1" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="10" Time="1571067227.190007" Type="ConnectedOutgoing" ID="0000000000000000" SuppressedEventCount="0" PeerAddr="10.0.3.52:4500" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="10" Time="1571067227.190007" Type="GetLeaderReply" ID="0000000000000000" SuppressedEventCount="0" Coordinator="10.0.3.52:4500" Nominee="6451b4a628b6a9a7" Generation="0" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="10" Time="1571067227.190007" Type="MonitorLeaderChange" ID="0000000000000000" NewLeader="0051b4a628b6a9a7" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="10" Time="1571067227.190007" Type="ClientInfo_CCInterfaceChange" ID="0000000000000000" CCID="ded6f28796b9269f" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="10" Time="1571067227.190563" Type="MonitorLeaderChange" ID="0000000000000000" NewLeader="0051b4a628b6a9a7" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="10" Time="1571067227.190753" Type="MonitorLeaderChange" ID="0000000000000000" NewLeader="0051b4a628b6a9a7" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="10" Time="1571067227.191125" Type="ClientInfoChange" ID="0000000000000000" ChangeID="9f769dbd8329c35b" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="10" Time="1571067229.198204" Type="ConnectionTimedOut" ID="0000000000000000" SuppressedEventCount="0" PeerAddr="10.0.3.54:4501" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="10" Time="1571067229.198204" Type="ConnectionClosed" ID="0000000000000000" Error="connection_failed" ErrorDescription="Network connection failed" ErrorCode="1026" SuppressedEventCount="0" PeerAddr="10.0.3.54:4501" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="10" Time="1571067229.198204" Type="PeerDestroy" ID="0000000000000000" Error="connection_failed" ErrorDescription="Network connection failed" ErrorCode="1026" SuppressedEventCount="0" PeerAddr="10.0.3.54:4501" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="20" Time="1571067229.198204" Type="N2_ConnectError" ID="f0603cbf7b85266b" SuppressedEventCount="0" Message="125" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="10" Time="1571067229.198890" Type="ConnectingTo" ID="0000000000000000" SuppressedEventCount="4" PeerAddr="10.0.3.55:4505" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="10" Time="1571067231.198916" Type="ConnectionTimedOut" ID="0000000000000000" SuppressedEventCount="0" PeerAddr="10.0.3.55:4505" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="10" Time="1571067231.198916" Type="ConnectionClosed" ID="0000000000000000" Error="connection_failed" ErrorDescription="Network connection failed" ErrorCode="1026" SuppressedEventCount="0" PeerAddr="10.0.3.55:4505" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="10" Time="1571067231.198916" Type="PeerDestroy" ID="0000000000000000" Error="connection_failed" ErrorDescription="Network connection failed" ErrorCode="1026" SuppressedEventCount="0" PeerAddr="10.0.3.55:4505" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="20" Time="1571067231.198916" Type="N2_ConnectError" ID="21485d34b1e34702" SuppressedEventCount="0" Message="125" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="10" Time="1571067231.208964" Type="ConnectingTo" ID="0000000000000000" SuppressedEventCount="0" PeerAddr="10.0.3.54:4501" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="10" Time="1571067232.184137" Type="MachineLoadDetail" ID="0000000000000000" User="43595164" Nice="19859" System="43143020" Idle="9234108921" IOWait="121941" IRQ="0" SoftIRQ="1258912" Steal="0" Guest="0" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="10" Time="1571067232.184137" Type="MemoryMetrics" ID="0000000000000000" TotalMemory16="262144" ApproximateUnusedMemory16="0" ActiveThreads16="2" TotalMemory32="262144" ApproximateUnusedMemory32="0" ActiveThreads32="2" TotalMemory64="393216" ApproximateUnusedMemory64="0" ActiveThreads64="3" TotalMemory128="262144" ApproximateUnusedMemory128="0" ActiveThreads128="2" TotalMemory256="262144" ApproximateUnusedMemory256="0" ActiveThreads256="2" TotalMemory512="131072" ApproximateUnusedMemory512="0" ActiveThreads512="1" TotalMemory1024="262144" ApproximateUnusedMemory1024="0" ActiveThreads1024="2" TotalMemory2048="131072" ApproximateUnusedMemory2048="0" ActiveThreads2048="1" TotalMemory4096="131072" ApproximateUnusedMemory4096="0" ActiveThreads4096="1" TotalMemory8192="0" ApproximateUnusedMemory8192="0" ActiveThreads8192="0" HugeArenaMemory="262224" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="10" Time="1571067232.184137" Type="NetworkMetrics" ID="0000000000000000" CantSleep="0" WontSleep="5" Yields="6" YieldCalls="0" YieldCallsTrue="0" SlowTaskSignals="0" YieldBigStack="0" RunLoopIterations="379" TimersExecuted="216" TasksExecuted="407" ASIOEventsProcessed="484" ReadCalls="99" WriteCalls="98" ReadProbes="99" WriteProbes="0" PacketsRead="94" PacketsGenerated="102" WouldBlock="4" SlowTask8M="1" S2Pri1="1.08249e-06" S2Pri2050="1.08249e-06" S2Pri3050="1.08249e-06" S2Pri4050="1.08249e-06" S2Pri4950="1.08249e-06" S2Pri5050="1.08249e-06" S2Pri7050="3.69275e-08" S2Pri8050="3.69275e-08" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="10" Time="1571067232.184137" Type="ProcessMetrics" ID="0000000000000000" Elapsed="4.99944" CPUSeconds="0.036" MainThreadCPUSeconds="0.032" UptimeSeconds="5.00008" Memory="384327680" ResidentMemory="12668928" UnusedAllocatedMemory="0" MbpsSent="0.00922983" MbpsReceived="0.0122734" DiskTotalBytes="0" DiskFreeBytes="0" DiskQueueDepth="0" DiskIdleSeconds="0" DiskReads="0" DiskWrites="0" DiskReadsCount="0" DiskWritesCount="0" DiskWriteSectors="0" DiskReadSectors="0" FileWrites="0" FileReads="0" CacheReadBytes="0" CacheFinds="0" CacheWritesBlocked="0" CacheReadsBlocked="0" CachePageReadsMerged="0" CacheWrites="0" CacheReads="0" ZoneID="[not set]" MachineID="[not set]" AIOSubmitCount="0" AIOCollectCount="0" AIOSubmitLag="0" AIODiskStall="0" CurrentConnections="4" ConnectionsEstablished="0.80009" ConnectionsClosed="0" ConnectionErrors="0" Machine="10.0.3.50:32969" LogGroup="default" TrackLatestType="Original" />
<Event Severity="10" Time="1571067232.184137" Type="MachineMetrics" ID="0000000000000000" Elapsed="4.99944" MbpsSent="0.0656537" MbpsReceived="0.106815" OutSegs="362" RetransSegs="2" CPUSeconds="0.0514017" TotalMemory="541026975744" CommittedMemory="7319822336" AvailableMemory="533707153408" ZoneID="[not set]" MachineID="[not set]" Machine="10.0.3.50:32969" LogGroup="default" TrackLatestType="Original" />
<Event Severity="10" Time="1571067232.187712" Type="TransactionMetrics" ID="0000000000000000" Cluster="KE0R6iP1" ReadVersions="2" LogicalUncachedReads="1" PhysicalReadRequests="1" CommittedMutations="2" CommittedMutationBytes="74" CommitStarted="1" CommitCompleted="1" TooOld="0" FutureVersions="0" NotCommitted="0" MaybeCommitted="0" ResourceConstrained="0" ProcessBehind="0" MeanLatency="0.00783944" MedianLatency="0.00783944" Latency90="0.00783944" Latency98="0.00783944" MaxLatency="0.00783944" MeanRowReadLatency="0" MedianRowReadLatency="0" MaxRowReadLatency="0" MeanGRVLatency="0.00276577" MedianGRVLatency="0.000789881" MaxGRVLatency="0.00474167" MeanCommitLatency="0.00309777" MedianCommitLatency="0.00309777" MaxCommitLatency="0.00309777" MeanMutationsPerCommit="2" MedianMutationsPerCommit="2" MaxMutationsPerCommit="2" MeanBytesPerCommit="74" MedianBytesPerCommit="74" MaxBytesPerCommit="74" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="10" Time="1571067233.209009" Type="ConnectionTimedOut" ID="0000000000000000" SuppressedEventCount="0" PeerAddr="10.0.3.54:4501" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="10" Time="1571067233.209009" Type="ConnectionClosed" ID="0000000000000000" Error="connection_failed" ErrorDescription="Network connection failed" ErrorCode="1026" SuppressedEventCount="0" PeerAddr="10.0.3.54:4501" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="10" Time="1571067233.209009" Type="PeerDestroy" ID="0000000000000000" Error="connection_failed" ErrorDescription="Network connection failed" ErrorCode="1026" SuppressedEventCount="0" PeerAddr="10.0.3.54:4501" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="20" Time="1571067233.209009" Type="N2_ConnectError" ID="f0326f80fb6add1c" SuppressedEventCount="0" Message="125" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="10" Time="1571067233.229081" Type="ConnectingTo" ID="0000000000000000" SuppressedEventCount="0" PeerAddr="10.0.3.55:4505" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="10" Time="1571067235.229131" Type="ConnectionTimedOut" ID="0000000000000000" SuppressedEventCount="0" PeerAddr="10.0.3.55:4505" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="10" Time="1571067235.229131" Type="ConnectionClosed" ID="0000000000000000" Error="connection_failed" ErrorDescription="Network connection failed" ErrorCode="1026" SuppressedEventCount="0" PeerAddr="10.0.3.55:4505" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="10" Time="1571067235.229131" Type="PeerDestroy" ID="0000000000000000" Error="connection_failed" ErrorDescription="Network connection failed" ErrorCode="1026" SuppressedEventCount="0" PeerAddr="10.0.3.55:4505" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="20" Time="1571067235.229131" Type="N2_ConnectError" ID="ed858142cef5dd15" SuppressedEventCount="0" Message="125" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="10" Time="1571067235.269188" Type="ConnectingTo" ID="0000000000000000" SuppressedEventCount="0" PeerAddr="10.0.3.54:4501" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="10" Time="1571067237.184169" Type="MachineLoadDetail" ID="0000000000000000" User="43595222" Nice="19860" System="43143095" Idle="9234124712" IOWait="121941" IRQ="0" SoftIRQ="1258912" Steal="0" Guest="0" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="10" Time="1571067237.184169" Type="MemoryMetrics" ID="0000000000000000" TotalMemory16="262144" ApproximateUnusedMemory16="0" ActiveThreads16="2" TotalMemory32="262144" ApproximateUnusedMemory32="0" ActiveThreads32="2" TotalMemory64="393216" ApproximateUnusedMemory64="0" ActiveThreads64="3" TotalMemory128="262144" ApproximateUnusedMemory128="0" ActiveThreads128="2" TotalMemory256="262144" ApproximateUnusedMemory256="0" ActiveThreads256="2" TotalMemory512="131072" ApproximateUnusedMemory512="0" ActiveThreads512="1" TotalMemory1024="262144" ApproximateUnusedMemory1024="0" ActiveThreads1024="2" TotalMemory2048="131072" ApproximateUnusedMemory2048="0" ActiveThreads2048="1" TotalMemory4096="131072" ApproximateUnusedMemory4096="0" ActiveThreads4096="1" TotalMemory8192="0" ApproximateUnusedMemory8192="0" ActiveThreads8192="0" HugeArenaMemory="262224" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="10" Time="1571067237.184169" Type="NetworkMetrics" ID="0000000000000000" CantSleep="0" WontSleep="1" Yields="1" YieldCalls="0" YieldCallsTrue="0" SlowTaskSignals="0" YieldBigStack="0" RunLoopIterations="417" TimersExecuted="256" TasksExecuted="419" ASIOEventsProcessed="506" ReadCalls="90" WriteCalls="90" ReadProbes="90" WriteProbes="0" PacketsRead="90" PacketsGenerated="92" WouldBlock="0" SlowTask4M="1" S2Pri1="9.34235e-08" S2Pri2050="9.34235e-08" S2Pri3050="9.34235e-08" S2Pri4050="9.34235e-08" S2Pri4950="9.34235e-08" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="10" Time="1571067237.184169" Type="ProcessMetrics" ID="0000000000000000" Elapsed="5.00002" CPUSeconds="0.028" MainThreadCPUSeconds="0.028" UptimeSeconds="10.0001" Memory="384327680" ResidentMemory="14254080" UnusedAllocatedMemory="0" MbpsSent="0.00679998" MbpsReceived="0.00575998" DiskTotalBytes="0" DiskFreeBytes="0" DiskQueueDepth="0" DiskIdleSeconds="0" DiskReads="0" DiskWrites="0" DiskReadsCount="0" DiskWritesCount="0" DiskWriteSectors="0" DiskReadSectors="0" FileWrites="0" FileReads="0" CacheReadBytes="0" CacheFinds="0" CacheWritesBlocked="0" CacheReadsBlocked="0" CachePageReadsMerged="0" CacheWrites="0" CacheReads="0" ZoneID="[not set]" MachineID="[not set]" AIOSubmitCount="0" AIOCollectCount="0" AIOSubmitLag="0" AIODiskStall="0" CurrentConnections="4" ConnectionsEstablished="0" ConnectionsClosed="0" ConnectionErrors="0" Machine="10.0.3.50:32969" LogGroup="default" TrackLatestType="Original" />
<Event Severity="10" Time="1571067237.184169" Type="MachineMetrics" ID="0000000000000000" Elapsed="5.00002" MbpsSent="0.0542158" MbpsReceived="0.0872717" OutSegs="270" RetransSegs="3" CPUSeconds="0.0420724" TotalMemory="541026975744" CommittedMemory="7318573056" AvailableMemory="533708402688" ZoneID="[not set]" MachineID="[not set]" Machine="10.0.3.50:32969" LogGroup="default" TrackLatestType="Original" />
<Event Severity="10" Time="1571067237.187919" Type="TransactionMetrics" ID="0000000000000000" Cluster="KE0R6iP1" ReadVersions="2" LogicalUncachedReads="1" PhysicalReadRequests="1" CommittedMutations="2" CommittedMutationBytes="74" CommitStarted="1" CommitCompleted="1" TooOld="0" FutureVersions="0" NotCommitted="0" MaybeCommitted="0" ResourceConstrained="0" ProcessBehind="0" MeanLatency="0" MedianLatency="0" Latency90="0" Latency98="0" MaxLatency="0" MeanRowReadLatency="0" MedianRowReadLatency="0" MaxRowReadLatency="0" MeanGRVLatency="0" MedianGRVLatency="0" MaxGRVLatency="0" MeanCommitLatency="0" MedianCommitLatency="0" MaxCommitLatency="0" MeanMutationsPerCommit="0" MedianMutationsPerCommit="0" MaxMutationsPerCommit="0" MeanBytesPerCommit="0" MedianBytesPerCommit="0" MaxBytesPerCommit="0" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="10" Time="1571067237.269246" Type="ConnectionTimedOut" ID="0000000000000000" SuppressedEventCount="0" PeerAddr="10.0.3.54:4501" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="10" Time="1571067237.269246" Type="ConnectionClosed" ID="0000000000000000" Error="connection_failed" ErrorDescription="Network connection failed" ErrorCode="1026" SuppressedEventCount="0" PeerAddr="10.0.3.54:4501" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="10" Time="1571067237.269246" Type="PeerDestroy" ID="0000000000000000" Error="connection_failed" ErrorDescription="Network connection failed" ErrorCode="1026" SuppressedEventCount="0" PeerAddr="10.0.3.54:4501" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="20" Time="1571067237.269246" Type="N2_ConnectError" ID="adb88189bdf95ad2" SuppressedEventCount="0" Message="125" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="10" Time="1571067237.349306" Type="ConnectingTo" ID="0000000000000000" SuppressedEventCount="0" PeerAddr="10.0.3.55:4505" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="10" Time="1571067239.349355" Type="ConnectionTimedOut" ID="0000000000000000" SuppressedEventCount="0" PeerAddr="10.0.3.55:4505" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="10" Time="1571067239.349355" Type="ConnectionClosed" ID="0000000000000000" Error="connection_failed" ErrorDescription="Network connection failed" ErrorCode="1026" SuppressedEventCount="0" PeerAddr="10.0.3.55:4505" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="10" Time="1571067239.349355" Type="PeerDestroy" ID="0000000000000000" Error="connection_failed" ErrorDescription="Network connection failed" ErrorCode="1026" SuppressedEventCount="0" PeerAddr="10.0.3.55:4505" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="20" Time="1571067239.349355" Type="N2_ConnectError" ID="17546bb4ade49e90" SuppressedEventCount="0" Message="125" Machine="10.0.3.50:32969" LogGroup="default" />
<Event Severity="10" Time="1571067239.509372" Type="ConnectingTo" ID="0000000000000000" SuppressedEventCount="0" PeerAddr="10.0.3.54:4501" Machine="10.0.3.50:32969" LogGroup="default" />
I was able to test your code (against a 6.2 cluster, to be fair), and it worked ok. It’s interesting that you are able to commit a transaction, as that implies that your database is available. It is the case, though, that the transaction you run prior to creating a directory doesn’t actually do any reads against the database, since the only key you read is also one that you wrote. Maybe there’s some issue with accessing some of the storage servers?
Given that the database is small, you could test that it is readable from your client host by doing a range read over the entire key space. You could use fdbcli for this by running something like the following with a sufficiently large limit.
> getrange "" \xff <limit>
The fact that you are having connections time out is interesting and probably the thread that I would follow. Do you have any logs on the server side that might provide clues on the other side of the connection?
FWIW, my issue with directory.CreateOrOpen() hanging was this one.
Specifically, I was not using the multiversion client.
Environment: macos 5.2.8 go client trying to talk to a 6.2.19 linux FDB server running in a debian docker container
Fix I tested:
- download latest 6.2 .pkg from https://www.foundationdb.org/download/ to ~/Desktop
- cd ~/Desktop
- /bin/rm -rf ./FoundationExpand
- pkgutil --expand-full FoundationDB-6.2.*.pkg ~/Desktop/FoundationExpand
- mkdir -p /tmp/externnalclient
- /bin/cp ~/Desktop/FoundationExpand/FoundationDB-clients.pkg/Payload/usr/local/lib/libfdb_c.dylib /tmp/externalclient/libfdb_c.dylib /tmp/externalclient
- export EXTERNAL_CLIENT_DIRECTORY=/tmp/externalclient # or use fdb.Options().SetExternalClientDirectory(“/tmp/externalclient”)
After this, the go client was able to create/open the directory, set, get in that 6.2.19 cluster.