We use foundationdb 6.0.15 as storage server for our go program.
It’s a very very basic config:
- two machines in this cluster (192.168.111.55, 192.168.111.40)
- one fdb process per machine, and there is no role config for it
- two coordinator, storage config is: double ssd
- data located in a 40GB disk (when problem happens, 37GB is used)
Today our go program hang with following stack:
(gdb) #0 runtime.cgocall (arg=0xc00005a770, fn=0xabb7e0 <_cgo_ce7ede6d4d1e_Cfunc_fdb_run_network>, ~r2=<optimized out>) at /usr/lib/golang/src/runtime/cgocall.go:130
#1 0x0000000000a992c9 in github.com/apple/foundationdb/bindings/go/src/fdb._Cfunc_fdb_run_network (r1=0) at _cgo_gotypes.go:415
#2 0x0000000000aa01e6 in github.com/apple/foundationdb/bindings/go/src/fdb.startNetwork.func1 () at /code/server/src/github.com/apple/foundationdb/bindings/go/src/fdb/fdb.go:214
#3 0x00000000004601a1 in runtime.goexit () at /usr/lib/golang/src/runtime/asm_amd64.s:1333
#4 0x0000000000000000 in ?? ()
(gdb) #0 runtime.gopark (lock=0x14f7660 <runtime.semtable+7552>, reason=17 '\021', traceEv=25 '\031', traceskip=4, unlockf=<optimized out>) at /usr/lib/golang/src/runtime/proc.go:303
#1 0x0000000000433bd3 in runtime.goparkunlock (lock=<optimized out>, reason=<optimized out>, traceEv=<optimized out>, traceskip=<optimized out>) at /usr/lib/golang/src/runtime/proc.go:308
#2 0x00000000004449fc in runtime.semacquire1 (addr=0xc0007415b4, lifo=false, profile=3) at /usr/lib/golang/src/runtime/sema.go:144
#3 0x000000000044472d in sync.runtime_SemacquireMutex (addr=<optimized out>, lifo=<optimized out>) at /usr/lib/golang/src/runtime/sema.go:71
#4 0x000000000047559f in sync.(*Mutex).Lock (m=0xc0007415b0) at /usr/lib/golang/src/sync/mutex.go:134
#5 0x0000000000a9c859 in github.com/apple/foundationdb/bindings/go/src/fdb.fdb_future_block_until_ready (f=0x7fddc401e8e0) at /code/server/src/github.com/apple/foundationdb/bindings/go/src/fdb/futures.go:100
#6 0x0000000000a9c89b in github.com/apple/foundationdb/bindings/go/src/fdb.future.BlockUntilReady (f=...) at /code/server/src/github.com/apple/foundationdb/bindings/go/src/fdb/futures.go:104
#7 0x0000000000a9cbb2 in github.com/apple/foundationdb/bindings/go/src/fdb.futureNil.Get (f=..., ~r0=...) at /code/server/src/github.com/apple/foundationdb/bindings/go/src/fdb/futures.go:244
#8 0x0000000000aa1efc in github.com/apple/foundationdb/bindings/go/src/fdb.(*futureNil).Get (~r0=...) at <autogenerated>:1
#9 0x0000000000aa006c in github.com/apple/foundationdb/bindings/go/src/fdb.Database.Transact.func1 (e=..., ret=...) at /code/server/src/github.com/apple/foundationdb/bindings/go/src/fdb/database.go:139
#10 0x0000000000a9adeb in github.com/apple/foundationdb/bindings/go/src/fdb.retryable (e=..., onError=
{void (github.com/apple/foundationdb/bindings/go/src/fdb.Error, github.com/apple/foundationdb/bindings/go/src/fdb.FutureNil *)} 0xc01f8b36e0, ret=..., wrapped={void (interface {} *, error *)} 0xc01f8b36d8)
at /code/server/src/github.com/apple/foundationdb/bindings/go/src/fdb/database.go:87
#11 0x0000000000a9af72 in github.com/apple/foundationdb/bindings/go/src/fdb.Database.Transact (d=..., f={void (github.com/apple/foundationdb/bindings/go/src/fdb.Transaction, interface {} *, error *)} 0xc01f8b3738, ~r1=..., ~r2=...)
at /code/server/src/github.com/apple/foundationdb/bindings/go/src/fdb/database.go:145
...
(gdb) #0 runtime.gopark (lock=0x14f7ea0 <runtime.semtable+9664>, reason=17 '\021', traceEv=25 '\031', traceskip=4, unlockf=<optimized out>) at /usr/lib/golang/src/runtime/proc.go:303
#1 0x0000000000433bd3 in runtime.goparkunlock (lock=<optimized out>, reason=<optimized out>, traceEv=<optimized out>, traceskip=<optimized out>) at /usr/lib/golang/src/runtime/proc.go:308
#2 0x00000000004449fc in runtime.semacquire1 (addr=0xc04a854834, lifo=false, profile=3) at /usr/lib/golang/src/runtime/sema.go:144
#3 0x000000000044472d in sync.runtime_SemacquireMutex (addr=<optimized out>, lifo=<optimized out>) at /usr/lib/golang/src/runtime/sema.go:71
#4 0x000000000047559f in sync.(*Mutex).Lock (m=0xc04a854830) at /usr/lib/golang/src/sync/mutex.go:134
#5 0x0000000000a9c859 in github.com/apple/foundationdb/bindings/go/src/fdb.fdb_future_block_until_ready (f=0x7fdde80f17a0) at /code/server/src/github.com/apple/foundationdb/bindings/go/src/fdb/futures.go:100
#6 0x0000000000a9c89b in github.com/apple/foundationdb/bindings/go/src/fdb.future.BlockUntilReady (f=...) at /code/server/src/github.com/apple/foundationdb/bindings/go/src/fdb/futures.go:104
#7 0x0000000000aa09fe in github.com/apple/foundationdb/bindings/go/src/fdb.(*futureByteSlice).Get.func1 () at /code/server/src/github.com/apple/foundationdb/bindings/go/src/fdb/futures.go:147
#8 0x0000000000475903 in sync.(*Once).Do (f={void ()} 0xc07ad7b1d0, o=0xc01d481f70) at /usr/lib/golang/src/sync/once.go:44
#9 0x0000000000a9c998 in github.com/apple/foundationdb/bindings/go/src/fdb.(*futureByteSlice).Get (f=<optimized out>, ~r0=..., ~r1=...) at /code/server/src/github.com/apple/foundationdb/bindings/go/src/fdb/futures.go:142
#10 0x0000000000a9ca0b in github.com/apple/foundationdb/bindings/go/src/fdb.(*futureByteSlice).MustGet (f=<optimized out>, ~r0=...) at /code/server/src/github.com/apple/foundationdb/bindings/go/src/fdb/futures.go:164
#11 0x0000000000aad9d5 in server/storage/fdb.(*FDB).Get.func1 (err=..., ret=..., tr=...) at /code/server/src/server/storage/fdb/fdb.go:67
#12 0x0000000000aa0129 in github.com/apple/foundationdb/bindings/go/src/fdb.Database.ReadTransact.func1 (e=..., ret=...) at /code/server/src/github.com/apple/foundationdb/bindings/go/src/fdb/database.go:176
#13 0x0000000000a9adeb in github.com/apple/foundationdb/bindings/go/src/fdb.retryable (e=..., onError=
{void (github.com/apple/foundationdb/bindings/go/src/fdb.Error, github.com/apple/foundationdb/bindings/go/src/fdb.FutureNil *)} 0xc07ad7b328, ret=..., wrapped={void (interface {} *, error *)} 0xc07ad7b320)
at /code/server/src/github.com/apple/foundationdb/bindings/go/src/fdb/database.go:87
#14 0x0000000000a9b0b2 in github.com/apple/foundationdb/bindings/go/src/fdb.Database.ReadTransact (d=..., f={void (github.com/apple/foundationdb/bindings/go/src/fdb.ReadTransaction, interface {} *, error *)} 0xc07ad7b380, ~r1=...,
~r2=...) at /code/server/src/github.com/apple/foundationdb/bindings/go/src/fdb/database.go:185
...
And then foundationdb seems keep reinitialising itself.
-
fdbcli cannot show status, it says:
Initializing new transaction servers and recovering transaction logs.
-
there are several trace logs, they all begin with role recruitment; and it keep create new trace log
This is one trace log file: 55a.tar.gz
And there are some problem about it:
- Why the fdb try to reinit?
- Why the fdb process still the same one?
- There are connection failed, SlowTask, SlowSSLoopx100 in the log. Is this OK?