There is already an existing port but it uses the older Makefile build, and therefore doesn’t build C or other bindings. Using the newer cmake structure will provide this. So, I have taken Jesse’s amazing work and prepped it into a port for FreeBSD, so it can simply be pkg install -r FreeBSD databases/foundationdb-devel.
hopefully rebase successfully & pass all the tests
fails
The failures reported follow, I’ve attached the traces to the gist if these are useful, as tar.xz and also the ninja check build report. I don’t intend to address any of these failures as it makes more sense to focus on hitting 7.0.0.
97% tests passed, 3 tests failed out of 103
Label Time Summary:
fast = 921.60 sec*proc (38 tests)
from_6.2.0 = 7365.74 sec*proc (4 tests)
rare = 347.28 sec*proc (14 tests)
restarting = 82.59 sec*proc (3 tests)
slow = 1089.42 sec*proc (31 tests)
status = 13.68 sec*proc (12 tests)
Total Test time (real) = 10218.50 sec
The following tests FAILED:
1 - SimpleExternalTest (Failed)
58 - restarting/from_6.2.0/SnapTestSimpleRestart-1 (Timeout)
60 - restarting/from_6.2.0/SnapCycleRestart-1 (Timeout)
questions
the trace logs are a bit intimidating, are there any tools or scripts that help diagnose these?
what’s the requirements for upstreaming Jesse’s work into the main repo, test failures notwithstanding?
These are… a bit strange of tests. SimpleExternalTest loads a .so file that has a workload in it. I think snapshotting tests were added after Jesse did most of his work, and added additional platform-specific code, so I’m guessing a FreeBSD version of that is missing.
I’d be fine with merging a PR where these tests still fail. They’re unlikely to impact anyone’s current use of FDB on FreeBSD anyway, and fixing them up later should be fine. Grepping for Severity="40" should tell you what in a test actually failed, and anything beyond that, I’ll go debug.
Can you go re-run ctest with --output-on-failure set? The trace logs for the snapshotting tests don’t actually seem to show a failure, and I’m hoping something was printed to stdout/stderr instead?
Could you symbolicate that? I’d assume it should point towards the failure.
But either way, if we reach a point where master builds, or your port is landed so that there’s a simple one-step build command, then it’s much easier for me to just do the running and debugging on freebsd myself.
I have no idea what symbolicate means but here’s the first test. This is being run in a ramdisk on a rather well spec’d machine, if that should make a difference.
I trimmed the time in the test as I don’t want to wait an actual hour for each failure - is this correct? It’s not clear to me if timeout & testDuration are simulated or actual clock time.
ctest --output-on-failure -v
Test project /tmp/wolfspider/build
Start 1: SimpleExternalTest
1/103 Test #1: SimpleExternalTest ..............................***Failed 416.48 sec
2020-01-23 10:45:23,261 - 33485 - INFO - COMMAND: /tmp/wolfspider/build/bin/fdbserver -r simulation -s 1272856103 -b on --trace_format xml -s 1272856103 -f /tmp/wolfspider/tests/SimpleExternalTest.txt
Random seed is 1272856103...
Datacenter 0: 3/12 machines, 1/1 coordinators
Datacenter 1: 3/12 machines, 0/1 coordinators
Datacenter 2: 3/12 machines, 0/1 coordinators
Datacenter 3: 3/12 machines, 0/1 coordinators
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
startingConfiguration:new log_spill:=2 triple regions=[{"datacenters":[{"id":"0","priority":2},{"id":"2","priority":1,"satellite":1}]},{"datacenters":[{"id":"1","priority":1},{"id":"3","priority":1,"satellite":1}]}] remote_double resolvers:=5 ssd-2 usable_regions:=1 start
setting up test (ChangeConfig)...
Test received trigger for setup...
Test received trigger for setup...
Test received trigger for setup...
Test received trigger for setup...
Test received trigger for setup...
running test (ChangeConfig)...
ChangeConfig complete
ChangeConfig complete
ChangeConfig complete
ChangeConfig complete
ChangeConfig complete
ChangeConfig complete
checking test (ChangeConfig)...
fetching metrics (ChangeConfig)...
SimulatedFDBDTerminated: Platform error
SimulatedFDBDTerminated: Platform error
SimulatedFDBDTerminated: Platform error
SimulatedFDBDTerminated: Platform error
SimulatedFDBDTerminated: Platform error
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
ERROR: Test timed out after 30000 seconds.
0 test clients passed; 5 test clients failed
Unseed: 62347
Elapsed: 36001.000000 simsec, 333.358084 real seconds
TestSystem 1.1.1.1:1: 0.000 Mclocks
Server 2.1.1.0:1: 0.000 Mclocks
Server 3.4.3.2:1: 0.000 Mclocks
Server 3.4.3.4:1: 0.000 Mclocks
Server 3.4.3.5:1: 0.000 Mclocks
Server 2.0.1.1:1: 0.000 Mclocks
Server 2.3.1.1:1: 0.000 Mclocks
Server 2.1.1.1:1: 0.000 Mclocks
Server 2.3.1.2:1: 0.000 Mclocks
Server 3.4.3.1:1: 0.000 Mclocks
Server 2.2.1.2:1: 0.000 Mclocks
Server 3.4.3.3:1: 0.000 Mclocks
Server 2.1.1.2:1: 0.000 Mclocks
Server 2.2.1.1:1: 0.000 Mclocks
Server 2.0.1.2:1: 0.000 Mclocks
Server 2.2.1.0:1: 0.000 Mclocks
Server 2.0.1.0:1: 0.000 Mclocks
Server 2.3.1.0:1: 0.000 Mclocks
22 SevError events logged
Delete /tmp/wolfspider/build/test_runs/2020_01_23__10_45_23/test_SimpleExternalTest/simfdb
I will submit the port today, it should be available 2-3 days later on the next package build run.
And the last 2 fails, where I didn’t (yet) alter the timeouts:
58/103 Test #58: restarting/from_6.2.0/SnapTestSimpleRestart-1 ...***Timeout 3600.21 sec
2020-01-23 11:14:20,532 - 88279 - INFO - Run old binary at /usr/local/bin/fdbserver
2020-01-23 11:14:20,532 - 88279 - INFO - COMMAND: /usr/local/bin/fdbserver -r simulation -s 1272856164 -b on --trace_format xml -s 1272856164 -f /tmp/wolfspider/tests/restarting/from_6.2.0/SnapTestSimpleRestart-1.txt
Random seed is 1272856164...
Datacenter 0: 8/8 machines, 1/1 coordinators
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
startingConfiguration:new log_spill:=2 log_version:=4 triple resolvers:=5 memory-2 usable_regions:=1 start
FDBD joined cluster.
setting up test (ChangeConfig)...
Test received trigger for setup...
Test received trigger for setup...
Test received trigger for setup...
Test received trigger for setup...
running test (ChangeConfig)...
ChangeConfig complete
ChangeConfig complete
ChangeConfig complete
ChangeConfig complete
ChangeConfig complete
checking test (ChangeConfig)...
fetching metrics (ChangeConfig)...
setting up test (SnapSimplePre)...
Test received trigger for setup...
Test received trigger for setup...
Test received trigger for setup...
Test received trigger for setup...
running test (SnapSimplePre)...
SnapTest complete
SnapTest complete
SnapTest complete
SnapTest complete
SnapSimplePre complete
checking test (SnapSimplePre)...
fetching metrics (SnapSimplePre)...
setting up test (ConsistencyCheck)...
Test received trigger for setup...
Test received trigger for setup...
Test received trigger for setup...
Test received trigger for setup...
running test (ConsistencyCheck)...
ConsistencyCheck complete
ConsistencyCheck complete
ConsistencyCheck complete
ConsistencyCheck complete
ConsistencyCheck complete
checking test (ConsistencyCheck)...
fetching metrics (ConsistencyCheck)...
4 test clients passed; 0 test clients failed
setting up test (SnapSimpleTakeSnap)...
Test received trigger for setup...
Test received trigger for setup...
Test received trigger for setup...
Test received trigger for setup...
running test (SnapSimpleTakeSnap)...
SnapTest complete
SnapTest complete
SnapTest complete
Start 59: restarting/from_6.2.0/SnapTestRestart-1
59/103 Test #59: restarting/from_6.2.0/SnapTestRestart-1 ......... Passed 62.13 sec
Start 60: restarting/from_6.2.0/SnapCycleRestart-1
60/103 Test #60: restarting/from_6.2.0/SnapCycleRestart-1 ........***Timeout 3600.06 sec
2020-01-23 12:15:22,875 - 73795 - INFO - Run old binary at /usr/local/bin/fdbserver
2020-01-23 12:15:22,875 - 73795 - INFO - COMMAND: /usr/local/bin/fdbserver -r simulation -s 1272856168 -b on --trace_format xml -s 1272856168 -f /tmp/wolfspider/tests/restarting/from_6.2.0/SnapCycleRestart-1.txt
Random seed is 1272856168...
Datacenter 0: 8/8 machines, 1/1 coordinators
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
FDBD joined cluster.
startingConfiguration:new log_spill:=2 logs:=2 proxies:=5 single ssd-2 usable_regions:=1 start
setting up test (ChangeConfig)...
Test received trigger for setup...
Test received trigger for setup...
Test received trigger for setup...
Test received trigger for setup...
Test received trigger for setup...
running test (ChangeConfig)...
ChangeConfig complete
ChangeConfig complete
ChangeConfig complete
ChangeConfig complete
ChangeConfig complete
ChangeConfig complete
checking test (ChangeConfig)...
fetching metrics (ChangeConfig)...
setting up test (SnapCyclePre)...
Test received trigger for setup...
Test received trigger for setup...
Test received trigger for setup...
Test received trigger for setup...
Test received trigger for setup...
running test (SnapCyclePre)...
CycleWorkload;SnapTest complete
CycleWorkload;SnapTest complete
CycleWorkload;SnapTest complete
CycleWorkload;SnapTest complete
I’ll upload the detailed logs, and hopefully you can explain what “symbolicate” means.
That means to convert the backtrace addresses that were logged in the trace event Alex posted into symbols, usually by running the command provided. That way we can see where in the code things are going wrong. In your case, you could get it by running:
OK good progress today on the squash & rebase. The git part has been done, but I don’t expect this yet to build cleanly on any platform yet - there’s still a few changes I need to look at more closely.
That said, it’s close enough for somebody with more expertise to look at the commits and require changes.
a few changes that require more review and further clean-up
I’ll keep working on this locally, to get it to build, and just let me know if this is ready for actual PR, even if it doesn’t quite build yet. If you can check it out on linux or whatever else is used, let me know what breaks during build. Thanks!
I had been waiting on the couple issues I flagged during review to be resolved, but given the maintenance overhead of keeping the PR outstanding, I’ve merged it now, and we can clean up the couple unresolved issues post-merge.
Thanks for your review time, it was really appreciated. I’d like to add a FreeBSD CI image to the build process, whether it runs locally with one of my systems, or it’s something that would be set up in/at apple, is that possible? I’m happy to do the leg work obviously.
All of the existing CI checks are driven off of an Apple-managed Jenkins instance. If we were to make FreeBSD a required check, I think we’d need a centrally managed FreeBSD image/build environment and a story for how we’d update it with new versions/dependencies/repos over time.
I think we’d be able to set up a webhook to notify a jenkins/etc. setup of yours of new PRs and give permissions to a github account to be able to report PR status checks. As we wouldn’t be able to fix anything that goes wrong on the infra side, I think we’d need to leave a check done this way as non-required.
@alvin is the owner of the build infrastructure though, and would have the final say either way.