Client side buggify - internal error?

Hi @markus.pilman @ajbeamon , I recently upgrade to 6.2.10, and have started to use setClientBuggifyEnable() option for testing.

In addition to retry-able errors being thrown by fdb client, sometimes an internal_error (4100) is thrown; is this expected? I believe that layer code will be unable to handle this class of errors, other than failing the transaction.

Was this error intentionally thrown from buggify() code? If so, is there any guidance on how should one deal with this error in the layer code (other than bailing out on the transaction)?

I am seeing following error line when this error is being thrown:

Internal Error @ ./flow/flow.h 456:
  atos -o libfdb_c.dylib-debug -arch x86_64 -l 0x123eac000 0x1242a4759 0x12413e1f5 0x12410aa5b 0x12410ab37 0x123fc6c19 0x1240f7e86 0x1240f9298 0x1240f8a35 0x12410a8a8 0x1240fccf8 0x1240f7c53 0x124109638 0x124104052 0x124108658 0x124108246 0x124106e48 0x124106666 0x124102dc8 0x1241022df 0x1240ee978 0x1240ee7b6 0x12422eb99 0x12422e5a5 0x1242f73a8 0x1243382f6 0x124333b0d 0x123ffb487 0x1241c933c 0x123fc23e4 0x123ead958 0x1233ca6bf 0x109bc5c84 0x109bae33d
	Internal Error @ ./flow/flow.h 456:
atos -o libfdb_c.dylib-debug -arch x86_64 -l 0x129e29000 0x12a221759 0x12a0bb1f5 0x12a087a5b 0x12a087b37 0x129f43c19 0x12a074e86 0x12a076298 0x12a075a35 0x12a0878a8 0x12a079cf8 0x12a074c53 0x12a086638 0x12a081052 0x12a085658 0x12a085246 0x12a083e48 0x12a083666 0x12a07fdc8 0x12a07f2df 0x12a06b978 0x12a06b7b6 0x12a1abb99 0x12a1ab5a5 0x12a2743a8 0x12a2b52f6 0x12a2b0b0d 0x129f78487 0x12a14633c 0x129f3f3e4 0x129e2a958 0x1289226bf 0x1139b59f4 0x1139a62bd

Sample settings that generate this error:

final com.apple.foundationdb.FDB fdb = com.apple.foundationdb.FDB.selectAPIVersion(620);
fdb.options().setClientBuggifyEnable();
fdb.options().setClientBuggifySectionActivatedProbability(10);
fdb.options().setClientBuggifySectionFiredProbability(10);

Throwing internal error is not expected. I’ll create a new issue on github and reference this post

Edit: https://github.com/apple/foundationdb/issues/2424

1 Like

Thanks Andrew! Wanted to check if the fix will be ported to 6.2? As it is, this is causing crashes in tests that rely on buggify.

Btw, I am running OSX 10.14.6 . I hope that is not causing this issue.
Reason for suspecting this: Trying to build foundationdb 6.2 branch locally, results in build error:

foundationdb/flow/Deque.h:170:19: error: 'aligned_alloc' is only available on macOS 10.15 or newer [-Werror,-Wunguarded-availability-new]
                T* newArr = (T*)aligned_alloc(std::max(__alignof(T), sizeof(void*)),

This looks like a pretty straightforward problem, and we can release the fix on 6.2.

You can resolve the aligned_alloc issue if you define HAS_ALIGNED_ALLOC when building. This is a configuration parameter you can set when doing the cmake build.

1 Like

If you build with cmake there’s no need to set this manually. CMake will automatically set this variable to the right value (it will check on config time whether this symbols exists). Therefore cmake-builds should just work.

2 Likes

It is not very important to me at the moment, so please do not spend much time on it - I have tried the default cmake based build process, and verified that it detects -- Has aligned_alloc: false, but I still see the build error that I posted earlier.

Build commands and output: http://snippi.com/s/gjedcm6

Something seems to be wrong here. From your cmake output:

Has aligned_alloc: false

But on Catalina this should be true. I am not quite sure what to do here… I build on Catalina quite often and it works without any issues? Can you please make sure that your Xcode is up to date and if it still doesn’t work can you create a github issue? I don’t think the forum is the best place to track this.

By the way, here is a PR for the fix: https://github.com/apple/foundationdb/pull/2427.

1 Like