Fdb-joshua: Unable to get any tests running

Hi,

I am trying to setup fdb-joshua on a single node machine as a working protoype. I have the docker image created and running. I use the following to run the image in detached mode.

docker run -d --env FDB_CLUSTER_FILE=/opt/joshua/fdb.cluster --rm -v /etc/foundationdb/:/opt/joshua -it foundationdb/joshua-agent:latest

here the /etc/foundation/fdb.cluster will be my coordinating fdb

Further I have my custom build in place, and the tarball created in ./packages/ correctness-6.3.21.tar.gz

I then use the below command to submit the tests on the coordinating fdb
python3 -m joshua.joshua --cluster-file /etc/foundationdb/fdb.cluster start --tarball /home/ubuntu/fdbtesting/fdbbuild/packages/correctness-6.3.21.tar.gz

However, when I am trying to list the tests I keep seeing the below, and no progress

Currently active ensembles:
20220209-224155-ubuntu-e0003470186a0fb1 compressed=True data_size=22166943 fail_fast=10 max_runs=100000 priority=100 remaining=not_started runtime=0:08:09 sanity=False submitted=20220209-224155 timeout=5400 username=ubuntu

Can someone help understand what the issue might be?
@jzhou I apologize to tag you here, but I see you have responded to questions on fdb-joshua. Just looking to get some help. I am new to both fdb and fdb-joshua.

It looks like you have successfully submitted the correctness package to the FDB cluster. The problem seems to be that the Joshua agent is not taking the jobs. What I’d suggest is to go inside your joshua-agent docker, and manually verify that you could connect to the FDB cluster, e.g., running python3 -m joshua.joshua --cluster-file file_path list.

If you could connect the the FDB cluster, next thing you can check is to look at the agent logs, which is probably inside the /tmp.

Thank you for the reply @jzhou . I keep getting the below error inside the docker

sh-4.2$ python3 -m joshua.joshua
/usr/bin/python3: Error while finding module specification for ‘joshua.joshua’ (ModuleNotFoundError: No module named ‘joshua’)

Although I see that the Dockerfile has instructions to install the module.

Insider docker, run

source /opt/rh/devtoolset-8/enable && source /opt/rh/rh-python38/enable && source /opt/rh/rh-ruby27/enable python3 -m joshua.joshua

I see @jzhou thank you. I ran this inside the docker container, but don’t see nay output. It just sits on the command.


but I do see the list from where I intend to run the client, which is the same machine on which I have the docker running for now.

Yeah, that’s what I suspect: insider docker, you can’t connect to the FDB cluster. You need to solve that so that the Joshua agent can work.

1 Like

Thank you so much for helping me with this. @jzhou

Hi @jzhou thank you for all your help before. I was able to run the tests, but they keep failing with the error
Output="/var/joshua/ensembles/20220217-190339-ubuntu-33a80effc351c351/bin/fdbserver: /lib64/libc.so.6: version GLIBC_2.25’ not found`

The reason is the foundationdb binaries are compiled with glibc 2.25, but the OS on the joshua machine (or a container) has a older version.

The best option is to use Centos8 instead of Centos7.

Thank you @osamarin . I tried with centos8, and it worked fine. The only issue is the correctness tests are taking a long time, like been running for few hours now.

@saurabh.chatterjee

The duration of the tests depends on the --max-runs parameter value of the joshua start command.

A less value causes a quicker run but with a smaler test coverage.

1 Like