I am trying to setup fdb-joshua on a single node machine as a working protoype. I have the docker image created and running. I use the following to run the image in detached mode.
docker run -d --env FDB_CLUSTER_FILE=/opt/joshua/fdb.cluster --rm -v /etc/foundationdb/:/opt/joshua -it foundationdb/joshua-agent:latest
here the /etc/foundation/fdb.cluster will be my coordinating fdb
Further I have my custom build in place, and the tarball created in ./packages/ correctness-6.3.21.tar.gz
I then use the below command to submit the tests on the coordinating fdb
python3 -m joshua.joshua --cluster-file /etc/foundationdb/fdb.cluster start --tarball /home/ubuntu/fdbtesting/fdbbuild/packages/correctness-6.3.21.tar.gz
However, when I am trying to list the tests I keep seeing the below, and no progress
Currently active ensembles:
20220209-224155-ubuntu-e0003470186a0fb1 compressed=True data_size=22166943 fail_fast=10 max_runs=100000 priority=100 remaining=not_started runtime=0:08:09 sanity=False submitted=20220209-224155 timeout=5400 username=ubuntu
Can someone help understand what the issue might be? @jzhou I apologize to tag you here, but I see you have responded to questions on fdb-joshua. Just looking to get some help. I am new to both fdb and fdb-joshua.
It looks like you have successfully submitted the correctness package to the FDB cluster. The problem seems to be that the Joshua agent is not taking the jobs. What I’d suggest is to go inside your joshua-agent docker, and manually verify that you could connect to the FDB cluster, e.g., running python3 -m joshua.joshua --cluster-file file_path list.
If you could connect the the FDB cluster, next thing you can check is to look at the agent logs, which is probably inside the /tmp.
Thank you for the reply @jzhou . I keep getting the below error inside the docker
sh-4.2$ python3 -m joshua.joshua
/usr/bin/python3: Error while finding module specification for ‘joshua.joshua’ (ModuleNotFoundError: No module named ‘joshua’)
Although I see that the Dockerfile has instructions to install the module.
Hi @jzhou thank you for all your help before. I was able to run the tests, but they keep failing with the error Output="/var/joshua/ensembles/20220217-190339-ubuntu-33a80effc351c351/bin/fdbserver: /lib64/libc.so.6: version GLIBC_2.25’ not found`
Thank you @osamarin . I tried with centos8, and it worked fine. The only issue is the correctness tests are taking a long time, like been running for few hours now.