How to improve foundationdb read throughput?

why I add machine can’t improve the performance linearly?
machine info:
|CPU|Intel(R) Xeon(R) CPU E5-2690 v2 @ 2.60GHz 40 core|
|memory|256GB DDR3|
|disk|3T shannon card|
|os|centos 6.10|
|engine|Memory|

https://apple.github.io/foundationdb/benchmarking.html
official my-test (workload 1B)
one process read(10 key per transaction) 30W reads/second 18W reads/second
2-machine read(10 key per transaction single) ---- 22W reads/second
2-machine read(10 key per transaction double) ---- 27W reads/second
12-machine read(10 key per transaction) 5,540,000 reads/second -----

according to the official artical,add machine can linearly improve read performance,but I just improve 25%.(stoarge cpu usage is 99%)

Configuration:
Redundancy mode - double
Storage engine - memory-2
Coordinators - 1
Usable Regions - 1

Workload:
Read rate - 282414 Hz
Write rate - 0 Hz
Transactions started - 27749 Hz
Transactions committed - 0 Hz
Conflict rate - 0 Hz

Process performance details:
10.205.31.31:4011 ( 99% cpu; 4% machine; 0.161 Gbps; 0% disk IO; 1.3 GB / 8.0 GB RAM )
10.205.31.31:4023 ( 16% cpu; 4% machine; 0.161 Gbps; 0% disk IO; 0.2 GB / 8.0 GB RAM )
10.205.31.31:4024 ( 0% cpu; 4% machine; 0.161 Gbps; 0% disk IO; 0.6 GB / 8.0 GB RAM )
10.205.31.32:4011 ( 99% cpu; 3% machine; 0.199 Gbps; 0% disk IO; 1.3 GB / 8.0 GB RAM )
10.205.31.32:4023 ( 0% cpu; 3% machine; 0.199 Gbps; 0% disk IO; 0.2 GB / 8.0 GB RAM )
10.205.31.32:4024 ( 1% cpu; 3% machine; 0.199 Gbps; 0% disk IO; 0.4 GB / 8.0 GB RAM )

Coordination servers:
10.205.31.32:4011 (reachable)

here is my test method:
go + python
every go process can only create one network threads,so I use python to open multi go process to open multi network threads,in order to fully use every network threads,I use 20 goroutine per go process

db connection, I reduce the times of client accesses to proxy and log process,but the read performance can’t improve linearly

db := getFdbConnection(fdbClusterFile)
_ = db.Options().SetTransactionCausalReadRisky()

main method: just only cyclic reading

_, err := db.Transact(func (tr fdb.Transaction) (ret interface{}, e error) {
		_ = tr.Options().SetSnapshotRywDisable()
		for i :=0; i < transactionSize; i++ {
			key_name := fmt.Sprintf("%d", rand.Intn(keyspacelen))
			t6:=time.Now().UnixNano()
			_ = tr.Get(fdb.Key(key_name)).MustGet()
			t7:=time.Now().UnixNano()
			usedTime = append(usedTime, t7 - t6)
			atomic.AddInt64(&finishRequest, 1)
			atomic.AddInt64(&totalUsedTime, t7 - t6)
		}
		return
	})

Is adding machine can improve read performance linearly?

I also try to add storage process ,can improve performance
20 go client and every go client has 20 goroutine
10 operation per transaction
1 stateless 1transaction

                        performance (1B)

one storage 22W qps
two stoarge 30W qps
three storage 32W qps
eight storage 40W qps

I don’t know why. Are there any way to improve the read performance ?
all the benchmarking all the cpu usage of storage process is up to 99% and (proxy and log) process just 1~2% (because I SetTransactionCausalReadRisky())

Is my benchmarking method problem ?

I also try to use ycsb,foundationdb flowbench ,and the official test process

ycsb test : why the test data is record by record ?
foundationdb flowbench : Compiled successfully but run the flowbench has the error “segment fault”

test process: Client Testing — FoundationDB 6.3
I can;t run this…
/home/apps/lxl/fdb_binaries/fdbserver -r multitest -f /home/apps/lxl/foundationdb-master/tests/RandomRead.txt (open 3 test process) … It just stuck there… and run fdbcli status details… nothing happen