Scalable time series database / Improving write throughput

It’s clearly obvious that ping is not the problem.

PING 10.112.15.97 (10.112.15.97) 56(84) bytes of data.
64 bytes from 10.112.15.97: icmp_seq=1 ttl=62 time=2.20 ms
64 bytes from 10.112.15.97: icmp_seq=2 ttl=62 time=0.148 ms
64 bytes from 10.112.15.97: icmp_seq=3 ttl=62 time=0.184 ms
64 bytes from 10.112.15.97: icmp_seq=4 ttl=62 time=0.181 ms
64 bytes from 10.112.15.97: icmp_seq=5 ttl=62 time=0.186 ms

If I understand correctly, one main issue of my implementation may be that I write serially the metrics for each machine in one transaction (this applies for the reads that I need for the aggregations as well) is that correct ? Since this implementation is written in python, which framework or library do you propose for parallelizing this process?