The storage servers pulls data aggressively from log servers for performance, including that not has been successfully commited or just persisted for the log server.
So my question is, is all persisted data on SQLite of a storage server committed? If so, how can it be achieved? Is it possible that the data reaches the rentention of 5s and is compacted and persisted from mvcc to SQLite but the transaction is finally aborted?
The storage server won’t persist data that will be rolled back later. This is because storage servers keep a 5s MVCC window of data in memory and only persist data after they are committed. Specifically, when peeking data from transaction logs, the TLogPeekReply
message contains a minKnownCommittedVersion
field, which essentially means any version before or equal to this number has been persisted on 3 logs, thus having no chance of rolling back during recovery. And storage server persists up to 5s before minKnownCommittedVersion
, which obviously must be committed.
Thanks a lot. Got it! I haven’t found this insight in either your paper or the documentation. So a storage server may still have data 5s ago in its MVCC because the data hasn’t been committed yet(perhaps because that the logservers are too slow), right?
Yes, storage server may have data not persisted on log servers yet. This is by design to reduce read latency: when a client reads from storage servers, the data should ideally already be there. The write data flow is proxy writes to log servers and to storage servers without touching disks. However, the transaction is “committed” when all 3 tlogs have persisted the data. After that, the proxy replies the client about the commit status and the next transaction can obtain the commit version as its read version.