@Naman - with your current implementation, assuming a dataset that takes say 30 mins to transfer to the new node, and frequent updates during the tablet transfer process, do you have an estimate of how long the predicate will be unavailable for writing during phase 2? Are we talking seconds or potentially several minutes?
Instead of this method, have you thought about writing the same data to more than one tablet at once while writes are coming in? What I mean is something like this:
- Before tablet move, writes are made to the orginal tablet t1 (and its replica set) only
- When tablet move is triggered, copy of edges from tablet t1 to tablet t2 begins
- While edges are being copied from t1 to t2, any new writes on t1 are also written to t2 simultaneously (while the copying process is still being done)
- While the edges are being copied from t1 to t2, reads are still only done from t1 (as now)
- When all the edges have been copied from t1 to t2, reads and writes are moved to t2 and t1 can be dropped
Assuming that writes to t2 can be processed both from the copying process and from the new writes simultaneously, that should guarantee that all new writes during the copying process show up on the new tablet, shouldn’t it?
On the off chance writing simultaneously to t1 and t2 might cause race conditions in the data that’s stored, you could also write to a third, temporary tablet t3 during (3), such that after copying all of t1 to t2, you cycle through copying all of t3 to t2 as well - which would overwrite all the new writes. Then when the switch is made over to using t2 instead of t1, t3 is dropped as well.
I appreciate that this adds a little more complexity, but having downtime on writes because of tablet moves is a big deal for some people that have high availability use cases.