Raft group cannot pass leader election

Ok well after 40h of dgraph outage my team and I were able to apply enough patches to dgraph to get it to get up and export.

If ever anyone from dgraph looks at this:

  • we wrapped the raftwal storage interface with one that would not produce raft peers that were removed according to the membership. This allowed raft elections to succeed.
    • the issue really is that the custom raftwal implementation (or something) was not removing peers that were removed using the removeNode endpoint, though the dgraph side (not etcd/raft side) of things knew these peers were correctly removed.
    • fundamentally the change we applied may be a ok safeguard if it is implausible to figure out why the peers were not being removed from storage in the first place.
  • after this, we were stalled on that group with tens of thousands of transactions (at least to its point of view) away from a usable readTS. This was very confusing but made it so that only best-effort queries were succeeding, and only if you hit a member of that group directly. It did not appear that it was making progress on advancing that timestamp for some reason.
  • we then applied another patch that allowed an export to be taken without waiting for readTS to be the latest according to the zeros. This allowed a full cluster export to succeed, where before it would wait indefinitely on reaching a current readTS.
    • we probably lost some changes in the wal on that group, but after a couple of days of partial downtime, we had to go for a slightly destructive solution over none at all.
  • after all of the above, I was able to use the export to rebuild the 12node cluster.

All in all, this was a massive pain, quite unfortunate we had to read dgraph code for 2 days to attempt to figure it out ourselves.