As @aman-bansal said, /removeNode is meant to replace unhealthy nodes. That means it’s not used to remove the current leader of the group. Doing so can make the existing members stuck trying to connect if the leader was suddenly removed and there’s no longer a majority.
The process to call /removeNode only to remove followers (leaders are presumably active and healthy) for both Dgraph Zero and Dgraph Alpha groups.
Manual recovery
If your cluster is still stuck, you can wipe the volumes and restore. Otherwise, you can undergo some manual recovery steps by keeping one of the Alpha p directories.
- Check
/statefor themaxLeaseIdandmaxTxnTsinformation (see docs about /state). - Keep a p directory around and remove other volumes.
- Start the Zeros.
- Call
/assign?what=uids&num=Nwherenumis set to the value formaxLeaseIdfrom step 1. This sets the UID lease for blank UID assignment. - Call
/assign?what=timestamps&num=Nwherenumis set to the value formaxTxnTsfrom step 1. This sets the latest txn timestamp.
- Call
- Copy the p directory to the respective Alpha volumes.
- Start the Alphas.
This is similar to the steps for bulk loading where bulk loader outputs p directory that you can then copy to the Alpha instances (step 4).