What I want to do
fix group with corrupt peer.
What I did
I had a peer with the missing file badger corruption I have brought up here before, and I had to call /removeNode on one peer of a group. This group has had a couple peers removed at this point and now is down without a leader but still trying to do pre votes to the removed peers:
I0705 16:33:45.813998      21 log.go:34] 1 is starting a new election at term 15
I0705 16:33:45.814029      21 log.go:34] 1 became pre-candidate at term 15
I0705 16:33:45.814033      21 log.go:34] 1 received MsgPreVoteResp from 1 at term 15
I0705 16:33:45.814046      21 log.go:34] 1 [logterm: 14, index: 11230462] sent MsgPreVote request to 2 at term 15
I0705 16:33:45.814055      21 log.go:34] 1 [logterm: 14, index: 11230462] sent MsgPreVote request to d at term 15
I0705 16:33:45.814060      21 log.go:34] 1 [logterm: 14, index: 11230462] sent MsgPreVote request to e at term 15
I0705 16:33:45.814693      21 log.go:34] 1 received MsgPreVoteResp from d at term 15
I0705 16:33:45.814715      21 log.go:34] 1 [quorum:3] has received 2 MsgPreVoteResp votes and 0 vote rejections
I0705 16:33:46.186119      21 log.go:34] 1 [logterm: 14, index: 11230462, vote: d] cast MsgPreVote for d [logterm: 14, index: 11230462] at term 15
W0705 16:33:46.815264      21 node.go:420] Unable to send message to peer: 0xe. Error: Do not have address of peer 0xe
W0705 16:33:46.815292      21 node.go:420] Unable to send message to peer: 0x2. Error: Do not have address of peer 0x2
Peer āeā was just removed, and peer ā2ā was removed weeks ago. A new peer was added but the new peer cannot find a leader so is sitting there doing nothing:
Error while calling hasPeer: Unable to reach leader in group 1. Retrying...
You can see in the /state output the new alpha(15) has been added to group 1:
{
  "1": {
    "id": "1",
    "groupId": 1,
    "addr": "graphdb-b-dgraph-alpha-2.graphdb-b-dgraph-alpha-headless.data-engine.svc.cluster.local:7080",
    "leader": false,
    "amDead": false,
    "lastUpdate": "1625492088",
    "learner": false,
    "clusterInfoOnly": false,
    "forceGroupId": false
  },
  "13": {
    "id": "13",
    "groupId": 1,
    "addr": "graphdb-b-dgraph-alpha-0.graphdb-b-dgraph-alpha-headless.data-engine.svc.cluster.local:7080",
    "leader": false,
    "amDead": false,
    "lastUpdate": "1624292805",
    "learner": false,
    "clusterInfoOnly": false,
    "forceGroupId": false
  },
  "15": {
    "id": "15",
    "groupId": 1,
    "addr": "graphdb-b-dgraph-alpha-1.graphdb-b-dgraph-alpha-headless.data-engine.svc.cluster.local:7080",
    "leader": false,
    "amDead": false,
    "lastUpdate": "0",
    "learner": false,
    "clusterInfoOnly": false,
    "forceGroupId": false
  }
}
I assume that leader election is failing because it is expecting votes from 4 peers total and only 2 are alive. I would have hoped that /removeNode would have removed the nodes as members in this raft group but it has not. Is this a bug or somehow expected?
Anything I can do to help this? My cluster is effectively down until we can fix this group.
Dgraph metadata
V21.03.1
shardReplicas=3
groups=4