It looks like we were able to resolve the issue by doing the following:
- cordon the node wich runs dgraph-alpha-1 and dgraph-zero-2 kubectl cordon $NODE_NAME
- kubectl delete pod dgraph-alpha-1 (this was the faulty corrupted dgraph pod)
- delete the pvc of dgraph-alpha-1 kubectl delete pvc datadir-dgraph-alpha-1 (this was the faulty corrupted dgraph pvc)
- uncordon the node from step 1
- That would relaunch dgraph-alpha-1 (the missing pod) on the node with a clean PVC created upon launch.
- The join process started, data started to rebuild from a snapshot of other alpha nodes