Dgraph fails to start on restarts with Kind (Kubernetes)

Looking at your situation, preemptible GCE running Dgraph cluster in KinD, here are some thoughts:

  • restarting docker would seem to simulate a destructive event equivalent of restarting GKE vm instances (GCE) all at once. I haven’t tested this scenario yet.
  • the big question is if the data saved into a volume so that it persists across destructive events?
    • In the preemptible scenario, where google claims the GCE resource, this would happen, unless both a docker volume resource is allocated and that resides on an external disk, mounted by GCE at started, docker uses the same volume when starting up KinD, and the dgraph pods re-attach to the volume (3 layers - pvc/pv to volume → docker volume → GCE external volume).
    • Why persistent volume so important? Dgraph for both zero and alphas maintain the state of cluster and membership with data saved on disk, so if data is lost that state cannot be discovered, and the cluster is in a funk. Also, of course, graph data is stored on the disk for alphas.
  • Recently, in my testing, I came across an issue with dgraph v20.07.1 used by the helm chart, where dgraph alpha may not terminate, and so it gets stuck. When this happens, I had to use kubectl delete --force to forcibly remove these. I don’t think you are running into this as restart seemed to work fine.

I haven’t tested KinD yet or the rancher local-path-provisioner, so this is something I can try out, see what I can discover.