Dgraph crash loop on aws

Hello.

Assuming you installed this using:

HELM_RELEASE_NAME=my-dgraph
helm install $HELM_RELEASE_NAME -f deployment_dgraph_values.yaml dgraph/dgraph

It looks like you are using the default values. I noticed that the deployment_dgraph_values.yaml doesn’t actually set any values, because everything is under the dgraph key, so the default values will not be overridden in this case. That may not be the desired intention.

For the EKS cluster, with eksctl, what is the command line or cluster.yaml that you are using? For example, I have spun up a cluster before using these values:

For size, we typically use i3.large (source: https://dgraph.io/docs/deploy/#using-kubernetes), so this could be a good one to start with. Typically, 16gb should be sufficient, at least until a point your needs may grow.

I am curious from K8S perspective, what may happen from resource perspective, so getting describes, especially of any failing pods, to see if this provides any further info, e.g.

kubectl describe sts/$HELM_RELEASE_NAME-alpha
kubectl describe sts/$HELM_RELEASE_NAME-zero
kubectl describe pod/$HELM_RELEASE_NAME-alpha-{0..2}
kubectl describe pod/$HELM_RELEASE_NAME-zero-{0..2}

This would allow us to tell of any events, such as lack of resources that might have caused the node to failed.

2 Likes