Upgrade from 1.0.4 to 1.0.6 crash and data loss

We encountered an instability with v1.0.6 again on our staging server though.

The database just stopped being responsive. With client’s getting “context exceeded” messages.

When investigating further we found nothing obviously wrong in the logs of either zero or server. When we issued a restart on server, it would come up seemingly OK but still “deadline exceeded” messages on client.

When restarting server it started spitting out this message in a loop:

Jul 12 13:00:17 dev-gametv-db dgraphserver[29455]: 2018/07/12 13:00:17 groups.go:721: WARNING: We don’t have address of any dgraphzero leader.

Jul 12 13:00:18 dev-gametv-db dgraphserver[29455]: 2018/07/12 13:00:18 groups.go:510: Unable to sync memberships. Error: rpc error: code = Unknown desc = context deadline exceeded

Jul 12 13:00:18 dev-gametv-db dgraphserver[29455]: 2018/07/12 13:00:18 groups.go:494: Got address of a Zero server: localhost:5080
After restarting zero, server didn’t change from above, while zero started spitting out this in a loop:

Jul 12 13:01:19 dev-gametv-db dgraphzero[29384]: 2018/07/12 13:01:19 raft.go:1070: INFO: 1 no leader at term 4; dropping index reading msg