Jepsen tests fail due to "address already in use" error

Moved from GitHub dgraph/5122

Posted by martinmr:

What version of Dgraph are you using?

master

Have you tried reproducing the issue with the latest release?

yes

Steps to reproduce the issue (command/config used to run Dgraph).

Run the full jepsen tests. Some tests will be incomplete. Looking at the logs you see entries like

2020-04-02 19:31:58 Jepsen starting dgraph zero --idx 1 --port_offset 0 --expose_trace --v 2 --replicas 3 --rebalance_interval 10h --jaeger.collector http://jaeger:14268 --my n1:5080
[Decoder]: Using assembly version of decoder
[Sentry] 2020/04/02 19:31:58 Integration installed: ContextifyFrames
[Sentry] 2020/04/02 19:31:58 Integration installed: Environment
[Sentry] 2020/04/02 19:31:58 Integration installed: Modules
[Sentry] 2020/04/02 19:31:58 Integration installed: IgnoreErrors
[Decoder]: Using assembly version of decoder
[Sentry] 2020/04/02 19:31:58 Integration installed: ContextifyFrames
[Sentry] 2020/04/02 19:31:58 Integration installed: Environment
[Sentry] 2020/04/02 19:31:58 Integration installed: Modules
[Sentry] 2020/04/02 19:31:58 Integration installed: IgnoreErrors
I0402 19:31:58.904756   23651 init.go:99]

Dgraph version   : v2.0.0-rc1-148-g5b1241b32
Dgraph SHA-256   : d4863fcd1e819941a3c923faee694669510d6dccbbb072f77ea2533e0b86fdcb
Commit SHA-1     : 5b1241b32
Commit timestamp : 2020-04-01 20:25:21 -0700
Branch           : master
Go version       : go1.13.4

For Dgraph official documentation, visit https://docs.dgraph.io.
For discussions about Dgraph     , visit http://discuss.hypermode.com.
To say hi to the community       , visit https://dgraph.slack.com.

Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2020 Dgraph Labs, Inc.


I0402 19:31:58.904980   23651 run.go:106] Setting up grpc listener at: 0.0.0.0:5080
[Sentry] 2020/04/02 19:31:58 ModuleIntegration wasn't able to extract modules: module integration failed
[Sentry] 2020/04/02 19:31:58 Sending fatal event [c900b8157957464fb81f7360300d60e2] to sentry.io project: 1805390
2020/04/02 19:31:58 listen tcp 0.0.0.0:5080: bind: address already in use

github.com/dgraph-io/dgraph/x.Check
        /home/ashish/projects/src/github.com/dgraph-io/dgraph/x/error.go:42
github.com/dgraph-io/dgraph/dgraph/cmd/zero.run
        /home/ashish/projects/src/github.com/dgraph-io/dgraph/dgraph/cmd/zero/run.go:215
github.com/dgraph-io/dgraph/dgraph/cmd/zero.init.0.func1
        /home/ashish/projects/src/github.com/dgraph-io/dgraph/dgraph/cmd/zero/run.go:76
github.com/spf13/cobra.(*Command).execute
        /home/ashish/projects/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:830
github.com/spf13/cobra.(*Command).ExecuteC
        /home/ashish/projects/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:914
github.com/spf13/cobra.(*Command).Execute
        /home/ashish/projects/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:864
github.com/dgraph-io/dgraph/dgraph/cmd.Execute
        /home/ashish/projects/src/github.com/dgraph-io/dgraph/dgraph/cmd/root.go:70
main.main
        /home/ashish/projects/src/github.com/dgraph-io/dgraph/dgraph/main.go:78
runtime.main
        /home/ashish/go/src/runtime/proc.go:203
runtime.goexit
        /home/ashish/go/src/runtime/asm_amd64.s:1357

There’s probably not enough time between tearing down a cluster and trying to start a new one which causes some tests to fail.

Expected behaviour and actual result.

Jepsen tests work and clusters are correctly downed and upped in between tests.