One Node in Dgraph Cluster Showing Unusual Resource Usage

We use the two environments in a primary-backup setup, so after switching, they receive identical writes. We have been monitoring the Alpha queries metrics, and both request and write loads are balanced.

At one point, we suspected that the leader node was consuming more resources due to the additional snapshot responsibilities. However, even after this machine was unexpectedly restarted and became a follower, it continued to use significantly more resources. It allocated more memory and performed more frequent garbage collection.

We’ve monitored several metrics, and this machine showed much higher values compared to the others, including:

  • go_memstats_alloc_bytes_total_total
  • go_gc_cycles_total_gc_cycles_total_total
  • badger_read_bytes_lsm