One Node in Dgraph Cluster Showing Unusual Resource Usage

gooohgb · March 27, 2025, 8:16am

We have two Dgraph environments with almost identical data distribution. In one of them, there’s a machine with abnormally high CPU usage. We’ve observed that its GC frequency, the amount of data read from LSM, and memory allocation frequency are all higher than other members in the same group. Our queries are fully load-balanced.
Does anyone know what might be causing this?

matthewmcneely · March 27, 2025, 5:10pm

The first thing I would look at is the nature of the queries and mutations into the two clusters. Is it possible that the resource-hungry cluster is getting more/broader queries or high write rates?

gooohgb · March 28, 2025, 1:37am

We use the two environments in a primary-backup setup, so after switching, they receive identical writes. We have been monitoring the Alpha queries metrics, and both request and write loads are balanced.

At one point, we suspected that the leader node was consuming more resources due to the additional snapshot responsibilities. However, even after this machine was unexpectedly restarted and became a follower, it continued to use significantly more resources. It allocated more memory and performed more frequent garbage collection.

We’ve monitored several metrics, and this machine showed much higher values compared to the others, including:

go_memstats_alloc_bytes_total_total
go_gc_cycles_total_gc_cycles_total_total
badger_read_bytes_lsm

gooohgb · March 28, 2025, 3:25am

We’ve identified the issue: the schema inconsistency was caused by Alpha-3 believing that certain predicates had indexes, when in fact they do not.

system · March 29, 2025, 3:26am

This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Dgraph Alpha Eating Up All RAM Dgraph	7	607	September 9, 2021
Memory consumption irregularities Issues	2	594	January 17, 2023
Extreme memory usage when constantly query and mutate data Dgraph	5	1732	February 5, 2020
Production instance is taking entire load for cluster Users	8	689	November 21, 2019
When writing data, dgraph takes up too much memory Dgraph area:performance	7	817	January 20, 2021

One Node in Dgraph Cluster Showing Unusual Resource Usage

Related topics