Dgraph Alpha Eating Up All RAM

@lminaudier The heap profile you shared says that 2452.59 MB of memory was currently in-use. That’s well below the 64 GB i3.2xlarge instances that you’re running Dgraph on. Because the memory usage is so low in the heap profile you shared, it doesn’t help explain what’s taking up memory since the usage is nowhere close to ~64 GB which would indicate an OOM scenario.

The dashboard metrics you shared show that you have spikes of 7k - 9k pending queries or 1k pending mutations at once. That many pending requests could account for increased memory usage.

It also looks like you have a lot of transaction aborts based on the dashboard. Pending transactions require memory and transaction aborts are ultimately processing work that went to waste. It’d help if you can minimize the number of aborts by either discarding txns when you don’t need them or by reducing the number of conflicts you have in updates.

If the Dgraph in-use memory metric you’re charting is from dgraph_memory_inuse_bytes (Go heap in-use), then the gap is probably the Go idle memory that the Go runtime keeps around instead of releasing it back to the OS. Idle memory gets released back as needed by the OS.