Dgraph disk space usage

mko0815 · March 29, 2025, 9:03pm

Hi,

Today, I realized that my development environment was close to running out of disk space.

It’s a test system with around 10 users, so the data volume should be relatively small. However, the disk usage had reached 14.5 GB, while the configured volume size was 15 GB.

Backup & Storage Update:

I created a backup, which was only 40MB (zipped) and 400MB (unzipped).
I switched the storage to a slower, mounted volume instead of local machine storage, increasing the available space to 300GB.

Live Import Performance:

After restoring the data via live import, the process took:

Number of transactions: 6,355
N-Quads processed: 6,354,243
Total time: 2m 31.8s
Processing speed: 42,081 N-Quads/sec

This seems quite slow given the relatively small dataset.

Disk Space Usage After Import:

After the import, I checked the new disk usage, and it was only 400MB, which matches the unzipped backup size.

Key Question:

Why did the previous setup require 15GB for the same data, while the new setup only uses 400MB?

Production Concerns:

I’m trying to understand how this will behave in production. The recommended disk size per Alpha is 750GB, but:

My provider doesn’t offer 750GB SSDs.
I’m unsure how many instances and shards would be needed for production, where we expect 100,000 users instead of 10.
For comparison, at my previous company, we used MySQL with over 100 million users, and while the dataset was significantly larger, the total disk space used was only ~1TB. The main scaling challenge was CPU and RAM, not disk usage.

Concern About Dgraph’s Disk Usage:

From my observations, Dgraph appears to consume an excessive amount of disk space, which could significantly increase infrastructure costs.

Would appreciate any insights into:

Why the old setup required 15GB for the same data.
Whether Dgraph’s disk usage scales linearly with data size.
Best practices for optimizing disk usage in production.

Thanks!

harshil_goel · March 30, 2025, 2:40pm

In general there could be various things that build up over time causing disk to increase in space. For example, during compactions, you can expect upto 2-3x normal disk usage to do compaction. Dgraph also stores some histories, which might be increasing the disk space. If it would be possible, can you give us the expanded p directory, I can take a look to figure out why the disk space is high. We can then optimize the parameters for you.

Topic		Replies	Views
Dgraph uses a lot of disk space (compared to backup) Dgraph	5	450	October 27, 2023
High disk space usage by DGraph Dgraph	3	954	July 24, 2019
Hardware Requirements Dgraph	4	765	June 18, 2021
Empty Dgraph instance consumes 45GiB Disk Dgraph	3	521	October 21, 2020
Suddenly increase pace of disk usage in Dgraph Dgraph	2	508	August 22, 2018