Recommended RAM to bulk load 200M RDF entries to avoid OOM

Edit:
Apologies, after checking the successful load, I realized that I’ve loaded 1B nquads and 1.6B edges.
I thought it was 200M+ RDF entries, but it was actually the equivalent of 200M+ RDBMS row entries translated into RDF, so it came to about 1B RDF entries I think. Perhaps that was why the RAM was necessary. Everything loaded in 48mins, without ludicrous mode, 16 vCPU 128GB RAM Intel Cascade Lake on Huawei Cloud.

Screenshot 2020-05-08 at 11.36.44 AM


Original:
I’m doing a benchmark for management and they’re requesting we do a simple load test on a 2vCPU 4GB RAM server. I load 200 .rdf files (each with 1M records) containing a total of 200M+ entries via bulk loader (–ludicrous_mode seems to make a small difference here) and Dgraph consistently runs out of memory at the 14th .rdf file. It takes about 40-50s per file.
Alpha’s lru_mb is at 2048 to be safe.

Postgres and MSSQL managed to load the 200M+ records with the same CPU and RAM constraints in about 2-3 hours.

I’ve provisioned a server with 4vCPUs and 16GB RAM. Memory usage seems to be climbing with no end in sight at 9GB/16GB used at only the 33rd .rdf file. Takes about 30-40s per file. Not using ludicrous_mode. If it OOMs again, perhaps I’ll use the Live loader instead. :frowning:

Hi @geoyws, please try bulk loader without ludicrous mode as well.

@ashishgoswami yup currently running without ludicrous mode

The 4vCPU 16GB RAM VM went OOM after 110 .rdf files (1 million RDF entries each). (Edit: Actually ~5M RDF entries each)



So I’ve gone with a 8 vCPU 32GB RAM VM. If that OOMs, will go with the 64GB RAM VM next.

Update: OOM again.



Going with the 64GB RAM now.

Hi @geoyws Would you like to have a quick call with one of our engineers to try and figure out this issue? Please let me know. We’d be happy to help.

@dereksfoster99 Sure, when would be a good time?

In short, I repeated the same process with a VM image on a 64GB RAM VM and it still went OOM.

The 128GB RAM VM however managed to do it.

1B nquads and 1.6B edges in 48mins without ludicrous mode, 16 vCPU 128GB RAM Intel Cascade Lake on Huawei Cloud.

Screenshot 2020-05-08 at 11.36.44 AM


@geoyws How about Monday afternoon? We’re on Pacific time.

Appreciate the help. We’re in Kuala Lumpur at UTC +8, so that means at Monday 12pm Pacific Time it would be Monday 3am UTC +8 in KL.
Could we somehow do it either 9am PT or perhaps 3pm PT?

This commit should fix the issue: perf: Various optimizations to the bulk loader (#6412) · dgraph-io/dgraph@9109186 · GitHub

If you compile from master, do send the build tag “jemalloc”, would require jemalloc to have been installed with je_ prefix. We’ll be updating our Makefile to automatically do this.

2 Likes

Will any of these improvements also help reduce OOM in live loader? Or reduce memory consumption of a running machine?

Not yet. But, I’ve just asked the team to replicate those live loader OOM issues, so we can fix those. If you have a way to replicate them, do let us know.

Just throw ~6 million quads at a 4Gb RAM dgraph instance. That should make it go OOM. I hqven’t troed lately.