Bulk Loader REDUCE problem - it's very slow

hi @apete, here is the binary. This binay is from commit ebed67859944725dab53919bee2b61afcef30b86 build over release/v20.11 branch. Here is the PR for your reference chore(perf): Improve perf of bulk loader with Reuse allocator and assinging tags to allocator (#7360) by aman-bansal · Pull Request #7547 · dgraph-io/dgraph · GitHub.
Do let us know if you face any issues.

Looking better and better!

Unfortunately I accidentally deleted the test data I’ve been using before. So I can’t give accurate number/time comparisons between this and earlier builds.

The data set I’m using now is larger than before. (It’s the same number of files but in terms of RDF triples they contain, definitely more.) Still this build finished faster (5h).

I somehow got the impression that thread utilisation during MAP was slightly worse than before, but it certainly looked better for REDUCE. Still room for improvement – still far from 100% thread utilisation.

At this point it’s getting close to what could work for us, provided that it scales well when moving to a bigger machine.

(I’ll mail you the cpu.pprof file for this execution.)

If I remember the previous dataset’s average file size correctly, this is now loading 30-40% more data!

For bulk loader, the faster, more the cores the better (assuming fast SSD/Nvme). I ran the bulk loader over LDBC dataset – was taking 25 mins on my Threadripper 2950x. Took 10 mins on Ryzen 5950x.