I continue to have problems with this. If/when I run the bulk loader on a very small subset of our data, and the default settings, it works. The only problem then is that extrapolating the runtime for the full data set we end up with something like 8days. That’s way too long for us! (and that’s assuming we can keep the same processing rate on a much larger data set).
When I increase data size and/or start experimenting with the various performance tuning “flags” I either see no difference or the process crashes. Before this particular crash I could see that system memory (394G) was entirely consumed by the bulk loader.
I need to stop these crashes AND get some sort of performance increase, and would very much appreciate some guidance.
These are the configurations of that last execution:
--ignore_errors --num_go_routines=24 --map_shards=4 --reduce_shards=2 --reducers=2
The machine I’m currently testing on has 48 “processors” (more precisely 1 CPU, 24 cores and 48 threads) and 394G RAM.
badger 2021/01/12 23:42:22 INFO: Compaction backed off 23000 times
badger 2021/01/12 23:42:23 INFO: Compaction backed off 23000 times
badger 2021/01/12 23:42:23 INFO: Compaction backed off 23000 times
[23:42:23-0800] MAP 01h05m14s nquad_count:3.180G err_count:0.000 nquad_speed:812.6k/sec edge_count:3.913G edge_speed:999.8k/sec jemalloc: 52 GiB
[23:42:24-0800] MAP 01h05m15s nquad_count:3.181G err_count:0.000 nquad_speed:812.4k/sec edge_count:3.913G edge_speed:999.5k/sec jemalloc: 52 GiB
badger 2021/01/12 23:42:25 INFO: Compaction backed off 22000 times
[23:42:25-0800] MAP 01h05m16s nquad_count:3.181G err_count:0.000 nquad_speed:812.2k/sec edge_count:3.913G edge_speed:999.3k/sec jemalloc: 52 GiB
[23:42:26-0800] MAP 01h05m17s nquad_count:3.181G err_count:0.000 nquad_speed:812.0k/sec edge_count:3.913G edge_speed:999.0k/sec jemalloc: 52 GiB
[23:42:27-0800] MAP 01h05m18s nquad_count:3.181G err_count:0.000 nquad_speed:811.8k/sec edge_count:3.913G edge_speed:998.8k/sec jemalloc: 52 GiB
[23:42:28-0800] MAP 01h05m19s nquad_count:3.181G err_count:0.000 nquad_speed:811.6k/sec edge_count:3.913G edge_speed:998.6k/sec jemalloc: 52 GiB
[23:42:29-0800] MAP 01h05m20s nquad_count:3.181G err_count:0.000 nquad_speed:811.3k/sec edge_count:3.913G edge_speed:998.2k/sec jemalloc: 52 GiB
[23:42:30-0800] MAP 01h05m21s nquad_count:3.181G err_count:0.000 nquad_speed:811.2k/sec edge_count:3.913G edge_speed:998.1k/sec jemalloc: 52 GiB
[23:42:31-0800] MAP 01h05m22s nquad_count:3.181G err_count:0.000 nquad_speed:811.0k/sec edge_count:3.913G edge_speed:997.8k/sec jemalloc: 52 GiB
[23:42:32-0800] MAP 01h05m23s nquad_count:3.181G err_count:0.000 nquad_speed:810.8k/sec edge_count:3.914G edge_speed:997.6k/sec jemalloc: 52 GiB
[23:42:33-0800] MAP 01h05m24s nquad_count:3.181G err_count:0.000 nquad_speed:810.6k/sec edge_count:3.914G edge_speed:997.3k/sec jemalloc: 52 GiB
[23:42:34-0800] MAP 01h05m25s nquad_count:3.181G err_count:0.000 nquad_speed:810.4k/sec edge_count:3.914G edge_speed:997.1k/sec jemalloc: 52 GiB
[1] 1208162 killed dgraph bulk -f -s --format=rdf --xidmap xidmap --http localhost:8000
dgraph bulk -f -s --format=rdf --xidmap xidmap --http localhost:8000 95693.23s user 12025.46s system 2722% cpu 1:05:57.24 total