We are able to reproduce above issue after rerun. I have shared the logs with @ibrahim.
Since above rate is slow to reach 1 TB, I am trying bulk load with stackoverflow data right now. I am trying to load following file from stackoverflow, It has around ~1.7B RDFs. Currently no schema has been used.
-rwxr-xr-x 1 ashish ashish 57G Jul 15 17:00 posts.rdf.gz
Observation:
Bulk loader went OOM in reduce phase. Map to Reduce phase transition logs are as follows:
[16:25:40Z] MAP 02h33m08s nquad_count:1.727G err_count:0.000 nquad_speed:188.0k/sec edge_count:1.727G edge_speed:188.0k/sec
[16:25:41Z] MAP 02h33m09s nquad_count:1.727G err_count:0.000 nquad_speed:188.0k/sec edge_count:1.727G edge_speed:188.0k/sec
[16:25:42Z] MAP 02h33m10s nquad_count:1.727G err_count:0.000 nquad_speed:188.0k/sec edge_count:1.727G edge_speed:188.0k/sec
[16:25:43Z] MAP 02h33m11s nquad_count:1.727G err_count:0.000 nquad_speed:187.9k/sec edge_count:1.727G edge_speed:187.9k/sec
[16:25:44Z] MAP 02h33m12s nquad_count:1.727G err_count:0.000 nquad_speed:187.9k/sec edge_count:1.727G edge_speed:187.9k/sec
Shard tmp/map_output/000 -> Reduce tmp/shards/shard_0/000
[16:25:45Z] REDUCE 02h33m13s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[16:25:46Z] REDUCE 02h33m14s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[16:25:47Z] REDUCE 02h33m15s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[16:25:48Z] REDUCE 02h33m16s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[16:25:49Z] REDUCE 02h33m17s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
Most of map phase heap is taken by xidmap. Heap profile(in use) top is as follows:
ashish@ashish-test1:~/pprof$ go tool pprof pprof.dgraph.alloc_objects.alloc_space.inuse_objects.inuse_space.008.pb.gz
File: dgraph
Build ID: 371190db1475fb14a7dfd3c44905e712b120ca09
Type: inuse_space
Time: Jul 16, 2020 at 3:45pm (UTC)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 26623.49MB, 98.43% of 27049.15MB total
Dropped 72 nodes (cum <= 135.25MB)
Showing top 10 nodes out of 24
flat flat% sum% cum cum%
21492.55MB 79.46% 79.46% 21510.06MB 79.52% github.com/dgraph-io/dgraph/xidmap.(*XidMap).AssignUid
3455.05MB 12.77% 92.23% 3455.05MB 12.77% strings.(*Builder).WriteString
489.11MB 1.81% 94.04% 489.11MB 1.81% github.com/dgraph-io/dgraph/types.Marshal
483.27MB 1.79% 95.83% 483.27MB 1.79% bytes.makeSlice
264.54MB 0.98% 96.80% 264.54MB 0.98% github.com/dgraph-io/dgraph/posting.NewPosting
186.01MB 0.69% 97.49% 186.01MB 0.69% github.com/dgraph-io/dgraph/dgraph/cmd/bulk.newMapper.func1
145.31MB 0.54% 98.03% 145.31MB 0.54% strconv.Unquote
60.01MB 0.22% 98.25% 313.55MB 1.16% github.com/dgraph-io/dgraph/chunker.(*rdfChunker).Parse
37.63MB 0.14% 98.39% 223.64MB 0.83% github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*mapper).addMapEntry
10MB 0.037% 98.43% 179.81MB 0.66% github.com/dgraph-io/dgraph/chunker.ParseRDF
Next I will retry reduce phase.