[WIP] Performance bottleneck on 1 TB data

We are able to reproduce above issue after rerun. I have shared the logs with @ibrahim.

Since above rate is slow to reach 1 TB, I am trying bulk load with stackoverflow data right now. I am trying to load following file from stackoverflow, It has around ~1.7B RDFs. Currently no schema has been used.

-rwxr-xr-x 1 ashish ashish 57G Jul 15 17:00 posts.rdf.gz

Observation:

Bulk loader went OOM in reduce phase. Map to Reduce phase transition logs are as follows:

[16:25:40Z] MAP 02h33m08s nquad_count:1.727G err_count:0.000 nquad_speed:188.0k/sec edge_count:1.727G edge_speed:188.0k/sec
[16:25:41Z] MAP 02h33m09s nquad_count:1.727G err_count:0.000 nquad_speed:188.0k/sec edge_count:1.727G edge_speed:188.0k/sec
[16:25:42Z] MAP 02h33m10s nquad_count:1.727G err_count:0.000 nquad_speed:188.0k/sec edge_count:1.727G edge_speed:188.0k/sec
[16:25:43Z] MAP 02h33m11s nquad_count:1.727G err_count:0.000 nquad_speed:187.9k/sec edge_count:1.727G edge_speed:187.9k/sec
[16:25:44Z] MAP 02h33m12s nquad_count:1.727G err_count:0.000 nquad_speed:187.9k/sec edge_count:1.727G edge_speed:187.9k/sec
Shard tmp/map_output/000 -> Reduce tmp/shards/shard_0/000
[16:25:45Z] REDUCE 02h33m13s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[16:25:46Z] REDUCE 02h33m14s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[16:25:47Z] REDUCE 02h33m15s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[16:25:48Z] REDUCE 02h33m16s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[16:25:49Z] REDUCE 02h33m17s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0

Most of map phase heap is taken by xidmap. Heap profile(in use) top is as follows:

ashish@ashish-test1:~/pprof$ go tool pprof pprof.dgraph.alloc_objects.alloc_space.inuse_objects.inuse_space.008.pb.gz 
File: dgraph
Build ID: 371190db1475fb14a7dfd3c44905e712b120ca09
Type: inuse_space
Time: Jul 16, 2020 at 3:45pm (UTC)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 26623.49MB, 98.43% of 27049.15MB total
Dropped 72 nodes (cum <= 135.25MB)
Showing top 10 nodes out of 24
      flat  flat%   sum%        cum   cum%
21492.55MB 79.46% 79.46% 21510.06MB 79.52%  github.com/dgraph-io/dgraph/xidmap.(*XidMap).AssignUid
 3455.05MB 12.77% 92.23%  3455.05MB 12.77%  strings.(*Builder).WriteString
  489.11MB  1.81% 94.04%   489.11MB  1.81%  github.com/dgraph-io/dgraph/types.Marshal
  483.27MB  1.79% 95.83%   483.27MB  1.79%  bytes.makeSlice
  264.54MB  0.98% 96.80%   264.54MB  0.98%  github.com/dgraph-io/dgraph/posting.NewPosting
  186.01MB  0.69% 97.49%   186.01MB  0.69%  github.com/dgraph-io/dgraph/dgraph/cmd/bulk.newMapper.func1
  145.31MB  0.54% 98.03%   145.31MB  0.54%  strconv.Unquote
   60.01MB  0.22% 98.25%   313.55MB  1.16%  github.com/dgraph-io/dgraph/chunker.(*rdfChunker).Parse
   37.63MB  0.14% 98.39%   223.64MB  0.83%  github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*mapper).addMapEntry
      10MB 0.037% 98.43%   179.81MB  0.66%  github.com/dgraph-io/dgraph/chunker.ParseRDF

Next I will retry reduce phase.