Yes… I mean 30Gb.
For background, I am exporting a dataset from mongo, and for simplicity, rather than chopping up all sub-documents into their own node/dgraph type, we have decided to perform our initial import and analysis by simply flattening the documents into nodes.
i.e. source document like
{
"this": {"prop": 1, "attribute": 2},
"is": {"prop": 1, "attribute": 2},
"a": {"prop": 1, "attribute": 2},
"doc": {"prop": 1, "attribute": 2},
}
Would be come a node with the predicates:
- this.prop
- this.attribute
- is.prop
- is.attribute
- a.prop
- a.attribute
- doc.prop
- doc.attribute
If we combine this with the fact that array in documents are also flattened into the node, the number of predicates really explodes (each type has its own set of predicates with only a handful being shared across types)
The reason my schema file doesn’t include all these is because my data structures keeping track of them in my mongo export logic were causing my workstation to start swapping heavily and eventually run out of virtual memory at around 80GiB, so I decided to just punt and add to the schema manually as our evaluation dictates.
Anyway, I now realize that this approach was excessively naive and will definitely need to be reconsidered, but I was hoping the dataset would still be useful for analyzing the perfomance/functionality of graph lookups where they do happen (we do have foreign key style references in the mongo docs and those are what will convert to node references in dgraph)
If I get the feedback dgraph will struggle with disproportionately large numbers of predicates… especially in the bulk load situation, I can work at restructuring the input data to be more usable.