I wont know how to design your schema really but let me give you some advice on how dgraph internally manages storage - which may help you in drawing conclusions with respect to ingestion and performance:
- Dgraph stores data by tablet, which is synonymous with a predicate. (an ‘attribute’ key as you have written above - sometimes getting the jargon to all match up is half the battle)
- Therefore, dgraph does not store anything per ‘node’ or ‘edge’. A node is just a unique ID some triples share as a subject.
- So, if you have 1M predicates ‘on a node’ vs. 3 predicates ‘on a node’, dgraph does not care, and will be equally performant at query time (specifically on querying X things ‘on a node’ in either pattern)
- Conversely, if you have a huge graph with billions of values, and it only has 5 different predicates (attribute keys, if you will) total, that will give you terrible performance, since the storage is by predicate.
- As an extension of the above, indicies will also be huge corresponding to the predicate being indexed.
- A well balanced huge database with billions of (key,value)s should be well balanced across a good number of predicates. What is a good number? That may take some work to find out.
I highly suggest you read the whitepaper before designing a database of this magnitude. It is certainly possible to do (I have ~4Bn triples in my current production dgraph) but you should not go in without understanding exactly how dgraph performs operations as to best design your database.
Good luck!