If google would use dgraph, should it use for everything one single dgraph DB, or for every service (Maps, YouTube, GMail...) an own dgraph DB?

Data takes the shape of S → P → O.

Subject
Predicate
Object

So looking at user data we can see this in the RDF format.

0x1 <User.name> "foo" .
0x2 <User.name> "bar" .
0x3 <User.name> "baz" .

Everything takes this SPO model. So a friend relationships is of the same shape:

0x1 <User.friend> 0x2 .
0x2 <User.friend> 0x1 .

The term predicate and edge get used a lot interchangeably with Dgraph. The context is the key to understanding the differences. The context here is that the predicates are sharded. So if we had this data on a HA cluster, then it could be sharded with all “User.name” triples on one alpha and all “User.friend” triples on the other.

Data is queried by predicates. The fastest query is to ask for an object by UID (the Subject), but usually, you want something more than the UID, so you request a field.

n(func: uid(0x1)) { User.name }

This would return “foo” structured in JSON for “n” and the field requested.

This is quick because the zero node know who holds the predicate User.name so it goes there and asks for the specific triple.

Now if you were to ask for the friends and the names, then the query could span across multiple alphas and gather the data that is needed. Data is stored in posting lists so it only gets keys first and then it filters those keys and then it gets the fields requested with Zero managing this process. Data is only read once even if it is repeated multiple times very deeply in the query. It is the job of one alpha to respond to the request after Zero has coordinates all of the Alphas to work together to retrieve the data needed to respond.

1 Like