@minions: Here are some thoughts about the SubGraph
structure:
There are several graph traversal languages: GraphQL, Gremlin, Cypher, SPARQL. Our backend should be flexible enough to accommodate all these languages eventually.
As with all languages, it is generally true that “ease of use” comes at the expense of “being restrictive”. GraphQL is easy to understand and use, but it feels limited. It is great for a “query structure” that keeps fanning out, but for anything else, it is not so great.
The above is a graph DB schema taken from here.
Fix a (person, company). Say we want to see if that person applies to that company. Say we cannot add reverse edges. How would you do it in GraphQL? Say the jobs are all entities, not values. Essentially we want to intersect person.fanout("completes").fanout("appliesTo")
with company.fanout("creates")
. It is not clear at all how to do that in GraphQL.
Our backend should have a structure which you can intersect easily. This structure (which we shall call QueryBlock
) should be as simple and as small as possible, so that any operations, say an intersection or even a groupBy have fewer fields to track and maintain.
Let’s take a look at the current SubGraph
structure.
type SubGraph struct {
Attr string
Count int
Offset int
AfterUid uint64
GetCount uint16
Children []*SubGraph
IsRoot bool
GetUid bool
isDebug bool
Query []byte // Contains list of source UIDs.
Result []byte // Contains UID matrix or list of values for child attributes.
}
There are very few fields that QueryBlock
needs. In particular, we only need: Attr, Query, Result
. There is no need to keep Children
.
Imagine we have this new structure
type QueryBlock struct {
Attr string
Query []byte
Result []byte
}
Actually, since in both postTraverse
(for JSON) and preTraverse
(for proto), we need to parse and create task.Query
and task.Result
, and parsing is very fast for flatbuffers, why not just cache these very handy structures?
type QueryBlock struct {
Attr string
Query *task.Query
Result *task.Result
QueryB []byte
ResultB []byte
}
There should be another structure which is like a list of sorted entities. Actually, that is just task.Query
. The intersect function should return a task.Query
or something similar. Now, to perform the intersection, we need two sorted list of unique UIDs from each QueryBlock
. There are two choices here. Either we store the sorted list of unique destination UIDs in QueryBlock
, or we don’t. I am not sure about this. Say we do store the dest UIDs in QueryBlock and get:
type QueryBlock struct {
Attr string
Src *task.Query
Link *task.Result
SrcB []byte
LinkB []byte
Dest *task.Query
}
Dest
can be lazily initialized. I have taken the liberty of renaming Query
as Src
because what we care about in QueryBlock
is the sorted list of unique source UIDs. And task.Result
is a link between Src
and Dest
UIDs. Finally, this is how intersect can look like:
func Intersect(q1, q2 *QueryBlock) *task.Query
You might wonder: should we return []byte
instead? Since flatbuffers parse really fast, and we often need its parsed structure, what if we just wrap it and make it smarter and stopping asking questions like this? Developer productivity is also important. The structure will look like the following.
type wrappedTaskQuery struct {
b []byte // b for bytes.
p *task.Query // p for "parsed"
}
p
can be lazily initialized, so that we parse it at most once. Now, QueryBlock
can be just:
type QueryBlock struct {
Attr string
Src *wrappedTaskQuery
Link *wrappedTaskResult
Dest *wrappedTaskQuery
}
What about the rest of SubGraph
? It seems to me that SubGraph
is all about fanning out. It can be a companion to QueryBlock
. Ideally it should look like this:
type SubGraph struct {
Attr string
Children []*SubGraph
Qb *QueryBlock
}
To summarize, this is starting to look like a simple refactoring of task.Query, task.Result
out of SubGraph
so that we can build more logic around QueryBlock
and not have to worry about the actual graph structure or query structure. I have swept a lot under the rug, e.g., Count, Offset
, etc, but let’s defer that.