We have two methods which handle the conversion of the response from Dgraph into the format that client expects.
ToJSON (HTTP client)
ToProtocolBuffer (Go and other clients)
This means that whenever any new feature (schema validation, types etc., result count, debug root) is implemented, both the functions have to be modified. What usually happens is that ToJSON is modified because it’s more commonly used and ToProtocolBuffer lags behind. This leads to inconsistencies. The idea is to write a wrapper over ToProtocolBuffer, which can convert the ProtocolBuffer response to JSON.
It’d definitely be nice if we have to maintain just one of those and the other automatically gets the changes.
But what would be the effect on performance? Some benchmark data on ToJson before and after the change with different number of result entities would give us a good idea about this.
Right now our ToProtocolBuffer method is about 2x faster than ToJSON, let me try to complete this and run the benchmarks on some large queries(which return 1000 entities).
Is the speedup including the marshalling step or just going through the subgraph and creating a map and graph objects (i.e the post and pre travese method times)?
For queries with more number of entities(100, 1000) we can see that JSON2(the new method) takes more ns/op and allocates more B/op. So maybe not such a good idea to go with it? Would be nice to have a second pair of eyes on the code still though before we discard it, incase I missed some obvious optimizations.
Good job running this via benchmarks to verify this. I’ll take a look at the code over the weekend maybe – to see if we could optimize this. I had thought about this exact problem before as well. But, kept the ToJson because it’s going to be used more often than ToProto in startups and smaller installations, and this way we keep it as fast as we possibly can.
I doubt we’d be able to squeeze out the same performance in the two-step method, of FB->ToProto->ToJSON.
It might be quite a bit faster if we generate the JSON ourselves instead of generating the intermediate map of interface{} structures and sending to standard JSON library. The proto will in some sense take the place of this intermediate map[string]interface{}.
That could be tricky to do. Though maybe possible, if one generates strings all the way, and stitches them together somehow. Of course, that would only complicate the code even further.
I thought it is just maintaining a single bytes.Buffer and converting to string at the end? A simple tree traversal might suffice. (I am happy to give it a try if you all like.)
Might not be a single bytes.Buffer, because you’re essentially creating a tree. Going from predicate based nodes to uid based nodes. I’d say this has a low yield for us right now. So, do the lexer channel thing instead. We can get to this later.
I was wrong to jump to saying “it’s just a single bytes.Buffer”.
But let me take a step back and ask: Is there any fundamental difference between the JSON and proto output such that we have to do them differently? postTraverse looks like postorder traversal; preTraverse looks like preorder traversal. Keeping the logic consistent is a bit painful as it gets more complicated. Imagine all the different combinations of options. There’s offset, count, filters, afterUID, _uid_, and more, and they can interact.
If there is no fundamental difference between JSON and proto output, it could be a single piece of code doing all the logic, talking to some interface that can produce JSON or proto or any output. To be more precise, for JSON output, this interface would produce map[string]interface{} not the final JSON string. Since there is no extra proto to map[string]interface{} conversion, I would expect this to run no slower than before (assuming pre-order works just as fast as post-order). It would lead to code simplification, improved productivity, consistency between JSON and proto outputs, etc.
All those are handled before the ToJson, or ToProto functions are called. That part is handled via process.
That depends a lot on implementation. It could be executed badly where we have more complex and fragile code, because one function is trying to do the job of two functions, or it could fit just right. It’s not clear which one it would be.
You are right @jchiu that they are just traversals. The difference is in how we build the outputs. Since proto output has a schema(there are nodes which have properties) vs JSON which doesn’t have any inherent schema. The output we want for JSON has attributes(e.g friends, name etc.) as keys.
I definitely agree that we should have just one function and generate an intermediate structure. I couldn’t find a better way to do it apart from what I proposed here.