Hey @MichelDiz,
I’m starting learn the vocab a bit better now… query root, etc - thanks. I created the examples queries above to closely mimic the queries we’re performing on a much larger DGraph. I needed them in an easily repeatable way to illustrate the challenges. A couple questions/comments related to scale below and I have a few GitHub issues I’d like to recommend at the bottom.
Are there query performance considerations when querying all nodes of a particular type and then applying the filter? That essentially sounds like a table scan. We have 110GB worth of nodes (tens of millions) we’d be issuing that style of query on. Our thought process led us to believe if we directly queried the node of interest by an indexed predicate, that the query would be significantly more performant. The result only contains the recursed nodes/edges of that node of interest and it never had to traverse irrelevant nodes in the graph from a SELECT ALL style query.
I’m not looking for all users, just the users who have some connection to Michael, less anyone named Sarah. The DGraph Tour query is able to find everyone (including pets) performing an @recurse query following the relationships. What doesn’t work in this query is the @filter. I see it does work in your query when you are querying for all nodes vs. a specific node. This is odd behavior.
I listed this as GitHub Ticket #4 below with repeatable examples in DGraph Tour above.
My workaround for this is to put the @filter in the query body on each of the nodes I want it applied to.
To expand a little on why we don’t think it’s advisable to change the query to a SELECT ALL NODES style… The query which searches on a specific node’s value returns a tiny sub-set of the entire data in our production DGraph. There’s many millions of other nodes it never had to touch. If we were to change the query structure to query for all nodes first, would it not have to iterate through every node first to complete the query? Maybe we’re off-base here with our experience with Mongo, Elastic, SQL, etc.
This tidbit is super helpful. It made me start to think about the query lifecycle when it’s inside Dgraph. Is there a query lifecycle that’s shared anywhere? Or general notes/discussions on it?
Proposed GitHub Tickets
- Docs on using
@countwith@recurse. The@countdocumentation page should reflect the changed behavior when used in conjunction with@recurse… Something like if PredicateA is returned in a query AND PredicateA is also counted, it must be ordered in the query AFTER the initial predicate. is returned when using an@recurse. - Docs on using
@count with an @filter. The@countdocumentation page should say it can be combined with an@filterand mention the issues with@recurse. - Feature: Request the root level
@filterbe recursively applied to nodes/predicates when@recurseis used. (Example above) - Bug: The
@filterwith@recurseonly works when performing a full SELECT ALL / Table Scan approach of DGraph vs. navigating directly to a node (reference the examples above from the DGraph tutorial of it not working and @MichelDiz example of it working with a different style query.) Odd it works in one case and not the other.
Thanks again for all the help and information,
Best,
Ryan