Problem about nodes filter according to edge

lych4o · December 12, 2019, 9:56am

Consider such kind of graph with some person, post and forum. A person can join several forum in defferent time and create post in different forum. How can I filter posts whose creator has been in the forum which contains the post for more than 4 year?

{
    <0> <dgraph.type> "post" .
    <0> <hasCreator> <1> .
    <0> <content> "123" .
    
    <1> <dgraph.type> "person" .
    <1> <name> "Bob" .
    
    <2> <dgraph.type> "forum" .
    <2> <hasMember> <1> (year=5) .
    <2> <title> "forum1" .
    <2> <containerOf> <0> .

    <3> <dgraph.type> "post" .
    <3> <hasCreator> <1> .
    <3> <content> "456" .

    <4> <dgraph.type> "forum" .
    <4> <title> "forum2" .
    <4> <hasMember> <1> (year=3) .
    <4> <containerOf> <0> .
}

type person {
    name: string
}
type post {
    content: string
    hasCreator: person
}
type forum {
    containerOf: [post]
    hasMember: [person]
    title: string
}
title: string .
name: string .
content: string .
hasCreator: uid @reverse .
hasMember: [uid] @reverse.
containerOf: [uid] @reverse.

MichelDiz · December 12, 2019, 4:18pm

Not sure if I get it, but here we go.

{
  var(func: type(person)) @cascade {
    name
    FO as ~hasMember @facets(gt(year, 4)){
      title
    }
  }
    
  q(func: uid(FO)) {
    title
    containerOf {
      content
      hasCreator {
        name
      }
    }
  }
}

{
  "data": {
    "f": [
      {
        "title": "forum1",
        "containerOf": [
          {
            "content": "123",
            "hasCreator": [
              {
                "name": "Bob"
              }
            ]
          }
        ]
      }
    ]
  }
}

imkleats · December 12, 2019, 7:21pm

Sometimes, when you come across a query that seems difficult to solve, it can actually indicate a weakness in the design of your data model. In this case, consider your type system:

On its face, it creates a fully-connected three node graph (that seems like a good thing)
However, one of the edges is “fat” because it includes a facet.
This proves problematic when you want to use that facet to filter the node that is opposite the edge on which the facet is stored (draw out a diagram of Person, Post, Forum to see what I mean).
What your data model obscures by using the facet is that there is actually some missing type, maybe called ForumMember with edges to Person (hasMembership), Forum (~hasMember), and Post (contributed or ~contributedBy) and one or more attributes (i.e. “year” or “memberSince”).
With the addition of this fourth type, the query becomes trivial:

{
   var(func: type(ForumMember)) @filter( gt(year, 4) ) {
      veteranPosts as contributed { }
   }

   q(func: uid(veteranPosts)) {
      content
      author {   # This is thinking maybe a Person-Post edge is kept
         name    # and renamed to "author"
      }
      postForum: ~containerOf {
         name
      }
   }
}

TLDR: Facets can be very helpful, but they can also lead to “fat” edges that obfuscate the existence of hidden nodes/relationships. When thinking about adding a facet, it’s good to evaluate whether the facet is really an attribute of the edge (like a heuristic weight) or whether the facet represents a feature of some intermediate entity.

For the sake of completeness, I’ll try to explain why it’s much more complicated with the current data model:

What you’re looking for boils down to the intersection of two sets:
** Set 1: All Person uid’s that ~hasMember in a given Forum with @facet(gt(year,4))
** Set 2: All Person uid’s that ~hasCreator to any Post in the same given Forum.
Both of those sets are easily obtainable for one individual forum, but I’m unaware of how it would be extended to all forums.

{
   var(func: uid(<forumId>)) {
      veteranMembers as hasMember @facets(gt(year,4)) { name }
   }
   q(func: type(post)) @cascade {
      content
      containerOf @filter(uid(<forumId>)) { # Or UID_IN(containerOf, <forumID> at root
         title
      }
      hasCreator @filter(uid(veteranMembers)) {
         name
      }
   }
}

smtbx · December 13, 2019, 11:18am

@imkleats Hi.

You mention “fat” edges. Where can I find more info regarding this topic?

Thanks.

imkleats · December 13, 2019, 12:27pm

@smtbx, I first heard the term in a YouTube video from some company that was doing a retrospective on their experiences using Neo4j in production. It had some interesting lessons-learned, like how they dealt with supernodes and iteratively improving their data model. The guy used the term “fat” to describe relationships that have a lot of attributes (neo4j is a bit different in the ease with which it allows adding attributes to edges, but the analog in Dgraph would be facets).

One of his recommendations was to keep the relationships as thin as possible (ideally no attributes) and, if necessary, turn that “relationship” into an intermediate connected node itself to hold those attributes instead. In this case and using ASCII art, (person)<-[hasMember facets{year}]-(forum) could become something more like: (person)-[has]-(forumMembership preds{year})-[in]-(forum)

Another key idea that I only just remembered is that fat edges can also indicate that you’re attempting to describe multiple different relationships in a single edge. His advice was to not be afraid to multiple edges to connect the same two nodes. I think that this is exemplified by the Dgraph documentation’s examples for filtering on facets. In this example, they’ve included a facet for “relative” on the Friend edge between two Person nodes. This would be an instance where you might consider having both a Friend and Relative edge instead of a “relative=true” facet on the Friend edge.

Sorry that I couldn’t quickly find the link, but I’ll keep looking.

MichelDiz · December 13, 2019, 4:52pm

I don’t think this idea of "fat" edges applies to Dgraph. In fact you are adding more information on that edges. However Facets are not first class-citizens. This means in part that they do not disturb any aspect of the DB at all. Sometimes using Facets turns out to be advantageous in some non-indexing cases. I do not know bad cases of using Facets. Just poorly planning of its usage.

imkleats · December 13, 2019, 5:00pm

Agree to disagree. then. It could be argued that, because 1) relationships are stored as predicates rather than as their own object (by which I mean, an instance of a relationship does not have a uid that allows for it to be uniquely identifiable apart from its relation to one or more uids), and consequently 2) facets are not first-class citizens, that understanding the data modeling implications to having fat edges is even more important to Dgraph. I won’t ever dispute the value facets can have, but I also think we’ve just highlighted a couple use-cases where it could be bad to use them (“bad” in the sense that there are better solutions).

MichelDiz · December 13, 2019, 5:38pm

Relationships are not stored in predicates.

Look Neo4J is very different from Dgraph, atomically speaking. You can’t use Neo4J concepts directly with Dgraph. Other than the basics of Graphs concepts. So this idea of "fat edges" has a very good chance (For me it is absolutely certain) of having nothing to do with the use of facets in Dgraph. I need to check this idea in their documentation and understand the context.

Internally Dgraph stores all data in KV using BadgerDB. An Edge uses something like “slots” (let’s call like that) for each part of its abstract structure “Entity, Attribute, Other Entity / Value, Label : (Facet)”.

Nodes, edges, predicates, facets and so on. Are just “abstract ideas” in Dgraph. They do not exist, they are divided into KV pieces and “assembled algorithmically”.

Abstractly you might think that facets “weigh” on edges. In practice they do not even “exist” because they are “non-first-class-citizens”. You can put as much data into facets as you like. This data will go to a KV slot and stay there for when you request it.

In practice a facet will only “weigh” when you request it. This “fat edge” thing must happen in Neo4J because they treat their “facet like” as first-class citizens or something else. Because you can query nodes via this information. But, in Dgraph you can’t - You only query facets through traversing query.

Cheers.

imkleats · December 13, 2019, 6:04pm

I think we’re talking across purposes because I’m not trying to analogize Neo4j concepts to Dgraph concepts. I’m talking fundamentally about graph problems. With the OP specifically, when information is stored on an edge, the problem is Complex. (I would be very curious to see how you or someone on your team can solve it with a general solution for all forums using the current schema - I think you’ll find your earlier solution does not work if you added some additional cases to the provided data set, which is the only reason I added my thoughts in the first place).

But we don’t have to live with Complex graph problems. Leveraging our NoSQL schema flexibility, we have an opportunity to make Complex problems Trivial. When facets can trivialize a problem, we should use them, but when they don’t (like in the OP), we shouldn’t feel obligated to use them.

Edit: I’m going to restate the OP’s problem as a pattern-matching problem to get those juices flowing:

match (f:Forum)-[hasMember {year > 4}]->(:Person)<-[hasCreator]-(p:Post)<-[containerOf]-(f)
return p

I’m honestly perplexed how a GraphQL+/- query could be structured to match a closed pattern like this (i.e. a pattern starting and ending with the same uid). I was toying with a creative use of @groupby, but I don’t think that would work. Anyway, that’s why I ask if you all have a general solution to share since you have much more familiarity than myself.

And take it for what it’s worth (maybe nothing if I’m completely oblivious to how it can be accomplished simply without it), maybe you’d like to consider supporting an @reflex directive with an optional predicate argument that can be called at the bottom level of a query to indicate that all resulting uids must be connected back to the root uid (through any predicate with no args, or through a specific set of predicates with args). Closed patterns are a pretty important use case for graph data.

MichelDiz · December 13, 2019, 7:55pm

I did not read what you wrote in your reply to lych4o. I assumed you were adding other ways to deal with his question I only got the idea of "fat edges" after it was mentioned by Smair Mishra. But now I went to read all.

See, it’s no problem for users to create their Schemas and structures using Facets. As long as they document it. Or even reject its usage.

The only thing I disagree with is the idea of "Fat Edges". I have not talked about approaches, I am not against recommending other approaches and arguing which one is best. The more people exploring different approaches and sharing the better. My point is just the statement about “Fat Edges”. And in my answer above I exemplify well why I think this idea is invalid.

You can perfectly be against Facets for not being easy to deal with or for any other reason. But “Fat Edges” is not one of them. As far I can tell.

In your query you use @filter (gt (year, 4)) to do this you will need to add an index.

See, is “ForumMember” an extra type in type Person or another intermediate node? I believe it is intermediate because the gt func and Index of 4, 3 year and so on. They would not be unique if they’d in the Person node. Unless you made an edge for each forum on the site.

Technically this approach adds an intermediate node context and extra indexing with count index. It is a cost benefit that it is up to the user to decide if it is worth it or not. Use Facets and document their usage VS add intermediate nodes.

     {
         "uid":"_:Bob",
         "name":"Bob Shelton",
         "contributed":  { "uid": "_:SomePostInforum1" },
         "dgraph.type": [
            "Person",
            "ForumMember"
         ]
      }

OR ?

[
   {
      "uid":"_:Bob",
      "name":"Bob Shelton",
      "dgraph.type":"Person"
   },
   {
      "Member": { "uid":"_:Bob" },
      "Forum": { "uid":"_:forum1" },
      "year": "5",
      "contributed": { "uid":"_:SomePostInforum1" },
      "dgraph.type":"ForumMember"
   },
   {
      "content":"Cras placerat nisl orci...",
      "dgraph.type":"Post",
      "author":  { "uid":"_:Bob" },
   }
]

Based on this, I did some samples.

{
   var(func: type(ForumMember)) @filter( gt(year, 4) ) {
      veteranPosts as contributed
   }

   q(func: uid(veteranPosts)) {
      content
      author {  
         name
      }
      postForum: ~containerOf {
         title
      }
   }
}

Result

{
  "data": {
    "q": [
      {
        "content": "Cras placerat nisl orci, ut iaculis turpis vulputate ut. Nullam vestibulum mollis quam, ac scelerisque turpis condimentum et. Sed hendrerit porta nunc sit amet posuere.",
        "author": [
          {
            "name": "Bob Shelton"
          }
        ],
        "postForum": [
          {
            "title": "forum1"
          }
        ]
      }
    ]
  }
}

Continuing with the same idea of intermediate nodes I remade their second query.

{
   var(func: type(ForumMember)) @filter( gt(year, 4)) @cascade { 
    forumId as Forum @filter(uid(0x4e28)) #0x4e28 Forum UID
    veteranMembers as  Member 
   }
   q(func: type(Post)) @cascade {
      content
      containerOf : ~containerOf @filter(uid(forumId))  { 
         title
      }
      author @filter(uid(veteranMembers)) 
        {
         name
      }
   }
}

Result

Basically the same answer but a bit complex query

{
  "data": {
    "q": [
      {
        "content": "Cras placerat nisl orci, ut iaculis turpis vulputate ut. Nullam vestibulum mollis quam, ac scelerisque turpis condimentum et. Sed hendrerit porta nunc sit amet posuere.",
        "containerOf": [
          {
            "title": "forum1"
          }
        ],
        "author": [
          {
            "name": "Bob Shelton"
          }
        ]
      }
    ]
  }
}

Dataset

I added some dummy Persons, for no reason.

{
   "set":[
      {
         "uid":"_:Jago",
         "name":"Jago Churchill",
         "dgraph.type":"Person"
      },
      {
         "uid":"_:Megan",
         "name":"Megan North",
         "dgraph.type":"Person"
      },
      {
         "uid":"_:Mariella",
         "name":"Mariella Atherton",
         "dgraph.type":"Person"
      },
      {
         "uid":"_:Bob",
         "name":"Bob Shelton",
         "dgraph.type":"Person"
      },
      {
         "Member":{"uid":"_:Bob"},
         "Forum":{"uid":"_:forum1"},
         "year":"5",
         "contributed":{"uid":"_:SomePostInforum1"},
         "dgraph.type":"ForumMember"
      },
      {
         "Member":{"uid":"_:Bob"},
         "Forum":{"uid":"_:forum2"},
         "year":"3",
         "contributed":{"uid":"_:SomePostInforum2"},
         "dgraph.type":"ForumMember"
      },

      {
         "uid":"_:forum1",
         "title":"forum1",
         "containerOf":[
            {
               "uid":"_:SomePostInforum1",
               "content":"Cras placerat nisl orci, ut iaculis turpis vulputate ut. Nullam vestibulum mollis quam, ac scelerisque turpis condimentum et. Sed hendrerit porta nunc sit amet posuere.",
               "dgraph.type":"Post",
               "author":{
                  "uid":"_:Bob"
               }
            }
         ],
         "dgraph.type":"Forum",
         "hasMember":[
            {
               "uid":"_:Bob"
            },
            {
               "uid":"_:Jago"
            },
            {
               "uid":"_:Mariella"
            }
         ]
      },
      {
         "uid":"_:forum2",
         "title":"forum2",
         "containerOf":[
            {
               "uid":"_:SomePostInforum2",
               "content":"Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aenean nisl odio, pharetra molestie varius vestibulum, gravida at libero. Phasellus diam tortor, pulvinar at mi in, gravida aliquam ligula.",
               "dgraph.type":"Post",
               "author":{
                  "uid":"_:Bob"
               }
            }
         ],
         "dgraph.type":"Forum",
         "hasMember":[
            {
               "uid":"_:Bob"
            },
            {
               "uid":"_:Megan"
            },
            {
               "uid":"_:Mariella"
            },
            {
               "uid":"_:Jago"
            }
         ]
      }
   ]
}

type Person {
    name: string
}
type Post {
    content: string
    hasCreator: uid
}
type Forum {
    containerOf: [uid]
    hasMember: [uid]
    title: string
}
type ForumMember {
    Member: uid
    Forum: uid
    year: int
    contributed: [uid]
}

<Member>: uid .
<Forum>: uid .
<year>: int @index(int) .
<contributed>: [uid] .

<title>: string .
<name>: string .
<content>: string .
<hasCreator>: uid @reverse .
<hasMember>: [uid] @reverse .
<containerOf>: [uid] @reverse .
<author>: uid .

Using Facets you can simplify the queries and the Dataset, but you need to document it. Again, there is no such thing as “Fat Edges” in Dgraph. I do not know this ~Fatty~ concept, but by the name and my experience, Facets are harmless. Can be complex if you don’t document it, but is trivial.

BTW

What OP means? there is a paper about this OP graph problem? Is it like the postman problem?

Cheers.

imkleats · December 13, 2019, 8:03pm

Sorry, OP as in the original post. I notice you’ve pulled in a forum uid above. It would still be helpful to see the generalized solution for closed pattern matching if possible. The point is that, by including a facet, the graph structure then requires matching that closed pattern. When it’s expressed as a node rather than a facet, the pattern being matched is no longer closed.

Edit (sorry for all the edits): I thought it might perhaps be helpful if I described in terms that are more conventional for graph theory. The pattern being describe by the original post is a closed path that starts and ends on the same node, also referred to as a cycle (See my ASCII-art pattern above for visual). GraphQL+/- provides a very intuitive way to interface with a connected, acyclic graph, also referred to as a tree. As such, I think it isn’t apparent how to deal with cyclical patterns, and it’s unclear whether the way UIDs are collected in var blocks even support modeling cyclical data conveniently (at least, I’ve run into errors in my limited attempts to call a variable from within the block that defines it).

system · January 12, 2020, 8:03pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to Filter in graphql query correctly? GraphQL kind:question	3	518	March 17, 2021
Add equivalent of Dgraph's 'has' to GraphQL Dgraph dgraph , status:accepted , kind:feature , area:graphql	3	922	September 3, 2020
Filter out node based on connected node values Dgraph	9	915	July 25, 2020
Whats the DQL equivilant of GQL's "has" filter Documentation	10	1244	June 18, 2022
Recommendation nodes with edge filter Dgraph	4	509	September 25, 2024

Problem about nodes filter according to edge

OR ?

Result

Result

Dataset

Related topics