Why does Dgraph shortest path return more hops than Neo4j for the same data and how to get Neo4j-like paths in Dgraph?

I’m working with both Dgraph and Neo4j to find the shortest path between two nodes in a graph. However, I’m seeing different results in terms of path length (number of hops) for what should be the same data and relationships.

Dgraph Query

{ 
  var(func: eq(nt, "People")) @filter(eq(title, "person1")) { 
    start as uid 
  } 

  var(func: eq(nt, "Initiative")) @filter(eq(title, "test")) { 
    end as uid 
  } 

  path as shortest(from: uid(start), to: uid(end)) { 
    CONNECTED 
    OWNER 
    ~CONNECTED 
    ~OWNER 
  } 

  path(func: uid(path)) { 
    uid 
    nt 
    title 
  } 
}

Dgraph Result (5 hops):

"path": [
  { "uid": "0x972107", "nt": "People", "title": "person1" },
  { "uid": "0x9d3b7f", "nt": "Entity", "title": "abc llc" },
  { "uid": "0xbecd37", "nt": "People", "title": "person23" },
  { "uid": "0xa97091", "nt": "Function", "title": "admistrative assistant" },
  { "uid": "0xb8b27b", "nt": "Entity", "title": "test entity" },
  { "uid": "0xc053ee", "nt": "Initiative", "title": "test" }
]

So, Dgraph returns a path with 5 hops.


Neo4j Query

MATCH p=shortestPath((x:People {title: "person1"})-[*1..4]-(y:Initiative {title: "test"})) 
WHERE x <> y 
RETURN 
  extract(x IN nodes(p) | x.title) as titles, 
  extract(i IN relationships(p)| type(i)) as edge_titles, 
  extract(j IN nodes(p) | j.nt) as node_types

Neo4j Result (3 hops):

titles = [ "person1", "rfp", "test entity", "test" ]
edge_titles = [ "CONNECTED", "CONNECTED", "CONNECTED" ]
node_types = [ "People", "Function", "Entity", "Initiative" ]

Neo4j returns a path with 3 hops.


My Questions

  • Why is Dgraph returning a longer path (more hops) than Neo4j for what should be the same data and relationships?
  • How can I get Dgraph to return the same (shortest) path as Neo4j?
  • Is there something about how Dgraph’s shortest function works, or about the way I’m specifying the edges, that causes this difference?
  • Are there any best practices for modeling or querying in Dgraph to ensure shortest path queries behave like Neo4j’s?

Additional Info

  • Both databases have the same data and relationships.
  • In Dgraph, I’m using both forward and reverse edges in the shortest block.
  • In Neo4j, I’m using an undirected variable-length path.

Any insights or suggestions would be appreciated!

Hey @janakmistry Guessing it has to something to do with the differences in edge types. Any chance I could get access to the data, can you export?

Following data I can share

{
    set{
        _:node_5340 <dgraph.type> "People" .
        _:node_5340 <nt> "People" .
        _:node_5340 <title> "person1" .

        _:node_5120 <dgraph.type> "Entity" .
        _:node_5120 <nt> "Entity" .
        _:node_5120 <title> "abc llc" .

        _:node_5163 <dgraph.type> "Function" .
        _:node_5163 <nt> "Function" .
        _:node_5163 <title> "rfp" .

        _:node_5260 <dgraph.type> "Functional_Role" .
        _:node_5260 <nt> "Functional_Role" .
        _:node_5260 <title> "senior business development & client services operations associate" .

        _:node_5125 <dgraph.type> "Office_Location" .
        _:node_5125 <nt> "Office_Location" .
        _:node_5125 <title> "new york city" .

        _:node_5292 <dgraph.type> "People" .
        _:node_5292 <nt> "People" .
        _:node_5292 <title> "cai, caroline" .

        _:node_5152 <dgraph.type> "Function" .
        _:node_5152 <nt> "Function" .
        _:node_5152 <title> "admistrative assistant" .

        _:node_1013 <dgraph.type> "Entity" .
        _:node_1013 <nt> "Entity" .
        _:node_1013 <title> "test entity" .

        _:node_8192 <dgraph.type> "Initiative" .
        _:node_8192 <nt> "Initiative" .
        _:node_8192 <title> "test" .

        _:node_5340 <CONNECTED> _:node_5120 .
        _:node_5340 <CONNECTED> _:node_5163 .
        _:node_5340 <CONNECTED> _:node_5260 .
        _:node_5340 <CONNECTED> _:node_5125 .

        _:node_1013 <CONNECTED> _:node_5163 .

        _:node_5292 <CONNECTED> _:node_5120 .

        _:node_5152 <OWNER> _:node_5292 .

        _:node_1013 <CONNECTED> _:node_5152 .

        _:node_8192 <CONNECTED> _:node_1013 .

    }
}

That dataset is missing person1. Also, assuming this is your schema

<CONNECTED>: [uid] @reverse .
<OWNER>: uid @reverse .
<nt>: string @index(exact) .
<title>: string @index(exact) .
type <Entity> {
	title
	nt
	CONNECTED
	OWNER
}
type <Function> {
	title
	nt
	CONNECTED
	OWNER
}
type <Functional_Role> {
	title
	nt
	CONNECTED
	OWNER
}
type <Office_Location> {
	title
	nt
	CONNECTED
	OWNER
}
type <People> {
	title
	nt
	CONNECTED
	OWNER
}

Update on Schema and Query Execution

I have updated the person1 data as mentioned in the previous reply.

Regarding the schema, I have reviewed the configuration and can confirm that everything is set as you had assumed, except for the following changes which I have now applied:

<nt>: string @index(exact) .
<title>: string @index(exact) .

Previously, these fields were indexed using @index(term). I have now updated them to use @index(exact) as suggested.

After making the changes, I re-executed the query, but the results remain the same.

Given that data, I get the expected results…