How to efficiently find shortest and longest paths between node types in Dgraph (similar to APOC in Neo4j)?

I’m trying to find the shortest and longest path between two node types across the entire graph in Dgraph, similar to how it’s done using APOC in Neo4j.

In Neo4j, I can use a single query like this:

MATCH (start: `ntype1`)
CALL apoc.path.expandConfig(start, {
    labelFilter: '>ntype2',
    minLevel: 1,
    uniqueness: 'NODE_GLOBAL'
})
YIELD path
RETURN min(length(path)) as min, max(length(path)) as max

This efficiently returns the shortest and longest path lengths between any nodes of type ntype1 to ntype2 in one go.


What I’m doing in Dgraph:

Since Dgraph doesn’t support such a direct query, I’m first fetching all UIDs of source and target node types, and then looping over combinations to run multiple shortest queries:

combined_query = f"""
{{
  sources(func: type({ntype1})) {{ uid  RELATED_TO{{uid}} ~RELATED_TO{{uid}} }}
  targets(func: type({ntype2})) {{ uid  RELATED_TO{{uid}} ~RELATED_TO{{uid}} }}
}}
"""
result = dgraphManager.query(combined_query)

source_uids = [x['uid'] for x in result['sources'] if 'RELATED_TO' in x or '~RELATED_TO' in x]
target_uids = [x['uid'] for x in result['targets'] if 'RELATED_TO' in x or '~RELATED_TO' in x]

uid_list = [{"from": s, "to": t} for s in source_uids for t in target_uids]

query_parts = []
result_parts = []
for i, item in enumerate(uid_list, 1):
    query_parts.append(f"""
        var(func: uid({item['from']})) {{
           start{i} as uid
        }}
        var(func: uid({item['to']})) {{
           end{i} as uid
        }}
        path{i} as shortest(from: uid(start{i}), to: uid(end{i})) {{ RELATED_TO ~RELATED_TO }}
    """)
    result_parts.append(f"result{i}(func: uid(path{i})) {{ uid }}")

final_query = "{\n" + "\n".join(query_parts) + "\n" + "\n".join(result_parts) + "\n}"
data = dgraphManager.query(final_query)

path_lengths = [len(val) for key, val in data.items() if val]

Problem:

This approach works, but it’s very inefficient when handling numerous nodes. For example, if ntype1 has 100 nodes and ntype2 has 50 nodes, it results in executing 5,000 queries — just to determine path lengths between two types. Sometimes, it causes errors related to query timeout.


Question:

Is there a more efficient way in Dgraph to compute:

  • The shortest path between any node of type A to any node of type B
  • The longest such path

Ideally similar to how APOC works in Neo4j, with just one query.

Any insights or optimizations would be highly appreciated!

As you’re acutely aware, the shortest function only operates on uids…

Since you’re using Python, you may want to check out the extract_dict function in pydgraph. It was built specifically for converting DQL query results to a dict representing nodes and a list representing edges. Function: link. Test: link

Once you have those two artifacts, you can use them in networkx or any other graph algo package you like to accomplish your objectives.

Here’s a notebook that uses a locally defined extract_dict (it was later moved to pydgraph) and moves the results into networkx and then performs a number of graph analyses on the data.

Thank you for your response. I appreciate the clarity you’ve provided. I’m interested in understanding how the extract_dict function, which recursively extracts nodes and edges from a dictionary generated by a Dgraph query, can be utilized to determine both the shortest and longest paths between two nodes. Your insights on this would be very valuable.

Sorry for the delay. I extended a notebook in our dgraph-experimental repo that explores this in detail. Here’s the link to the notebook. If you have Docker and Jupyter Notebooks installed, you can run this locally. If not, I’ve included the output of the notebook in the commit, so you can see the output just by loading the file in your browser.

As I stated earlier, the sole native path function in Dgraph (shortest) relies on knowing the UIDs from two separate nodes. So natively running queries across paths in the entire graph is not possible. However, using pydgraph and networkx you can perform all sorts of graph analysis, not just path calculations. But since the topic is about shortest and longest shortest, here’s a brief overview. I definitely would recommend looking at the notebook linked above tho, as this is just a quick summary… Note that this notebook uses our stock Friend of a Friend graph, in which there are Person nodes, connected to other Person nodes via a Person.friends predicate.

  1. Load nodes and edges from Dgraph, convert and import to networkx
import json, pydgraph, networkx as nx

# Query Dgraph for Person nodes and friendships
client = pydgraph.DgraphClient(pydgraph.DgraphClientStub('localhost:9080'))
query = """
{
  person(func: type(Person)) @recurse {
    id: uid
    name: Person.name
    friends: Person.friends
  }
}
"""
txn = client.txn(read_only=True)
res = txn.query(query=query)

nodes = {}
edges = []
# Here we use the pydgraph `convert function to extract nodes and edges from the DQL result
convert.extract_dict(nodes=nodes, edges=edges, data=json.loads(res.json))

nodes_df = pd.DataFrame.from_dict(nodes, orient = 'index')
edges_df = pd.DataFrame(edges)

G = nx.from_pandas_edgelist(
    edges_df,
    source="src",
    target="dst",
    create_using=nx.MultiDiGraph()
)
  1. Find longest shortest path

Since friendship graphs are cyclic by nature, we can’t use nx.dag_longest_path(), which is only for acyclic graphs. Instead, we use the graph diameter - the longest shortest path between any two nodes, representing maximum degrees of separation:

# Find longest shortest path (diameter cmp approach)
longest_path = []
max_length = 0

for source, targets in nx.shortest_path(G).items():
    for target, path in targets.items():
        if len(path) > max_length:
            max_length = len(path)
            longest_path = path

# Display with readable names
path_names = [node_id_to_name.get(node_id, node_id) for node_id in longest_path]
print(f"Diameter: {max_length-1} edges")
print(f"Path: {' -> '.join(path_names)}")
1 Like