Recent Performance Improvements

rahst12 · June 12, 2025, 4:16pm

I’m tracking some optimistic looking performance commits to the main branch for dgraph. I was wondering if the team could give an update about them and where they’d be impactful for users.
Thanks!

harshil_goel · June 13, 2025, 8:50pm

Hey, thanks a lot for the enthusiasm! We’ve made three key changes, all of which are within the query engine. Here’s a summary:

1. Parallel Merge-Sorted Algorithm

At several stages during query execution, we need to merge multiple sorted UID lists (sorted by value). Previously, the merging logic wasn’t parallelized because we assumed that multiple queries would be running concurrently — which would naturally use available cores.

However, we discovered that our existing merging algorithm (based on heap sort) struggled when dealing with many small lists, especially when most of them contained just a single UID. By parallelizing the merge process, we saw a significant performance improvement, especially in high-fanout queries.

2. Caching UID Arrays in Ristretto

Our posting list cache used to store just enough information to reconstruct the “view” of the data whenever needed. This view (the actual list of UIDs visible to the query) had to be recomputed every time it was accessed — even if the underlying posting list hadn’t changed.

We improved this by caching the computed UID view itself in Ristretto. Now, once the UID list is computed, it’s stored and reused directly. This dramatically increases the hit value of the cache and reduces CPU usage on repetitive access.

3. Sharded Map for Post-Query Speedup

DQL supports variable propagation, where different parts of a query can share data (e.g., values associated with UIDs).

Previously, the propagation data was stored in a single map per variable — mapping UID to value — and all computations (merges, aggregations) on this map were done sequentially.

We introduced a sharded map, allowing these operations to be parallelized across shards. This improves performance when working with large variable maps and speeds up complex DQL queries significantly.

We also have some more work in the pipeline revolving around ordering and using limit x; Currently they are done after the main query has finished. We are going to start integrating and implementing join algorithm to get multiple root, filter and order / limit at the same time.

rahst12 · June 14, 2025, 4:27am

This all sounds great! What’s the plan/schedule for a formal release/preview tag?

harshil_goel · June 15, 2025, 12:18pm

@rahst12 We will soon release another v25 preview build, and hopefully the full v25 version too.

Topic		Replies	Views
DGraph v0.2 Release Announce	1	1376	April 4, 2016
Scale the shit out of this! - Dgraph Blog Blog	0	1331	August 18, 2017
Optimize Dgraph Memory Usage Dgraph todo	2	981	May 11, 2020
Dgraph v21.12: Zion - The Last City Standing - Dgraph Blog Blog	5	1122	February 25, 2022
V0.3 Release Notes Users	5	863	November 28, 2017

Recent Performance Improvements

1. Parallel Merge-Sorted Algorithm

2. Caching UID Arrays in Ristretto

3. Sharded Map for Post-Query Speedup

Related topics