Date Comparisons - Speed differences with data at scale

Hello,
We have a use-case where comparing dates require precision down to the second. We created a a predicate date as a type datetime with an index. As we build-up the query we need, it gets progressively slower. We’re pretty certain it has to do with the DateTime Hourly index we have on the field.

Schema:

<date>: datetime @index(hour) @upsert .

Query 1: Time: 80476 ns (0.08ms)

node_0 (func:uid("0x27231820"))

Query 2: Time: 184123 ns (0.18ms)

node_0 as var (func:uid("0x27231820"))
nodeRelDate_0 (func: uid(node_0)) @filter(has(date)) {uid}

Query 3: Time: 4697328642 ns (4697.32ms // 4.6 seconds)

node_0 as var (func:uid("0x27231820"))
nodeRelDate_0 as var (func: uid(node_0)) @filter(has(date))
nodeRelDateFiltered_0  (func: uid(nodeRelDate_0)) @filter(le(date, "2025-30-13T20:46:03Z")) {uid
date}

Query 4: Time: 108032601ns (108.03ms // 0.1 seconds)

node_0 as var (func:uid("0x27231820"))
nodeRelDate_0 as var (func: uid(node_0)) @filter(has(date))
nodeRelDateFiltered_0  (func: uid(nodeRelDate_0)) @filter(not gt(date, "2025-30-13T20:46:03Z")) {uid
date}

Questions:

  1. Generically, any ideas why we’re seeing the variation in query speed between query 2, 3, and 4, especially when searching using le vs not gt in the index?

We thought perhaps:
If there is an index on a datetime field, perhaps that index sorted ascending/descending which would cause filtering on dates at one side of the index faster than the other side? For example, if I have data from 2000 - 2025, filtering on 2025-03-01 would be much faster as a not gt because it starts at the top of the data vs a le may start at the bottom of the data (2000) and have to traverse up the index.

  1. Does @filter(not gt(date, "2025-30-13T20:46:03Z")) even hit the index? We thought it would be looking at the direct value of date (being that it’s a direct uid lookup) and the string literal passed in. Then marshalling the string literal value into a ‘go’ date, comparing the two, and returning back. Are we on the right track thinking this is instead hitting an index slowing it down?

  2. Does using an operator like this @filter(not gt(date, "2025-30-13T20:46:03Z")) require an index to be set on the date field? In our trials it does not. Just looking for confirmation here.

  3. Can we safely drop an index off a datetime field?

  4. Would it be faster to be comparing dates as a int type with an epoch?

For research/reference, I’ve read some of the background of dates here: Datetime Indexes in Dgraph - Dgraph Blog

Thanks,
Ryan

We just swapped from Query 3 (using le) to Query 4 (using not gt) and saw an immediate 10x (11.6x to be specific) improvement. Waiting on longer metrics to confirm it remains a 10x improvement.

I’m still curious if dropping the index entirely would speed this up.

Also, I’m curious if precision date equality/math is required, if the better/best approach approach is using an epoch / int field if speed is the optimizing factor. And if so, how much faster is it.

IMO, this is the way. I don’t have specifics from past implementations, but I do know this approach was a no brainer for a large graph from which queries were heavily “time-based”.

Yeah so the datetime search functionality has some issues. If you create an hour index, we will create one key for each hour. Now for each query, we would first query what indexes are there. So when you do le() with a relatively new time, it would end up fetching a lot more indexes than ge(). Which is why you see the improvement.
Your point regarding that your query shouldn’t hit the index is valid, and we technically don’t. But the way code is written, we first calculate how many indexes are there to compare against how many uids are there. I fixed this earlier for eq filters, where I came up with storing the number in memory. I am yet to fix it for le, gt.
You can safely drop the index (if anything goes wrong it can be easily rebuilt), but I am not sure if your filter query would work. It should though.
Would it be possible for you to try out this PR: perf(core): improve performance for ineq filters when uidlist is small by harshil-goel · Pull Request #9383 · hypermodeinc/dgraph · GitHub, and see if that helps?

We tested this, it’s true that le, not gt, eq all work without the index.

The easiest way for us to test a PR is to have an alpha/beta/rc docker image published, like 24.1.3-alpha1. Is that possible?

While we await am answer wrt a RC Docker image for testing, you could build an image locally too (and rather easily):

  1. git clone the Dgraph repo locally.
  2. cd dgraph
  3. Create a new branch from the target tag: git checkout -b <branch-name> tags/<your-tag-of-choice>
  4. Run make image to build a Docker image. The new image would be tagged as dgraph-local.
  5. Run docker image to list the new image and start a cluster using the same.

HTH!

I saw PR #9383 has been merged into main with updates to improve performance for le and gt. Unfortunately our IT organizational policies do not make it easy for us to build and deploy our own docker images, but if a beta/preview docker image is built and published we are able to request that artifact be pulled into our environment for testing.

We’re definitely looking forward to this one based on the performance improvement we already saw just flipping from le to not gt.

@rahst12 We should have a preview build soon. I will let you know once it’s here.

1 Like

@rahst12 We have enabled nightly builds, and you can try one using this:
docker pull Package dgraph-standalone · GitHub.
Let me know if this works for you.