Maybe even a buffer to get some extra internally if cascade is anywhere in the query, so it is not getting all, but a little more than first just in case.
But the main problem lies in the offset though. The only way to really fix that is to use after instead of offset when using cascade, but after is not available in GraphQL 
To illustrate why this is problematic to do paginate with cascade without after…
Let’s assume a query gets users that have a certain payment plan and we need the query to start at the root of the user instead of refactoring for a better query, so we use cascade.
Schema
type User {
username: String! @id
plan: Plan! @hasInverse(field: "usedBy")
}
type Plan {
id: ID
usedBy: [User]
}
Example Data
<0x1> <dgraph.type> "User" .
<0x1> <User.username> "user1" .
<0x1> <User.plan> <0x5> .
<0x2> <dgraph.type> "User" .
<0x2> <User.username> "user2" .
<0x2> <User.plan> <0x6> .
<0x3> <dgraph.type> "User" .
<0x3> <User.username> "user3" .
<0x3> <User.plan> <0x5> .
<0x4> <dgraph.type> "User" .
<0x4> <User.username> "user4" .
<0x4> <User.plan> <0x6> .
<0x5> <dgraph.type> "Plan" .
<0x5> <Plan.usedBy> <0x1> .
<0x5> <Plan.usedBy> <0x3> .
<0x6> <dgraph.type> "Plan" .
<0x6> <Plan.usedBy> <0x2> .
<0x6> <Plan.usedBy> <0x4> .
Example Query with cascade
{
queryUser @cascade {
username
plan(filter: { id: ["0x6"] }) {
id
}
}
}
Cascade being a post-query and pre-pagination and pre-return process this would currently get all 4 users and then weed out the two we don’t want.
Now let’s apply pagination getting the first 1
Example Query with first:1
{
queryUser(first:1) {
username
plan(filter: { id: ["0x6"] }) {
id
}
}
}
This still gets ALL 4 users and then cascades and then returns the first 1.
This logic works the same for first:1, offset: 1 to get ALL then cascade down to the 2 and then removes the offset slice and limits to the first 1.
The problem stated above in this performance is what if there were millions or billions of users and there were only a very small set fitting the cascade post process?
Every query would then need to touch ALL users (even if there are billions of them) and then apply the cascade and then the pagination.
It was suggested to use a has filer to help reduce the galaxy at the root, but if every user has a plan, then a has filter is of no effect in this scenario.
So a user requests the first user out of the 10 that match the cascade out of the billions of users. The performance makes no difference Dgraph side if there is pagination or not applied.
This change was made in v21.03 to fix the problem when cascade was applied after pagination and incomplete results were being returned.
The suggestion above was to do a recursive paginating getting more until the pagination was met. But the question must be answered logically where to start the pagination from if you only can provide offset?
If we paginate any at all before cascade then we will possibly run into incomplete results and need to fetch more before the response.
If we offset at all in our first query then we risk not passing over enough and not knowing how many we have actually passed over that matched the cascade.
So let’s look at how this suggested process might work:
- Query with paginating getting the first 1
- Apply Cascade
- Do we have limit? && are we at the end? if both no, then go to step 1
- Return results to client
But when we include an offset it goes something like this:
- Query with paginating (without offset) getting the first 1
- Apply Cascade
- Do we have (offset amount + limit)? && Are we at the end? If both no, then go to step 1
- Return results to client
So the first method without an offset works well, but think again if we have billion of nodes, and we offset by 5. Consider that the first 5 are within the first few hundred nodes but the 6th one is the very last node.
- Limiting to 6 would still need to read all billion nodes, but now instead of one chunk of billion we are caught in a very long loop
- Offsetting does no good to decrease this loop, but rather makes the loop even worse as now you have to loop at least twice to get a complete result.
- If we wanted the
first:1, offset: 5knowing ourselves that the 6th node is the billionth node, then we would be caught running one billion queries if we did not buffer the pagination some to get any more than one on each loop - A Higher offset acts as a multiplying factor to the number of loops needed to get a complete result set.
Solution: DO NOT USE CASCADE AND PAGINATION TOGETHER if you are experiencing performance issues. Instead refactor the query to have the best performance. If you are cascading over multiple fields, then it would be better to run the query in DQL (with var blocks) instead of GraphQL for better performance, but then you will need to apply any auth rules as well in var blocks.