V21.03: After pagination+cascade change, queries are too slow to finish

@anand - Our queries are built from a user query, these are just example queries where pagination is used to make the query bearable - I am basically converting a user-input cypher MATCH syntax into a dgraph query, so I can’t just optimize this one case because I happen to know the size of the levels in this case - I need to handle whatever the customer wants to query. This one represents (a:Device)-[:has_object]->(b:Object)-[:has_indicator]->(c:Indicator), but it could be anything the user wanted to access.

Here are debug metrics from all nested variables vs just the root variable. (note you will see some other predicates with .next I have cut out of previous examples that represent our virtual edges that have an intermediate node)

with nested variables (1.38s, 6499 uids)

  "extensions": {
    "server_latency": {
      "parsing_ns": 380350,
      "processing_ns": 1380744741,
      "encoding_ns": 239606,
      "total_ns": 1381527330
    },
    "txn": {
      "start_ts": 2920081
    },
    "metrics": {
      "num_uids": {
        "": 6183,
        "_total": 6499,
        "qa.has_indicator": 8,
        "qa.has_indicator.next": 16,
        "qa.has_object": 4,
        "qa.has_object.next": 8,
        "qa.has_timerange": 26,
        "qa.name": 14,
        "qa.timerange_end": 101,
        "qa.timerange_start": 101,
        "qa.type": 12,
        "uid": 26
      }
    }

with one root variable (29.25s, 2896418 uids)

  "extensions": {
    "server_latency": {
      "parsing_ns": 399144,
      "processing_ns": 29251738486,
      "encoding_ns": 502454,
      "total_ns": 29252845323
    },
    "txn": {
      "start_ts": 2967173
    },
    "metrics": {
      "num_uids": {
        "": 365468,
        "_total": 2896418,
        "qa.has_indicator": 6000,
        "qa.has_indicator.next": 156000,
        "qa.has_object": 2,
        "qa.has_object.next": 6000,
        "qa.has_timerange": 324002,
        "qa.name": 162002,
        "qa.timerange_end": 695471,
        "qa.timerange_start": 695471,
        "qa.type": 162000,
        "uid": 324002
      }
    }
  }

I would run one with our actual current query that is ~1s in v20.11 but if I let it run for over 5m in v21.03 it OOMkills our 25GB ram alpha.

1 Like