I have a query requirement, I write a query and test the result on small data is correct.
However, when running on big data, the prompt exceeds 20s.
I wonder why this query has been running for so long? Is it because my code is not well written?
I need to know:
Given a Message, retrieve the (1-hop) Comments that reply to it.
In addition, return a boolean flag knows indicating if the author of the reply knows the author of the original message. If author is same as original author, return false for knows flag.
the relationship like:
my code:
{
# 找出给定月份的Tag是哪些
var(func: has(creationDate))@filter((type(Comment) or type(Post)) and le(creationDate,"2011-09-30") and ge(creationDate, "2011-09-01")){
hasTag{
tag_uid as uid
}
}
# 首先是第一个月的次数
var(func: uid(tag_uid)) @filter(type(Tag)){
count_thismonth as count(~hasTag) @filter((type(Comment) or type(Post)) and le(creationDate,"2011-09-30") and ge(creationDate, "2011-09-01"))
}
# 其次是第二个月的次数
var(func: uid(tag_uid)) @filter(type(Tag)){
count_nextmonth as count(~hasTag) @filter((type(Comment) or type(Post)) and le(creationDate,"2011-10-31") and ge(creationDate, "2011-10-01"))
}
# 计算diff
var(func: uid(count_thismonth)){
diff as math(max(count_thismonth-count_nextmonth,0)+max(count_nextmonth-count_thismonth,0))
}
# 统一计算
q(func: uid(diff),orderdesc:val(diff),orderasc:name){
name
count_thismonth1: val(count_thismonth)
count_nextmonth1: val(count_nextmonth)
diff1: val(diff)
}
}
I want to know how to improve the efficiency of this query?
Who can give me some advice
Avoid using has func at root, especially when you have tons of data. You can use on filters tho.
The best approach here to gain perf is using indexation. Any kind.
Also, I personally would recommend that you segment your types. Doing a pattern like “namespacing”.
For example. The predicate “name”. You can have this very same predicate shared with several entities. This isn’t too good. So, I recommend that you do like:
user.name: string .
product.name: string .
animal.name: string .
object.name: string .
...
So on and so forth.
You should do this block like this
A0 as var(func: type(Post)) @filter(le(creationDate,"2011-09-30") AND ge(creationDate, "2011-09-01"))
A1 as var(func: type(Comment)) @filter(le(creationDate,"2011-09-30") AND ge(creationDate, "2011-09-01"))
var(func: uid(A0,A1)){
hasTag {
tag_uid as uid
}
That way you can have a better performance.
@MichelDiz
thanks!
In fact, I used “has” because I didn’t know I could use “type” directly, now I removed “has”.
But it still took a long time. I suddenly thought that it might be an index problem, so I added an index to the “creationDate”, and the result came out, which took 6 seconds (this time is still too long, but let’s do it first)
BTW, I did a small “upgrade” on the query. You don’t need the has() at all. As you are using on the filters the same predicate, there’s no need to check its existence.