Conceptually I have two node types, programs (movies, show, episodes) and timeslots (the time and channel a program is airing)
I need to return a list of programs, that includes the first airing of it timeslot, and also allow for sorting of the entire result set using data in the returned timeslot.
This correctly gives me a list of programs, and only the earliest airing of the program. Now what I need to do is determine a way to sort the result set by the gmt_date_time.
Am I going about it the right way? I’m not sure if there is an implicit association between the vars pids and gdt. For example when using the var function, multiple pid, gmt tuples are returned. Does the following query recognize which pid is matched on and implicitly use the correct gdt?
I also thought maybe there was a way to expose the data as a variable inside the query and sort on it
Thie above query seems to function as expected, but performance is poor. 70sec with ~1million timeslots and 50k programs. How can I optimize it to perform better? Both elasricsearch and neo4j can handle this query in a few seconds.
Maybe two things, don’t use sort in the var block and add a indexed predicate like “has_ program_id: true”. You gonna have a better perf cuz it’s indexed - has a smaller cut than the has function. Bool index is faster.
I just ran it with twice the data in the hopes performance wouldn’t jump too much but it basically doubled to 14 seconds. I appreciate the help, but I don’t think dgraph is the ideal system for this type of query. I’ll be investigating other technologies since it doesn’t seem to be great for my use case.
I’m looking for a replacement to elasticsearch, due to its inabilty to handle many to many joins. This particular query is modeled as a parent child relationship in ES and many users expect similar functionality in any new system. It runs against the entire dataset in about 5 seconds. The neo4j poc I created ran the much smaller dataset I’m using for dgraph in 5 seconds. I’ve been running everything in the ratel UI. I can certainly work around the limitations by querying more sorted.timeslots in the outer branch and allowing duplicate programs in.the result set, then doing additional data manipulations on the client side but I can also do that with other db systems. To answer your question, I do need the availability of millions of nodes, more like a billion in total, and the ability to join on them. If I don’t have the right query, I would appreciate more input.
I’ve heard weeks ago, from some users comparisons of Dgraph against elasticsearch. And was good ones. As I don’t use it, I can’t say about it. Also I’ve heard that Vespa (from Yahoo guys) are better than elasticsearch.
As elasticsearch and Dgraph have distinct proposals I do not know if it is fair to compare them directly. But the users seemed to me very satisfied with Dgraph.
I did not get this phrase, did you force the Dgraph more than the neo4j and hoped Dgraph was better in this hypothesis? or is that phrase truncated?
Ratel UI isn’t meant to get that amount of data. Just to explore, test, plan it. You must use the clients tho.
A short time ago a user started doing Dgraph benchmarks. See what you think.