Hi,
I’m trying to build a multilingual dictionary. For now, it is made up of Dictionaries, Words and Meanings.
An entry in an English-German dictionary for the word “apple” would be like:
{
"content": "apple",
"meaning": [
{
"translation": [
{
"content": "Apfel"
}
]
}
]
}
The translation for “apple” in the English-Hungarian dictionary would be like:
{
"content": "apple",
"meaning": [
{
"translation": [
{
"content": "alma"
}
]
}
]
}
Now, my idea is that every Word and Meaning may belong to multiple dictionaries. Here “apple” belongs to my eng-ger and eng-hun dict, “Apfel” belongs to eng-ger, “alma” belongs to eng-hun, and the Meaning node belongs to both dictionaries.
This is my schema:
<content>: string @index(fulltext, term, trigram) .
<dgraph.graphql.schema>: string .
<dgraph.graphql.xid>: string @index(exact) @upsert .
<dict>: [uid] @reverse .
<indexWord>: [uid] .
<language>: string .
<meaning>: [uid] @reverse .
<name>: string .
<pos>: string .
<shortName>: string @index(exact) .
<translation>: [uid] @reverse .
This is my dataset:
{
"set": [
{
"dgraph.type": "Dictionary",
"uid": "_:eng-hun-dict",
"name": "English-Hungarian dictionary",
"shortName": "eng-hun-dict",
"indexWord": [
{
"dgraph.type": "Word",
"uid": "_:apple-word",
"dict": [
{
"uid": "_:eng-hun-dict"
}
],
"language": "eng",
"pos": "noun",
"content": "apple",
"meaning": [
{
"dgraph.type": "Meaning",
"uid": "_:apple-meaning",
"dict": [
{
"uid": "_:eng-hun-dict"
}
],
"translation":[
{
"dgraph.type": "Word",
"uid": "_:alma-word",
"dict": [
{
"uid": "_:eng-hun-dict"
}
],
"language": "hun",
"pos": "noun",
"content": "alma"
}
]
}
]
}
]
},
{
"dgraph.type": "Dictionary",
"uid": "_:eng-ger-dict",
"name": "English-German dictionary",
"shortName": "eng-ger-dict",
"indexWord": [
{
"dgraph.type": "Word",
"uid": "_:apple-word",
"dict": [
{
"uid": "_:eng-ger-dict"
}
],
"language": "eng",
"pos": "noun",
"content": "apple",
"meaning": [
{
"dgraph.type": "Meaning",
"uid": "_:apple-meaning",
"dict": [
{
"uid": "_:eng-ger-dict"
}
],
"translation": [
{
"dgraph.type": "Word",
"uid": "_:apfel-word",
"dict": [
{
"uid": "_:eng-ger-dict"
}
],
"language": "ger",
"pos": "noun",
"content": "apfel"
}
]
}
]
}
]
}
]
}
Now when I search for a word and its translation in a specific dictionary I want to generate these dictionary entries by traversing so that only those nodes are included in the result that belong to the currrently searched dictionary.
I use the following query. The parameter $dict can be either “eng-ger-dict” or “eng-hun-dict”. Based on this it generates either the dictionary entry with the translation “Apfel” or “alma”:
query dentry($dict: string){
var(func: eq(shortName, $dict)) {
~dict @filter(eq(dgraph.type, "Meaning")) {
MEANING_UID as uid
}
}
var(func: eq(shortName, $dict)) {
~dict @filter(eq(dgraph.type, "Word")) {
WORD_UID as uid
}
}
dentry(func: allofterms(content, "apple")) @filter(uid(WORD_UID)){
content
meaning @filter(uid(MEANING_UID)) {
translation @filter(uid(WORD_UID)) {
content
}
}
}
}
And now my actual question(s): I wonder how scalable is this solution? Once I have my dictionary fully created it will have millions of nodes belonging to the same dictionary. If I understand correctly how vars work then MEANING_UID and WORD_UID will be calculated before running my “dentry” query and MEANING_UID and WORD_UID could contains millions of uids which will be evaluated in the @filter expressions of “dentry()”.
How will this effect query performance? Can Dgraph handle it? How much memory will this use?
(Note: I know that for this toy database I could use uid_in(dict, 0xsome_dict_uid
) like this and it would be more efficient:
query dentry($dict: int){
dentry(func: allofterms(content, "apple")){
content
meaning @filter(uid_in(dict, $dict)) {
translation @filter(uid_in(dict, $dict)) {
content
}
}
}
}
But for now I try to understand how vars work and how much I could/should use it in case of more advanced queries.)
Thanks!