Get positions of matching terms in query

I am using dgraph to search documents and would like to highlight the matching terms in the results.

Is it possible to get the positions of the words that satisfied the all of terms / any of terms / any of text queries?

No, I recommend you to use multiple blocks for each case.

Hi Michael, I have done that.

But I would like to highlight the search term in each block. The trouble is the full text search does stemming, so running would match with run, so I cant just do a search for that word.

Could this be added as a feature request?

Thanks

Dean

highlight how?

We use Bleve. If Bleve can do it, we could potentially do it. What ever it is.

Hi Michael, Yep it does, Highlight Matches in Results -- Bleve I was looking to try and use that with docs storing the UIDs of nodes. But if its built in and uses the same library then that would be much better.

Could be as simple as providing the offsets of the matching words rather than actually providing formatted output.

Interesting, thanks for checking. So, it would generate a HTML responde? with CSS? I’m not following how we would do this. In Ratel? if implemented could work anywhere? if it throws an HTML, nice. Otherwise it might be too much complex in our case.

Anyway. You can open a ticket for a feature request. But don’t expect speed in the process. We have few engineers.

In the results from bleve ( even if you don’t get the marked up result) there is an offset and length of things that matched the query.

Maybe there could be a new property you add to the query to get an array of offsets and lengths returned from bleve? .

Do you accept public pull requests. I am not sure I can do it, but I do a bit of go development, I could have a stab at it.

That would be nice!

Hi Michael,

I cant find where bleve is used to do the search. I can find where it is used to do the stemming of the words for the search, but cant find anywhere that uses the query for searching. Does it actually use bleve for the search function?

Well, as I am not a core developer, I don’t have the full context of the code. What I can offer you is a theoretical perspective, quite surface-level. What I do know is that soon after the query is parsed, it proceeds to “ProcessTask” and “ProcessTaskOverNetwork”.

From that point forward, my understanding is mostly conjecture. I know we do some kind of proprietary indexing. I’m not sure how Bleve is involved, but we do use the Bleve package, as shown in the list below.

dgraph/tok/bleve.go
dgraph/tok/stemmers.go
dgraph/tok/stopwords.go

All indexing involves the creation of a token in the Tablet. The ProcessTask triggers this matrix with indexing data and UIDs. I suspect if we use Bleve at this level, its application is not trivial. It’s not a simple task. Dgraph’s code its a bit complex. It’s something that takes time to understand. All the processes are somewhat atomized. Several concurrent tasks happens. Need more dig to get what it does in terms of the query matching.

If you are using VsCode you can use this dgraph/launch.json at main · dgraph-io/dgraph · GitHub and put some breakpoints at dgraph/worker/task.go so you can investigate it.

PS: A possible explanation could be as follows: We utilize Bleve to create the indexing table, a process we refer to as “Tokenization”. So the model is based on Bleve. However, during the query process, we do not employ Bleve. This is merely a theory.

Thanks for taking the time to dig into it, thats pretty much what I worked out from the code.

I think its the Bleve index that actually holds the position of the words in the document as well as the document they are in. So I am not sure its going to be possible to achieve what I want. I could use the same same tokenizer as you though so I can re-create the term you would have indexed match the word to the stemmed search query in my app.

I will add it to the wish list though as I am sure other people would benefit from the feature.

2 Likes