Get positions of matching terms in query

deanroker123 · June 9, 2023, 12:09am

I am using dgraph to search documents and would like to highlight the matching terms in the results.

Is it possible to get the positions of the words that satisfied the all of terms / any of terms / any of text queries?

MichelDiz · June 9, 2023, 12:24am

No, I recommend you to use multiple blocks for each case.

deanroker123 · June 9, 2023, 7:39am

Hi Michael, I have done that.

But I would like to highlight the search term in each block. The trouble is the full text search does stemming, so running would match with run, so I cant just do a search for that word.

Could this be added as a feature request?

Thanks

Dean

MichelDiz · June 9, 2023, 7:49am

highlight how?

We use Bleve. If Bleve can do it, we could potentially do it. What ever it is.

deanroker123 · June 9, 2023, 10:21am

Hi Michael, Yep it does, Highlight Matches in Results -- Bleve I was looking to try and use that with docs storing the UIDs of nodes. But if its built in and uses the same library then that would be much better.

Could be as simple as providing the offsets of the matching words rather than actually providing formatted output.

MichelDiz · June 9, 2023, 5:13pm

Interesting, thanks for checking. So, it would generate a HTML responde? with CSS? I’m not following how we would do this. In Ratel? if implemented could work anywhere? if it throws an HTML, nice. Otherwise it might be too much complex in our case.

Anyway. You can open a ticket for a feature request. But don’t expect speed in the process. We have few engineers.

deanroker123 · June 9, 2023, 5:41pm

In the results from bleve ( even if you don’t get the marked up result) there is an offset and length of things that matched the query.

Maybe there could be a new property you add to the query to get an array of offsets and lengths returned from bleve? .

Do you accept public pull requests. I am not sure I can do it, but I do a bit of go development, I could have a stab at it.

MichelDiz · June 9, 2023, 5:54pm

That would be nice!

deanroker123 · June 9, 2023, 9:08pm

Hi Michael,

I cant find where bleve is used to do the search. I can find where it is used to do the stemming of the words for the search, but cant find anywhere that uses the query for searching. Does it actually use bleve for the search function?

MichelDiz · June 10, 2023, 3:26am

Well, as I am not a core developer, I don’t have the full context of the code. What I can offer you is a theoretical perspective, quite surface-level. What I do know is that soon after the query is parsed, it proceeds to “ProcessTask” and “ProcessTaskOverNetwork”.

From that point forward, my understanding is mostly conjecture. I know we do some kind of proprietary indexing. I’m not sure how Bleve is involved, but we do use the Bleve package, as shown in the list below.

dgraph/tok/bleve.go
dgraph/tok/stemmers.go
dgraph/tok/stopwords.go

All indexing involves the creation of a token in the Tablet. The ProcessTask triggers this matrix with indexing data and UIDs. I suspect if we use Bleve at this level, its application is not trivial. It’s not a simple task. Dgraph’s code its a bit complex. It’s something that takes time to understand. All the processes are somewhat atomized. Several concurrent tasks happens. Need more dig to get what it does in terms of the query matching.

If you are using VsCode you can use this dgraph/launch.json at main · dgraph-io/dgraph · GitHub and put some breakpoints at dgraph/worker/task.go so you can investigate it.

PS: A possible explanation could be as follows: We utilize Bleve to create the indexing table, a process we refer to as “Tokenization”. So the model is based on Bleve. However, during the query process, we do not employ Bleve. This is merely a theory.

deanroker123 · June 10, 2023, 9:31am

Thanks for taking the time to dig into it, thats pretty much what I worked out from the code.

I think its the Bleve index that actually holds the position of the words in the document as well as the document they are in. So I am not sure its going to be possible to achieve what I want. I could use the same same tokenizer as you though so I can re-create the term you would have indexed match the word to the stemmed search query in my app.

I will add it to the wish list though as I am sure other people would benefit from the feature.

Topic		Replies	Views
Feature request: full text search with tf-idf Scoring Dgraph dgraph , status:accepted , kind:feature , area:querylang , exp:expert	11	1672	January 11, 2021
Ordering of results in `anyofterms` query Dgraph	1	892	January 14, 2020
Number of matches Dgraph	1	349	February 16, 2025
How to implement keyword based relevance sorting Dgraph kind:question , dgraph	1	802	November 28, 2021
Cannot search `*` with term index Dgraph kind:question , dgraph , status:accepted , kind:bug , ticket:created	10	678	January 15, 2021

Get positions of matching terms in query

Related topics