Equality checking with the trigram index

According to the String Indices section of the Dgraph documentation, the trigram index supports:

Regular expression matching. Can also be used for equality checking.

At the moment, I’m interested in that last part about equality checking. Because with a very simple test schema…

type User {
  name
}

name: string @index(trigram) .

…and a very simple query using the eq() function…

{
  user(func: eq(name, Alice)) {
    name
    uid
  }
}

…I get an error:

Attribute name does not have a valid tokenizer.

So I have two questions:

  1. How exactly can I do equality checking with a trigram index?
  2. Does trigram equality checking support strings of fewer than 3 characters?

hash , exact , term , or fulltext
Only these indexes can use the eq function. docs

As far as I know, the regexp function corresponding to trigram index only supports strings with more than 3 characters, And the eq function has no limit on the number of characters.

Using multiple indexes at the same time like the following should meet your needs.

name: string @index(trigram, term) .
2 Likes

Thanks. I came to the same conclusion, and ended up adding the hash index:

name: string @index(hash, trigram) .

Hopefully someone from Dgraph will chime in and explain what this means with regards to the regexp index:

Can also be used for equality checking.

Just my take on this, with regexp you can create an expression that is the same logically as a equality check using the start of string marker, the string to equate against, and the end of string marker.

Just because it can be done does not mean it should be done.

Regular expression check is like switching to another language midstream that requires extra processing. It is fast to perform an equality match using a hash/exact index but it is much much slower to perform an equality match with a regexp as it has to process the regexp one character at a time and reference the three character splitting index to quickly rule out longer non matches. Regular expressions can also be very bad in some cases that drastically decrease performance but on the other hand inversely increases flexibility of search procedures.

I will have to try to find the best video I ever watched on regexp that really helped me realize why using a regexp as a primary tool of choice is usually a bad idea verses catching all other cases first that don’t depend on regexp and then only using regexp as a last resort. The bottom line of it though was that regexp is it’s own programming language and not just a somewhat universal matching pattern.

In this case it is slower because you’re essentially comparing arrays instead of strings:

something === something

versus

som
ome
met
eth
thi
hin
ing

each trigram exists in that order to match the word…

Definitely not faster lol

J