How can I set text with JSON encoded emojis using DQL?

I try to set an emoji like this to a node predicate.
:grinning:
This character is converted to encoded string in JSON “\uD83D\uDE00”

{
	set {
		<0x1f> <post-text> "\uD83D\uDE00" .
  }
}

After executing above mutation I get the following:

image

I also checked the returned text in browser and clearly Dgraph corrupts encoded string values.

"post-text": "\uFFFD\uFFFD",

This is the value I get. That is why smiley is shown as question marks in the browser.

Is this a bug or am I doing something wrong?

1 Like

Try \\uD83D\\uDE00

That is not a valid unicode encoding. \u is a special indicator that following characters are the hex representation of underlying unicode character.
Check in the browser console as below.

image

Apparently, Dgraph serializer does not respect Unicode encoding or there is a bug.

Dgraph doesn’t support special characters. You have to follow the JSON escaping rules in order to store those values. You have to(on your end) scape and unscape those before sending/reading them to Dgraph.

@docs maybe we should document this.

The JSON specification says: ’ To escape an extended character that is not in the Basic Multilingual Plane, the character is represented as a twelve-character sequence, encoding the UTF-16 surrogate pair. So, for example, a string containing only the G clef character (U+1D11E) may be represented as “\uD834\uDD1E”. ’

This is not special character. This is the definition of JSON serialization and deserialization.
I send values to Dgraph using .NET JsonSerializer and it converts all emojis encoded.
If DQL language designed to work on JSON data, then it would be wise to support this.

Related discussion: Consider adding a JavaScriptEncoder implementation that doesn't encode the block list or surrogate pairs. · Issue #42847 · dotnet/runtime · GitHub

Dgraphs’ string doesn’t support JSON. But it follows the JSON rules of scaping special characters. A Slash is a special character.

We might create a bug / feature request to not to convert given input
“\uD83D\uDE00”
into this:
“\uFFFD\uFFFD”

If this syntax is not supported, Dgraph should throw an exception instead of converting the bytes into some other number.

If we throw an exception every time we see a special character, the users will be really mad cuz it will happen all the time. We could have a scaping directive, but for now, is perfectly fine to have it on your end.

Yes, I handled the situation on my end. I insist on this to improve Dgraph.
These kind of issues can be send to backlog to be implemented in the future.

If I am not wrong, DQL parser unquotes a string using primitive strconv Unquote function.

I am newbie on Go lang however I found following:

This might end up with one line code change sth like below.

json.Unmarshal([]byte(str), &str)

I’ll look into it - on a related note there is a PR I have for GraphQL (not DQL) that is for a similar issue (it’s yet to be merged):

1 Like