https://en.wikipedia.org/wiki/UTF-8#WTF-8
WTF-8 is a superset of UTF-8, that includes UTF-16 surrogates
in UTF-8 bytes (forbidden in well-formed UTF-8).
We may get UTF-8 with these from bad producers or converters.
We can get such chars in the text we get from Wikipedia API once
their (fully valid) JSON has been decoded by our lpeg-based JSON
decoder (which is a defect, hard to fix). (Our other pure-Lua json
decoder has no problem and do that correctly).
We might also find these WTF-8 in some dictionaries, so let's
support them.