Recently I was working on a UI implementation that must handle emoji inputs into a textarea element. I realized that the maxlength
attribute on text inputs don’t handle emojis the way I expected, more precisely, the length of the character 👨 was 2, not 1. I was aware that JavaScript strings are all UTF-16 and when counting the length of a string, it is done by the length of the 16-bit of the UTF-16 data, not by the code point of the string. Here is a much more thorough explanation of this behavior: https://blog.jxck.io/entries/2017-03-02/unicode-in-javascript.html
This is still ok when you have control over how you count the string using APIs that handles the string based on its code point. Like the example below, using the spread operator to a string makes the characters separated by its code point. This is because String iterator handles the string by the code point, so using any function or statement that implements the iterable interface you can properly count the number of the graphemes.
'👨'.length
// => 2
Array.of(...'👨').length
// => 1
However, when it comes to DOM attributes like maxlength
or minlength
, the problem is that you do not have any control over how the length of the value is counted, which makes it frustrating when you need to validate the length of the input value. This is especially troublesome when using emojis, concerning how emojis are widely used in most major devices and browsers.
I was not 100% sure if my point is valid or not, as the WebIDL specification states:
The DOMString type corresponds to the set of all possible sequences of code units. Such sequences are commonly interpreted as UTF-16 encoded strings [RFC2781] although this is not required
however, seeing the test cases of Chromium that asserts the behavior of the maxlength
on textarea
, I thought this behavior around how emojis are counted weren’t tested and should be discussed further. I created an issue, and so would like to write up another post when things move forward from here.