Saturday, February 2, 2013: Node.js: When to use a StringDecoder?

In Node.js, Buffers has toString() that can convert a buffer into a String with a specified encoding, and StringDecoder does the same. So, when to use StringDecoder?

The docs says that the StringDecoder is better at UTF-8. Let’s see some practical use. Here I have few buffers:

var b1 = new Buffer([0xe0,0xb8,0x81,0xe0,0xb8,0xb2,0xe0,0xb8])
  , b2 = new Buffer([0xa3,0xe0,0xb8,0x97,0xe0,0xb8,0x94,0xe0])
  , b3 = new Buffer([0xb8,0xaa,0xe0,0xb8,0xad,0xe0,0xb8,0x9a])

Let’s say that we received these buffers one at a time.

When we receive each of these buffers, we want to pass it immediately to the client as a string.

So for each received buffer, we decoded it and sent it right away.

console.log(b1.toString('utf-8'))
console.log(b2.toString('utf-8'))
console.log(b3.toString('utf-8'))

Now, what did the client get? Some gibberish along with the text…

กา��
�ทด�
��อบ

How about a StringDecoder?

var decoder = new (require('string_decoder').StringDecoder)('utf-8')
console.log(decoder.write(b1))
console.log(decoder.write(b2))
console.log(decoder.write(b3))

Here’s the output:

กา
รทด
สอบ

So from what we see, instead of converting incomplete UTF-8 character sequence into gibberish, a StringDecoder does buffer incomplete UTF-8 multibyte character sequence and waits until the character sequence is completed.