Saturday, February 2, 2013: Node.js: When to use a StringDecoder?
In Node.js, Buffers has toString() that can convert a buffer into a String with a specified encoding, and StringDecoder does the same. So, when to use StringDecoder?
The docs says that the StringDecoder is better at UTF-8. Let’s see some practical use. Here I have few buffers:
var b1 = new Buffer([0xe0,0xb8,0x81,0xe0,0xb8,0xb2,0xe0,0xb8])
, b2 = new Buffer([0xa3,0xe0,0xb8,0x97,0xe0,0xb8,0x94,0xe0])
, b3 = new Buffer([0xb8,0xaa,0xe0,0xb8,0xad,0xe0,0xb8,0x9a])
Let’s say that we received these buffers one at a time.
When we receive each of these buffers, we want to pass it immediately to the client as a string.
So for each received buffer, we decoded it and sent it right away.
console.log(b1.toString('utf-8'))
console.log(b2.toString('utf-8'))
console.log(b3.toString('utf-8'))
Now, what did the client get? Some gibberish along with the text…
กา��
�ทด�
��อบ
How about a StringDecoder?
var decoder = new (require('string_decoder').StringDecoder)('utf-8')
console.log(decoder.write(b1))
console.log(decoder.write(b2))
console.log(decoder.write(b3))
Here’s the output:
กา
รทด
สอบ
So from what we see, instead of converting incomplete UTF-8 character sequence into gibberish, a StringDecoder does buffer incomplete UTF-8 multibyte character sequence and waits until the character sequence is completed.
add / view all comments
Responses