parsing请求正文中的UTF8字符的问题？

在node.js中实现HTTP服务时，下面有很多示例代码用于获取整个请求实体（客户端上传的数据，例如带有JSON数据的POST）：

var http = require('http'); var server = http.createServer(function(req, res) { var data = ''; req.setEncoding('utf8'); req.on('data', function(chunk) { data += chunk; }); req.on('end', function() { // parse data }); });

使用req.setEncoding('utf8')自动将input字节解码为string，假定input是UTF8编码的。但我觉得它可以打破。如果我们收到一个以多字节UTF8字符结尾的数据块呢？我们可以模拟这个：

 > new Buffer("café") <Buffer 63 61 66 c3 a9> > new Buffer("café").slice(0,4) <Buffer 63 61 66 c3> > new Buffer("café").slice(0,4).toString('utf8') 'caf?'

所以我们得到一个错误的字符，而不是等待下一个字节正确解码最后一个字符。

因此，除非请求对象考虑到这一点，确保只有完全解码的字符被压入块中，否则这个无处不在的代码示例被破坏。

替代方法是使用缓冲区，处理缓冲区大小限制的问题：

 var http = require('http'); var MAX_REQUEST_BODY_SIZE = 16 * 1024 * 1024; var server = http.createServer(function(req, res) { // A better way to do this could be to start with a small buffer // and grow it geometrically until the limit is reached. var requestBody = new Buffer(MAX_REQUEST_BODY_SIZE); var requestBodyLength = 0; req.on('data', function(chunk) { if(requestBodyLength + chunk.length >= MAX_REQUEST_BODY_SIZE) { res.statusCode = 413; // Request Entity Too Large return; } chunk.copy(requestBody, requestBodyLength, 0, chunk.length); requestBodyLength += chunk.length; }); req.on('end', function() { if(res.statusCode == 413) { // handle 413 error return; } requestBody = requestBody.toString('utf8', 0, requestBodyLength); // process requestBody as string }); });

我是对的，还是这已经由http请求类照顾？

这是自动照顾。调用setEncoding时会加载一个节点中的string_decoder模块。解码器将检查接收到的最后几个字节，如果它们不是全字符，则将它们存储在“数据”发出之间，因此数据总是会得到正确的string。如果你不做setEncoding，并且不要自己使用string_decoder，那么缓冲区可能会遇到你提到的问题。

虽然文档没有太大的帮助，但是你可以在这里看到这个模块https://github.com/joyent/node/blob/master /lib/string_decoder.js

“setEncoding”的实现和发射逻辑也使得它更清晰。

setEncoding： https ： //github.com/joyent/node/blob/master/lib/http.js#L270
_emitData https://github.com/joyent/node/blob/master/lib/http.js#L306

只需添加response.setEncoding（'utf8'）; request.on（'response'）callback函数。在我的情况下，这是足够的。

 // Post : 'tèéïst3 ùél' // Node return : 't%C3%A8%C3%A9%C3%AFst3+%C3%B9%C3%A9l' decodeURI('t%C3%A8%C3%A9%C3%AFst3+%C3%B9%C3%A9l'); // Return 'tèéïst3+ùél'

parsing请求正文中的UTF8字符的问题？

如何用URL中的别名replace对象ID

使用Node打开index.js会抛出错误

聊天服务器在embedded式平台上

React Server和React Client之间的区别

Gulp-sass在包含的文件中找不到variables？

grunt：将图像转换为base64并插入到html中

mongoose如何“删除”不存在的文件？

Autobahn.JS掉线

在瞬间重复使用多个规则无法正常工作

Node.js：遇到摩卡问题，并期望to.throwError

parsing请求正文中的UTF8字符的问题？

如何用URL中的别名replace对象ID

使用Node打开index.js会抛出错误

聊天*服务器*在embedded式平台上

React Server和React Client之间的区别

Gulp-sass在包含的文件中找不到variables？

grunt：将图像转换为base64并插入到html中

mongoose如何“删除”不存在的文件？

Autobahn.JS掉线

在瞬间重复使用多个规则无法正常工作

Node.js：遇到摩卡问题，并期望to.throwError

聊天服务器在embedded式平台上