Node.js或expression自己阻止事件循环?

我有非常简单的Node.js 8.5.0 Express 4.15.5服务器与集群模块,服务静态文件。 问题在于事件循环似乎在很长一段时间内被阻塞。 我正在使用阻塞的模块,并跟踪自上次检查以来处理了多less个请求,使用超时间隔70毫秒。 很多时候,计数器只是零:有时没有请求,事件循环有时被阻塞。

日志:

Execution blocked for 1056 ms [2017-09-27 16:18:06.322], 1 requests, total requestcount 115, pid 31071 Execution blocked for 358 ms [2017-09-27 16:18:12.570], 0 requests, total requestcount 123, pid 31071 Execution blocked for 1578 ms [2017-09-27 16:18:15.551], 10 requests, total requestcount 147, pid 31071 Execution blocked for 872 ms [2017-09-27 16:18:35.926], 0 requests, total requestcount 557, pid 31077 Execution blocked for 117 ms [2017-09-27 16:20:11.668], 0 requests, total requestcount 761, pid 31077 Execution blocked for 381 ms [2017-09-27 16:23:00.268], 0 requests, total requestcount 2231, pid 31077 Execution blocked for 1206 ms [2017-09-27 16:23:06.096], 2 requests, total requestcount 3147, pid 31070 Execution blocked for 505 ms [2017-09-27 16:23:10.319], 2 requests, total requestcount 2256, pid 31077 Execution blocked for 475 ms [2017-09-27 16:23:10.335], 1 requests, total requestcount 840, pid 31071 Execution blocked for 2113 ms [2017-09-27 16:23:16.918], 1 requests, total requestcount 2283, pid 31077 Execution blocked for 303 ms [2017-09-27 16:23:20.071], 0 requests, total requestcount 3261, pid 31070 Execution blocked for 423 ms [2017-09-27 16:23:23.417], 1 requests, total requestcount 3267, pid 31070 Execution blocked for 6395 ms [2017-09-27 16:23:31.633], 7 requests, total requestcount 3285, pid 31070 Execution blocked for 210 ms [2017-09-27 16:32:04.764], 10 requests, total requestcount 3306, pid 31071 Execution blocked for 690 ms [2017-09-27 16:32:05.945], 1 requests, total requestcount 3313, pid 31071 Execution blocked for 704 ms [2017-09-27 16:32:05.948], 5 requests, total requestcount 5214, pid 31077 Execution blocked for 857 ms [2017-09-27 16:32:07.082], 0 requests, total requestcount 3315, pid 31071 Execution blocked for 1475 ms [2017-09-27 16:32:12.691], 0 requests, total requestcount 3333, pid 31071 Execution blocked for 1487 ms [2017-09-27 16:32:12.692], 1 requests, total requestcount 5247, pid 31077 Execution blocked for 125 ms [2017-09-27 16:32:16.306], 0 requests, total requestcount 7921, pid 31070 Execution blocked for 189 ms [2017-09-27 16:33:16.369], 0 requests, total requestcount 8087, pid 31070 Execution blocked for 182 ms [2017-09-27 16:33:16.621], 0 requests, total requestcount 8087, pid 31070 

例如:

 epoll_wait(6, [], 1024, 70) = 0 epoll_wait(6, [], 1024, 70) = 0 epoll_wait(6, [], 1024, 70) = 0 write(2, "Execution blocked for 724 ms [20"..., 103) = 103 epoll_wait(6, [{EPOLLIN, {u32=24, u64=24}}], 1024, 70) = 1 read(24, "", 1024) = 0 epoll_ctl(6, EPOLL_CTL_DEL, 24, 0x7fff8ef58de0) = 0 close(24) = 0 epoll_wait(6, [], 1024, 0) = 0 epoll_wait(6, [], 1024, 69) = 0 epoll_wait(6, [], 1024, 70) = 0 

还有大量的内存和CPU可用(3核心):

 top - 16:36:50 up 6 days, 5:51, 4 users, load average: 0.17, 0.37, 0.45 Tasks: 137 total, 1 running, 136 sleeping, 0 stopped, 0 zombie %Cpu(s): 8.3 us, 0.6 sy, 0.0 ni, 91.0 id, 0.1 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 4562340 total, 170144 free, 2234000 used, 2158196 buff/cache KiB Swap: 1048572 total, 993992 free, 54580 used. 2075596 avail Mem 

我也build立了气相色谱监测,但是很less达到100ms:

 const obs = new PerformanceObserver((list) => { let gc = list.getEntries()[0]; if (gc.duration > 100) { console.warn('gc', gc); } performance.clearGC(); }); obs.observe({ entryTypes: ['gc'] }); 

它expression或它使用的一些模块导致阻塞甚至似乎什么都没有发生? 如何debugging? 如果没有,是Node.js本身? 如果不是,什么? 由于不阻塞事件循环是Node.js的基础,我假设有工具来debugging,但找不到任何。

编辑:testing与spdy和本地https模块,没有区别。

编辑:源代码:

 "use strict"; const bodyParser = require('body-parser'), cluster = require('cluster'), cors = require('cors'), compress = require('compression'), cookieParser = require('cookie-parser'), express = require('express'), favicon = require('serve-favicon'), fs = require('fs'), http = require('http'), // https = require('spdy'), https = require('https'), path = require('path'), strftime = require('strftime'); const { performance, PerformanceObserver } = require('perf_hooks'); global.V = {}; const workers = process.argv[3] || 3; function blocked(interval, cb, cb_ok) { var start = process.hrtime(); setInterval(function(){ let delta = process.hrtime(start); let nanosec = delta[0] * 1e9 + delta[1]; let ms = nanosec / 1e6; let n = ms - interval; if (n > interval) { cb(Math.round(n)); } else if (cb_ok) { cb_ok(Math.round(n)); } start = process.hrtime(); V.httpRequests2 = 0; }, interval).unref(); } if (cluster.isMaster) { console.log(`Master ${process.pid} is running`); // Fork workers. for (let i = 0; i < workers; i++) { cluster.fork(); } cluster.on('exit', (worker, code, signal) => { console.log(`worker ${worker.process.pid} died`); setTimeout(function() { console.log('Fork one replacement worker...'); cluster.fork(); }, 120000); }); } else { V.expressOptions = { key: fs.readFileSync('./ssl/server.key'), cert: fs.readFileSync('./ssl/ssl-blunde.crt'), requestCert: false, rejectUnauthorized: false }; V.expressApp = express(); V.server_ssl = https.createServer(V.expressOptions, V.expressApp); V.server_ssl.listen(8080); V.expressApp.use(cors({origin: 'https://example.com'})); V.expressApp.disable('x-powered-by'); V.expressApp.use(compress()); V.expressApp.use(cookieParser()); V.expressApp.use(favicon(__dirname + '/static/html/favicon.ico')); V.expressApp.use(bodyParser.json()); V.expressApp.use(bodyParser.urlencoded({ extended: true })); V.httpRequests = 0; V.httpRequests2 = 0; V.expressApp.use('*', function(req, res, next) { V.httpRequests2++; V.httpRequests++; next(); }); V.expressApp.use('/', express.static(path.join(__dirname, 'static/html'))); V.expressApp.use(express.static(path.join(__dirname, 'static'), { maxAge: 1000 * 60 * 60 })); V.expressApp.use(function (err, req, res, next) { console.error(err.stack); next(err); }); V.expressApp.use(function (err, req, res, next) { if (req.xhr) { console.log('Express error', err); res.status(500).send({ error: 'Something blew up!' }); } else { next(err); } }); V.expressApp.use(function (err, req, res, next) { console.log('Express error 500', err); res.status(500); }); console.log(`Worker ${process.pid} started`); blocked(70, function(ms) { if (ms > 2500) { console.error('Execution blocked for ' + ms + ' ms [' + strftime('%F %T.%L') + '], %s requests, total requestcount %s, pid %s', V.httpRequests2, V.httpRequests, process.pid); } else if (ms > 500) { console.warn('Execution blocked for ' + ms + ' ms [' + strftime('%F %T.%L') + '], %s requests, total requestcount %s, pid %s', V.httpRequests2, V.httpRequests, process.pid); if (V.httpRequests > 200000) { console.log('Enough requests, exit, requestcount %s, pid %s', V.httpRequests, process.pid); process.exit(); } } else { console.log('Execution blocked for ' + ms + ' ms [' + strftime('%F %T.%L') + '], %s requests, total requestcount %s, pid %s', V.httpRequests2, V.httpRequests, process.pid); } }); const obs = new PerformanceObserver((list) => { let gc = list.getEntries()[0]; if (gc.duration > 500) { console.warn('GC', gc); } else if (gc.duration > 100) { console.log('GC', gc.duration); } performance.clearGC(); }); obs.observe({ entryTypes: ['gc'] }); } 

编辑:它似乎是相关的如何Node.js与它的线程进行通信:以下futex-EAGAIN行为发生每当事件循环被阻止。 很显然,Node.js正在等待事实上阻塞事件循环的事情 。 问题不是任何I / O,因为在任何线程都没有阻塞。

 782050 16:14:56.945451111 5 node (17387) < futex res=0 782051 16:14:56.945493832 3 node (17385) > futex addr=7F8F03C8FB20 op=128(FUTEX_PRIVATE_FLAG) val=2 782052 16:14:56.945494164 5 node (17387) > futex addr=7F8F03C8FB20 op=129(FUTEX_PRIVATE_FLAG|FUTEX_WAKE) val=1 782053 16:14:56.945494233 3 node (17385) < futex res=-11(EAGAIN) 782054 16:14:56.945494712 3 node (17385) > futex addr=7F8F03C8FB20 op=129(FUTEX_PRIVATE_FLAG|FUTEX_WAKE) val=1 782055 16:14:56.945494814 5 node (17387) < futex res=0 782056 16:14:56.945494872 3 node (17385) < futex res=0 782057 16:14:56.945495204 3 node (17385) > futex addr=7F8F03C8FB20 op=128(FUTEX_PRIVATE_FLAG) val=2 782058 16:14:56.945495491 5 node (17387) > futex addr=7F8F03C8FB20 op=129(FUTEX_PRIVATE_FLAG|FUTEX_WAKE) val=1 782059 16:14:56.945495541 3 node (17385) < futex res=-11(EAGAIN) 782060 16:14:56.945495941 5 node (17387) < futex res=0 782061 16:14:56.945495992 3 node (17385) > futex addr=7F8F03C8FB20 op=129(FUTEX_PRIVATE_FLAG|FUTEX_WAKE) val=1 782062 16:14:56.945496239 3 node (17385) < futex res=0 782063 16:14:56.945496460 3 node (17385) > futex addr=7F8F03C8FB20 op=128(FUTEX_PRIVATE_FLAG) val=2 782064 16:14:56.945496661 5 node (17387) > futex addr=7F8F03C8FB20 op=129(FUTEX_PRIVATE_FLAG|FUTEX_WAKE) val=1 782065 16:14:56.945496780 3 node (17385) < futex res=-11(EAGAIN) 782066 16:14:56.945497107 5 node (17387) < futex res=0 782067 16:14:56.945497232 3 node (17385) > futex addr=7F8F03C8FB20 op=129(FUTEX_PRIVATE_FLAG|FUTEX_WAKE) val=1 782068 16:14:56.945497381 3 node (17385) < futex res=0 782069 16:14:56.945497596 3 node (17385) > futex addr=7F8F03C8FB20 op=128(FUTEX_PRIVATE_FLAG) val=2 782070 16:14:56.945497764 5 node (17387) > futex addr=7F8F03C8FB20 op=129(FUTEX_PRIVATE_FLAG|FUTEX_WAKE) val=1 782071 16:14:56.945497913 3 node (17385) < futex res=-11(EAGAIN) 782072 16:14:56.945498204 5 node (17387) < futex res=0 

当不需要做任何事情时,node.js或者express不应该阻塞事件循环。 有时可能会有一些时间用于垃圾收集,但我不会指望只要你观察到6395ms

提供静态文件的内置方法只使用asynchronousI / O,因此不应阻塞事件循环。

如果您需要进一步帮助诊断您的应用程序中可能会导致此问题的其他内容,则可能需要向我们显示您的代码。

回答你的直接问题:

Node.js或expression自己阻止事件循环?

不,除了非常短的时间垃圾收集。 如果你的服务器正在使用大量的Javascript对象,并且非常非常繁忙,那么有时候垃圾收集有时候可能会落后,需要一些时间来追赶,但是这只会在一个非常非常繁忙的服务器上代码是使用大量的对象(从而创造了大量的GC工作)。

仅供参考,如果您所做的只是在高负载环境下提供静态文件,那么使用静态文件比使用Express更有效。 一种常用的方法是将Nginx放在快速服务器前,并使用Nginx直接从文件系统提供静态文件。 还有更大规模情况下的CDN。

如需进一步的帮助,请向我们显示您的实际Express代码,以便我们可以看到您的服务器在做什么。

回答我自己的问题:可能会出现所有工作线程繁忙的情况 。 但是,这不是我的情况。

与以前的版本相比,Node.js 8和工作线程有什么不同。 降级到节点7.10.1完全解决了这个问题。 由于问题也存在于简单的Express服务器中,所以我认为这是Node 8中的一个bug。