nodejs内存泄漏（async.queue和请求）

我有一个非常简单的爬虫，它通过250页，分配大约400MB的内存，永远不会释放它。我不知道如何解决它，也许有人注意到一些东西，请让我知道。

function scrape(shop, o, cb, step) { var itemz = [] var q = async.queue(function (o, cb) { req({ url: o.url }, function (e, r) { if (e) throw (e) cb() o.cb(r.body) }) }, o.threads) var get = function (url, cb) { q.push({ url: url, cb: cb }) } var url = 'https://www.host.com' var total, done = 0, itemsPerPage = 24 get(url, function (r) { pages = (r.match(/data-page="(\d+)"/g)); pages = pages[pages.length - 2].split("data-page=\"")[1].split('"')[0] || 1; pages = Math.min(pages, 10) // limit to 10 pages max (240 items) for (var i = 1; i <= pages; i++) { get(url + '&page=' + i, scrapeList) } total = pages + pages * itemsPerPage }) // - extract the transaction links from the pages: // and add them to queue function scrapeList(r) { var itemsFound = 0 r.replace(/href="(https:\/\/www.host.com\/listing\/(\d+).*)"/g, function (s, itemUrl, dateSold) { itemsFound++ get(itemUrl, function (r) { scrapeItem(r, itemUrl, dateSold) step(++done, total) if (done == total) onend() }) }) total -= itemsPerPage - itemsFound // decrease expected items, if less items per page found than initially expected step(++done, total) } // - from item page extract the details, and add to items array function scrapeItem(r, itemUrl, dateSold) { var d = {} d.url = itemUrl; d.date = new Date(Date.now()) d.quantity = 1; itemz.push(d) } // - when no more requests in a queue (on drain), group items by title function onend() { cb(null, itemz); } }

我有一个类似的问题，我刮了一个主机，并使用cheerioparsing的HTML，但cheerio使用lodash内部有内存泄漏，它从来没有释放，所以我find了一个周期触发GC （垃圾回收器）定期释放内存，只需调用global.gc(); 在指定的时间间隔后，运行带有标志--expose-gc脚本

例如： node <script>.js --expose-gc.

这不是一个理想的解决scheme，但它像你的独立脚本快速修复看到这里也不要保持间隔太短，因为我发现垃圾收集是CPU密集型，也延迟事件循环，所以每5到10秒应该做的招。

此外，我还发现了一个关于v8垃圾回收的有趣阅读

nodejs内存泄漏（async.queue和请求）

为什么node.js不释放内存？

nodejs的mysql内存泄漏大量的高频查询

如何在Node.js / V8中debugging/分析非常长的GC暂停

节点GeoFire内存泄漏？

进程节点不断增长，直到吃完所有内存

Node.js请求内存泄漏

内存泄漏Meteor.http

Node.js内存泄漏？

NodeJS + GCloud日志logging中的内存泄漏

内存泄漏在NodeJS