asynchronous并行HTTP请求

我有一个应用程序加载大量的URL的控制stream问题。 我正在使用Caolan Async和NPM请求模块。

我的问题是,一旦函数被添加到队列,HTTP响应就会启动。 理想情况下,我想构build我的队列,并且只在队列启动时才开始发出HTTP请求。 否则,在队列开始之前,callback开始启动 – 导致队列过早完成。

var request = require('request') // https://www.npmjs.com/package/request , async = require('async'); // https://www.npmjs.com/package/async var myLoaderQueue = []; // passed to async.parallel var myUrls = ['http://...', 'http://...', 'http://...'] // 1000+ urls here for(var i = 0; i < myUrls.length; i++){ myLoaderQueue.push(function(callback){ // Async http request request(myUrls[i], function(error, response, html) { // Some processing is happening here before the callback is invoked callback(error, html); }); }); } // The loader queue has been made, now start to process the queue async.parallel(queue, function(err, results){ // Done }); 

有没有更好的方法来攻击?

使用for循环与asynchronous调用相结合是有问题的(使用ES5),并可能产生意想不到的结果(在你的情况下,错误的URL被检索)。

相反,考虑使用async.map()

 async.map(myUrls, function(url, callback) { request(url, function(error, response, html) { // Some processing is happening here before the callback is invoked callback(error, html); }); }, function(err, results) { ... }); 

鉴于你有1000多个url来检索, async.mapLimit()也可能值得考虑。

如果你愿意开始使用BluebirdBabel利用promisesES7 async / await你可以做以下事情:

 let Promise = require('bluebird'); let request = Promise.promisify(require('request')); let myUrls = ['http://...', 'http://...', 'http://...'] // 1000+ urls here async function load() { try { // map myUrls array into array of request promises // wait until all request promises in the array resolve let results = await Promise.all(myUrls.map(request)); // don't know if Babel await supports syntax below // let results = await* myUrls.map(request)); // print array of results or use forEach // to process / collect them in any other way console.log(results) } catch (e) { console.log(e); } } 

我很自信你遇到了一个不同的错误的结果。 当你的排队function正在评估,我已被重新定义,这可能会导致它看起来像你错过了第一个url。 当你排队的function,尝试一些closures。

 var request = require('request') // https://www.npmjs.com/package/request , async = require('async'); // https://www.npmjs.com/package/async var myLoaderQueue = []; // passed to async.parallel var myUrls = ['http://...', 'http://...', 'http://...'] // 1000+ urls here for(var i = 0; i < myUrls.length; i++){ (function(URLIndex){ myLoaderQueue.push(function(callback){ // Async http request request(myUrls[URLIndex], function(error, response, html) { // Some processing is happening here before the callback is invoked callback(error, html); }); }); })(i); } // The loader queue has been made, now start to process the queue async.parallel(queue, function(err, results){ // Done });