Tag: scraping phantomjs

如何设置Phantomjs页面抓取的时间间隔: 目前我写了一个Phantomjs的脚本，通过多个页面。我的脚本工作，但我不知道如何设置时间间隔擦伤。我尝试使用setInterval并传递大约每5秒从arrayList的项目，但它似乎并没有工作。我的脚本不断打破。这是我的示例phantomjs脚本代码：没有setInterval var arrayList = ['string1', 'string2', 'string3'….] arrayList.forEach(function(eachItem) { var webAddress = "http://www.example.com/eachItem" phantom.create(function(ph) { return ph.createPage(function(page) { return page.open(yelpAddress, function(status) { console.log("opened site? ", status); page.injectJs('http://ajax.googleapis.com/ajax/libs/jquery/1.7.2/jquery.min.js', function() { setTimeout(function() { return page.evaluate(function() { //code here for gathering data }, function(result) { return result ph.exit(); }); }, 5000); }); }); […]

PhantomJS错误：UnhandledPromiseRejectionWarning: 我的目标是使用Node.js从网站上刮取一些数据。我已经设法只使用request包来抓取数据，但是我想抓取的站点具有dynamic内容，并且request只能抓取这个dynamic数据。所以我做了一些研究，发现为了达到这个目的，基于这个SO问题，我需要通过npm安装一些软件包（我不知道是否需要这三个软件包）：请求 Cheerio 幻影基于这个问题，我使用相同的代码，只是为了了解它是如何工作的： myFile.js var phantom = require('phantom'); phantom.create(function (ph) { ph.createPage(function (page) { var url = "http://www.bdtong.co.kr/index.php?c_category=C02"; page.open(url, function() { page.includeJs("http://ajax.googleapis.com/ajax/libs/jquery/1.6.1/jquery.min.js", function() { page.evaluate(function() { $('.listMain > li').each(function () { console.log($(this).find('a').attr('href')); }); }, function(){ ph.exit() }); }); }); }); }); 但是当我尝试运行在terminal$ node myFile.js ，它不工作，并不断给我的错误：（节点：6576）UnhandledPromiseRejectionWarning：未处理的承诺拒绝（拒绝ID：1）：错误：意外types的参数。期望参数是数组。（节点：6576）弃用警告：不处理的承诺拒绝已被弃用。将来，未处理的承诺拒绝将使用非零退出代码来终止Node.js进程。 […]

如何关注PhantomJS中的document.location.reload？: 我在PhantomJS中加载了一个页面（在NodeJS中使用它），在页面上有一个JS函数doRedirect() ，它包含 … document.cookie = "key=" + assignedKey document.location.reload(true) 我像这样从PhantomJS运行doRedirect() page.evaluate(function() { return doRedirect() }).then(function(result) { // result is null here }) 我希望PhantomJS遵循document.location.reload(true)并返回新页面的内容。如何才能做到这一点？