使用Javascript和grunt将网页源文件下载到文件

我的任务有问题和疑问。我在GruntJs上写了一些应用程序。我必须通过gruntJs下载网页源代码。

例如，我有一个页面： example.com/index.html 。

我想在Grunt任务中提供URL，例如： scr: "example.com/index.html" 。

然后，我必须有这个源文件， ex: source.txt 。

我怎样才能做到这一点？

这有几个办法。

首先是注释中提到的来自node.js API的原始http.get 。这将使您获得最初的加载页面所提供的原始资源。问题出现时，该网站广泛使用的JavaScript来build立更多的HTML之后的Ajax请求。

第二种方法是使用实际的浏览器引擎来加载网站，并执行任何JavaScript和进一步的HTML构build页面加载运行。最常见的引擎是PhantomJS ，它被包装在一个名为grunt-lib-phantomjs的Grunt库中。

幸运的是，有人已经提供了另外一层，几乎完全是你要求的： https ： //github.com/cburgdorf/grunt-html-snapshot

上面的链接的示例configuration：

 grunt.initConfig({ htmlSnapshot: { all: { options: { //that's the path where the snapshots should be placed //it's empty by default which means they will go into the directory //where your Gruntfile.js is placed snapshotPath: 'snapshots/', //This should be either the base path to your index.html file //or your base URL. Currently the task does not use it's own //webserver. So if your site needs a webserver to be fully //functional configure it here. sitePath: 'http://localhost:8888/my-website/', //you can choose a prefix for your snapshots //by default it's 'snapshot_' fileNamePrefix: 'sp_', //by default the task waits 500ms before fetching the html. //this is to give the page enough time to to assemble itself. //if your page needs more time, tweak here. msWaitForPages: 1000, //if you would rather not keep the script tags in the html snapshots //set `removeScripts` to true. It's false by default removeScripts: true, //he goes the list of all urls that should be fetched urls: [ '', '#!/en-gb/showcase' ] } } } });

使用Javascript和grunt将网页源文件下载到文件

jenkins与Grunt整合

如何捆绑客户端组件（js / less / css / img）

stream浪汉提供，不能开始咕噜

gruntjs vs yeoman vs后卫

grunt-contrib-sass在Windows中不起作用

如何解决Gruntfile.js中错误configuration的grunt-connect-proxy设置的404错误？

没有代理可以find以下function：咕噜

grunt watch在Ubuntu 12.04上使用大量的CPU

从bash脚本调用grunt.js时遇到问题

如何解决在npm安装peerinvalid错误？