在JS中parsing远程DOM

我想获得一个远程网站的DOM，并能够parsing它，理想情况下，将parsing的结果转换成一个DOM节点，并从中有效地获取所需的元素，然后处理它们。也就是说，我想从检索到的DOM中切出某些元素，并将它们存储在数组中以便进一步操作。它实际上是可以实现的吗？到目前为止，我已经与这个：

import request from 'request'; export default function getBody(url, callback) { request(url, (err, res, body) => { callback(body); }); }

并在path文件夹中：

 import express from 'express'; import getBody from '../server'; const router = express.Router(); const url = 'http://www.google.com'; let result = {}; getBody(url, response => { result = response; }); router.get('/', (req, res, next) => { res.render('index', { title: 'Express', data: result }); }); export default router;

这个代码把远程页面的DOM放到了我的视图中，但是结果又回到了一个巨大的string，这将是一个噩梦来处理它。我试图使用浏览器请求库来处理它从前端，但我不能得到的头文件的工作，它总是会返回一个错误No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin 'http://localhost:3000' is therefore not allowed access. No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin 'http://localhost:3000' is therefore not allowed access.

为了获得远程DOM并以上述方式parsing，最好采取什么行动？

如果您熟悉jQuery，则可以使用cheerio来浏览DOM。

 import request from 'request'; import cheerio from 'cheerio'; export default function getBody(url, callback) { request(url, (err, res, body) => { $ = cheerio.load(body); $('h2') // finds all of the `h2` tags within the `body` object. }); }

在JS中parsing远程DOM

libxmljs的替代软件

响应string中的未知字符

videostream通过Websocket到<video>标签

如何用node.js中的XPath修改xmldom中的DOM？

Node.js：JSDOM删除内联事件

使用jQuery和Node填充DOM数据的最佳方法

从js中用node.js和horseman刮取html

select带有cheerio属性的元素

实现W3C XML DOM的Node.js库？

使用PhantomJs，Nodejs和MySQL