试图在网站上获取歌曲列表不起作用

我试图在节点和webBrowser控制中使用phantomjs,cheerio,并获得我的歌曲列表,我可以成功的HTML,但没有歌曲列表,我不明白为什么我不能得到它…

我能做的唯一方法是复制开发工具的HTML和JQuery分析。

这是我在WinForm中的代码:

private void Form1_Load(object sender, EventArgs e) { webBrowser1.Navigate("http://grooveshark.com/#!/shinningstar1001/collection"); webBrowser1.DocumentCompleted += webBrowser1_DocumentCompleted; } void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e) { File.WriteAllText("D://test.txt", webBrowser1.DocumentText); } 

在Cheerio:

 var cheerio = require('cheerio'); var request = require('request'); var url = 'http://grooveshark.com/#!/shinningstar1001/collection'; request({ url: url, headers: {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'} }, function (err, resp, body) { $ = cheerio.load(body); console.log(body); }) 

我想这是因为我不能在ajax加载后得到完整的文档?

但为什么webBrowser控制不能工作呢? 我可以看到完整的内容被加载在控件中。 任何build议将真正感激。

我试过@Murray Foxcroft解决scheme仍然无法获得我想要的确切html: 在这里输入图像说明

附加问题

通过@Murray Foxcroft解决scheme,我可以获得8%的列表内容,但是为什么我无法获取到该页面的完整歌曲列表? 例如,我可以得到列表中大约第40位的歌曲“Set me free”,但歌曲列表中大约第70位的歌曲“This Love”无法播放。 (确实有两首歌在网站上)

  if (webBrowser1.ReadyState != WebBrowserReadyState.Complete) return; if (richTextBox1.Text.Length > 0) return; var songList = webBrowser1.Document.GetElementById("profile-grid"); //try to get "This Love" that never step into the code: if (songList != null && songList.InnerHtml.Contains("This Love")){...} //"Set Me Free" is OK: if (songList != null && songList.InnerHtml.Contains("Set Me Free")) { richTextBox1.Text = songList.OuterHtml; } 

对于WebBrowser示例,事件实际上是否触发?

尝试在导航之前关联事件:

即换行到以下内容:

webBrowser1.DocumentCompleted + = webBrowser1_DocumentCompleted;

webBrowser1.Navigate(“ http://grooveshark.com/#!/shinningstar1001/collection ”);

另外,DocumentCompleted可能会触发每个子文档(如CSS样式表),因此请确保您正在捕获您之后的URL的事件。

 void BrowserDocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e) { if (e.Url.AbsolutePath != (sender as WebBrowser).Url.AbsolutePath) return; //The page is finished loading } 

更多细节在这里: 检测WebBrowser完整的页面加载

最终的解决scheme – 内容是从另一个来源pipe道到主页面,所以寻找目标的div是关于最好的解决scheme:

  private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e) { // If the ReadyState is Complete then the page or an iFrame within have completed downloading. if (webBrowser1.ReadyState != WebBrowserReadyState.Complete) return; // Ensures only the first match of page-content is resturned to the RichTextBox. // If this does not contain what you are looking for then you may need to find an // additional way to refine for the content you are after. if (richTextBox1.Text.Length > 0) return; // Check to see if we have got the page-content div in our result source // and set the richtextbox if we have it. var songList = webBrowser1.Document.GetElementById("page-content"); if (songList != null) { richTextBox1.Text = songList.OuterHtml; } } 
Interesting Posts