如何爬取如下这几篇文章Additional Information框下的文字内容
Class和Id都不是固定的
Xpath语言好像也不怎么好写
麻烦大家看一下,教教我,谢谢
示例网址:
https://www.nature.com/articles/ncomms14317

测试网址:
https://www.nature.com/articles/ncomms12634
https://www.nature.com/articles/ncomms12600
https://www.nature.com/articles/ncomms12591
https://www.nature.com/articles/ncomms12548
https://www.nature.com/articles/ncomms12532
https://www.nature.com/articles/ncomms12671
https://www.nature.com/articles/ncomms12693
https://www.nature.com/articles/ncomms12545
https://www.nature.com/articles/ncomms12678
https://www.nature.com/articles/ncomms12447
https://www.nature.com/articles/ncomms12477
https://www.nature.com/articles/ncomms12520
https://www.nature.com/articles/ncomms12382
https://www.nature.com/articles/ncomms12599
https://www.nature.com/articles/ncomms12606
https://www.nature.com/articles/ncomms12585
https://www.nature.com/articles/ncomms12587
https://www.nature.com/articles/ncomms12566
https://www.nature.com/articles/ncomms12592
https://www.nature.com/articles/ncomms12615
https://www.nature.com/articles/ncomms12584
https://www.nature.com/articles/ncomms12653
https://www.nature.com/articles/ncomms12469
https://www.nature.com/articles/ncomms12316
https://www.nature.com/articles/ncomms12367
https://www.nature.com/articles/ncomms12539
https://www.nature.com/articles/ncomms12625
https://www.nature.com/articles/ncomms12553
https://www.nature.com/articles/ncomms12552
https://www.nature.com/articles/ncomms12492
https://www.nature.com/articles/ncomms12533
https://www.nature.com/articles/ncomms12588
https://www.nature.com/articles/ncomms12490
https://www.nature.com/articles/ncomms12398
https://www.nature.com/articles/ncomms12371
https://www.nature.com/articles/ncomms12385
https://www.nature.com/articles/ncomms12470
https://www.nature.com/articles/ncomms12512
https://www.nature.com/articles/ncomms12582
https://www.nature.com/articles/ncomms12446
https://www.nature.com/articles/ncomms12518
https://www.nature.com/articles/ncomms12478
https://www.nature.com/articles/ncomms12519
https://www.nature.com/articles/ncomms12312
https://www.nature.com/articles/ncomms12386
https://www.nature.com/articles/ncomms12499
https://www.nature.com/articles/ncomms12449
https://www.nature.com/articles/ncomms12442
https://www.nature.com/articles/ncomms12483

本帖子中包含更多资源

您需要 登录 才可以下载或查看,没有帐号?立即注册

x
举报 使用道具
| 回复

共 1 个关于本帖的回复 最后回复于 2021-8-8 10:52

沙发
gz51837844 管理员 发表于 2021-8-8 10:52:39 | 只看该作者
这种情况需要自己写xpath, xpath中可以同时用上@class或@id, 及包含的文本:
//*[@class='c-article-section' and contains(./h2,'Additional information')]


举报 使用道具
您需要登录后才可以回帖 登录 | 立即注册

精彩推荐

  • 话题分析(NMF模型和LDA模型)软件的安装和
  • 运行Apple无法验证的程序的方法
  • 文本聚类分析软件的安装和使用方法
  • 利用AI阅读和分析文本:扣子COZE记录用户反
  • 在网页片段内直观标注——以B站评论采集为

热门用户

GMT+8, 2025-3-15 13:52