如何爬取如下这几篇文章Additional Information框下的文字内容
Class和Id都不是固定的
Xpath语言好像也不怎么好写
麻烦大家看一下,教教我,谢谢
示例网址:
https://www.nature.com/articles/ncomms14317

测试网址:
https://www.nature.com/articles/ncomms12634
https://www.nature.com/articles/ncomms12600
https://www.nature.com/articles/ncomms12591
https://www.nature.com/articles/ncomms12548
https://www.nature.com/articles/ncomms12532
https://www.nature.com/articles/ncomms12671
https://www.nature.com/articles/ncomms12693
https://www.nature.com/articles/ncomms12545
https://www.nature.com/articles/ncomms12678
https://www.nature.com/articles/ncomms12447
https://www.nature.com/articles/ncomms12477
https://www.nature.com/articles/ncomms12520
https://www.nature.com/articles/ncomms12382
https://www.nature.com/articles/ncomms12599
https://www.nature.com/articles/ncomms12606
https://www.nature.com/articles/ncomms12585
https://www.nature.com/articles/ncomms12587
https://www.nature.com/articles/ncomms12566
https://www.nature.com/articles/ncomms12592
https://www.nature.com/articles/ncomms12615
https://www.nature.com/articles/ncomms12584
https://www.nature.com/articles/ncomms12653
https://www.nature.com/articles/ncomms12469
https://www.nature.com/articles/ncomms12316
https://www.nature.com/articles/ncomms12367
https://www.nature.com/articles/ncomms12539
https://www.nature.com/articles/ncomms12625
https://www.nature.com/articles/ncomms12553
https://www.nature.com/articles/ncomms12552
https://www.nature.com/articles/ncomms12492
https://www.nature.com/articles/ncomms12533
https://www.nature.com/articles/ncomms12588
https://www.nature.com/articles/ncomms12490
https://www.nature.com/articles/ncomms12398
https://www.nature.com/articles/ncomms12371
https://www.nature.com/articles/ncomms12385
https://www.nature.com/articles/ncomms12470
https://www.nature.com/articles/ncomms12512
https://www.nature.com/articles/ncomms12582
https://www.nature.com/articles/ncomms12446
https://www.nature.com/articles/ncomms12518
https://www.nature.com/articles/ncomms12478
https://www.nature.com/articles/ncomms12519
https://www.nature.com/articles/ncomms12312
https://www.nature.com/articles/ncomms12386
https://www.nature.com/articles/ncomms12499
https://www.nature.com/articles/ncomms12449
https://www.nature.com/articles/ncomms12442
https://www.nature.com/articles/ncomms12483

885db8feb4da366e9ac3e1a57a96721.png
20f5db5c5cc24aad1dfce98150dcd42.png
举报 使用道具
| 回复

共 1 个关于本帖的回复 最后回复于 2021-8-8 10:52

gz51837844 管理员 发表于 2021-8-8 10:52:39 | 显示全部楼层
这种情况需要自己写xpath, xpath中可以同时用上@class或@id, 及包含的文本:
//*[@class='c-article-section' and contains(./h2,'Additional information')]


举报 使用道具
您需要登录后才可以回帖 登录 | 立即注册

精彩推荐

  • Gephi社会网络分析-马蜂窝游记文本分词并同
  • Gephi社会网络分析-基于马蜂窝游记文本以词
  • 知乎话题文本根据词语间距筛选后生成共词矩
  • 马蜂窝游记文本分词后以词语间距为筛选条件
  • 学习使用apriori算法挖掘关联关系

热门用户

GMT+8, 2024-3-29 15:23