集搜客GooSeeker网络爬虫

标题: 集搜客抓取数据不全,页面有50条数据,有的只能抓3条,有的抓20几条 [打印本页]

作者: qianlixue123    时间: 2021-2-25 17:09
标题: 集搜客抓取数据不全,页面有50条数据,有的只能抓3条,有的抓20几条
获取人民日报网站的数据:http://data.people.com.cn/rmrb/s?qs=%7B%22cds%22%3A%5B%7B%22cdr%22%3A%22AND%22%2C%22cds%22%3A%5B%7B%22fld%22%3A%22title%22%2C%22cdr%22%3A%22OR%22%2C%22hlt%22%3A%22true%22%2C%22vlr%22%3A%22OR%22%2C%22val%22%3A%22%E8%82%BA%E7%82%8E%2B%E7%96%AB%E6%83%85%22%7D%2C%7B%22fld%22%3A%22subTitle%22%2C%22cdr%22%3A%22OR%22%2C%22hlt%22%3A%22true%22%2C%22vlr%22%3A%22OR%22%2C%22val%22%3A%22%E8%82%BA%E7%82%8E%2B%E7%96%AB%E6%83%85%22%7D%2C%7B%22fld%22%3A%22introTitle%22%2C%22cdr%22%3A%22OR%22%2C%22hlt%22%3A%22true%22%2C%22vlr%22%3A%22OR%22%2C%22val%22%3A%22%E8%82%BA%E7%82%8E%2B%E7%96%AB%E6%83%85%22%7D%2C%7B%22fld%22%3A%22contentText%22%2C%22cdr%22%3A%22OR%22%2C%22hlt%22%3A%22true%22%2C%22vlr%22%3A%22OR%22%2C%22val%22%3A%22%E8%82%BA%E7%82%8E%2B%E7%96%AB%E6%83%85%22%7D%5D%7D%5D%2C%22obs%22%3A%5B%7B%22fld%22%3A%22dataTime%22%2C%22drt%22%3A%22DESC%22%7D%5D%7D&tr=A&ss=1&pageNo=1&pageSize=50   

首先建整理箱  取标题、时间、摘要和  关键词  测试结果OK
然后将标题作为样例复制映射第一位;取第二条新闻的标题 设置为样例复制映射第二位   测试结果只能取20几条...并不能取全部50条数据  定位改为绝对定位,也是如此,多次尝试都是这样,改成每页20条或10条数据采集,也是无法取到该页面全部数据

作者: maomao    时间: 2021-2-25 17:22
任务的名字是什么?
作者: qianlixue123    时间: 2021-2-25 17:26
maomao 发表于 2021-2-25 17:22
任务的名字是什么?

规则名是这个:新冠肺炎疫情0225

作者: maomao    时间: 2021-2-25 17:41
本帖最后由 maomao 于 2021-2-25 17:43 编辑

[attach]13805[/attach]

参考教程定位映射采集列表数据

作者: qianlixue123    时间: 2021-2-25 17:48
maomao 发表于 2021-2-25 17:41
参考教程定位映射采集列表数据

好 非常感谢!我试试





欢迎光临 集搜客GooSeeker网络爬虫 (https://www.gooseeker.com/doc/) Powered by Discuz! X3.2