import scrapy
class ManagerspiderSpider(scrapy.Spider):
name = 'managerspider'
#allowed_domains = ['www.xxx.com']
start_urls = ['https://www.howbuy.com/fund/manager/']
def parse(self, response):
tr_list = response.xpath('/html/body/div[2]/div[3]/div[3]/table//tr[3]')
这里获取到的tr_list为空,网上也查了资料,说是浏览器会自动增加tbody.但是我打印了scrapy爬下来的HTML文件:
['<table width="100%" border="0" cellspacing="0" cellpadding="0" class="chart-table ftArial ">\n <thead>\n <tr>\n <th width="8%" rowspan="2">名次</th>\n <th width="10%" rowspan="2">基金经理</th>\n <th width="9%" rowspan="2">姓名</th>\n <th width="8%" rowspan="2">\n <a href="javascript:void(0)" class="orderField" target="_self" data-value="rqzs">人气指数<span class="ico_default"></span></a>\n </th>\n <th width="9%" rowspan="2">\n <a href="javascript:void(0)" class="orderField" target="_self" data-value="cyrq">从业时间<span class="ico_default"></span></a>\n </th>\n <th width="10%" rowspan="2">当前所在公司</th>\n <th width="10%" rowspan="2">\n <a href="javascript:void(0)" class="orderField" target="_self" data-value="jdjf">综合评分<span class="ico_default"></span></a>\n </th>\n <th width="12%" rowspan="2">最擅长的基金类型</th>\n <th colspan="2">代表基金</th>\n </tr>\n <tr>\n <th width="14%">基金简称</th>\n <th width="10%">\n <a target="_self" class="orderField" href="javascript:void(0)" data-value="syl">近3月收益<span class="ico_default"></span></a>\n </th>\n </tr>\n </thead>\n <tbody></tbody>\n </table>']
这里的 tbody里确实为空。
而我在网页上看到的,确实是有数据的。
请高手解惑。
|
本帖子中包含更多资源
您需要 登录 才可以下载或查看,没有帐号?立即注册
x
|
|
|
|
共 5 个关于本帖的回复 最后回复于 2021-11-5 15:38