请教问题如下: 1.按照网站的指南operation example操作,按照相对线索抓取网页,只能抓取首页,需要怎么设置 2.如新建schema 北京rain3,在本地目录DataScraperWorks中没有相应的目录和文件,如何设置
谢谢! aflooding@gmail.com
翻页线索的主题名不一致的话就不能翻下一页了
已经按指导方法修改,加入FREEFORMAT做映射,Schema名字为:北京rain_4,提取数据有错误,延时设为60000,错误信息如下:
2012-08-08 20:48:01 ConfManager parseStopMark INFO: No stopMark 2012-08-08 20:48:05 ConfManager parseStopMark INFO: No stopMark 2012-08-08 20:48:12 ConfManager parseStopMark INFO: No stopMark 2012-08-08 20:48:22 WalkContext:SetCurrentStep 北京rain_4 DEBUG: Null input 2012-08-08 20:48:22 DataScraperEngine CrawlForTheme WARN: Transfer state from 18 to STATE_CRAWL_COUNTED. 2012-08-08 20:48:22 DataScraperEngine handleFetchedTheme DEBUG: SessionTheme has already been loaded 2012-08-08 20:48:22 DataScraperEngine handleFetchedWorkflow DEBUG: workflow has already been loaded 2012-08-08 20:48:22 DataScraperEngine handleFetchedDsdList DEBUG: instruction has already been loaded 2012-08-08 20:48:22 InstructionBag ~InstructionBag INFO: InstructionBag has been released for 北京rain_4/北京rain_4.default.dsd.xml 2012-08-08 20:48:23 FetchSpiderClue:fetchClueCallback 北京rain_4 INFO: Communication cost: 156 microsecond 2012-08-08 20:48:24 DataScraperEngine handleLoadEvent DEBUG: load has been caught 2012-08-08 20:49:24 北京rain_4 ValidateDelayedPage ERROR: Timeout to load the page 2012-08-08 20:49:24 DataScraperEngine:ScheduleProcessor 北京rain_4 DEBUG: suppress all 2012-08-08 20:49:24 DataScraperEngine handleLoadEvent DEBUG: load has been caught 请多指点! 谢谢!
翻页的线内线索的主题名不能改
翻页线索的主题名不一致的话就不能翻下一页了
已经按指导方法修改
已经按指导方法修改,加入FREEFORMAT做映射,Schema名字为:北京rain_4,提取数据有错误,延时设为60000,错误信息如下:
2012-08-08 20:48:01 ConfManager parseStopMark INFO: No stopMark
2012-08-08 20:48:05 ConfManager parseStopMark INFO: No stopMark
2012-08-08 20:48:12 ConfManager parseStopMark INFO: No stopMark
2012-08-08 20:48:22 WalkContext:SetCurrentStep 北京rain_4 DEBUG: Null input
2012-08-08 20:48:22 DataScraperEngine CrawlForTheme WARN: Transfer state from 18 to STATE_CRAWL_COUNTED.
2012-08-08 20:48:22 DataScraperEngine handleFetchedTheme DEBUG: SessionTheme has already been loaded
2012-08-08 20:48:22 DataScraperEngine handleFetchedWorkflow DEBUG: workflow has already been loaded
2012-08-08 20:48:22 DataScraperEngine handleFetchedDsdList DEBUG: instruction has already been loaded
2012-08-08 20:48:22 InstructionBag ~InstructionBag INFO: InstructionBag has been released for 北京rain_4/北京rain_4.default.dsd.xml
2012-08-08 20:48:23 FetchSpiderClue:fetchClueCallback 北京rain_4 INFO: Communication cost: 156 microsecond
2012-08-08 20:48:24 DataScraperEngine handleLoadEvent DEBUG: load has been caught
2012-08-08 20:49:24 北京rain_4 ValidateDelayedPage ERROR: Timeout to load the page
2012-08-08 20:49:24 DataScraperEngine:ScheduleProcessor 北京rain_4 DEBUG: suppress all
2012-08-08 20:49:24 DataScraperEngine handleLoadEvent DEBUG: load has been caught
请多指点!
谢谢!