DataScraper provides a set of facilities to monitor working status.
In the Output region, there is a list presenting messages or warnings. There are five columns as follows:
If multiple data schemas have been defined for a theme, DataScraper should select one matching the structure of current page. There are a few log messages recording the selecting process. The messages may be presented in the following format:
where:
In case there are multiple data schemas for a theme, log messages like above do not mean fault. Only if all data schemas failed to be tried, the following log message would be emitted:
where CCC is the page number where data schemas has been tried.
The above message says all data schemas was not matched with current page's structure. As a result, data extraction was not performed over the page. The operator can find the status of the SpiderClue record with this id has been set to unkownschema. The operator can load the page manually into MetaStudio to analyze its data structure. Maybe one more data schemas should be defined for this theme over this sample page.
Note: In this release, there is not a GUI-based approach to query the SpiderClue record in status of unknownschema. Operator should access MySQL database system to query the record with its id.
The progress of data extraction task is shown on Status panel of DataScraper. The following are displayed: