DataScraper's position in data extraction tasks

DataScraper should work together with other tools from MetaSeeker Toolkits to extract data from the Web. Generally there are the following steps as follows:

  1. Define data schemas and data and clue extraction rules for the target sites with MetaStudio. The rules are stored in data and clue extraction instruction files and uploaded onto the DataStore server. At the same time a data and clue extraction work-flow file is generated and uploaded onto the DataStore server too. Please refer MetaStudio User's Guide for detailed information.
  2. Run DataScraper to list or query themes. Over the theme list region, there is a right-button pop-up menu List to list all themes. Themes can also be queried by inputing a query condition into the edit box at the bottom of the theme list and hitting RETURN. A specific theme name or a string with wildcard character "*" are permitted, e.g. "Com", "Com*", "*".
  3. Extract Web data via DataScraper. Over the theme list region, click the right-button pop-up menu item Crawl and input the account of clues to be extracted and submit. Then DataScraper will run continuously to crawl the Web and extract required data.