DataScraper User's Guide v3.x

DataScraper is a Web data extractor which consumes the Data and Dlue Extraction Intruction Files generated by MetaStudio to extract data from a group of Web pages belonging to the same theme. DataScraper stores the extraction results, in form of XML files, into the local folder named as DataScraperWorks which is in the home directory of the user.

DataScraper is one of the tools from the MetaSeeker toolkits.

This version has been simplified compared with the previous version, i.e. V2.x. Instead of reading the whole guide to learn it, the V2.x users can only pay attention to the following changes.

  • The extraction results are stored locally in the folder ${HOME}/DataScraperWorks. For example, for MS Windows user xpuser, the results are stored in the folder C:\Users\xpuser\DataScraperWorks\. There are multiple sub-folders named with the theme names.
  • The facility provided by DataScraper to manage the extraction results stored on the DataStore server has been banned.
  • The facility provided by DataScraper to control Lucene indexing has been banned too.