DataScraper User's Guide v2.0

DataScraper, a Web data extraction tool, makes use of data and clue extraction instruction files generated by MetaStudio to extract data from the Web. Before extracting a page, the structure of the page is checked against data schema defined by MetaStudio. Only if the page is considered recognizable against the data schema, the data can be extracted, which keeps semantics being exact. The extracted data are stored in Data Extraction Result Files which are XML documents and are stored onto the DataStore server. DataScraper also provides GUI-based result manager and indexing manager for Lucene v2.3.2 indexing engine.

DataScraper's core is a proprietary work-flow engine which is driven by Data and Clue Extraction Workflow Files(DCEWF) generated by MetaStudio.

DataScraper is one of tools from MetaSeeker toolkits.