inside MetaSeeker

Data Extraction Workflow File

Data Extraction Workflow File, driving DataScraper's workflow engine, records required workflow processors in sequence for extracting data from the Web pages belonging to a specific theme. The file is in format of XML whose name is suffixed with .profile.xml. The following is a example:

Data Schema Recognition Rule File

Data Schema Recognition Rule File, also called as DSD file, verifies if the structures of the target pages match any data schemas belonging to current theme. The files in this type are stored in DataStore server's folder $CATALINE/work/DataStore/context/extraction/config/<theme-name>/. The names of these files are suffixed with .dsd.xml.

Clue Extraction Instruction File

Clue Extraction Instruction File, also called as SCE file, is used by DataScraper to extract clues from target Web pages. The files in this type are stored in DataStore server's folder $CATALINE/work/DataStore/context/extraction/config/<theme_name>/. The names of the files are suffixed with .sce.xml. The structure of these files is shown as follows:

Data Structure Specification File

Data Structure Specification File, also called as GEM file, describes data structure of a data extraction result file. The specification files are stored in DataStore server's folder $CATALINE/work/DataStore/context/extraction/config/<theme_name>/. The file names are suffixed with .gem.xml. The structure of the files in this type is shown as follows:

Data Extraction Instruction File

Data Extraction Instruction File, also called as MAP file, is a standard XSLT command file, which transforms target Web pages and extracts data from them and serializes the harvests into XML files, i.e. Data Extraction Result Files.

Overview of Data and Clue Extraction Instruction Files

Data and Clue Extraction Instruction Files, made up of a set of instruction files, are produced by MetaStudio when defining data schemas which correspond to the instruction files one to one. The set of files are consumed by DataScraper to extract data and clues from the Web. As a matter of fact, each set of the files can be viewed as a HTML wrapper.

Data Schema Specification File

Data Schema Specification Files are products of MetaStudio when defining data schemas for Web pages. The files are stored on MetaCamp server in folder $CATALINA/work/MetaCamp/context/extraction/meta/<them_name>.

Syndicate content