Data and Clue Extraction Instruction Files, made up of a set of instruction files, are produced by MetaStudio when defining data schemas which correspond to the instruction files one to one. The set of files are consumed by DataScraper to extract data and clues from the Web. As a matter of fact, each set of the files can be viewed as a HTML wrapper. In the following sections, we are going to state the structures of each of the files.
- Data Extraction Instruction File
- Data Structure Specification File
- Clue Extraction Instruction File
- Data Schema Recognition Rule File
Notes: All above files are stored in DataStore server's folder $CATALINE/work/DataStore/context/extraction/config/<theme_name>/. It may not be the best ways to store the files in Tomcat's folder work, because the folder may be cleaned when Tomcat is updated. Let's take an example. a MetaCamp server is installed on Fedora Linux. When Tomcat is updated via Yum service, the work folder may be cleaned. Prevent works from being lost, they should be backed up in advance.