While data on a Web page are presented in a particular format, the semantics or meaning should be deduced by a person himself. Although human being can grasp the implication of the format for the deep meaning, computers can't. Computers can't yet manipulate data in the Web in the same way as they do against a database. The major difference is that there are not data schemas in the Web.
MetaStudio is the tool to define data schemas. MetaStudio provides a series of convenient ways for operators to analyze the format and structure of a target HTML document, e.g. viewing a DOM tree, to define the data schema and specify its attributes, to upload the data schema onto the MetaCamp server, to collaboratively edit an existing data schema. After MetaStudio has generated data extraction rules from the defined data schemas, DataScraper, another tool from MetaSeeker toolkit, extracts data from the Web according to the rules.
In summary, defining data schema is the start point to extract data from the Web, which is a differentiated character of MetaSeeker compared with other data extraction tools. The character uncovers the secret of generating all kinds of HTML wrappers without coding and makes sure that MetaSeeker is very suitable for collecting information for competitive intelligence system.