The appendix states a succeeding phase of phase 1, which is void for most cases when extracting data from eCommerce sites. The reason for adding this phase is explained in phase 1#What next. In this phase, the clues to visit sub-category pages are to be extracted.
Tools used: MetaStudio, a data schema definition tool.
On MetaStudio's Theme List work board, the theme ComYellowPage_mic_en_l2 is shown in status torecognize (figure 1). Click right-button pop-up menu item recognize over the theme list to load a sample page which is automatically selected by MetaStudio.
Note: Because there is already a data schema defined in phase 1, MetaStudio will ask operators if the current work board should be cleaned, which must be confirmed before defining a new data schema.
Tools used: MetaStudio, a data schema definition tool.
This step can be skipped because default information are enough.
Tools used: MetaStudio, a data schema definition tool.
On the Clue Editor work board, take the following steps to create a clue in type of Pattern:
Tools used: MetaStudio, a data schema definition tool.
Operators should tell MetaStudio at which position or within which scope one or more clues are to be extracted, which is fulfilled by mapping a DOM node standing for the position or the scope to the new-created clue. There are the following steps for this task:
Tools used: MetaStudio, a data schema definition tool.
Take the following steps to map pattern values and to name target themes:
Following figure shows the clue and the pattern after mapping:
Tools used: MetaStudio, a data schema definition tool.
Push button Schema on right side of the toolbar to upload the work files.
Tools used: DataScraper, a Web data and clue extraction tool.
The following steps are taken to extract clues with DataScraper:
The phase stated in this appendix is an optional phase between phase 1 and phase 2. Please go to phase 2 to extract commodity information further.