Map clues

After having selected a DOM node in the DOM Tree Viewer, click the right-button pop-up menu Map Clue to map the node to a clue or its attributes.

There are the following methods to select a DOM node in the DOM Tree Viewer:

  • Ordinary selection: Expand the DOM tree one level by another till the target node is found.
  • Reverse selection: In the embedded browser of MetaStudio, click the target data snippets, the DOM tree will be expanded and the corresponding DOM node will be selected. By default, this function is not enabled. In order to enable it, tick the Reverse Selection checkbox on the tool bar. Thereafter, the handler for mouse clicking event is overriden by the customized one which positions the target HTML nodes on the tree.


Map clues

Except those in type of Info Clue all clues must be mapped before attribute mappings, e.g. mark mapping. The procedure implies different things for different types of clues.
For clues in type of Single Clue, mapping a DOM node to a clue implies extracting a URL from this fixed position.
For clues in other types, mapping a DOM node to a clue implies extracting clues according to some rules from a scope which is delimited by this DOM node. The mapping operation for clues in type of Marker will be described in the following sections. Others are described in chapter MetaStudio Senior User's Handbook#Clue Types.

After having selected a DOM node in the DOM Tree Viewer, click the right-button pop-up menu Map Clue >>Clue Mapping, the third level of the menu will be expanded. This level is automatically generated and named with clues' numbers, e.g. Clue 1. Click on one of them to map the current DOM node to this clue. The mapping status, shown in the second line of the Clue Operations area on the work board, will be changed from Node: unmapped to Node:xxx where xxx is the serial number of the DOM node.

Not all DOM nodes can be mapped to any clues in any types. In other words, only a DOM node in a specific type can be mapped to a specific clue. For example, only HTML <A> element can be mapped to a clue in type of Single Clue. On the other hand, any HTML elements can be mapped to a clue in other types. If the operator improperly performed a mapping operation, MetaStudio would complain by popping up an alert window.



Map marks

For a clue in type of Marker Clue, the mark value may not be input manually. Instead, there is a right-button pop-up menu Map Clue>>Marker Mapping for the operation. After having selected a text node in the DOM Tree Viewer, clicking the menu can fill the text box with the text node's value. Thereafter the mark can be changed as described in previous section. The mark matching rule, denoted by icon , may be changed accordingly. By now, the marker clue has been fully mapped.

Only a text node embraced by a HTML tag <A> can be mapped to a mark.



Exercises

Following the steps in the previous section, take the steps to map clues and their attributes as follows:

  1. Select the DOM node of No. 1513 which delimit a scope of the HTML page which contains a hyper-link for turning the page over.
  2. Map this node to Clue 1 via clicking the menu item Map Clue->Clue Mapping->s_clue_1.
  3. Select the DOM node of No. 1529 which is a text node, embraced by an HTML <A> tag, with value next >>.
  4. Map this node to the mark via clicking the menu item Map Clue->Marker Mapping. You can find 1529 is filled into the field of Marker Row No and "next >>" is filled into the text box Marker Value.
  5. Delete all space characters before and after next >> in the text box.
  6. Uncheck the checkbox .

Note: The serial numbers of DOM nodes you got when you do this exercise might be different from those shown here because the structure of the target HTML page might be changed.

Note: You should not worry about changing of the serial numbers too much. The serial numbers are not recorded into the data schema and data and clue extraction rules so that the changes might not impact the validation of them. In most cases, after you have uploaded the data schema onto the MetaCamp server for a long time, you get different serial numbers when you download and edit it again with MetaStudio. The MetaStudio can handle the changes normally. But it is not the case all the way. If the structure of the page would have been changed greatly, MetaStudio would complain about not being able to position some properties. In this case, you must re-map the properties.