After the bucket has been built up, MetaStudio should be told from which DOM node the data will be extracted for a property. We call this process mapping or property mapping compared to clue mapping which is to be stated in the next chapter.
In order to improve the robustness of the data extraction rules against changes in Web page structures, instead of locating a DOM node absolutely, MetaSeeker takes the following three approaches to locate it:
All above operations are initiated via clicking right-button pop-up menu items over the DOM tree viewer, which are stated in detail in the following paragraphs.
The right-button pop-up menu Map Data over the DOM tree viewer is constructed dynamically. All sub menu items correspond to the leaf nodes in the bucket. After a DOM node has been selected in the DOM tree viewer, click one of the menu items to map this DOM node to the corresponding property. As a result, the column node in the property edit area displays the serial number of the DOM node.
There are the following methods to select a DOM node in the DOM Tree Viewer:
Despite text nodes are mapped to properties in most cases, other nodes, e.g. elements and attributes, can also be mapped. Which type of node can be mapped is determined by the attributes of the target property, which is stated in detail in MetaStudio Senior User's Handbook. If the mapping was invalid, a alert window would pop up to show the reason. If the root reason can not be found yet, please go to MetaSeeker Toolkit forum to ask help from the community or contact us directly.
Note: Nesting of HTML nodes may impact precision of positioning by reversion selection. In some cases the found node may be an ancestor of the target node. In this case, the user must make sure the node is the wanted. MetaStudio provides such a convenient tool helping the user to verify it that he just watches whose border flashes in red for three times.
If the DOM node from which the data will be extracted has FreeFormat marks, FreeFormat mapping can be performed from this node to the property. Alternatively, if the DOM node hasn't a FreeFormat mark but one of its ancetor has, the ancestor node can be mapped from. Whether the property is a container node or not, the mapping operation can be performed.
Notes: When finding FreeFormat marks, the scope on the Web page is limited. Every container in the bucket represents a block on the page which is delimited by the outermost element. It is the scope to be searched. If the user maps from a DOM node out of this scope, an alert window will pops up during calculating data extraction rules by MetaStudio.
Take one of the two approaches, stated in the above section, to select a DOM node. Click Map FreeFormat, one of menu items poping up via clicking right-button over the DOM tree viewer, to pop up the sub menu which is made up of the names of the properties. Click one of the sub menu items to map the FreeFormat mark on the selected DOM node to the property. On the bucket structure tree, the fields FreeFormat and Type are filled with the value and the type of the FreeFormat respectively.
After having mapped FreeFormat, data mapping can still be performed over the property. As stated before, FreeFormat marks, acting as references, are helpful to precisely extract data. If data mapping has not be performed, FreeFormat mapping operation automatically set the block attribute for the property, where the filter is Text meaning all textual content embraced by this element will be extracted. If being afraid of extracting too many useless contents, the block attribute can be cancelled and perform exact data mapping.
As stated before, i.e. the words in italic in the section Steps, the approach to map replicas has changed greatly. Currently, only container node can be mapped for extracting multiple instances. If FreeFormat marks can be mapped to the container nodes, they are preferred for improving robustness of the data extraction rules.
If replicas are to be used to extract multiple instances, they should be enabled for a specific property in advance via taking the following steps: