If a property with attribute clue being set has been created on the Bucket Editor work board, when turning GUI focus to work board Clue Editor, the user can find a Info clue on the board. The user must name the target theme properly for the clue.
On the Clue Editor work board, by pushing button newClue a new clue is created which is appended to the end of the pull-down menu list and is in type of Single by default. On the other hand, push button delClue to delete the current clue from the list.
The following common attributes can be assigned to the clue in the Clue Operation region:
- key: means the clue is a key for validating a data schema. Different from keys in relational databases, being a key here means this clue must exist on the target page otherwise the page is considered as not recognizable against the data schema.
- inthread: means the clue is in type of in-thread clue.
-
clue types: are set by pushing a radio button with same name accordingly. After the button has been pushed, the tab window for that type in Clue Definition region is focused. There are the following types:
- Info Clue
- Single Clue
- Marker Clue
- Relative Clue
The clue types are defined in Terms and Abbreviations#Clue. Further information can be gotten from MetaStudio Senior User's Handbook#Clue Types. In this chapter, only Info Clue and Marker Clue are described and used in the example.
Info Clue has the following specific attributes:
- Property Name: means over which property the clue is extracted. This attribute is not editable and is in form of <bean name>.<property name>.
- Target Theme: means to which theme the clue belongs. After having input the name, the Query button can be pushed to check if the theme exists. If the theme does not exist, DataScraper and MetaCamp will work together to create a new theme with this name during extracting clues from target pages.
Marker Clue has the following specific attributes:
- Marker Row No: is the serial number of a HTML DOM node over which the clue is to be extracted. This number is assigned automatically during mapping for the Marker clue.
- Marker Value: shows what the mark is. The mark, a character string, is automatically filled during mapping for the Marker clue. The mark may be edited in some cases. For example, after the clue has been mapped, the string "Next page >>" is filled into this field. If the operator wants just the string "Next page", he can delete the two ">" signs.
- Mark matching rule: On the right side of the first line, there is a checkbox which is checked by default. It means the mark should be matched fully. If the checkbox is unchecked, it means a clue can be recognized if its text contains the mark. Normally, the checkbox should be unchecked if the mark has been modified, e.g. deleting the two ">" signs, by the operator.
- Target Theme: means to which theme the clue belongs. After having input the name, the Query button can be pushed to check if the theme exists. If the theme does not exist, DataScraper and MetaCamp will work together to create a new theme with this name during extracting clues from target pages.
Except Info clues are created on Bucket Editor work board, all other clues are created via pushing newClue button on Clue Editor work board.
Except Info clues are deleted on Bucket Editor work board, all other clues are deleted via pushing delClue button on Clue Editor work board.
Exercises
Following the steps in previous chapter, move GUI focus onto Clue Editor work board. You can find an Info clue is shown on the board. It is numbered as Clue 0. Please give the clue a name of target theme, i.e. ComPage_en_ali.
Because the company list is paginated, you should define a inthread clue to turn over the pages.
- Push the newClue button. The new created clue is numbered as Clue 1 automatically.
- Tick the in-thread checkbox.
- Push the Marker radio button in the Clue Operation region to set the type of the new clue. As a result, you can find the tab window Marker in Clue definition region is focused. You can also find the Target Theme field has already been filled with the current theme name because the clue is in type of in-thread.
- Fill other fields via mapping operation which is described in the next section.