Having implemented FreeFormat, the process to define data extraction rules has been greatly simplified. If the target Web page has a formal semantic structure natively, e.g. ones with microformats embedded, only a few mouse clicks are needed to define the data extraction rules. At the same time, the performance and robustness of MetaSeeker toolkit have been improved sharply. As a result, FreeFormat bucket has totally taken the place of ListBucket in the current release, V3.1.0.
FreeFormat bucket has a tree-like structure, each of the tree node, representing a property, can be put into the following two categories:
Replica has been implemented from MetaStudio V2.x to extract multiple instances, e.g. multiple products on a product catalog page. In this example, two sibling products, usually the 1st and 2nd products, should be mapped to the same property. Thereafter MetaStudio calculates out the duplication parameters, i.e. the start point and the period, which are used to generate data extraction rules. Obviously, the approach is very complicated because every property should be mapped twice. At the same time, the capability is constrained to extract two-dimension tables. From V3.x and on, replica has been optimized in FreeFormat bucket so that not every property but container nodes should be mapped twice only if multiple instances are to be extracted with the replica approach. At the same time, not only two-dimension tables but also trees can be extracted with the help of FreeFormat bucket, which is to be stated in the following chapters.