Replica

After having defined Buckets and their Properties, operators of MetaStudio should assign the mapping relations from data snippets on the sample page to the properties. That is, operators tell the system where the data snippets are extracted and where they should be stored. Thereafter MetaStudio generates data extraction rules automatically. With these rules, DataScraper can exactly extract one instance of the Bucket from the page.

If there are multiple instances of the bucket on the target page, for example a product list on a eCommerce site, DataScraper should be told how to extract them all repeatedly. MetaStudio makes use of a concept, Replica, to fulfill the requirement.
By default, the primary replica is created automatically when creating a bucket. Operators should create the secondary Replica of the bucket manually and should map once again another group of data snippets to the replica. The two groups of data snippets should be neighbors in the list. To be simplified, the two groups of data snippets should be the first and the second rows in the list.

In a summary, in order to extract a whole list, operators create the secondary Replica of the Bucket and map a neighbored group of data snippets once again to it. Thereafter MetaStudio can generate data extraction rules for extracting the whole list.

Let's discuss it in detail with an example. On a blog page, there is only one data block on blog owner's private information while there are a list of blog entries. Two buckets are created. The operator needs map only once for the primary Replica of the first bucket while he needs to create another Replica and map twice for the second bucket.