Bucket

You may have a few cases of different sizes to store and classify your clothes and other daily necessities. MetaStudio provides similar cases, called as Buckets, to classify data snippets on a Web page according to their semantics. In a bucket there are many shelves which are booked to store data snippets with specific semantics. The structure of the shelves is designable by MetaStudio's operator. After the operator has finished designing all the buckets, they are transformed and stored into a Data Schema. Each bucket is casted into a Bean in the Data Schema. Each shelf of the bucket is mapped into a Property of the Bean, similar to a table's field. In other words, a Data Schema contains many Beans which further contain Properties which express the semantics of data snippets. Let's discuss it in detail with an example.

One operator wants to define a Data Schema of a blog site for extracting information on blog owner and blog entries. He selects a sample page and loads it into MetaStudio's browser. He creates two buckets. One is used to store data snippets located in the area of owner's information, such as the name of the owner, sex, email address etc. The other is used to store data snippets located in the area of blog entries, such as title, publish date, abstract, body, comment count etc. As a result, two Beans are casted: Owner and Blog Entry. The former has a series of properties such as name, sex or email address. The latter has another series of properties such as title, publish date, abstract, body, comment count.