The Web is called to be under explosion since a great amount of data are fed into the Web every day. As a result, there should be a lot of automatic methods to manipulate the data. So the phrase Web Automation emerges to represent all the methods and techniques. There is not a strict scope of the meaning. The following are considered falling in this scope ( an uncompleted list):
- automated screen scraping and web scraping
- automated test procedures against web services
- automated HTML Form submission
- automated Web pages extracting by Web spiders, robots, worms, crawlers
- automated Web page wrappers converting page contents between different formats
- automated Web data extraction
- automated methods to find out all stale links on a target site
- automated information aggregation from multiple sources
- automated recognition of semantic schema or data structures of Web pages
Some implementations are shown on [1].
What is MetaSeeker toolkit
MetaSeeker toolkit provides a series of tools which semantically describe data schemas of target Web pages, construct Data Schema Specification Files and Data and Clue Extraction Instruction Files, continuously extract information in bulk from the Web, produce and store Data Extraction Result Files with semantic meta data. All above activities are necessary for collecting contents during building up information services.