Appendix b: define multiple data schemas

During extracting data and clues with DataScraper, a suitable data schema might not be found for a page. In this case, one more data schema should be defined for the theme. This chapter supplements scenario 2#phase 2. The following steps are taken to define multiple data schemas for theme ComList_mic_en:

  1. Log onto MySQL server via command line or PHPMyAdmin;
  2. Submit the command "select * from SPIDERCLUE where theme='ComList_mic_en' and status='unknownschema'";
  3. Take down the URLs of the clues in status of unknownschema
  4. Load the page into MetaStudio to define one more data schema.
  5. Log onto MySQL again and submit "update SPIDERCLUE set status='start' where ...";
  6. Extract data and clues again with DataScraper.