Tools used: DataScraper, a data and clue extraction tool.
DataScraper has been integrated with a facility named as Index Manager which manages the Lucene indexing engine in DataStore server. Via Index Manager, data schema and property specific indexing parameters can be specified, such as property specific boost parameter, key attribute, storing switch etc. Index Manager is enough for building up most of vertical search engines.
The following indexing parameters are set via Index Manager for the ebook search engine(how to operate Index Manager is stated in DataScraper User's Guide#Index Manager):
Property | Store Param | Index Param | Boost Param |
content brief | YES | TOKENIZED | 1.0 |
content | YES | TOKENIZED | 1.0 |
title | YES | TOKENIZED | 1.2 |
book page | YES | UNTOKENIZED | 1.0 |
By now, an ebook search engine has been built up. Load the page http://localhost:8080/datastore/searchharvest.htm and input keyword asp, may books about ASP are presented.