Features

Index Manager provides users a GUI to manage and monitor Lucene v2.3.2 indexing engine.

Monitor index base

Input a theme name into the Theme Information region and push the button Query, the information on index bases are presented in a pop up window.

The window is split into five regions:

  • Theme Information region: presents general information on global index base and theme specific index base.
  • Index Base region: tells DataStore into which index base, i.e. global or theme specific, the data extraction results should be indexed.
  • Common Fields region: shows indexing parameters for common fields in index bases which is stated in detail in next section.
  • Data Schema Properties region: shows data schema specific properties whose indexing parameters are configurable.
  • Operation button region: provides a set of buttons which are stated in detail in next chapter.



Show if Data and Clue Extraction Instruction Files hosted on current DataStore server

On the right side of second row in Theme Information region, there is a icon ownership showing if the instruction files for the current data schema are hosted on currently connected DataStore server. If they are, the icon is . Only in this case, operators can set indexing parameters. Otherwise, the setting makes no senses.

MetaSeeker Toolkits can be deployed distributedly so that multiple DataStore servers may be fixed on different locations. There even be private servers engaged to an enterprise.

On MetaStudio's Schema List work board, operators can check where the data and clue extraction instruction files are hosted. If the operator wants to copy a set of instruction files and the owner of them permits copying, he can copy them onto his DataStore server, which is stated in detail in MetaStudio Senior User's Handbook#Copy instruction files.



Show if index instruction files expired

If the data and clue extraction instruction files have been changed since the index instruction files were created, the first icon in the third row of Theme Information region will be changed to . It means the index instruction files have expired because of inconsistence in Properties in the two sets of files. The icon means expiration of the files is unknown because the files are not hosted on this DataStore server.

If the index instruction files have expired, operators should load indexing parameters onto Index Manager and change the settings accordingly. After pushing button Save, new index instruction files will be created to make the files consistent again.

Note: In this release, modification dates are checked to judge consistence. If data and clue extraction instruction files are re-created without changing the Beans or the Properties, it is still judged as inconsistent. It can be resolved just by pushing the button of Index Manager.

The following table shows what can cause index instruction files to expire and what should be done to resolve it:

Causes Resolutions
One or more beans are deleted from the data schema. Because the remaining beans and their properties have not been touched, just push Save button to resolve the inconsistence issue..
One or more beans are added into the data schema. Indexing parameters should be set for the new beans' properties despite default parameters could be taken by just pushing the Save button.
One or more properties are added into a bean. Indexing parameters should be set for the new properties of the bean despite default parameters could be taken by just pushing the Save button.
One or more properties are deleted from a bean. Because the remaining beans and properties have not been touched, just push Save button to resolve the inconsistence issue..

After having set any indexing parameters, Save button should be pushed.

Note: There must be cases where the data schema is changed further after some data extraction results had already been indexed. In these cases indexing operation can be performed as usual and the index base can also be searched. Unfortunately, the indexing parameters set for the properties might make no sense anymore.



Select index base

There are two types of index bases:

  • Global index base: There is only one global index base for a DataStore server where all extraction results can be indexed.
  • Theme specific index base: There can be one index base for each theme respectively. The index bases are in the folders named with theme names.

In Index Base region, there is a group of radio buttons for selecting index bases. Selecting index bases takes effect only after pushing Save button. After the selection has been saved, all Index Instruction Files belonging to this theme are changed.

Which index bases the data extraction results are indexed into is shown via two icons labeled with global index and theme index. The icon means the index base has already been created. For one single theme, some results can be indexed into global index base while others can be indexed into theme specific one. The icon means the index base has not be created. After having selected one type of index base, pushing button Build will create an index base in according type. If global index base has already been created, nothing will be done by pushing Build again.



Show common fields

There are a few common fields in every Lucene Document, for which the indexing parameter are not configurable. The value of Store is stored in Lucene Field.Store object and the value of Index is stored in Lucene Field.Index object.

There is a special common field, default field which is made up of texts from all bean's properties. The field is indexed without storing its content. As a result, if keywords are to be searched from the whole scope of a Lucene Document instead of from a specific field, this field will be searched.

Each bean in a data extraction result file is indexed into a Lucene Document.



Set indexing parameters for properties

The right column of Index Manager is Data Schema Properties region where operators can set indexing parameters for each properties respectively. When selecting a bean from the pull-down list, the bean's properties are presented. After having changed the parameters, push button Save to re-create Index Instruction Files. The following parameters can be set:

  • Unique: During indexing, uniqueness will be checked against the properties with this parameter being set. As a result, duplicated beans are discarded. Zero or more properties can be unique.
  • Store: determines if the content of the field will be stored in index bases. One of the following values can be taken:
    • YES: The content of the field will be stored in index bases.
    • COMPRESS: The content of the field will be compressed and stored in index bases. This parameter is used for fields in large size or in binary, e.g. a fragment of an HTML document.
    • NO: The content of the field will not be stored in index bases. Despite the field can be searched if it had already been indexed, it's content can not be presented. For example, default field's Store parameter takes this value.
  • Index: determines how the field should be indexed. One of the following values can be taken:
    • TOKENIZED: The content of the field will be segmented into words. As a result, users can retrieve the document by searching for a keyword appearing in the content.
    • UN_TOKENIZED: The content of the field will not be segmented. That is the content is indexed as a whole. A field acting as a key should take this value. One example is product's ID on a eCommerce site. As a result, a user can retrieve the document only by searching the whole content. He cannot retrieve the document by searching for a word appearing in the content.
    • NO: The content of the field will not be indexed into index bases. Users can not retrieve a document by searching for a keyword only appearing in this field.
  • NO_NORMS:
  • Boost: assigns a weight to the field during indexing. Default is 1.0. Larger the parameter is topper will the document be boosted in the searching result list.

The following dependence should be taken care of:
A unique property should not take NO for parameter Store and must take UN_TOKENIZED for parameter Index.