Saturday, 2 April 2011

Introduction:
Endeca is powerful tool for developers and businesses to create index of data for a range of sources. The sources include XML, Database, flat delimited text files etc.

Terminology:
  • Pipeline - a combination (flow) of data handling components provided by Endeca to create and deploy index.
  • Dimensions - Any taxonomy to be used for search is called dimensions in Endeca
  • Properties
  • Dgraph or Agraph - names of indexes which Endeca creates. I will talk about Dgraph
  • Forge - a process(program) responsible for gathering data from sources.
  • Dgidx or Agidx - a process (program) which creates index based on the input data it gets from the other components in pipeline.
Building Blocks:
I will describe all the components used within Endeca when creating an index. The basic logical component of the whole process is a pipeline.

Components in the pipeline:
1. Adapters
2. Manipulators
3. Property Mapper
4. Record Assembler
5. Utility Components

Generally the components used in a pipeline are the first four. I will describe each of them in the following section.
1. Adapters

Adapters load data from a source and put it on the destination, there are following types of the adapters.

1. Dimension Adapter - to load the taxonomy and load it into the pipeline

2. Record Adapter – loads the input records (data to be indexed) it can load data in various formats e.g. delimited, ODBC, JDBC etc.

3. Indexer Adapter – saves data and make it “ready to be indexed”, it takes records and dimension hierarchy and index configuration information from the pipeline and combine everything in a format that is ready to be used by Dgidx or Adgix

4. Update Adapter – updates partial information changed in the source data. It performs a live update on the index.

2. Manipulators

Manipulators change the data associated with Endeca Records (to be continued…)

3. Property Mappers

As obvious from name it performs the mapping of the records in the source data to Endeca properties or dimensions.

4. Record Assembler

If we have more than one data sources then it components merge the data from all the sources.

5. Utility Components

They help to perform basic tasks e.g. loggin and cashing. These include:

Dimension Servers – works with Dimension Adapter and serve as central source of dimension information for pipelines

Record Cashes – saves a temporary copy of data read by Record Adapter

Spiders – spider provides facility to crawl document hierarchies on a file system or over HTTP or HTTPS J


Dimension Types
internal - which developer creates manually
propmapper - developer studio creates it during the property mapping.
autogen - Developer Studio automatically created the dimension

Note: Not all components in the pipeline are used always.

Building a Pipeline:

I will run the steps after installation of the Endeca. The.......[to be continued]