Saturday, 2 April 2011

Introduction:
Endeca is powerful tool for developers and businesses to create index of data for a range of sources. The sources include XML, Database, flat delimited text files etc.

Terminology:
  • Pipeline - a combination (flow) of data handling components provided by Endeca to create and deploy index.
  • Dimensions - Any taxonomy to be used for search is called dimensions in Endeca
  • Properties
  • Dgraph or Agraph - names of indexes which Endeca creates. I will talk about Dgraph
  • Forge - a process(program) responsible for gathering data from sources.
  • Dgidx or Agidx - a process (program) which creates index based on the input data it gets from the other components in pipeline.
Building Blocks:
I will describe all the components used within Endeca when creating an index. The basic logical component of the whole process is a pipeline.

Components in the pipeline:
1. Adapters
2. Manipulators
3. Property Mapper
4. Record Assembler
5. Utility Components

Generally the components used in a pipeline are the first four. I will describe each of them in the following section.
1. Adapters

Adapters load data from a source and put it on the destination, there are following types of the adapters.

1. Dimension Adapter - to load the taxonomy and load it into the pipeline

2. Record Adapter – loads the input records (data to be indexed) it can load data in various formats e.g. delimited, ODBC, JDBC etc.

3. Indexer Adapter – saves data and make it “ready to be indexed”, it takes records and dimension hierarchy and index configuration information from the pipeline and combine everything in a format that is ready to be used by Dgidx or Adgix

4. Update Adapter – updates partial information changed in the source data. It performs a live update on the index.

2. Manipulators

Manipulators change the data associated with Endeca Records (to be continued…)

3. Property Mappers

As obvious from name it performs the mapping of the records in the source data to Endeca properties or dimensions.

4. Record Assembler

If we have more than one data sources then it components merge the data from all the sources.

5. Utility Components

They help to perform basic tasks e.g. loggin and cashing. These include:

Dimension Servers – works with Dimension Adapter and serve as central source of dimension information for pipelines

Record Cashes – saves a temporary copy of data read by Record Adapter

Spiders – spider provides facility to crawl document hierarchies on a file system or over HTTP or HTTPS J


Dimension Types
internal - which developer creates manually
propmapper - developer studio creates it during the property mapping.
autogen - Developer Studio automatically created the dimension

Note: Not all components in the pipeline are used always.

Building a Pipeline:

I will run the steps after installation of the Endeca. The.......[to be continued]




5 comments:

  1. this post is to be continued....

    ReplyDelete
  2. Very good post, esp for beginners like me. Did you happen to post your next version/posts. Could you please give me some references on endeca, pipeline, I am new to endeca and would sincerely appreciate all the help.

    ReplyDelete
  3. Ajsbaby thanks for you comment, unfortunately there is not much help available on internet about indeca except some blogs from few developer who experienced to build a pipeline, i will be completing post soon hopefully that will help. What i use for help is just the references which come with the installation of Endeca.
    Regards,

    ReplyDelete
  4. Really its very useful... please continue.. waiting for next post..

    ReplyDelete