Terminology:
- Pipeline - a combination (flow) of data handling components provided by Endeca to create and deploy index.
- Dimensions - Any taxonomy to be used for search is called dimensions in Endeca
- Properties
- Dgraph or Agraph - names of indexes which Endeca creates. I will talk about Dgraph
- Forge - a process(program) responsible for gathering data from sources.
- Dgidx or Agidx - a process (program) which creates index based on the input data it gets from the other components in pipeline.
Adapters load data from a source and put it on the destination, there are following types of the adapters.
1. Dimension Adapter - to load the taxonomy and load it into the pipeline
2. Record Adapter – loads the input records (data to be indexed) it can load data in various formats e.g. delimited, ODBC, JDBC etc.
3. Indexer Adapter – saves data and make it “ready to be indexed”, it takes records and dimension hierarchy and index configuration information from the pipeline and combine everything in a format that is ready to be used by Dgidx or Adgix
4. Update Adapter – updates partial information changed in the source data. It performs a live update on the index.
2. Manipulators
Manipulators change the data associated with Endeca Records (to be continued…)
3. Property Mappers
As obvious from name it performs the mapping of the records in the source data to Endeca properties or dimensions.
4. Record Assembler
If we have more than one data sources then it components merge the data from all the sources.
5. Utility Components
They help to perform basic tasks e.g. loggin and cashing. These include:
Dimension Servers – works with Dimension Adapter and serve as central source of dimension information for pipelines
Record Cashes – saves a temporary copy of data read by Record Adapter
Spiders – spider provides facility to crawl document hierarchies on a file system or over HTTP or HTTPS J