org.apache.nutch.indexer
Interface IndexingFilter
- All Superinterfaces:
- org.apache.hadoop.conf.Configurable, Pluggable
- All Known Implementing Classes:
- BasicIndexingFilter, CCIndexingFilter, LanguageIndexingFilter, MoreIndexingFilter, RelTagIndexingFilter
public interface IndexingFilter
- extends Pluggable, org.apache.hadoop.conf.Configurable
Extension point for indexing. Permits one to add metadata to the indexed
fields. All plugins found which implement this extension point are run
sequentially on the parse.
| Methods inherited from interface org.apache.hadoop.conf.Configurable |
getConf, setConf |
X_POINT_ID
static final String X_POINT_ID
- The name of the extension point.
filter
Document filter(Document doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
throws IndexingException
- Adds fields or otherwise modifies the document that will be indexed for a
parse. Unwanted documents can be removed from indexing by returning a null value.
- Parameters:
doc - document instance for collecting fieldsparse - parse data instanceurl - page urldatum - crawl datum for the pageinlinks - page inlinks
- Returns:
- modified (or a new) document instance, or null (meaning the document
should be discarded)
- Throws:
IndexingException
Copyright © 2006 The Apache Software Foundation