|
||||||||||
| PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES | |||||||||
See:
Description
| Interface Summary | |
|---|---|
| FetchSchedule | This interface defines the contract for implementations that manipulate fetch times and re-fetch intervals. |
| Class Summary | |
|---|---|
| AbstractFetchSchedule | This class provides common methods for implementations of
FetchSchedule. |
| AdaptiveFetchSchedule | This class implements an adaptive re-fetch algorithm. |
| Crawl | |
| CrawlDatum | |
| CrawlDatum.Comparator | A Comparator optimized for CrawlDatum. |
| CrawlDb | This class takes the output of the fetcher and updates the crawldb accordingly. |
| CrawlDbFilter | This class provides a way to separate the URL normalization and filtering steps from the rest of CrawlDb manipulation code. |
| CrawlDbMerger | This tool merges several CrawlDb-s into one, optionally filtering URLs through the current URLFilters, to skip prohibited pages. |
| CrawlDbMerger.Merger | |
| CrawlDbReader | Read utility for the CrawlDB. |
| CrawlDbReader.CrawlDatumCsvOutputFormat | |
| CrawlDbReader.CrawlDatumCsvOutputFormat.LineRecordWriter | |
| CrawlDbReader.CrawlDbStatCombiner | |
| CrawlDbReader.CrawlDbStatMapper | |
| CrawlDbReader.CrawlDbStatReducer | |
| CrawlDbReader.CrawlDbTopNMapper | |
| CrawlDbReader.CrawlDbTopNReducer | |
| CrawlDbReducer | Merge new page entries with existing entries. |
| DefaultFetchSchedule | This class implements the default re-fetch schedule. |
| FetchScheduleFactory | Creates and caches a FetchSchedule implementation. |
| Generator | Generates a subset of a crawl db to fetch. |
| Generator.CrawlDbUpdater | Update the CrawlDB so that the next generate won't include the same URLs. |
| Generator.DecreasingFloatComparator | |
| Generator.HashComparator | Sort fetch lists by hash of URL. |
| Generator.PartitionReducer | |
| Generator.Selector | Selects entries due for fetch. |
| Generator.SelectorEntry | |
| Generator.SelectorInverseMapper | |
| Injector | This class takes a flat file of URLs and adds them to the of pages to be crawled. |
| Injector.InjectMapper | Normalize and filter injected urls. |
| Injector.InjectReducer | Combine multiple new entries for a url. |
| Inlink | |
| Inlinks | A list of Inlinks. |
| LinkDb | Maintains an inverted link map, listing incoming links for each url. |
| LinkDbFilter | This class provides a way to separate the URL normalization and filtering steps from the rest of LinkDb manipulation code. |
| LinkDbMerger | This tool merges several LinkDb-s into one, optionally filtering URLs through the current URLFilters, to skip prohibited URLs and links. |
| LinkDbReader | . |
| MapWritable | A writable map, with a similar behavior as java.util.HashMap. |
| MD5Signature | Default implementation of a page signature. |
| NutchWritable | |
| PartitionUrlByHost | Partition urls by hostname. |
| Signature | |
| SignatureComparator | |
| SignatureFactory | Factory class, which instantiates a Signature implementation according to the current Configuration configuration. |
| TextProfileSignature | An implementation of a page signature. |
Crawl control code.
|
||||||||||
| PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES | |||||||||