Sink

A sink of CrawlUri.

class spyder.core.sink.AbstractCrawlUriSink[source]

Abstract sink. Only overwrite the methods you are interested in.

process_not_found(curi)[source]

The uri we should have crawled was not found, i.e. HTTP Error 404. Do something with that.

process_redirect(curi)[source]

There have been too many redirects, i.e. in the default config there have been more than 3 redirects.

process_server_error(curi)[source]

There has been a server error, i.e. HTTP Error 50x. Maybe we should try to crawl this uri again a little bit later.

process_successful_crawl(curi)[source]

We have crawled a uri successfully. If there are newly extracted links, add them alongside the original uri to the frontier.

Previous topic

Crawl Scoper

Next topic

Roadmap

This Page