Welcome to Spyder

Spyder is a scalable web-spider written in Python using the non-blocking Tornado library and ZeroMQ as messaging layer. The messages are serialized using Thrift.

The architecture is very basic: a Master process contains the crawl Frontier that organises the URLs that need to be crawled; several Worker processes actually download the content and extract new URLs that should be crawled in the future. For storing the content you may attach a Sink to the Master and be informed about the interesting events for an URL.

Indices and tables

Table Of Contents

Next topic

Release Notes

This Page