This module contains the default architecture for worker processes. In order to start a new worker process you should simply call this modules main method.
Communication between master -> worker and inside the worker is as follows:
Master -> PUSH -> Worker Fetcher
Worker Fetcher -> PUSH -> Worker Extractor
Worker Extractor -> PUB -> Master
Each Worker is a ZmqWorker (or ZmqAsyncWorker). The Master pushes new CrawlUris to the Worker Fetcher. This will download the content from the web and PUSH the resulting CrawlUri to the Worker Extractor. At this stage several Modules for extracting new URLs are running. The Worker Scoper will decide if the newly extracted URLs are within the scope of the crawl.
Create a processing method that iterates all processors over the incoming message.
Create and return a new Worker Extractor that will combine all configured extractors to a single ZmqWorker.
Create and return a new Worker Fetcher.
This module contains a ZeroMQ based Worker abstraction.
The ZmqWorker class expects an incoming and one outgoing zmq.socket as well as an instance of the spyder.core.mgmt.ZmqMgmt class.
Asynchronous version of the ZmqWorker.
This worker differs in that the self._processing method should have two arguments: the message and the socket where the result should be sent to!
This is the ZMQ worker implementation.
The worker will register a ZMQStream with the configured zmq.Socket and zmq.eventloop.ioloop.IOLoop instance.
Upon ZMQStream.on_recv the configured processors will be executed with the deserialized context and the result will be published through the configured zmq.socket.