Publish Scrapy stats to statsd daemon to see your spider stats in real time.
Table of Contents
Exporting scrapy's metrics into statsd required was not a perfect one to one mapping. The naming conventions and values some adjustment. Please, see them below.
The stats will be transformed from the forward slashed notation that is commonly seen on stats dumps to a more common dotted notation for statsd. Furthermore, if there is a spider present it will be prefixed to the stat name. For instance:
Scrapy Notation | StatsD Notation |
---|---|
downloader/exception_count | downloader.exception_count |
downloader/exception_type_count/twisted.internet.error.DNSLookupError | downloader.exception_type_count.twisted.internet.error.DNSLookupError |
downloader/request_count | downloader.request_count |
downloader/response_status_count/200 | downloader.response_status_count.200 |
Only numeric types are exported to statsd. No other types are exported. One might be inclined why set_value is not translated into a set within statsd. There is a mismatch in the purpose of setting a non-numeric value from scarpy to statsd. The statsd set counts the number of unique items. This is fundamentally different than setting a value with Scrapy's default stats module.
The operations increment or decrement do not use the parameter start. Scrapy has a notion that the stats are being collected in a single dictionary where you can check if the value has been set. Statsd doesn't act like a key value store. There is no mechanism to check if a value has been set for a given metric.
-
Pip install the package
pip install scrapy-statsd
Note: The requirements state Scrapy version 1.0.5 or higher, but that'll be reduce once testing is done.
-
Add the following lines to your
settings.py
of your Scrapy projectSTATS_CLASS = 'scrapy_statsd.statscollectors.StatsDStatsCollector' STATSD_HOST = 'localhost' STATSD_PORT = 8125