Scrapy request+response+download time -
upd: not close question because think way not clear should be
is possible current request + response + download time saving item?
in "plain" python do
start_time = time() urllib2.urlopen('http://example.com').read() time() - start_time
but how can scrapy?
upd:
solution enought me i'm not sure of quality of results. if have many connections timeout errors download time
may wrong (even download_timeout * 3)
for
settings.py
downloader_middlewares = { 'myscraper.middlewares.downloadtimer': 0, }
middlewares.py
from time import time scrapy.http import response class downloadtimer(object): def process_request(self, request, spider): request.meta['__start_time'] = time() # not block middlewares has greater number return none def process_response(self, request, response, spider): request.meta['__end_time'] = time() return response # return response coz should def process_exception(self, request, exception, spider): request.meta['__end_time'] = time() return response( url=request.url, status=110, request=request)
inside spider.py in def parse(...
log.msg('download time: %.2f - %.2f = %.2f' % ( response.meta['__end_time'], response.meta['__start_time'], response.meta['__end_time'] - response.meta['__start_time'] ), level=log.debug)
you write downloader middleware time each request. add start time request before it's made , finish time when it's finished. typically, arbitrary data such stored in request.meta attribute. timing information later read spider , added item.
this downloader middleware sounds useful on many projects.
Comments
Post a Comment