You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried to crawl the data using “DeltaFetch”, but facing below issue:
My DB file is getting updated both time when i am using below command to run the Crawler
“$ scrapy crawl quotes -a deltafetch_reset=1”
“$ scrapy crawl quotes -a deltafetch_reset=0”
My DB file is not getting updated when i am using below command:
“$ scrapy crawl quotes”
Below are the updation i have done in setting.py file:
SPIDER_MIDDLEWARES = {
‘scrapy.contrib.spidermiddleware.referer.RefererMiddleware’: True,
‘scrapy_deltafetch.DeltaFetch’: 100,
}
Kinldy help me on below issue:
I tried to crawl the data using “DeltaFetch”, but facing below issue:
My DB file is getting updated both time when i am using below command to run the Crawler
“$ scrapy crawl quotes -a deltafetch_reset=1”
“$ scrapy crawl quotes -a deltafetch_reset=0”
My DB file is not getting updated when i am using below command:
“$ scrapy crawl quotes”
Below are the updation i have done in setting.py file:
SPIDER_MIDDLEWARES = {
‘scrapy.contrib.spidermiddleware.referer.RefererMiddleware’: True,
‘scrapy_deltafetch.DeltaFetch’: 100,
}
COOKIES_ENABLED = True
COOKIES_DEBUG = True
DELTAFETCH_ENABLED = True
DELTAFETCH_DIR = ‘/home/administrator/apps/scrapy-deltafetch/Crawling/Crawling/crawl_output’
DOTSCRAPY_ENABLED = True
please find my below code:
import scrapy
from selenium import webdriver
from w3lib.url import url_query_parameter
class QuotesSpider(scrapy.Spider):
name = “quotes_git”
def start_requests(self):
urls = [
‘https://www.wikipedia.org/’,
]
for url in urls:
yield scrapy.Request(url=url, meta={‘deltafetch_key’: url_query_parameter(url, ‘abc001’)}, callback=self.parse)
def parse(self, response):
print (‘testing’)
print(response.url)
self.driver = webdriver.Chrome(‘/home/administrator/Downloads/Gopal/Crawling/Crawling/spiders/chromedriver’)
self.driver.get(response.url)
print(‘check point1’)
title = self.driver.title
print (title)
filename = ‘sample_git.txt’
with open(filename, ‘wb’) as f:
f.write(response.url + title)
print (‘done’)
The text was updated successfully, but these errors were encountered: