Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Facing issue regarding deltafetch #30

Open
gopal1414 opened this issue May 22, 2018 · 0 comments
Open

Facing issue regarding deltafetch #30

gopal1414 opened this issue May 22, 2018 · 0 comments

Comments

@gopal1414
Copy link

Kinldy help me on below issue:

I tried to crawl the data using “DeltaFetch”, but facing below issue:

My DB file is getting updated both time when i am using below command to run the Crawler
“$ scrapy crawl quotes -a deltafetch_reset=1”
“$ scrapy crawl quotes -a deltafetch_reset=0”

My DB file is not getting updated when i am using below command:
“$ scrapy crawl quotes”

Below are the updation i have done in setting.py file:
SPIDER_MIDDLEWARES = {
‘scrapy.contrib.spidermiddleware.referer.RefererMiddleware’: True,
‘scrapy_deltafetch.DeltaFetch’: 100,
}

COOKIES_ENABLED = True
COOKIES_DEBUG = True
DELTAFETCH_ENABLED = True
DELTAFETCH_DIR = ‘/home/administrator/apps/scrapy-deltafetch/Crawling/Crawling/crawl_output’
DOTSCRAPY_ENABLED = True

please find my below code:

import scrapy
from selenium import webdriver
from w3lib.url import url_query_parameter

class QuotesSpider(scrapy.Spider):
name = “quotes_git”

def start_requests(self):
urls = [
‘https://www.wikipedia.org/’,
]
for url in urls:
yield scrapy.Request(url=url, meta={‘deltafetch_key’: url_query_parameter(url, ‘abc001’)}, callback=self.parse)
def parse(self, response):
print (‘testing’)
print(response.url)
self.driver = webdriver.Chrome(‘/home/administrator/Downloads/Gopal/Crawling/Crawling/spiders/chromedriver’)

self.driver.get(response.url)
print(‘check point1’)

title = self.driver.title
print (title)

filename = ‘sample_git.txt’
with open(filename, ‘wb’) as f:
f.write(response.url + title)
print (‘done’)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant