Silene is an open source web crawler framework built upon Pyppeteer.
You must have at least Python 3.7 installed.
To install the latest release run pip install silene
.
Each crawler must subclass the Crawler
class and implement the abstract configure
method. The CrawlerConfiguration
specifies the initial requests to make and other properties of the crawler. Once a request is processed, the appropriate
callback will be invoked. By default, in case of a successful request the
on_response_success
callback will be executed. This is where you can interact with the page content. You can also
specify custom callbacks for your requests.
Below you can find a very simple implementation.
from silene.crawl_request import CrawlRequest
from silene.crawl_response import CrawlResponse
from silene.crawler import Crawler
from silene.crawler_configuration import CrawlerConfiguration
class MyCrawler(Crawler):
def configure(self) -> CrawlerConfiguration:
return CrawlerConfiguration([CrawlRequest('https://example.com')])
def on_response_success(self, response: CrawlResponse) -> None:
# Do something with the response...
pass
This project requires Pipenv to be installed.
Run pipenv install --dev
to create a new virtual environment and install the necessary packages.
Run pytest
in the project root folder.
Run pytest --cov=silene
in the project root folder.
The source code of Silene is made available under the Apache License, Version 2.0.