Skip to content

Commit

Permalink
Merge pull request #130 from yujiosaka/browser_cache_option
Browse files Browse the repository at this point in the history
Browser cache option
  • Loading branch information
yujiosaka authored Feb 24, 2018
2 parents 00bc9e7 + 4abf327 commit cf3bdf0
Show file tree
Hide file tree
Showing 4 changed files with 17 additions and 2 deletions.
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,9 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.

## [Unreleased]

- Support `browserCache` for [crawler.queue()](https://github.com/yujiosaka/headless-chrome-crawler#crawlerqueueoptions)'s options.
- Support `depthPriority` option again.

## [1.3.4] - 2018-02-22

### changed
Expand Down
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -182,7 +182,7 @@ browserWSEndpoint, ignoreHTTPSErrors
Also, the following options can be set as default values when [crawler.queue()](#crawlerqueueoptions) are executed.

```
url, allowedDomains, deniedDomains, timeout, priority, depthPriority, delay, retryCount, retryDelay, jQuery, device, username, password, evaluatePage
url, allowedDomains, deniedDomains, timeout, priority, depthPriority, delay, retryCount, retryDelay, jQuery, browserCache, device, username, password, evaluatePage
```

> **Note**: In practice, setting the options every time you queue equests is redundant. Therefore, it's recommended to set the default values and override them depending on the necessity.
Expand Down Expand Up @@ -222,7 +222,7 @@ ignoreHTTPSErrors, headless, executablePath, slowMo, args, ignoreDefaultArgs, ha
Also, the following options can be set as default values when [crawler.queue()](#crawlerqueueoptions) are executed.

```
url, allowedDomains, deniedDomains, timeout, priority, depthPriority, delay, retryCount, retryDelay, jQuery, device, username, password, evaluatePage
url, allowedDomains, deniedDomains, timeout, priority, depthPriority, delay, retryCount, retryDelay, jQuery, browserCache, device, username, password, evaluatePage
```

> **Note**: In practice, setting the options every time you queue the requests is redundant. Therefore, it's recommended to set the default values and override them depending on the necessity.
Expand Down Expand Up @@ -251,6 +251,7 @@ url, allowedDomains, deniedDomains, timeout, priority, depthPriority, delay, ret
* `retryCount` <[number]> Number of limit when retry fails, defaults to `3`.
* `retryDelay` <[number]> Number of milliseconds after each retry fails, defaults to `10000`.
* `jQuery` <[boolean]> Whether to automatically add [jQuery](https://jquery.com) tag to page, defaults to `true`.
* `browserCache` <[boolean]> Whether to enable browser cache for each request, defaults to `true`.
* `device` <[string]> Device to emulate. Available devices are listed [here](https://github.com/GoogleChrome/puppeteer/blob/master/DeviceDescriptors.js).
* `username` <[string]> Username for basic authentication. pass `null` if it's not necessary.
* `screenshot` <[Object]> Screenshot option, defaults to `null`. This option is passed to [Puppeteer's page.screenshot()](https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#pagescreenshotoptions). Pass `null` or leave default to disable screenshot.
Expand Down
10 changes: 10 additions & 0 deletions lib/crawler.js
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,7 @@ class Crawler {
this._preventNewTabs(),
this._authenticate(),
this._emulate(),
this._setCacheEnabled(),
this._setUserAgent(),
this._setExtraHeaders(),
this._handlePageEvents(),
Expand Down Expand Up @@ -118,6 +119,15 @@ class Crawler {
return this._page.emulate(devices[this._options.device]);
}

/**
* @return {!Promise}
* @private
*/
_setCacheEnabled() {
if (this._options.browserCache) return Promise.resolve();
return this._page.setCacheEnabled(false);
}

/**
* @return {!Promise}
* @private
Expand Down
1 change: 1 addition & 0 deletions lib/hccrawler.js
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,7 @@ class HCCrawler extends EventEmitter {
retryCount: 3,
retryDelay: 10000,
jQuery: true,
browserCache: true,
persistCache: false,
skipDuplicates: true,
depthPriority: true,
Expand Down

0 comments on commit cf3bdf0

Please sign in to comment.