Skip to content

Commit

Permalink
Merge pull request #32 from yujiosaka/rename_option
Browse files Browse the repository at this point in the history
Rename option
  • Loading branch information
yujiosaka authored Dec 10, 2017
2 parents 95d05a6 + c88a201 commit 84300a2
Show file tree
Hide file tree
Showing 3 changed files with 10 additions and 10 deletions.
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ function launch() {
onSuccess: (result => {
console.log('onSuccess', result);
}),
ensureClearCache: false, // Set false so that cache won't be cleared when closing the crawler
persistCache: true, // Set true so that cache won't be cleared when closing the crawler
   cache,
 });
}
Expand Down Expand Up @@ -154,7 +154,7 @@ HCCrawler provides method to launch or connect to a HeadlessChrome/Chromium.
* `maxConcurrency` <[number]> Maximum number of pages to open concurrently, defaults to `10`.
* `maxRequest` <[number]> Maximum number of requests, defaults to `0`. Pass `0` to disable the limit.
* `cache` <[Cache]> A cache object which extends [BaseCache](#class-basecache) to remember and skip duplicate requests, defaults to [SessionCache](#class-sessioncache). Pass `null` if you don't want to skip duplicate requests.
* `ensureClearCache` <[boolean]> Whether to clear cache on closing or disconnecting from the browser, defaults to `true`.
* `persistCache` <[boolean]> Whether to persist cache on closing or disconnecting from the browser, defaults to `false`.
* returns: <Promise<HCCrawler>> Promise which resolves to HCCrawler instance.

This method connects to an existing Chromium instance. The following options are passed straight to [puppeteer.connect([options])](https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#puppeteerconnectoptions).
Expand All @@ -177,7 +177,7 @@ url, timeout, priority, delay, retryCount, retryDelay, jQuery, device, username,
* `maxConcurrency` <[number]> Maximum number of pages to open concurrently, defaults to `10`.
* `maxRequest` <[number]> Maximum number of requests, defaults to `0`. Pass `0` to disable the limit.
* `cache` <[Cache]> A cache object which extends [BaseCache](#class-basecache) to remember and skip duplicate requests, defaults to [SessionCache](#class-sessioncache). Pass `null` if you don't want to skip duplicate requests.
* `ensureClearCache` <[boolean]> Whether to clear cache on closing or disconnecting from the browser, defaults to `true`.
* `persistCache` <[boolean]> Whether to clear cache on closing or disconnecting from the browser, defaults to `false`.
* returns: <Promise<HCCrawler>> Promise which resolves to HCCrawler instance.

The method launches a HeadlessChrome/Chromium instance. The following options are passed straight to [puppeteer.launch([options])](https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#puppeteerlaunchoptions).
Expand Down Expand Up @@ -309,7 +309,7 @@ HCCrawler.launch({ cache: null });

### class: RedisCache

Passing a `RedisCache` object to the [HCCrawler.connect([options])](#hccrawlerconnectoptions)'s `cache` options allows you to persist requested urls in Redis and prevents from requesting same urls in a distributed servers' environment. It also works well with its `ensureClearCache` option to be false.
Passing a `RedisCache` object to the [HCCrawler.connect([options])](#hccrawlerconnectoptions)'s `cache` options allows you to persist requested urls in Redis and prevents from requesting same urls in a distributed servers' environment. It also works well with its `persistCache` option to be true.

Its constructing options are passed to [NodeRedis's redis.createClient([options])](https://github.com/NodeRedis/node_redis#rediscreateclient)'s options.

Expand All @@ -320,15 +320,15 @@ const RedisCache = require('headless-chrome-crawler/cache/redis');
const cache = new SessionRedis({ host: '127.0.0.1', port: 6379 });

HCCrawler.launch({
ensureClearCache: false, // Set false so that cache won't be cleared when closing the crawler
persistCache: true, // Set true so that cache won't be cleared when closing the crawler
cache,
});
// ...
```

### class: BaseCache

You can create your own cache by extending the [BaseCache's interfaces](https://github.com/yujiosaka/headless-chrome-crawler/blob/master/cache/base.js) and pass its object to the [HCCrawler.connect([options])](#hccrawlerconnectoptions)'s `cache` options.
You can create your own cache by extending the [BaseCache's interfaces](https://github.com/yujiosaka/headless-chrome-crawler/blob/master/cache/base.js) and pass its object to the [HCCrawler.connect([options])](#hccrawlerconnectoptions)'s `cache` options.

Here is an example of creating a file based cache.

Expand Down
2 changes: 1 addition & 1 deletion examples/redis-cache.js
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ function launch() {
onSuccess: (result => {
console.log('onSuccess', result);
}),
ensureClearCache: false, // Set false so that cache won't be cleared when closing the crawler
persistCache: true, // Set true so that cache won't be cleared when closing the crawler
cache: new RedisCache(), // Passing no options expects Redis to be run in the local machine.
});
}
Expand Down
6 changes: 3 additions & 3 deletions lib/hccrawler.js
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ const HCCRAWLER_OPTIONS = [
'maxConcurrency',
'maxRequest',
'cache',
'clearCacheOnEnd',
'persistCache',
];

const deviceNames = Object.keys(devices);
Expand Down Expand Up @@ -84,7 +84,7 @@ class HCCrawler {
retryDelay: 10000,
jQuery: true,
cache: new SessionCache(),
ensureClearCache: true,
persistCache: true,
}, options);
this._pQueue = new PQueue({
concurrency: this._options.maxConcurrency,
Expand Down Expand Up @@ -342,7 +342,7 @@ class HCCrawler {
* @private
*/
_clearCacheOnEnd() {
if (this._options.ensureClearCache) return this._clearCache();
if (this._options.persistCache) return this._clearCache();
return Promise.resolve();
}

Expand Down

0 comments on commit 84300a2

Please sign in to comment.