Skip to content

Commit

Permalink
Improve docs
Browse files Browse the repository at this point in the history
  • Loading branch information
manzanit0 committed Sep 13, 2020
1 parent 131381c commit 19d2e2d
Show file tree
Hide file tree
Showing 2 changed files with 19 additions and 6 deletions.
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,11 @@ iex> Spidey.crawl("https://manzanit0.github.io", :crawler_pool, pool_size: 15)

In a nutshell, the above line will:

1. Spin up a new supervision tree under the `Spidey` OTP application that will contain a pool of workers for crawling.
1. Spin up a new supervision tree under the `Spidey` OTP Application that will contain a pool of workers for crawling.
2. Create an ETS table to store crawled urls
3. Crawl the website
4. Teardown the supervision tree
4. Return all the urls as a list
5. Teardown the supervision tree and the ETS table

The function is synchronous, but if you were to call it asynchronously
multiple times, each invocation will spin up a new supervision trees with a
Expand Down
20 changes: 16 additions & 4 deletions lib/spidey.ex
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,19 @@ defmodule Spidey do
alias Spidey.Crawler

@doc """
Crawls a website for all the same-domain urls, returning a list.
Crawls a website for all the same-domain urls, returning a list with them.
The defauilt `pool_name` is `:default`, but a custom one can be provided.
The default filter rejects assets, Wordpress links, and others. To provide
custom filtering make sure to implement the `Spidey.Filter` behaviour and
provide it via the `filter` option.
Furthermore, `crawl/3` accepts the following options:
* `filter`: a custom url filter
* `pool_size`: the amount of workers to crawl the website. Defaults to 20.
* `max_overflow`: the amount of workers to overflow before queueing urls. Defaults to 5.
## Examples
Expand All @@ -20,11 +32,11 @@ defmodule Spidey do
Crawler.crawl(url, pool_name, opts)
end

@doc "Crawls a website for all the sam-domain urls and Saves the list of urls to file"
def crawl_to_file(url, pool_name \\ :default, path)
@doc "Just like `crawl/3` but saves the list of urls to file"
def crawl_to_file(url, path, pool_name \\ :default, opts \\ [])
when is_binary(url) and is_atom(pool_name) do
url
|> crawl(pool_name)
|> crawl(pool_name, opts)
|> File.save(path)
end
end

0 comments on commit 19d2e2d

Please sign in to comment.