Improve docs

manzanit0 · Sep 13, 2020 · 19d2e2d · 19d2e2d
1 parent 131381c
commit 19d2e2d
Show file tree

Hide file tree

Showing 2 changed files with 19 additions and 6 deletions.
diff --git a/README.md b/README.md
@@ -21,10 +21,11 @@ iex> Spidey.crawl("https://manzanit0.github.io", :crawler_pool, pool_size: 15)
 
 In a nutshell, the above line will:
 
-1. Spin up a new supervision tree under the `Spidey` OTP application that will contain a pool of workers for crawling.
+1. Spin up a new supervision tree under the `Spidey` OTP Application that will contain a pool of workers for crawling.
 2. Create an ETS table to store crawled urls
 3. Crawl the website
-4. Teardown the supervision tree
+4. Return all the urls as a list
+5. Teardown the supervision tree and the ETS table
 
 The function is synchronous, but if you were to call it asynchronously
 multiple times, each invocation will spin up a new supervision trees with a

diff --git a/lib/spidey.ex b/lib/spidey.ex
@@ -9,7 +9,19 @@ defmodule Spidey do
   alias Spidey.Crawler
 
   @doc """
-  Crawls a website for all the same-domain urls, returning a list.
+  Crawls a website for all the same-domain urls, returning a list with them.
+
+  The defauilt `pool_name` is `:default`, but a custom one can be provided.
+
+  The default filter rejects assets, Wordpress links, and others. To provide
+  custom filtering make sure to implement the `Spidey.Filter` behaviour and
+  provide it via the `filter` option.
+
+  Furthermore, `crawl/3` accepts the following options:
+
+      * `filter`: a custom url filter
+      * `pool_size`: the amount of workers to crawl the website. Defaults to 20.
+      * `max_overflow`: the amount of workers to overflow before queueing urls. Defaults to 5.
 
   ## Examples
 
@@ -20,11 +32,11 @@ defmodule Spidey do
     Crawler.crawl(url, pool_name, opts)
   end
 
-  @doc "Crawls a website for all the sam-domain urls and Saves the list of urls to file"
-  def crawl_to_file(url, pool_name \\ :default, path)
+  @doc "Just like `crawl/3` but saves the list of urls to file"
+  def crawl_to_file(url, path, pool_name \\ :default, opts \\ [])
       when is_binary(url) and is_atom(pool_name) do
     url
-    |> crawl(pool_name)
+    |> crawl(pool_name, opts)
     |> File.save(path)
   end
 end