Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
soooh authored Sep 11, 2024
1 parent 3e1b9fa commit b3760d3
Showing 1 changed file with 8 additions and 8 deletions.
16 changes: 8 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Blacklight Query

A command-line tool to fetch [Blacklight](https://themarkup.org/series/blacklight) scans for a list of urls. Directly queries the open-source [Blacklight Collector](https://github.com/the-markup/blacklight-collector) tool, and runs entirely locally.
A command-line tool to fetch [Blacklight](https://themarkup.org/series/blacklight) scans for a list of urls. Directly queries the open-source [Blacklight Collector](https://github.com/the-markup/blacklight-collector) tool and runs entirely locally.

## Prerequesites

Expand All @@ -18,33 +18,33 @@ A command-line tool to fetch [Blacklight](https://themarkup.org/series/blackligh

Write all URLs you wish to scan as **absolute URLs** (including protocol, domain, and path) in a file named `urls.txt` in the root directory. Separate urls by newline.

### Sample urls.txt file
### Sample `urls.txt` file

```text
https://www.themarkup.org
https://www.calmaterrs.org
https://www.calmatters.org
```

### Collector Options

All of the [`blacklight-collector`](https://github.com/the-markup/blacklight-collector?tab=readme-ov-file#collector-configuration) options can be specified using this tool, by editing the `config` object in `main.ts`.

Out-of-the-box, this tool sets the following options:
Out of the box, this tool sets the following options:

- `headless: true`, this sets the collector to use a headless, behind-the-scenes browser
- `outDir: ./outputs/[URL]`, specifies which directory the collector should store it's results in. Makes use of the url being scanned
- `numPages: 0`, tells the collector not to scan an additional page. Setting this to 1, 2, or 3 scans that number of randomly-chosen pages that are accessible from the home page
- `outDir: ./outputs/[URL]`, specifies which directory the collector should store its results in. Makes use of the url being scanned
- `numPages: 0`, tells the collector not to scan an additional page. Setting this to `1`, `2`, or `3` scans that number of randomly chosen pages that are accessible from the homepage

Some other options you may find useful are:

- `emulateDevice`, this specifies which device the collector should scan as
- `headers`, allows you to set custom headers on the headless browser

Read the blacklight-collector README for a full list of options and their defaults.
Read the [`blacklight-collector` README](https://github.com/the-markup/blacklight-collector/) for a full list of options and their defaults.

## Outputs

All scans will be saved in the `outputs` folder, in sub-folders named for the hostname of the url being scanned.
All scans will be saved in the `outputs` folder, in subdirectories named for the hostname of the url being scanned.

## Testing

Expand Down

0 comments on commit b3760d3

Please sign in to comment.