Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update README.md #6

Merged
merged 1 commit into from
Sep 11, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 8 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Blacklight Query

A command-line tool to fetch [Blacklight](https://themarkup.org/series/blacklight) scans for a list of urls. Directly queries the open-source [Blacklight Collector](https://github.com/the-markup/blacklight-collector) tool, and runs entirely locally.
A command-line tool to fetch [Blacklight](https://themarkup.org/series/blacklight) scans for a list of urls. Directly queries the open-source [Blacklight Collector](https://github.com/the-markup/blacklight-collector) tool and runs entirely locally.

## Prerequesites

Expand All @@ -18,33 +18,33 @@ A command-line tool to fetch [Blacklight](https://themarkup.org/series/blackligh

Write all URLs you wish to scan as **absolute URLs** (including protocol, domain, and path) in a file named `urls.txt` in the root directory. Separate urls by newline.

### Sample urls.txt file
### Sample `urls.txt` file

```text
https://www.themarkup.org
https://www.calmaterrs.org
https://www.calmatters.org
```

### Collector Options

All of the [`blacklight-collector`](https://github.com/the-markup/blacklight-collector?tab=readme-ov-file#collector-configuration) options can be specified using this tool, by editing the `config` object in `main.ts`.

Out-of-the-box, this tool sets the following options:
Out of the box, this tool sets the following options:

- `headless: true`, this sets the collector to use a headless, behind-the-scenes browser
- `outDir: ./outputs/[URL]`, specifies which directory the collector should store it's results in. Makes use of the url being scanned
- `numPages: 0`, tells the collector not to scan an additional page. Setting this to 1, 2, or 3 scans that number of randomly-chosen pages that are accessible from the home page
- `outDir: ./outputs/[URL]`, specifies which directory the collector should store its results in. Makes use of the url being scanned
- `numPages: 0`, tells the collector not to scan an additional page. Setting this to `1`, `2`, or `3` scans that number of randomly chosen pages that are accessible from the homepage

Some other options you may find useful are:

- `emulateDevice`, this specifies which device the collector should scan as
- `headers`, allows you to set custom headers on the headless browser

Read the blacklight-collector README for a full list of options and their defaults.
Read the [`blacklight-collector` README](https://github.com/the-markup/blacklight-collector/) for a full list of options and their defaults.

## Outputs

All scans will be saved in the `outputs` folder, in sub-folders named for the hostname of the url being scanned.
All scans will be saved in the `outputs` folder, in subdirectories named for the hostname of the url being scanned.

## Testing

Expand Down
Loading