Skip to content

robertjdominguez/docusaurus-to-pdf

Repository files navigation

docusaurus-to-pdf

docusaurus-to-pdf is a CLI tool that generates a PDF from a Docusaurus-based documentation website. The tool allows customization of the scraping process via a configuration file or CLI options.

Installation

You can use npx to run the tool without installing it globally:

npx docusaurus-to-pdf

Usage

By default, the tool looks for a configuration file named scraper.config.json. However, you can override this by providing specific options through the CLI.

CLI Options

Option Description Default
--all Generate PDF for all directories true
--baseUrl <url> Base URL of the site to scrape
--entryPoint <url> Entry point for scraping (starting URL)
--directories <dirs...> Specific directories to include in the scraping process (optional)
--customStyles <styles...> Add custom styles as a string to override defaults (optional)
--output <path> Output path for the generated PDF ./output/docs.pdf
--forceImages Disable lazy loading for images false

Examples

Below, you'll find some example configurations that can be placed in a scraper.config.json file.

Example 1: Scraping specific directories

Only paths which include 'auth' and 'support' will be included in the output:

CLI equivalent: npx docusaurus-to-pdf --baseUrl https://hasura.io --entryPoint https://hasura.io/docs/3.0 --directories auth support

{
  "baseUrl": "https://hasura.io",
  "entryPoint": "https://hasura.io/docs/3.0",
  "requiredDirs": ["auth", "support"]
}

Example 2: Scraping all directories

CLI equivalent: npx docusaurus-to-pdf --baseUrl https://hasura.io --entryPoint https://hasura.io/docs/3.0 --output ./output/all-docs.pdf

{
  "baseUrl": "https://hasura.io",
  "entryPoint": "https://hasura.io/docs/3.0",
  "outputDir": "./output/all-docs.pdf"
}

Example 3: Scraping without specifying the output directory

CLI equivalent: npx docusaurus-to-pdf --baseUrl https://docusaurus.io --entryPoint https://docusaurus.io/docs

{
  "baseUrl": "https://docusaurus.io",
  "entryPoint": "https://docusaurus.io/docs"
}

Example 4: Scraping with custom styles

This will add override the existing styles of tables to have a max-width of 3500px, which is typical for an A4 sheet of paper.

CLI equivalent: npx docusaurus-to-pdf --baseUrl https://hasura.io --entryPoint https://hasura.io/docs/3.0 --customStyles 'table { max-width: 3500px !important }'

{
  "baseUrl": "https://hasura.io",
  "entryPoint": "https://hasura.io/docs/3.0",
  "customStyles": "table { max-width: 3500px !important }"
}

Example 5: Scraping without lazy loading on images

CLI equivalent: npx docusaurus-to-pdf --baseUrl https://docusaurus.io --entryPoint https://docusaurus.io/docs --forceImages

{
  "baseUrl": "https://docusaurus.io",
  "entryPoint": "https://docusaurus.io/docs",
  "forceImages": true
}

Contributing

We welcome contributions! If you'd like to contribute, please follow these steps:

  1. Fork the repository.
  2. Create a new branch (git checkout -b feature-branch).
  3. Make your changes.
  4. Run the tests
  5. Commit your changes (git commit -am 'Add new feature').
  6. Push to the branch (git push origin feature-branch).
  7. Create a pull request.

About

A CLI tool for scraping Docusaurus sites into PDFs.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published