Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add doc generation scripts #16

Merged
merged 9 commits into from
Jul 1, 2024
Merged

Add doc generation scripts #16

merged 9 commits into from
Jul 1, 2024

Conversation

carlopi
Copy link
Collaborator

@carlopi carlopi commented Jun 28, 2024

EDIT: I have refined / iterate on this. It will need another couple of iterations depending on feedback, but this looks solid.

Recap is: there is a script that querying duckdb metadata functions (like FROM duckdb_functions()) before and after the LOAD of a given extension, can autogenerate documentation for the interface exposed by a given function.

Path is:
extension binary -> extension_name.md

This script is run on CI on every extension build, and once again on the whole set of published extensions.

Given there is currently no stable way to provide descriptions / comments / example in the code itself, and we plan in introducing a proper API in the run to v1.1, extension developers can opt-in to provide a csv file with relevant metadata fields, that will be (by DuckDB!) joined to the data already collected.


Two very rough bash scripts, one that INSTALLs locally all relevant extensions, one that, using extensions in the folder build/extension_dir/** will generate markdown for each extension.

To both scripts you need to provide a path to a duckdb binary for the right platform & version.

Also added description for functions in the h3 extension, using informations found on the README.md.


"Design" choices here are:

  • source of truth are DuckDB metadata functions (duckdb_functions(), duckdb_types(), duckdb_settings()), where a diff is made between before and after loading a given extension
  • whenever those are not sufficient (say, since there is no complete function parity yet, say on the extension description, or workaround to get there are involved, say on function's descriptions) additional CSV files can be provided to give those extra informations
  • script needs to be of value both in the setup of autogenerating documentations for community extensions, but also in general. Here it means there are two scripts: 1 to collect extensions (and store them in a custom extension directory), and 1 to analyze them.

Example to build docs for a core extension (installing it from http://extensions.duckdb.org):

duckdb -c "SET extension_directory = 'build/extension_dir'; FORCE INSTALL postgres;"
./scripts/generate_md.sh duckdb
cat build/docs/postgres_scanner.md

Example to build tpch extension locally and then generate docs for it:

cd duckdb
echo "duckdb_extension_load(tpch DONT_LINK)" > extension/extension_config_local.cmake
GEN=ninja make
./build/release/duckdb -c "SET extension_directory = 'build/extension_dir'; FORCE INSTALL tpch FROM 'build/release/repository'"
/path/to/community-extensions/scripts/generate_md.sh ./build/release/duckdb
cat build/docs/tpch.md

Note that as of now, a build folder will be generated, polluting the current folder.

@carlopi
Copy link
Collaborator Author

carlopi commented Jun 28, 2024

Note that providing an optional descriptor like:

function, description, example, comment
dbgen, "Generate TPCH tables and data", "CALL dbgen(sf=1)",
tpch, "Run a query on the generated tables", "PRAGMA tpch(1)",

allows resulting md files to have more fields populated:

| function_name | function_type |             description             | comment |     example      |
|---------------|---------------|-------------------------------------|---------|------------------|
| dbgen         | table         | Generate TPCH tables and data       |         | CALL dbgen(sf=1) |
| tpch          | pragma        | Run a query on the generated tables |         | PRAGMA tpch(1)   |
| tpch_queries  | table         |                                     |         |                  |
| tpch_answers  | table         |                                     |         |                  |

(using tpch as example, but this holds the same with the provided file for h3)

Copy link
Contributor

@isaacbrodsky isaacbrodsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

H3 function description table looks good to me

extensions/h3/docs/function_descriptions.csv Show resolved Hide resolved
scripts/generate_md.sh Show resolved Hide resolved
carlopi added 9 commits June 30, 2024 13:23
It has been generated from the README, using duckdb, like:
COPY (SELECT function, description, '' as comment, '' as example FROM 'README.md') TO 'function_descriptions.csv';
./script/fetch_extensions.sh duckdb
Will fetch all each relevant extension usign the provided duckdb binary.

./script/generate_md.sh duckdb
Will, usign the just installed extensions, generate markdown files for each extension
@carlopi carlopi merged commit 8731113 into duckdb:main Jul 1, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants