Skip to content

Commit

Permalink
Adds sample usage for metadata udfs
Browse files Browse the repository at this point in the history
  • Loading branch information
aykut-bozkurt committed Oct 24, 2024
1 parent 0bfc8b6 commit aeaf940
Show file tree
Hide file tree
Showing 2 changed files with 81 additions and 2 deletions.
5 changes: 3 additions & 2 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
name: CI lints and tests
on:
push:
branches:
- "*"
branches: [ "main" ]
pull_request:
branches: [ "main" ]

concurrency:
group: ${{ github.ref }}
Expand Down
78 changes: 78 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,13 +98,91 @@ SELECT * FROM product_example;
### Inspect Parquet schema
You can call `SELECT * FROM parquet.schema(<uri>)` to discover the schema of the Parquet file at given uri.

```sql
SELECT * FROM parquet.schema('/tmp/product_example.parquet')
uri | name | type_name | type_length | repetition_type | num_children | converted_type | scale | precision | field_id | logical_type
------------------------------+--------------+------------+-------------+-----------------+--------------+------------------+-------+-----------+----------+--------------
/tmp/product_example.parquet | arrow_schema | | | | 5 | | | | |
/tmp/product_example.parquet | id | INT32 | | OPTIONAL | | | | | 0 |
/tmp/product_example.parquet | product | | | OPTIONAL | 3 | | | | 1 |
/tmp/product_example.parquet | id | INT32 | | OPTIONAL | | | | | 2 |
/tmp/product_example.parquet | name | BYTE_ARRAY | | OPTIONAL | | UTF8 | | | 3 | STRING
/tmp/product_example.parquet | items | | | OPTIONAL | 1 | LIST | | | 4 | LIST
/tmp/product_example.parquet | list | | | REPEATED | 1 | | | | |
/tmp/product_example.parquet | items | | | OPTIONAL | 3 | | | | 5 |
/tmp/product_example.parquet | id | INT32 | | OPTIONAL | | | | | 6 |
/tmp/product_example.parquet | name | BYTE_ARRAY | | OPTIONAL | | UTF8 | | | 7 | STRING
/tmp/product_example.parquet | price | FLOAT | | OPTIONAL | | | | | 8 |
/tmp/product_example.parquet | products | | | OPTIONAL | 1 | LIST | | | 9 | LIST
/tmp/product_example.parquet | list | | | REPEATED | 1 | | | | |
/tmp/product_example.parquet | products | | | OPTIONAL | 3 | | | | 10 |
/tmp/product_example.parquet | id | INT32 | | OPTIONAL | | | | | 11 |
/tmp/product_example.parquet | name | BYTE_ARRAY | | OPTIONAL | | UTF8 | | | 12 | STRING
/tmp/product_example.parquet | items | | | OPTIONAL | 1 | LIST | | | 13 | LIST
/tmp/product_example.parquet | list | | | REPEATED | 1 | | | | |
/tmp/product_example.parquet | items | | | OPTIONAL | 3 | | | | 14 |
/tmp/product_example.parquet | id | INT32 | | OPTIONAL | | | | | 15 |
/tmp/product_example.parquet | name | BYTE_ARRAY | | OPTIONAL | | UTF8 | | | 16 | STRING
/tmp/product_example.parquet | price | FLOAT | | OPTIONAL | | | | | 17 |
/tmp/product_example.parquet | created_at | INT64 | | OPTIONAL | | TIMESTAMP_MICROS | | | 18 | TIMESTAMP
/tmp/product_example.parquet | updated_at | INT64 | | OPTIONAL | | TIMESTAMP_MICROS | | | 19 | TIMESTAMP
(24 rows)
```

### Inspect Parquet metadata
You can call `SELECT * FROM parquet.metadata(<uri>)` to discover the detailed metadata of the Parquet file, such as column statistics, at given uri.

```sql
SELECT uri, row_group_id, row_group_num_rows, row_group_num_columns, row_group_bytes, column_id, file_offset, num_values, path_in_schema, type_name FROM parquet.metadata('/tmp/product_example.parquet') LIMIT 1;
uri | row_group_id | row_group_num_rows | row_group_num_columns | row_group_bytes | column_id | file_offset | num_values | path_in_schema | type_name
------------------------------+--------------+--------------------+-----------------------+-----------------+-----------+-------------+------------+----------------+-----------
/tmp/product_example.parquet | 0 | 1 | 13 | 842 | 0 | 0 | 1 | id | INT32
(1 row)
```

```sql
SELECT stats_null_count, stats_distinct_count, stats_min, stats_max, compression, encodings, index_page_offset, dictionary_page_offset, data_page_offset, total_compressed_size, total_uncompressed_size FROM parquet.metadata('/tmp/product_example.parquet') LIMIT 1;
stats_null_count | stats_distinct_count | stats_min | stats_max | compression | encodings | index_page_offset | dictionary_page_offset | data_page_offset | total_compressed_size | total_uncompressed_size
------------------+----------------------+-----------+-----------+--------------------+--------------------------+-------------------+------------------------+------------------+-----------------------+-------------------------
0 | | 1 | 1 | GZIP(GzipLevel(6)) | PLAIN,RLE,RLE_DICTIONARY | | 4 | 42 | 101 | 61
(1 row)
```

You can call `SELECT * FROM parquet.file_metadata(<uri>)` to discover file level metadata of the Parquet file, such as format version, at given uri.

```sql
SELECT * FROM parquet.file_metadata('/tmp/product_example.parquet')
uri | created_by | num_rows | num_row_groups | format_version
------------------------------+------------+----------+----------------+----------------
/tmp/product_example.parquet | pg_parquet | 1 | 1 | 1
(1 row)
```

You can call `SELECT * FROM parquet.kv_metadata(<uri>)` to query custom key-value metadata of the Parquet file at given uri.

```sql
SELECT uri, encode(key, 'escape') as key, encode(value, 'escape') as value FROM parquet.kv_metadata('/tmp/product_example.parquet');
uri | key | value
------------------------------+--------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
/tmp/product_example.parquet | ARROW:schema | /////5gIAAAQAAAAAAAKAAwACgAJAAQACgAAABAAAAAAAQQACAAIAAAABAAIAAAABAAAAAUAAAD0BwAAlAQAAPwAAACIAAAABAAAADL4//9IAAAAHAAAAAwAAAAAAAEKKAAAAAAAAAAIAAwACgAEAAgAAAAIAAAAAAACAAYAAAArMDA6MDAAAAoAAAB.
| |.1cGRhdGVkX2F0AAABAAAABAAAADT4//8IAAAADAAAAAIAAAAxOQAAEAAAAFBBUlFVRVQ6ZmllbGRfaWQAAAAAsvj//zgAAAAUAAAADAAAAAAAAQoYAAAAAAAAAGr7//8AAAIAAAAAAAAAAAAKAAAAY3JlYXRlZF9hdAAAAQAAAAQAAACk+P//CAAAAA.
| |.wAAAACAAAAMTgAABAAAABQQVJRVUVUOmZpZWxkX2lkAAAAACL5//9cAwAAGAAAAAwAAAAAAAEMPAMAAAEAAAAIAAAALPr//0b5///0AgAAIAAAAAwAAAAAAAEN1AIAAAMAAABoAgAABAIAAAgAAABY+v//cvn//8ABAAAYAAAADAAAAAAAAQykAQAAA.
| |.QAAAAgAAAB8+v//lvn//1wBAAAgAAAADAAAAAAAAQ1AAQAAAwAAANQAAABwAAAACAAAAKj6///C+f//LAAAABAAAAAUAAAAAAABAxAAAAB2/P//AAABAAAAAAAFAAAAcHJpY2UAAAABAAAABAAAAKj5//8IAAAADAAAAAIAAAAxNwAAEAAAAFBBUlFV.
| |.RVQ6ZmllbGRfaWQAAAAAJvr//ygAAAAUAAAADAAAAAAAAQUMAAAAAAAAACz7//8EAAAAbmFtZQAAAAABAAAABAAAAAj6//8IAAAADAAAAAIAAAAxNgAAEAAAAFBBUlFVRVQ6ZmllbGRfaWQAAAAAhvr//ywAAAAQAAAAGAAAAAAAAQIUAAAAdPr//yA.
| |.AAAAAAAABAAAAAAIAAABpZAAAAQAAAAQAAABs+v//CAAAAAwAAAACAAAAMTUAABAAAABQQVJRVUVUOmZpZWxkX2lkAAAAAAUAAABpdGVtcwAAAAEAAAAEAAAArPr//wgAAAAMAAAAAgAAADE0AAAQAAAAUEFSUVVFVDpmaWVsZF9pZAAAAAAFAAAAaX.
| |.RlbXMAAAABAAAABAAAAOz6//8IAAAADAAAAAIAAAAxMwAAEAAAAFBBUlFVRVQ6ZmllbGRfaWQAAAAAavv//ygAAAAUAAAADAAAAAAAAQUMAAAAAAAAAHD8//8EAAAAbmFtZQAAAAABAAAABAAAAEz7//8IAAAADAAAAAIAAAAxMgAAEAAAAFBBUlFVR.
| |.VQ6ZmllbGRfaWQAAAAAyvv//ywAAAAQAAAAGAAAAAAAAQIUAAAAuPv//yAAAAAAAAABAAAAAAIAAABpZAAAAQAAAAQAAACw+///CAAAAAwAAAACAAAAMTEAABAAAABQQVJRVUVUOmZpZWxkX2lkAAAAAAgAAABwcm9kdWN0cwAAAAABAAAABAAAAPT7.
| |.//8IAAAADAAAAAIAAAAxMAAAEAAAAFBBUlFVRVQ6ZmllbGRfaWQAAAAACAAAAHByb2R1Y3RzAAAAAAEAAAAEAAAAOPz//wgAAAAMAAAAAQAAADkAAAAQAAAAUEFSUVVFVDpmaWVsZF9pZAAAAAC2/P//FAMAACAAAAAMAAAAAAABDfgCAAADAAAAjAI.
| |.AACQCAAAIAAAAyP3//+L8///gAQAAGAAAAAwAAAAAAAEMxAEAAAEAAAAIAAAA7P3//wb9//98AQAAJAAAAAwAAAAAAAENYAEAAAMAAAD0AAAAkAAAACAAAAAEAAYABAAAAAAAEgAaABQAEgATAAgAAAAMAAQAEgAAADQAAAAYAAAAHAAAAAAAAQMYAA.
| |.AAAAAGAAgABgAGAAAAAAABAAAAAAAFAAAAcHJpY2UAAAABAAAABAAAADj9//8IAAAADAAAAAEAAAA4AAAAEAAAAFBBUlFVRVQ6ZmllbGRfaWQAAAAAtv3//ygAAAAUAAAADAAAAAAAAQUMAAAAAAAAALz+//8EAAAAbmFtZQAAAAABAAAABAAAAJj9/.
| |./8IAAAADAAAAAEAAAA3AAAAEAAAAFBBUlFVRVQ6ZmllbGRfaWQAAAAAFv7//ywAAAAQAAAAGAAAAAAAAQIUAAAABP7//yAAAAAAAAABAAAAAAIAAABpZAAAAQAAAAQAAAD8/f//CAAAAAwAAAABAAAANgAAABAAAABQQVJRVUVUOmZpZWxkX2lkAAAA.
| |.AAUAAABpdGVtcwAAAAEAAAAEAAAAPP7//wgAAAAMAAAAAQAAADUAAAAQAAAAUEFSUVVFVDpmaWVsZF9pZAAAAAAFAAAAaXRlbXMAAAABAAAABAAAAHz+//8IAAAADAAAAAEAAAA0AAAAEAAAAFBBUlFVRVQ6ZmllbGRfaWQAAAAA+v7//ywAAAAYAAA.
| |.ADAAAAAAAAQUQAAAAAAAAAAQABAAEAAAABAAAAG5hbWUAAAAAAQAAAAQAAADg/v//CAAAAAwAAAABAAAAMwAAABAAAABQQVJRVUVUOmZpZWxkX2lkAAAAAF7///8sAAAAEAAAABgAAAAAAAECFAAAAEz///8gAAAAAAAAAQAAAAACAAAAaWQAAAEAAA.
| |.AEAAAARP///wgAAAAMAAAAAQAAADIAAAAQAAAAUEFSUVVFVDpmaWVsZF9pZAAAAAAHAAAAcHJvZHVjdAABAAAABAAAAIT///8IAAAADAAAAAEAAAAxAAAAEAAAAFBBUlFVRVQ6ZmllbGRfaWQAABIAGAAUABIAEwAIAAAADAAEABIAAAA0AAAAGAAAA.
| |.CAAAAAAAAECHAAAAAgADAAEAAsACAAAACAAAAAAAAABAAAAAAIAAABpZAAAAQAAAAwAAAAIAAwACAAEAAgAAAAIAAAADAAAAAEAAAAwAAAAEAAAAFBBUlFVRVQ6ZmllbGRfaWQAAAAA
(1 row)
```

## Object Store Support
`pg_parquet` supports reading and writing Parquet files from/to `S3` object store. Only the uris with `s3://` scheme is supported.

Expand Down

0 comments on commit aeaf940

Please sign in to comment.