Skip to content

Commit

Permalink
Add docs for Iceberg incremental MV refresh
Browse files Browse the repository at this point in the history
Co-authored-by: Manfred Moser <[email protected]>
  • Loading branch information
marton-bod and mosabua authored Jul 2, 2024
1 parent 3acbb30 commit 279fdd4
Showing 1 changed file with 24 additions and 9 deletions.
33 changes: 24 additions & 9 deletions docs/src/main/sphinx/connector/iceberg.md
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,16 @@ implementation is used:
`query_partition_filter_required` catalog session property for temporary,
catalog specific use.
- `false`
* - `iceberg.incremental-refresh-enabled`
- Set to `false` to force the materialized view refresh operation to always
perform a full refresh. You can use the `incremental_refresh_enabled`
catalog session property for temporary, catalog specific use. In the
majority of cases, using incremental refresh, as compared to a full refresh,
is beneficial since a much smaller subset of the source tables needs to be
scanned. While incremental refresh may scan less data, it may result in the
creation of more data files, since it uses the append operation to insert
the new records.
- `true`
:::

(iceberg-file-system-configuration)=
Expand Down Expand Up @@ -751,7 +761,7 @@ WITH (
orc_bloom_filter_fpp = 0.05)
```

The table definition below specifies to use Avro files, partitioning
The table definition below specifies to use Avro files, partitioning
by `child1` field in `parent` column:

```
Expand Down Expand Up @@ -1501,15 +1511,20 @@ Creating a materialized view does not automatically populate it with data. You
must run {doc}`/sql/refresh-materialized-view` to populate data in the
materialized view.

Updating the data in the materialized view with `REFRESH MATERIALIZED VIEW`
Updating the data in the materialized view can be achieved using the `REFRESH
MATERIALIZED VIEW` command. This operation may perform either an incremental or
a full refresh, depending on the complexity of the materialized view definition
and the snapshot history of the source tables. For a full refresh, the operation
deletes the data from the storage table, and inserts the data that is the result
of executing the materialized view query into the existing table. Data is
replaced atomically, so users can continue to query the materialized view while
it is being refreshed. Refreshing a materialized view also stores the
snapshot-ids of all Iceberg tables that are part of the materialized view's
query in the materialized view metadata. When the materialized view is queried,
the snapshot-ids are used to check if the data in the storage table is up to
date.
of executing the materialized view query into the existing table. For
incremental refresh, the existing data is not deleted from the storage table and
only the delta records are processed from the source tables and appended into
the storage table as needed. In both cases, data is replaced or appended
atomically, so users can continue to query the materialized view while it is
being refreshed. Refreshing a materialized view also stores the snapshot-ids of
all Iceberg tables that are part of the materialized view's query in the
materialized view metadata. When the materialized view is queried, the
snapshot-ids are used to check if the data in the storage table is up to date.

Materialized views that use non-Iceberg tables in the query show the [default
behavior around grace periods](mv-grace-period). If all tables are Iceberg
Expand Down

0 comments on commit 279fdd4

Please sign in to comment.