Exploring Apache Iceberg

This repo the is the accompaniment to the blog post where we explore the basics of setting up Hive, JDBC and REST Iceberg catalogs, and test writing and reading to Iceberg using Spark and Trino.

Prerequisites

You would need these to get things working:

Git
Docker/Podman Compose

Setup

Clone this repo down, including submodules:

git clone --recurse-submodules [email protected]:binayakd/exploring-apache-iceberg.git

Inside the cloned folder, trigger the image builds:

docker compose build

This will build the following Images:

jupyter-spark: this is the Jupyter Lab based development environment with all the client dependencies installed
hive-metastore: this will be used as the Iceberg Hive Catalog
iceberg-rest-catalog: this is a python Iceberg REST catalog by Kevin Liu, which I have forked, and added to this repo as a submodule

Now start all the services:

docker compose up

On top of the 3 images mentioned above, this will also start the following images:

minio: this will be our local S3 alternative, the object storage holding the data
mc: this is the Minio client image, which is started to automatically create the initial bucker in Minio, then shutdown.
postgres: this is the Postgres that will be used by the catalogs. An init script in the postgres-init folder is used to create the required databases in the postgres instances on first startup.
trino: this is the Trino server, running as a single node cluster, with all the configs in the trino-configfolder

Jupyter Lab Notebooks

Once the setup is done, the Jupyter lab instance can be accessed at: http://localhost:8888. There you will see the list of Jupyter Notebooks, which you can follow along in order:

00-setup.ipynb
01-iceberg-hive.ipynb
02-iceberg-jdbc.ipynb
03-iceberg-rest.ipynb

These are located in the workspace folder in this repo.

The data created when running the notebooks will be saved under the local-data folder.

Permissions issues

If you run into permissions issues in the workspace and the local-data folders, you can run the permissions-fix.sh script to try and fix it.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
hive-metastore		hive-metastore
iceberg-rest-catalog @ f379de5		iceberg-rest-catalog @ f379de5
jupyter-spark		jupyter-spark
local-data		local-data
postgres-init		postgres-init
trino-config		trino-config
workspace		workspace
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
docker-compose.yaml		docker-compose.yaml
permissions-fix.sh		permissions-fix.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Exploring Apache Iceberg

Prerequisites

Setup

Jupyter Lab Notebooks

Permissions issues

About

Releases

Packages

Languages

License

binayakd/exploring-apache-iceberg

Folders and files

Latest commit

History

Repository files navigation

Exploring Apache Iceberg

Prerequisites

Setup

Jupyter Lab Notebooks

Permissions issues

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages