Skip to content

Commit

Permalink
Merge branch 'main' into test_matrix
Browse files Browse the repository at this point in the history
  • Loading branch information
JE-Chen authored Dec 23, 2024
2 parents 5a8f1f8 + b450c1c commit d50de1e
Show file tree
Hide file tree
Showing 59 changed files with 3,790 additions and 3,309 deletions.
1 change: 1 addition & 0 deletions .asf.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ github:
required_approving_review_count: 1

required_linear_history: true
del_branch_on_merge: true
features:
wiki: true
issues: true
Expand Down
30 changes: 29 additions & 1 deletion .github/ISSUE_TEMPLATE/iceberg_bug_report.yml
Original file line number Diff line number Diff line change
@@ -1,3 +1,22 @@
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#

---
name: Iceberg Bug report 🐞
description: Problems, bugs and issues with Apache Iceberg
Expand All @@ -9,7 +28,8 @@ body:
description: What Apache Iceberg version are you using?
multiple: false
options:
- "0.8.0 (latest release)"
- "0.8.1 (latest release)"
- "0.8.0"
- "0.7.1"
- "0.7.0"
- "0.6.1"
Expand All @@ -31,3 +51,11 @@ body:
You can include files by dragging and dropping them here.
validations:
required: true
- type: checkboxes
attributes:
label: Willingness to contribute
description: The Apache Iceberg community encourages bug-fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the PyIceberg codebase?
options:
- label: I can contribute a fix for this bug independently
- label: I would be willing to contribute a fix for this bug with guidance from the Iceberg community
- label: I cannot contribute a fix for this bug at this time
19 changes: 19 additions & 0 deletions .github/ISSUE_TEMPLATE/iceberg_improvement.yml
Original file line number Diff line number Diff line change
@@ -1,3 +1,22 @@
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#

---
name: Iceberg Improvement / Feature Request
description: New features with Apache Iceberg
Expand Down
19 changes: 19 additions & 0 deletions .github/ISSUE_TEMPLATE/iceberg_question.yml
Original file line number Diff line number Diff line change
@@ -1,3 +1,22 @@
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#

---
name: Iceberg Question
description: Questions around Apache Iceberg
Expand Down
25 changes: 24 additions & 1 deletion .github/workflows/check-md-link.yml
Original file line number Diff line number Diff line change
@@ -1,12 +1,35 @@
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#

name: Check Markdown links

on:
push:
paths:
- mkdocs/**
- '.github/workflows/check-md-link.yml'
- 'mkdocs/**'
branches:
- 'main'
pull_request:
paths:
- '.github/workflows/check-md-link.yml'
- 'mkdocs/**'

jobs:
markdown-link-check:
Expand Down
13 changes: 13 additions & 0 deletions .github/workflows/python-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,19 @@ on:
branches:
- 'main'
pull_request:
paths:
- '**' # Include all files and directories in the repository by default.
- '!.github/workflows/**' # Exclude all workflow files
- '.github/workflows/python-ci.yml' # except the current file.
- '!.github/ISSUE_TEMPLATE/**' # Exclude files and directories that don't impact tests or code like templates, metadata, and documentation.
- '!.gitignore'
- '!.asf.yml'
- '!mkdocs/**'
- '!.gitattributes'
- '!README.md'
- '!CONTRIBUTING.md'
- '!LICENSE'
- '!NOTICE'

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
Expand Down
13 changes: 13 additions & 0 deletions .github/workflows/python-integration.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,19 @@ on:
branches:
- 'main'
pull_request:
paths:
- '**' # Include all files and directories in the repository by default.
- '!.github/workflows/**' # Exclude all workflow files
- '.github/workflows/python-integration.yml' # except the current file.
- '!.github/ISSUE_TEMPLATE/**' # Exclude files and directories that don't impact tests or code like templates, metadata, and documentation.
- '!.gitignore'
- '!.asf.yml'
- '!mkdocs/**'
- '!.gitattributes'
- '!README.md'
- '!CONTRIBUTING.md'
- '!LICENSE'
- '!NOTICE'

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
Expand Down
18 changes: 14 additions & 4 deletions .github/workflows/python-release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ jobs:
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ ubuntu-22.04, windows-2022, macos-12, macos-13, macos-14 ]
os: [ ubuntu-22.04, windows-2022, macos-13, macos-14, macos-15 ]

steps:
- uses: actions/checkout@v4
Expand Down Expand Up @@ -63,7 +63,7 @@ jobs:
if: startsWith(matrix.os, 'ubuntu')

- name: Build wheels
uses: pypa/cibuildwheel@v2.21.3
uses: pypa/cibuildwheel@v2.22.0
with:
output-dir: wheelhouse
config-file: "pyproject.toml"
Expand All @@ -84,7 +84,17 @@ jobs:
if: startsWith(matrix.os, 'ubuntu')
run: ls -lah dist/* && cp dist/* wheelhouse/

- uses: actions/upload-artifact@v3
- uses: actions/upload-artifact@v4
with:
name: "release-${{ github.event.inputs.version }}"
name: "release-${{ matrix.os }}"
path: ./wheelhouse/*
merge:
runs-on: ubuntu-latest
needs: build_wheels
steps:
- name: Merge Artifacts
uses: actions/upload-artifact/merge@v4
with:
name: "release-${{ github.event.inputs.version }}"
pattern: release-*
delete-merged: true
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ help: ## Display this help
install-poetry: ## Install poetry if the user has not done that yet.
@if ! command -v poetry &> /dev/null; then \
echo "Poetry could not be found. Installing..."; \
pip install --user poetry==1.8.4; \
pip install --user poetry==1.8.5; \
else \
echo "Poetry is already installed."; \
fi
Expand Down
1 change: 1 addition & 0 deletions dev/.rat-excludes
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
.rat-excludes
build
.git
.gitignore
poetry.lock
9 changes: 5 additions & 4 deletions dev/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -36,21 +36,22 @@ ENV PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.9.7-src.zip:$
RUN mkdir -p ${HADOOP_HOME} && mkdir -p ${SPARK_HOME} && mkdir -p /home/iceberg/spark-events
WORKDIR ${SPARK_HOME}

# Remember to also update `tests/conftest`'s spark setting
ENV SPARK_VERSION=3.5.3
ENV ICEBERG_SPARK_RUNTIME_VERSION=3.5_2.12
ENV ICEBERG_VERSION=1.6.0
ENV PYICEBERG_VERSION=0.8.0
ENV PYICEBERG_VERSION=0.8.1

RUN curl --retry 3 -s -C - https://archive.apache.org/dist/spark/spark-${SPARK_VERSION}/spark-${SPARK_VERSION}-bin-hadoop3.tgz -o spark-${SPARK_VERSION}-bin-hadoop3.tgz \
RUN curl --retry 5 -s -C - https://dlcdn.apache.org/spark/spark-${SPARK_VERSION}/spark-${SPARK_VERSION}-bin-hadoop3.tgz -o spark-${SPARK_VERSION}-bin-hadoop3.tgz \
&& tar xzf spark-${SPARK_VERSION}-bin-hadoop3.tgz --directory /opt/spark --strip-components 1 \
&& rm -rf spark-${SPARK_VERSION}-bin-hadoop3.tgz

# Download iceberg spark runtime
RUN curl -s https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-spark-runtime-${ICEBERG_SPARK_RUNTIME_VERSION}/${ICEBERG_VERSION}/iceberg-spark-runtime-${ICEBERG_SPARK_RUNTIME_VERSION}-${ICEBERG_VERSION}.jar -Lo iceberg-spark-runtime-${ICEBERG_SPARK_RUNTIME_VERSION}-${ICEBERG_VERSION}.jar \
RUN curl --retry 5 -s https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-spark-runtime-${ICEBERG_SPARK_RUNTIME_VERSION}/${ICEBERG_VERSION}/iceberg-spark-runtime-${ICEBERG_SPARK_RUNTIME_VERSION}-${ICEBERG_VERSION}.jar -Lo iceberg-spark-runtime-${ICEBERG_SPARK_RUNTIME_VERSION}-${ICEBERG_VERSION}.jar \
&& mv iceberg-spark-runtime-${ICEBERG_SPARK_RUNTIME_VERSION}-${ICEBERG_VERSION}.jar /opt/spark/jars

# Download AWS bundle
RUN curl -s https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-aws-bundle/${ICEBERG_VERSION}/iceberg-aws-bundle-${ICEBERG_VERSION}.jar -Lo /opt/spark/jars/iceberg-aws-bundle-${ICEBERG_VERSION}.jar
RUN curl --retry 5 -s https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-aws-bundle/${ICEBERG_VERSION}/iceberg-aws-bundle-${ICEBERG_VERSION}.jar -Lo /opt/spark/jars/iceberg-aws-bundle-${ICEBERG_VERSION}.jar

COPY spark-defaults.conf /opt/spark/conf
ENV PATH="/opt/spark/sbin:/opt/spark/bin:${PATH}"
Expand Down
4 changes: 2 additions & 2 deletions dev/check-license
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ else
declare java_cmd=java
fi

export RAT_VERSION=0.15
export RAT_VERSION=0.16.1
export rat_jar="$FWDIR"/lib/apache-rat-${RAT_VERSION}.jar
mkdir -p "$FWDIR"/lib

Expand All @@ -68,7 +68,7 @@ mkdir -p "$FWDIR"/lib
}

mkdir -p build
$java_cmd -jar "$rat_jar" -E "$FWDIR"/dev/.rat-excludes -d "$FWDIR" > build/rat-results.txt
$java_cmd -jar "$rat_jar" --scan-hidden-directories -E "$FWDIR"/dev/.rat-excludes -d "$FWDIR" > build/rat-results.txt

if [ $? -ne 0 ]; then
echo "RAT exited abnormally"
Expand Down
1 change: 0 additions & 1 deletion dev/docker-compose-azurite.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
version: "3"

services:
azurite:
Expand Down
1 change: 0 additions & 1 deletion dev/docker-compose-gcs-server.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
version: "3"

services:
gcs-server:
Expand Down
3 changes: 1 addition & 2 deletions dev/docker-compose-integration.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
version: "3"

services:
spark-iceberg:
Expand All @@ -41,7 +40,7 @@ services:
- hive:hive
- minio:minio
rest:
image: tabulario/iceberg-rest
image: apache/iceberg-rest-fixture
container_name: pyiceberg-rest
networks:
iceberg_net:
Expand Down
1 change: 0 additions & 1 deletion dev/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
version: "3"

services:
minio:
Expand Down
20 changes: 11 additions & 9 deletions dev/provision.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,17 @@
from pyiceberg.schema import Schema
from pyiceberg.types import FixedType, NestedField, UUIDType

spark = SparkSession.builder.getOrCreate()
# The configuration is important, otherwise we get many small
# parquet files with a single row. When a positional delete
# hits the Parquet file with one row, the parquet file gets
# dropped instead of having a merge-on-read delete file.
spark = (
SparkSession
.builder
.config("spark.sql.shuffle.partitions", "1")
.config("spark.default.parallelism", "1")
.getOrCreate()
)

catalogs = {
'rest': load_catalog(
Expand Down Expand Up @@ -120,10 +130,6 @@
"""
)

# Partitioning is not really needed, but there is a bug:
# https://github.com/apache/iceberg/pull/7685
spark.sql(f"ALTER TABLE {catalog_name}.default.test_positional_mor_deletes ADD PARTITION FIELD years(dt) AS dt_years")

spark.sql(
f"""
INSERT INTO {catalog_name}.default.test_positional_mor_deletes
Expand Down Expand Up @@ -168,10 +174,6 @@
"""
)

# Partitioning is not really needed, but there is a bug:
# https://github.com/apache/iceberg/pull/7685
spark.sql(f"ALTER TABLE {catalog_name}.default.test_positional_mor_double_deletes ADD PARTITION FIELD years(dt) AS dt_years")

spark.sql(
f"""
INSERT INTO {catalog_name}.default.test_positional_mor_double_deletes
Expand Down
1 change: 1 addition & 0 deletions mkdocs/docs/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
- Releases
- [Verify a release](verify-release.md)
- [How to release](how-to-release.md)
- [Release Notes](https://github.com/apache/iceberg-python/releases)
- [Code Reference](reference/)

<!-- markdown-link-check-enable-->
Expand Down
Loading

0 comments on commit d50de1e

Please sign in to comment.