Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add Manifest Stats in snapshot summary. #13

Closed
wants to merge 55 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
96587ab
Flink: Remove MiniClusterResource (#10817)
tomtongue Jul 30, 2024
4d1ceac
Docs: Use link addresses instead of descriptions in releases.md (#10815)
lurnagao-dahua Jul 30, 2024
0ff90e7
Build: Declare avro as an api dependency of iceberg-core (#10573)
devinrsmith Jul 30, 2024
72b39ab
Flink: backport PR #10748 for limit pushdown (#10813)
stevenzwu Jul 30, 2024
76dba8f
Docs: Fix header for entries metadata table (#10826)
gaborkaszab Jul 31, 2024
506fee4
Spark 3.5: Support Reporting Column Stats (#10659)
huaxingao Jul 31, 2024
84c9125
Flink: Backport #10548 to v1.18 and v1.17 (#10776)
venkata91 Aug 1, 2024
806da5c
Infra: Improve feature request template (#10825)
nastra Aug 1, 2024
99b8e88
Core: Replace the duplicated ALL_DATA_FILES with ALL_DELETE_FILES (#1…
hsiang-c Aug 1, 2024
eb9d395
Core: Adds Basic Classes for Iceberg Table Version 3 (#10760)
RussellSpitzer Aug 1, 2024
39373d0
Core: Allow SnapshotProducer to skip uncommitted manifest cleanup aft…
grantatspothero Aug 1, 2024
6e7113a
Flink: a few small fixes or tuning for range partitioner (#10823)
stevenzwu Aug 1, 2024
9a67f0b
Drop support for Java 8 (#10518)
findepi Aug 2, 2024
c2db97c
Build: Bump com.adobe.testing:s3mock-junit5 from 2.11.0 to 2.17.0 (#1…
nastra Aug 2, 2024
122176a
Core: Upgrade Jetty and Servlet API (#10850)
nastra Aug 2, 2024
08aed72
Build: Configure options.release = 11 / remove com.palantir.baseline-…
snazy Aug 2, 2024
3929575
Build: Bump kafka from 3.7.1 to 3.8.0 (#10797)
dependabot[bot] Aug 2, 2024
674214c
Build: Update baseline gradle plugin to 5.58.0 (#10788)
findepi Aug 2, 2024
dc7ad71
Flink: refactor sink tests to reduce the number of combinations with …
stevenzwu Aug 2, 2024
af75440
Flink: backport PR #10823 for range partitioner fixup (#10847)
stevenzwu Aug 2, 2024
b17d1c9
Core: Remove reflection from TestParallelIterable (#10857)
findepi Aug 2, 2024
479f468
Spec: Deprecate the file system table scheme (#10833)
rdblue Aug 4, 2024
d9aacd2
Core, API: UpdatePartitionSpec: Added ability to create a new Partiti…
shanielh Aug 5, 2024
4cfa38f
Build: Bump com.palantir.baseline:gradle-baseline-java (#10864)
dependabot[bot] Aug 5, 2024
98ecc9a
Build: Bump nessie from 0.94.2 to 0.94.4 (#10869)
dependabot[bot] Aug 5, 2024
e8582c0
Build: Bump org.xerial:sqlite-jdbc from 3.46.0.0 to 3.46.0.1 (#10871)
dependabot[bot] Aug 5, 2024
1f21989
Build: Bump org.apache.commons:commons-compress from 1.26.0 to 1.26.2…
dependabot[bot] Aug 5, 2024
9b70fdf
Build: Bump software.amazon.awssdk:bom from 2.26.25 to 2.26.29 (#10866)
dependabot[bot] Aug 5, 2024
74a9adb
Build: Bump mkdocs-material from 9.5.30 to 9.5.31 (#10863)
dependabot[bot] Aug 5, 2024
722a350
Build: Fix Scala compilation (#10860)
snazy Aug 5, 2024
87537f9
Build: Enable FormatStringAnnotation error-prone check (#10856)
findepi Aug 5, 2024
5fc1413
Core: Use encoding/decoding methods for namespaces and deprecate Spli…
nastra Aug 5, 2024
04c2533
Aliyun: Replace assert usage with assertThat (#10880)
nastra Aug 5, 2024
b531e97
Core: Extract filePath comparator into it's own class (#10664)
deniskuzZ Aug 5, 2024
3d364f6
Docs: Fix SQL in branching docs (#10876)
nakaken-churadata Aug 5, 2024
e9364fa
API: Add SupportsRecoveryOperations mixin for FileIO (#10711)
amogh-jahagirdar Aug 5, 2024
525d887
Spec: Clarify identity partition edge cases (#10835)
emkornfield Aug 6, 2024
6ee6d13
Build: Bump org.testcontainers:testcontainers from 1.20.0 to 1.20.1 (…
dependabot[bot] Aug 6, 2024
93f7839
Flink: move v1.19 to v.120
stevenzwu Aug 5, 2024
fb60ecd
Flink: add v1.19 back after coping from 1.20
stevenzwu Aug 5, 2024
0d8f2c4
Flink: remove v1.17 module
stevenzwu Aug 5, 2024
38733f8
Flink: adjust code for the new 1.20 module.
stevenzwu Aug 5, 2024
257b1d7
Build: Add checkstyle rule to ban assert usage (#10886)
nastra Aug 6, 2024
86611d9
Build: Bump Apache Avro to 1.12.0 (#10879)
Fokko Aug 7, 2024
8ec65ab
Spec: Fix rendering of unified partition struct (#10896)
findepi Aug 7, 2024
71b6439
Docs: Fix catalog name for S3 MRAP example (#10897)
tomtongue Aug 7, 2024
a3cbdcb
Add Flink 1.20 & remove Flink 1.17 in stage-binaries.sh and docs (#10…
snazy Aug 7, 2024
97e034b
Flink: Remove deprecated RowDataUtil.clone method (#10902)
findepi Aug 7, 2024
3bee806
AWS: Fix flaky TestS3RestSigner (#10898)
nastra Aug 8, 2024
70c506e
AWS: Implement SupportsRecoveryOperations mixin for S3FileIO (#10721)
amogh-jahagirdar Aug 8, 2024
d17a7f1
Core: Remove deprecated APIs for 1.7.0 (#10818)
nk1506 Aug 9, 2024
79620e1
Core, Flink: Fix build warnings (#10899)
nk1506 Aug 9, 2024
ae08334
Build: Bump Spark 3.5 to 3.5.2 (#10918)
manuzhang Aug 12, 2024
6ae2956
Add Manifest Stats in snapshot summary.
nk1506 Apr 16, 2024
da834c1
Fixed conflicts
nk1506 Aug 12, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
3 changes: 3 additions & 0 deletions .baseline/checkstyle/checkstyle.xml
Original file line number Diff line number Diff line change
Expand Up @@ -414,6 +414,9 @@
<property name="format" value="@Test\(.*expected.*\)"/>
<property name="message" value="Prefer using Assertions.assertThatThrownBy(...).isInstanceOf(...) instead."/>
</module>
<module name="IllegalToken">
<property name="tokens" value="LITERAL_ASSERT"/>
</module>
<module name="IllegalImport">
<property name="id" value="BanExpectedExceptionUsage"/>
<property name="illegalClasses" value="org.junit.rules.ExpectedException"/>
Expand Down
10 changes: 9 additions & 1 deletion .github/ISSUE_TEMPLATE/iceberg_improvement.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,4 +50,12 @@ body:
- Hive
- Other
validations:
required: false
required: false
- type: checkboxes
attributes:
label: Willingness to contribute
description: The Apache Iceberg community encourages contributions. Would you or another member of your organization be willing to contribute this improvement/feature to the Apache Iceberg codebase?
options:
- label: I can contribute this improvement/feature independently
- label: I would be willing to contribute this improvement/feature with guidance from the Iceberg community
- label: I cannot contribute this improvement/feature at this time
4 changes: 2 additions & 2 deletions .github/workflows/delta-conversion-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ jobs:
runs-on: ubuntu-22.04
strategy:
matrix:
jvm: [8, 11, 17, 21]
jvm: [11, 17, 21]
env:
SPARK_LOCAL_IP: localhost
steps:
Expand Down Expand Up @@ -100,7 +100,7 @@ jobs:
runs-on: ubuntu-22.04
strategy:
matrix:
jvm: [8, 11, 17, 21]
jvm: [11, 17, 21]
env:
SPARK_LOCAL_IP: localhost
steps:
Expand Down
11 changes: 2 additions & 9 deletions .github/workflows/flink-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -73,15 +73,8 @@ jobs:
runs-on: ubuntu-22.04
strategy:
matrix:
jvm: [8, 11, 17, 21]
flink: ['1.17', '1.18', '1.19']
exclude:
# Flink 1.17 does not support Java 17.
- jvm: 17
flink: '1.17'
# Flink 1.17 does not support Java 21.
- jvm: 21
flink: '1.17'
jvm: [11, 17, 21]
flink: ['1.18', '1.19', '1.20']
env:
SPARK_LOCAL_IP: localhost
steps:
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/hive-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ jobs:
runs-on: ubuntu-22.04
strategy:
matrix:
jvm: [8, 11, 17, 21]
jvm: [11, 17, 21]
env:
SPARK_LOCAL_IP: localhost
steps:
Expand Down Expand Up @@ -98,7 +98,7 @@ jobs:
runs-on: ubuntu-22.04
strategy:
matrix:
jvm: [8, 11, 17, 21]
jvm: [11, 17, 21]
env:
SPARK_LOCAL_IP: localhost
steps:
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/java-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ jobs:
runs-on: ubuntu-22.04
strategy:
matrix:
jvm: [8, 11, 17, 21]
jvm: [11, 17, 21]
env:
SPARK_LOCAL_IP: localhost
steps:
Expand Down Expand Up @@ -94,7 +94,7 @@ jobs:
runs-on: ubuntu-22.04
strategy:
matrix:
jvm: [8, 11, 17, 21]
jvm: [11, 17, 21]
steps:
- uses: actions/checkout@v4
- uses: actions/setup-java@v4
Expand All @@ -107,7 +107,7 @@ jobs:
runs-on: ubuntu-22.04
strategy:
matrix:
jvm: [8, 11, 17, 21]
jvm: [11, 17, 21]
steps:
- uses: actions/checkout@v4
- uses: actions/setup-java@v4
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/publish-snapshot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ jobs:
- uses: actions/setup-java@v4
with:
distribution: zulu
java-version: 8
java-version: 11
- run: |
./gradlew printVersion
./gradlew -DallModules publishApachePublicationToMavenRepository -PmavenUser=${{ secrets.NEXUS_USER }} -PmavenPassword=${{ secrets.NEXUS_PW }}
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/spark-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ jobs:
runs-on: ubuntu-22.04
strategy:
matrix:
jvm: [8, 11, 17, 21]
jvm: [11, 17, 21]
spark: ['3.3', '3.4', '3.5']
scala: ['2.12', '2.13']
exclude:
Expand Down
81 changes: 81 additions & 0 deletions .palantir/revapi.yml
Original file line number Diff line number Diff line change
Expand Up @@ -1056,6 +1056,87 @@ acceptedBreaks:
- code: "java.method.removed"
old: "method org.apache.iceberg.DataFiles.Builder org.apache.iceberg.DataFiles.Builder::withEqualityFieldIds(java.util.List<java.lang.Integer>)"
justification: "Deprecations for 1.6.0 release"
"1.6.0":
org.apache.iceberg:iceberg-common:
- code: "java.method.removed"
old: "method <T> org.apache.iceberg.common.DynFields.StaticField<T> org.apache.iceberg.common.DynFields.Builder::buildStaticChecked()\
\ throws java.lang.NoSuchFieldException"
justification: "Removing deprecated code"
- code: "java.method.removed"
old: "method java.lang.Class<? extends C> org.apache.iceberg.common.DynConstructors.Ctor<C>::getConstructedClass()"
justification: "Removing deprecated code"
- code: "java.method.removed"
old: "method org.apache.iceberg.common.DynConstructors.Builder org.apache.iceberg.common.DynConstructors.Builder::hiddenImpl(java.lang.Class<?>[])"
justification: "Removing deprecated code"
- code: "java.method.removed"
old: "method org.apache.iceberg.common.DynMethods.Builder org.apache.iceberg.common.DynMethods.Builder::ctorImpl(java.lang.Class<?>,\
\ java.lang.Class<?>[])"
justification: "Removing deprecated code"
- code: "java.method.removed"
old: "method org.apache.iceberg.common.DynMethods.Builder org.apache.iceberg.common.DynMethods.Builder::ctorImpl(java.lang.String,\
\ java.lang.Class<?>[])"
justification: "Removing deprecated code"
- code: "java.method.visibilityReduced"
old: "method <R> R org.apache.iceberg.common.DynMethods.UnboundMethod::invokeChecked(java.lang.Object,\
\ java.lang.Object[]) throws java.lang.Exception"
new: "method <R> R org.apache.iceberg.common.DynMethods.UnboundMethod::invokeChecked(java.lang.Object,\
\ java.lang.Object[]) throws java.lang.Exception"
justification: "Reduced visibility and scoped to package"
org.apache.iceberg:iceberg-core:
- code: "java.class.removed"
old: "enum org.apache.iceberg.BaseMetastoreTableOperations.CommitStatus"
justification: "Removing deprecated code"
- code: "java.method.addedToInterface"
new: "method org.apache.iceberg.metrics.CounterResult org.apache.iceberg.metrics.CommitMetricsResult::totalDataManifestFiles()"
justification: "Added new parameters for manifest stats"
- code: "java.method.addedToInterface"
new: "method org.apache.iceberg.metrics.CounterResult org.apache.iceberg.metrics.CommitMetricsResult::totalDeleteManifestFiles()"
justification: "Added new parameters for manifest stats"
- code: "java.method.removed"
old: "method java.lang.String org.apache.iceberg.FileScanTaskParser::toJson(org.apache.iceberg.FileScanTask)"
justification: "Removing deprecated code"
- code: "java.method.removed"
old: "method org.apache.iceberg.FileScanTask org.apache.iceberg.FileScanTaskParser::fromJson(java.lang.String,\
\ boolean)"
justification: "Removing deprecated code"
- code: "java.method.removed"
old: "method org.apache.iceberg.io.ContentCache.CacheEntry org.apache.iceberg.io.ContentCache::get(java.lang.String,\
\ java.util.function.Function<java.lang.String, org.apache.iceberg.io.ContentCache.FileContent>)"
justification: "Removing deprecated code"
- code: "java.method.removed"
old: "method org.apache.iceberg.io.ContentCache.CacheEntry org.apache.iceberg.io.ContentCache::getIfPresent(java.lang.String)"
justification: "Removing deprecated code"
- code: "java.method.removed"
old: "method org.apache.iceberg.io.InputFile org.apache.iceberg.io.ContentCache::tryCache(org.apache.iceberg.io.FileIO,\
\ java.lang.String, long)"
justification: "Removing deprecated code"
- code: "java.method.removed"
old: "method org.apache.iceberg.io.OutputFile org.apache.iceberg.SnapshotProducer<ThisT>::newManifestOutput()\
\ @ org.apache.iceberg.BaseOverwriteFiles"
justification: "Removing deprecated code"
- code: "java.method.removed"
old: "method org.apache.iceberg.io.OutputFile org.apache.iceberg.SnapshotProducer<ThisT>::newManifestOutput()\
\ @ org.apache.iceberg.BaseReplacePartitions"
justification: "Removing deprecated code"
- code: "java.method.removed"
old: "method org.apache.iceberg.io.OutputFile org.apache.iceberg.SnapshotProducer<ThisT>::newManifestOutput()\
\ @ org.apache.iceberg.BaseRewriteManifests"
justification: "Removing deprecated code"
- code: "java.method.removed"
old: "method org.apache.iceberg.io.OutputFile org.apache.iceberg.SnapshotProducer<ThisT>::newManifestOutput()\
\ @ org.apache.iceberg.StreamingDelete"
justification: "Removing deprecated code"
- code: "java.method.removed"
old: "method void org.apache.iceberg.rest.auth.OAuth2Util.AuthSession::<init>(java.util.Map<java.lang.String,\
\ java.lang.String>, java.lang.String, java.lang.String, java.lang.String,\
\ java.lang.String, java.lang.String)"
justification: "Removing deprecated code"
- code: "java.method.returnTypeChanged"
old: "method org.apache.iceberg.BaseMetastoreTableOperations.CommitStatus org.apache.iceberg.BaseMetastoreTableOperations::checkCommitStatus(java.lang.String,\
\ org.apache.iceberg.TableMetadata)"
new: "method org.apache.iceberg.BaseMetastoreOperations.CommitStatus org.apache.iceberg.BaseMetastoreTableOperations::checkCommitStatus(java.lang.String,\
\ org.apache.iceberg.TableMetadata)"
justification: "Removing deprecated code"
apache-iceberg-0.14.0:
org.apache.iceberg:iceberg-api:
- code: "java.class.defaultSerializationChanged"
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ Community discussions happen primarily on the [dev mailing list][dev-list] or on

### Building

Iceberg is built using Gradle with Java 8, 11, 17, or 21.
Iceberg is built using Gradle with Java 11, 17, or 21.

* To invoke a build and run tests: `./gradlew build`
* To skip tests: `./gradlew build -x test -x integrationTest`
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@
*/
package org.apache.iceberg.aliyun.oss.mock;

import static org.assertj.core.api.Assertions.assertThat;

import com.aliyun.oss.OSSErrorCode;
import com.aliyun.oss.model.Bucket;
import com.fasterxml.jackson.databind.ObjectMapper;
Expand Down Expand Up @@ -137,7 +139,9 @@ ObjectMetadata putObject(
Map<String, String> userMetaData)
throws IOException {
File bucketDir = new File(root, bucketName);
assert bucketDir.exists() || bucketDir.mkdirs();
assertThat(bucketDir)
.satisfiesAnyOf(
bucket -> assertThat(bucket).exists(), bucket -> assertThat(bucket.mkdirs()).isTrue());

File dataFile = new File(bucketDir, fileName + DATA_FILE);
File metaFile = new File(bucketDir, fileName + META_FILE);
Expand Down Expand Up @@ -170,17 +174,21 @@ ObjectMetadata putObject(

void deleteObject(String bucketName, String filename) {
File bucketDir = new File(root, bucketName);
assert bucketDir.exists();
assertThat(bucketDir).exists();

File dataFile = new File(bucketDir, filename + DATA_FILE);
File metaFile = new File(bucketDir, filename + META_FILE);
assert !dataFile.exists() || dataFile.delete();
assert !metaFile.exists() || metaFile.delete();
assertThat(dataFile)
.satisfiesAnyOf(
file -> assertThat(file).doesNotExist(), file -> assertThat(file.delete()).isTrue());
assertThat(metaFile)
.satisfiesAnyOf(
file -> assertThat(file).doesNotExist(), file -> assertThat(file.delete()).isTrue());
}

ObjectMetadata getObjectMetadata(String bucketName, String filename) throws IOException {
File bucketDir = new File(root, bucketName);
assert bucketDir.exists();
assertThat(bucketDir).exists();

File dataFile = new File(bucketDir, filename + DATA_FILE);
if (!dataFile.exists()) {
Expand Down
11 changes: 11 additions & 0 deletions api/src/main/java/org/apache/iceberg/UpdatePartitionSpec.java
Original file line number Diff line number Diff line change
Expand Up @@ -122,4 +122,15 @@ public interface UpdatePartitionSpec extends PendingUpdate<PartitionSpec> {
* change conflicts with other additions, removals, or renames.
*/
UpdatePartitionSpec renameField(String name, String newName);

/**
* Sets that the new partition spec will be NOT set as the default partition spec for the table,
* the default behavior is to do so.
*
* @return this for method chaining
*/
default UpdatePartitionSpec addNonDefaultSpec() {
throw new UnsupportedOperationException(
this.getClass().getName() + " doesn't implement addNonDefaultSpec()");
};
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
package org.apache.iceberg.io;

/**
* This interface is intended as an extension for FileIO implementations to provide additional
* best-effort recovery operations that can be useful for repairing corrupted tables where there are
* reachable files missing from disk. (e.g. a live manifest points to data file entry which no
* longer exists on disk)
*/
public interface SupportsRecoveryOperations {

/**
* Perform a best-effort recovery of a file at a given path
*
* @param path Absolute path of file to attempt recovery for
* @return true if recovery was successful, false otherwise
*/
boolean recoverFile(String path);
}
41 changes: 41 additions & 0 deletions api/src/main/java/org/apache/iceberg/types/Comparators.java
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,10 @@ public static Comparator<CharSequence> charSequences() {
return CharSeqComparator.INSTANCE;
}

public static Comparator<CharSequence> filePath() {
return FilePathComparator.INSTANCE;
}

private static class NullsFirst<T> implements Comparator<T> {
private static final NullsFirst<?> INSTANCE = new NullsFirst<>();

Expand Down Expand Up @@ -351,4 +355,41 @@ public int compare(CharSequence s1, CharSequence s2) {
return Integer.compare(s1.length(), s2.length());
}
}

private static class FilePathComparator implements Comparator<CharSequence> {
private static final FilePathComparator INSTANCE = new FilePathComparator();

private FilePathComparator() {}

@Override
public int compare(CharSequence s1, CharSequence s2) {
if (s1 == s2) {
return 0;
}
int count = s1.length();

int cmp = Integer.compare(count, s2.length());
if (cmp != 0) {
return cmp;
}

if (s1 instanceof String && s2 instanceof String) {
cmp = Integer.compare(s1.hashCode(), s2.hashCode());
if (cmp != 0) {
return cmp;
}
}
// File paths inside a delete file normally have more identical chars at the beginning. For
// example, a typical
// path is like "s3:/bucket/db/table/data/partition/00000-0-[uuid]-00001.parquet".
// The uuid is where the difference starts. So it's faster to find the first diff backward.
for (int i = count - 1; i >= 0; i--) {
cmp = Character.compare(s1.charAt(i), s2.charAt(i));
if (cmp != 0) {
return cmp;
}
}
return 0;
}
}
}
Loading
Loading