-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unable to create target format Delta with source format as Iceberg when the source table is on S3 #431
Comments
I was looking into the documentation and understand if the source is Iceberg table I need to include catalog.yaml as well. catalogImpl: io.my.CatalogImpl |
Hi @rajender07! The error clarifies the problem. It says the The important part to understand here is that Iceberg needs a CATALOG to get started with. Your config currently connects Iceberg with a Hive catalog but I don't see any thrift URL or such here. Can you instead use a Hadoop catalog & configure with something like this:
|
@dipankarmazumdar , Thank you for looking into the issue. I will try as you suggested using Hadoop catalog and let you know the findings. Could you please guide me to solve the issue while using Iceberg catalog. Should I use catalog.yaml file? if yes, I am confused on catalogName that should be used. FYI, I have added Thrift related properties under /etc/spark/conf/spark-default.conf and /etc/spark/conf/hive-site.xml. I have no issues connecting to my metastore and read/write data from it. |
@rajender07 Which catalog are you using? If it is HMS, the implementation is |
@dipankarmazumdar @the-other-tim-brown I used Hadoop catalog as you mentioned and created a new Iceberg table. Now, I can see version-hint.text file as well. However when I executed sync command it is with below error. Could you please assist how to resolve this issue. 2024-05-13 13:43:04 INFO org.apache.xtable.conversion.ConversionController:240 - No previous InternalTable sync for target. Falling back to snapshot sync. Here is my my_config.yaml **sourceFormat: ICEBERG
|
@rajender07 - I am not really sure about this particular error. However, I tried reproducing this on my end and I was able to translate from ICEBERG to DELTA using the setup I suggested. ICEBERG TABLE CONFIG & CREATION:
my_config.yaml
Run Sync
|
|
@rajender07 - LMK if you were able to get past the error with the recommendation. |
@rajender07 Can you pull the latest master try again ? |
@vinishjail97, Thank you. I will test the fix today and share an update. |
Creating empty _delta_log dir and erroring out. --Config File
-- |
I followed the documentation "Creating your first interoperable table", able to build the utilities-0.1.0-SNAPSHOT-bundled.jar successfully.
Initiated a pyspark session using below command. Spark version is 3.4.1 running on Amazon EMR 6.14
pyspark --conf "spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions" --conf "spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog" --conf "spark.sql.catalog.spark_catalog.type=hive"
Create an Iceberg table using below commands:
data =[("James","Smith","01012020","M","3000"),
("Michael","","02012021","M","4000"),
("Robert","Williams","03012023","M","4000"),
("Maria","Jones","04012024","F","4000"),
("Jen","Brown","05012025","F","-1")]
columns=["firstname","lastname","dob","gender","salary"]
df=spark.createDataFrame(data,columns)
spark.sql("""CREATE TABLE IF NOT EXISTS iceberg_table (firstname string,lastname string,dob string,gender string,salary string) USING iceberg""");
df.writeTo("iceberg_table").append()
I see the data and metadata directory under the table name on s3.
Created my_config.yaml as mentioned in the documentation
my_config.txt
executed below command and see failing with metadata/version-hint.text not available
sudo java -jar ./utilities-0.1.0-SNAPSHOT-bundled.jar --datasetConfig my_config.yaml
2024-04-30 10:24:25 INFO org.apache.xtable.conversion.ConversionController:240 - No previous InternalTable sync for target. Falling back to snapshot sync.
2024-04-30 10:24:25 WARN org.apache.iceberg.hadoop.HadoopTableOperations:325 - Error reading version hint file s3:///iceberg_table_1/metadata/version-hint.text
java.io.FileNotFoundException: No such file or directory: s3:////iceberg_table_1/metadata/version-hint.text
at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3801) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3652) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
at org.apache.hadoop.fs.s3a.S3AFileSystem.extractOrFetchSimpleFileStatus(S3AFileSystem.java:5288) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$executeOpen$6(S3AFileSystem.java:1578) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:499) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
The text was updated successfully, but these errors were encountered: