-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
java.lang.IllegalStateException: Connection pool shut down in Spark #11633
Comments
This issue is the same when you use IcebergSink on flink or when you perform expire_snapshot on spark. |
Hive catalog is unique from other catlaogs as all the other catalogs creates a fileIO instance inside the ops for ex :
maybe for the HiveCatalog since we re:use the fileIO we can only close it when all the ops are closed and if all the ops are done then may be just close in catalog close ?
can you please elaborate ? what do you mean by all other task ? Serializable table is broadcasted and when it reached executor and all tasks within the executors would be sharing it, ideally table close should be called when all the tasks are done, do you know which specific thing the task is doing ? (is it still writing parquet file ?) whats the precise error you see |
I have been facing this issue from past 6 months now. #10340 (comment) |
I'm seeing the same here apache/iceberg-python#1323 Let me dig into the details |
I did a git bisect, and narrowed down the issue to this one:
|
Apache Iceberg version
1.7.0 (latest release)
Query engine
Spark
Please describe the bug 🐞
We have a maintenance job which run all necessary SparkActions on our data lake. Recently we have upgraded to iceberg 1.7.0 and switched to
org.apache.iceberg.aws.s3.S3FileIO
. After this we started to getjava.lang.IllegalStateException: Connection pool shut
errors on both driver and workers.Here is example stacktrace:
I managed to track where
PoolingHttpClientConnectionManager.shutdown()
method is called from.FileIOTracker
:iceberg/core/src/main/java/org/apache/iceberg/io/FileIOTracker.java
Lines 41 to 46 in cf02ffa
Here is a stack trace:
We use
HiveCatalog
, it creates FileIO one time in theinitialize
method:iceberg/hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java
Lines 122 to 125 in e449d34
And each time it creates new
TableOperations
, it puts a new item to theFileIOTracker
but with the same instance ofFileIO
iceberg/hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java
Lines 629 to 636 in e449d34
When some
TableOperations
is cleaned up by the GC,FileIO
is closed and everything starts failing withConnection pool shut down
error.This is what we got on Spark Driver. I patched
FileIOTracker
not to touchFileIO
and the driver is fine now.PoolingHttpClientConnectionManager
is close from:This one looks like GC on unused
PoolingHttpClientConnectionManager
and should not cause the problem.This one looks like GC on unused
S3FileIO
and should not cause the problem as well.But the following one seems problematic:
When
SerializableTableWithSize
is closed, it closes underlyingTableOperations
and associatedFileIO
. As we have single instance ofFileIO
, after this all the tasks on the worker start failing withConnection pool shut down
. I'm not sure how to handle this case.Willingness to contribute
The text was updated successfully, but these errors were encountered: