Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace reference of Table.identifier with Table.name #1346

Merged
merged 3 commits into from
Nov 20, 2024

Conversation

kevinjqliu
Copy link
Contributor

@kevinjqliu kevinjqliu commented Nov 20, 2024

Closes #1336, related to #1327
This PR changes the implementation of the Table.name function to use self._identifier instead of self.identifier to avoid having unnecessary deprecation warnings.
This PR also replaces all references of Table.identifier with Table.name() to prepare for the deprecation of Table.identifier and avoid emitting warnings in tests. Using VSCode, I confirmed all use of Table.identifier is replaced in the repo.

@kevinjqliu kevinjqliu changed the title Kevinjqliu/use table name Use Table.name in repo and reference identifier without catalog Nov 20, 2024
@kevinjqliu kevinjqliu changed the title Use Table.name in repo and reference identifier without catalog Replace reference of Table.identifier with Table.name Nov 20, 2024
@kevinjqliu kevinjqliu requested review from Fokko and sungwy November 20, 2024 05:15
@kevinjqliu kevinjqliu added this to the PyIceberg 0.8.1 release milestone Nov 20, 2024
@sungwy
Copy link
Collaborator

sungwy commented Nov 20, 2024

Hi @kevinjqliu thanks for bouncing on this! And nice catch on the commit_table functions... this looks good to me. It looks like python3.12 CI is failing due to a DeprecationWarning on utcfromtimestamp. Could we rebase this branch and run the CI again?

Related fix: https://github.com/apache/iceberg-python/pull/1134/files

@kevinjqliu kevinjqliu force-pushed the kevinjqliu/use-table-name branch from 9398948 to 787b64e Compare November 20, 2024 17:17
@kevinjqliu
Copy link
Contributor Author

super weird that this is only happening to 3.12

self = TimestampNTZType(), ts = 1672601100000000

    def fromInternal(self, ts: int) -> datetime.datetime:
        if ts is not None:
            # using int to avoid precision loss in float
>           return datetime.datetime.utcfromtimestamp(ts // 1000000).replace(
                microsecond=ts % 1000000
            )
E           DeprecationWarning: datetime.datetime.utcfromtimestamp() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.fromtimestamp(timestamp, datetime.UTC).

which corresponds to pyspark's use of datetime.datetime.utcfromtimestamp, which was recently changed apache/spark@8e02a64#diff-bb02965370d515ec7add7967bc949c18c88ffc7346cceaebefb84907dfdce903L437

We remove the warning in #1134, but the tests passed then

@kevinjqliu
Copy link
Contributor Author

datetime.datetime.utcfromtimestamp is deprecated starting from 3.12

https://docs.python.org/3.12/library/datetime.html#datetime.datetime.utcfromtimestamp

@Fokko
Copy link
Contributor

Fokko commented Nov 20, 2024

@kevinjqliu Yes a newer version of Python will emit a deprecation warning, I'm okay with putting back the exclusion to fix that in another PR

@@ -801,7 +801,7 @@ def name(self) -> Identifier:
Returns:
An Identifier tuple of the table name
"""
return self.identifier
return self._identifier
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change is problematic since we change the contract here. Everyone that's using 8.0.0 expects the catalog to be part of the name, with his change we remove that one without letting the user know.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you have any suggestions on how to clean this up? looks like we need to deprecate name as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would be the correct way. I don't see any other option to just accept this. Since the .name() is just new, I think it is acceptable, but we probably should call it out at the release.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you saying this change is fine as long as we call it out at the release? wdyt about adding this PR to 0.8.1 as a quick fix?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deprecating it seems like a lot of work, and will also be confusing to the end-user. So I don't think we have many options, and I think what you're suggesting in this PR is the best path forward. Getting this out ASAP in 0.8.1 would be great 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, make sense to me!

@kevinjqliu kevinjqliu force-pushed the kevinjqliu/use-table-name branch from 787b64e to af21cdb Compare November 20, 2024 19:19
@kevinjqliu
Copy link
Contributor Author

opened #1349 and added back the warning filter, we can remove once we upgrade PySpark to 4.0

@kevinjqliu kevinjqliu merged commit 7a4734e into apache:main Nov 20, 2024
7 checks passed
@kevinjqliu kevinjqliu deleted the kevinjqliu/use-table-name branch November 20, 2024 20:46
Fokko pushed a commit to Fokko/iceberg-python that referenced this pull request Nov 21, 2024
* fix Table.name

* replace Table.identifier with Table.name

* add warning filter
kevinjqliu added a commit to kevinjqliu/iceberg-python that referenced this pull request Nov 25, 2024
* fix Table.name

* replace Table.identifier with Table.name

* add warning filter
kevinjqliu added a commit that referenced this pull request Nov 27, 2024
* use the non-deprecated func (#1326)

* 0.8.0 post release steps (#1334)

* add

* fix mkdoc

* Drop upper bounds for fsspec and it's implementations (#1341)

* Drop upper bounds for fsspec and it's implementations

* Run poetry lock

* Ignore tables without `table_type` from Glue and Hive

* Ignore tables without table_type parameters while loading all iceberg table from Glue and Hive catalog (#1331)

* Use TABLE_TYPE

---------

Co-authored-by: Wenzhuo Zhao <[email protected]>

* Replace reference of `Table.identifier` with `Table.name` (#1346)

* fix Table.name

* replace Table.identifier with Table.name

* add warning filter

* Allow leading underscore in column name used in row filter (#1358)

* Update parser.py

Allow leading underscore in column name used in row filter.

* Update test_parser.py

* Update test_parser.py

* Update test_parser.py

* Remove Python 3.13 upper bound restriction (#1355)

* Remove Python 3.13 upper bound restriction

* Fix missing poetry.lock file

* Upgrading numpy on the poetry.lock file from v1.26.0 to v1.26.4

* Improve documentation for "how to release" (#1359)

* initial update

* edits

* add gpg instructions

* verify artifacts

* add twine not

* grammar

* edits

* remove old artifacts

* update doc workflow action

* and name

* add docs on patch vs major/minor release

* fix `KeyError` raised by `add_files` when parquet file doe not have column stats (#1354)

* fix KeyError, by switching del to pop

* added unit test

* update test

* fix python 3.9 compatibility, and refactor test

* update test

* bump to 0.8.1

* Add instruction for patch release (#1373)

* add instruction for patch release

* create branch from tag

* Write `null` when there is no parent-snapshot-id (#1383)

---------

Co-authored-by: Sumanth <[email protected]>
Co-authored-by: gitzwz <[email protected]>
Co-authored-by: Wenzhuo Zhao <[email protected]>
Co-authored-by: vincenzon <[email protected]>
Co-authored-by: Luca Bigon <[email protected]>
Co-authored-by: Binayak Dasgupta <[email protected]>
Co-authored-by: Fokko Driesprong <[email protected]>
kevinjqliu added a commit to kevinjqliu/iceberg-python that referenced this pull request Nov 27, 2024
* fix Table.name

* replace Table.identifier with Table.name

* add warning filter
kevinjqliu added a commit that referenced this pull request Nov 27, 2024
* fix Table.name

* replace Table.identifier with Table.name

* add warning filter
sungwy pushed a commit to sungwy/iceberg-python that referenced this pull request Dec 7, 2024
* fix Table.name

* replace Table.identifier with Table.name

* add warning filter
sungwy pushed a commit to sungwy/iceberg-python that referenced this pull request Dec 7, 2024
* fix Table.name

* replace Table.identifier with Table.name

* add warning filter
sungwy pushed a commit to sungwy/iceberg-python that referenced this pull request Dec 24, 2024
* fix Table.name

* replace Table.identifier with Table.name

* add warning filter
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

How to get rid of the warning
3 participants