Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1480718: Support Series.str.translate #1776

Merged
merged 10 commits into from
Jun 28, 2024

Conversation

sfc-gh-joshi
Copy link
Contributor

@sfc-gh-joshi sfc-gh-joshi commented Jun 13, 2024

  1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-1480718

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
      • If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
    • If this is a new feature/behavior, I'm adding the Local Testing parity changes.
  3. Please describe how your code solves the related issue.

This PR implements Series.str.translate using SQL TRANSLATE.

Unlike the native pandas version, we cannot perform one-to-many character mappings with SQL TRANSLATE. I considered using chained REPLACE calls to mimic this behavior, but that would cause discrepancies in situations where one REPLACE mapping generates entries that match the replacement keys of a subsequent REPLACE calls. As such, for these use cases it is recommended for users to manually chain Series.str.replace calls as necessary.

Slack thread discussing future work on a more robust multi-character approach using REPLACE with sentinel values: https://snowflake.slack.com/archives/C04HF38JFAQ/p1718388095381469

Copy link
Contributor

@sfc-gh-mvashishtha sfc-gh-mvashishtha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some minor comments

@sfc-gh-joshi sfc-gh-joshi force-pushed the joshi-SNOW-1480718-str-translate branch from 39b0624 to 0c4dd8d Compare June 17, 2024 18:10
@sfc-gh-joshi sfc-gh-joshi requested review from a team, sfc-gh-evandenberg and sfc-gh-lmukhopadhyay and removed request for a team June 17, 2024 18:11
@sfc-gh-joshi sfc-gh-joshi force-pushed the joshi-SNOW-1480718-str-translate branch from 0c4dd8d to 9d4421b Compare June 20, 2024 18:35
Copy link
Contributor

@sfc-gh-helmeleegy sfc-gh-helmeleegy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks, Jonathan! Can you please update the PR description to better describe the plan for supporting multi-characters in the future? You can give a short description and point to the slack thread for more details.

@sfc-gh-joshi sfc-gh-joshi force-pushed the joshi-SNOW-1480718-str-translate branch 3 times, most recently from 30257dc to b2cc413 Compare June 24, 2024 21:14
@sfc-gh-joshi sfc-gh-joshi force-pushed the joshi-SNOW-1480718-str-translate branch from b2cc413 to 98eeec7 Compare June 27, 2024 18:55
Copy link
Contributor

@sfc-gh-mvashishtha sfc-gh-mvashishtha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work!

@sfc-gh-joshi sfc-gh-joshi merged commit a5c807b into main Jun 28, 2024
35 checks passed
@sfc-gh-joshi sfc-gh-joshi deleted the joshi-SNOW-1480718-str-translate branch June 28, 2024 01:45
@github-actions github-actions bot locked and limited conversation to collaborators Jun 28, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants