Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow string literals as segmentation modes #245

Merged
merged 4 commits into from
Mar 26, 2024

Conversation

eiennohito
Copy link
Collaborator

@eiennohito eiennohito commented Feb 29, 2024

Fixes #243

In addition to SplitMode objects SudachiPy will now accept A, B, C string literals as well (in all places).
Split mode also gets a constructor that accepts string literal in addition to class members

Bonus fix: tokenization now releases GIL

@eiennohito eiennohito force-pushed the feature/arseny/mode-literals branch from 27195ca to 65bab9d Compare February 29, 2024 05:17
@@ -65,7 +70,7 @@ class Dictionary:
...

def create(self,
mode: SplitMode = SplitMode.C,
mode: Union[SplitMode, Literal["A"], Literal["B"], Literal["C"]] = SplitMode.C,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use Union[SplitMode, Literal["A", "B", "C"]] as l.199.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dictionary.pre_tokenizer also take split mode as an argument (l.292).

@eiennohito
Copy link
Collaborator Author

@mh-northlander fixed!

@mh-northlander mh-northlander merged commit 693a32c into develop Mar 26, 2024
8 checks passed
@eiennohito eiennohito deleted the feature/arseny/mode-literals branch March 26, 2024 23:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update SplitMode implementation
2 participants