-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pt] More verbs/nouns fixes in disambiguation.xml #11033
Conversation
WalkthroughThe pull request introduces modifications to the Changes
Possibly related PRs
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
languagetool-language-modules/pt/src/main/resources/org/languagetool/resource/pt/disambiguation.xml (1)
Line range hint
3959-3977
: LGTM! Pattern adjustments improve accuracy of verb disambiguation.The changes are well thought out:
- Adding VMSP3S0 maintains consistency with other rules
- Removing VMP00SM helps prevent false positives with past participles
Consider adding a brief comment explaining why VMP00SM was removed, as it would help future maintainers understand the rationale.
- <token postag='VMSP3S0|VMIP3S0|VMM02S0|VMSP2S0|VMIP2S0|VMN02S0|VMSF2S0|VMIP1S0' postag_regexp="yes"/> + <!-- VMP00SM removed to prevent false positives with past participles --> + <token postag='VMSP3S0|VMIP3S0|VMM02S0|VMSP2S0|VMIP2S0|VMN02S0|VMSF2S0|VMIP1S0' postag_regexp="yes"/>
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
languagetool-language-modules/pt/src/main/resources/org/languagetool/resource/pt/disambiguation.xml
(4 hunks)
🔇 Additional comments (1)
languagetool-language-modules/pt/src/main/resources/org/languagetool/resource/pt/disambiguation.xml (1)
Line range hint 3928-3944
: LGTM! The addition of VMSP3S0 tag improves verb/noun disambiguation.
The change to include the third person singular present subjunctive form (VMSP3S0) in the pattern is a good improvement. This helps correctly identify more cases where words could be ambiguously interpreted as either verbs or nouns.
Let's verify the impact of this change:
✅ Verification successful
The VMSP3S0 tag addition in the verb-noun disambiguation rule is correctly implemented
Based on the verification results, the rule is well-structured and the examples demonstrate proper disambiguation scenarios:
- The pattern correctly matches verb forms (including VMSP3S0) followed by nouns
- The rule has appropriate examples showing disambiguation cases like "torna rosas"
- The tag is consistently used across similar patterns in the file
- The exception handling for "tá" prevents false positives
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for cases where the new pattern would apply
# Expected: Find examples where VMSP3S0 verbs could be confused with nouns
# Search for potential verb-noun ambiguity cases
rg -U "VMSP3S0.*NC" ".*\.dict"
# Search for existing rule usage
rg -A 2 "VMSP3S0.*NC" "disambiguation.xml"
Length of output: 183
Script:
#!/bin/bash
# Let's try a more precise search in the language resources directory
# Search for the dictionary files and disambiguation rules
fd -e xml -e dict . languagetool-language-modules/pt/
# Then search for the specific pattern in the found files
rg -U "VMSP3S0" languagetool-language-modules/pt/src/main/resources/org/languagetool/resource/pt/
# Look for examples of verb-noun ambiguity rules
rg -A 5 "postag='VM.*NC" languagetool-language-modules/pt/src/main/resources/org/languagetool/resource/pt/disambiguation.xml
Length of output: 1937
Script:
#!/bin/bash
# Let's look at the specific rule context and similar patterns
rg -B 2 -A 10 "VMSP3S0\|VMIP3S0\|VMM02S0" languagetool-language-modules/pt/src/main/resources/org/languagetool/resource/pt/disambiguation.xml
# Also search for example sentences in comments near this pattern
rg -B 5 -A 15 "Examples:" --multiline languagetool-language-modules/pt/src/main/resources/org/languagetool/resource/pt/disambiguation.xml | rg -B 5 -A 15 "VMSP3S0"
Length of output: 1494
More tons of fixes.
Summary by CodeRabbit
New Features
Bug Fixes