Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

import galician translations #19

Closed
wants to merge 3 commits into from
Closed

import galician translations #19

wants to merge 3 commits into from

Conversation

JarbasAl
Copy link
Member

@JarbasAl JarbasAl commented Dec 2, 2024

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced a new script for syncing translations in pull requests.
    • Added multiple dialogue options and intents in Galician for enhanced user interaction, including:
      • Coin flipping, fortune-telling, and dice rolling functionalities.
      • New prompts for user choices and responses to number range inputs.
  • Bug Fixes

    • Updated URLs and skill names for consistency across the project.
  • Documentation

    • Added new JSON files for intents and dialogs in Galician, improving localization support.

@JarbasAl JarbasAl requested a review from mikejgray December 2, 2024 14:53
Copy link

coderabbitai bot commented Dec 2, 2024

Walkthrough

The changes in this pull request introduce a new script sync_translations.py to manage translation files for multiple languages, ensuring proper directory structure and file handling. Additionally, several updates were made to the skill.json file to standardize naming conventions. New dialogue entries and intent definitions were added for the Galician language, enhancing user interaction and localization across various scenarios in the application.

Changes

File Path Change Summary
scripts/sync_translations.py New script added to process translation files for multiple languages, handling directory structure and file content.
skill.json Updated URLs and skill names to standardize naming conventions.
skill_randomness/locale/gl-es/dialog/*.dialog New dialogue lines added for various scenarios in Galician, enhancing user interaction.
skill_randomness/locale/gl-es/intents/*.intent New intent patterns added for various commands in Galician, improving natural language understanding.
translations/gl-es/dialogs.json New JSON file created for dialog strings in Galician, facilitating localization.
translations/gl-es/intents.json New JSON file created for intent definitions in Galician, allowing flexible user input recognition.

Poem

In the fields where translations play,
A script was born to light the way.
With choices made and dice to roll,
Galician voices now fill the whole.
From greetings bright to fortunes told,
New paths of dialogue, brave and bold! 🐇✨

Warning

Rate limit exceeded

@JarbasAl has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 10 minutes and 58 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between f58368b and 01ceafe.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Outside diff range and nitpick comments (3)
scripts/sync_translations.py (2)

1-3: TODO comment needs clarification

The TODO comment is vague and doesn't specify what actions need to be taken before PR merge.

Consider adding more detailed requirements or removing the TODO if it's no longer needed.


9-10: Consider making paths configurable

The hardcoded paths might cause issues if the directory structure changes.

Consider making these paths configurable through environment variables or command-line arguments:

-locale = f"{dirname(dirname(__file__))}/skill_randomness/locale"
-tx = f"{dirname(dirname(__file__))}/translations"
+SKILL_ROOT = os.getenv('SKILL_ROOT', dirname(dirname(__file__)))
+locale = os.path.join(SKILL_ROOT, "skill_randomness/locale")
+tx = os.path.join(SKILL_ROOT, "translations")
translations/gl-es/intents.json (1)

6-10: Consider adding more variations for dice rolling

The dice rolling intent could benefit from more natural language variations.

Consider adding common phrases like:

    "/intents/roll-single-die.intent": [
        "lanza (un| ) dado",
        "lanza un dado",
        "lanza dados de {faces} caras",
        "lanza un dado de {faces} caras"
+       "tira un dado",
+       "tira un dado de {faces} caras"
    ],
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between f58368b and 01ceafe.

📒 Files selected for processing (21)
  • scripts/sync_translations.py (1 hunks)
  • skill.json (2 hunks)
  • skill_randomness/locale/gl-es/dialog/choice-result.dialog (1 hunks)
  • skill_randomness/locale/gl-es/dialog/coin-result.dialog (1 hunks)
  • skill_randomness/locale/gl-es/dialog/die-result.dialog (1 hunks)
  • skill_randomness/locale/gl-es/dialog/first-choice.dialog (1 hunks)
  • skill_randomness/locale/gl-es/dialog/fortune-query.dialog (1 hunks)
  • skill_randomness/locale/gl-es/dialog/fortune-result.dialog (1 hunks)
  • skill_randomness/locale/gl-es/dialog/multiple-die-result.dialog (1 hunks)
  • skill_randomness/locale/gl-es/dialog/number-range-not-specified.dialog (1 hunks)
  • skill_randomness/locale/gl-es/dialog/number-result.dialog (1 hunks)
  • skill_randomness/locale/gl-es/dialog/over-dice-limit.dialog (1 hunks)
  • skill_randomness/locale/gl-es/dialog/second-choice.dialog (1 hunks)
  • skill_randomness/locale/gl-es/intents/flip-a-coin.intent (1 hunks)
  • skill_randomness/locale/gl-es/intents/fortune-teller.intent (1 hunks)
  • skill_randomness/locale/gl-es/intents/make-a-choice.intent (1 hunks)
  • skill_randomness/locale/gl-es/intents/pick-a-number.intent (1 hunks)
  • skill_randomness/locale/gl-es/intents/roll-multiple-dice.intent (1 hunks)
  • skill_randomness/locale/gl-es/intents/roll-single-die.intent (1 hunks)
  • translations/gl-es/dialogs.json (1 hunks)
  • translations/gl-es/intents.json (1 hunks)
✅ Files skipped from review due to trivial changes (8)
  • skill_randomness/locale/gl-es/dialog/over-dice-limit.dialog
  • skill_randomness/locale/gl-es/intents/flip-a-coin.intent
  • skill_randomness/locale/gl-es/dialog/multiple-die-result.dialog
  • skill_randomness/locale/gl-es/dialog/fortune-result.dialog
  • skill_randomness/locale/gl-es/intents/make-a-choice.intent
  • skill_randomness/locale/gl-es/dialog/number-result.dialog
  • translations/gl-es/dialogs.json
  • skill_randomness/locale/gl-es/dialog/number-range-not-specified.dialog
🔇 Additional comments (11)
skill.json (3)

3-3: Verify the updated repository URL is accessible

The URL has been standardized to follow the new naming convention.

✅ Verification successful

URL is accessible and valid

The repository URL https://github.com/OpenVoiceOS/ovos-skill-randomness returns a 200 OK status code, confirming it's a valid and accessible GitHub repository.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the repository URL is accessible and returns 200 OK
curl -I https://github.com/OpenVoiceOS/ovos-skill-randomness

Length of output: 4917


56-56: Verify consistent usage of the new skill name

The skill name has been updated to follow the standardized format.

✅ Verification successful

Let me gather more information to verify the skill name consistency across all relevant files.


Based on the verification results, I can now provide the final response:

Skill name is consistently used across the codebase

The skill name "ovos-skill-randomness" is correctly used in:

  • setup.py (PYPI_NAME)
  • skill.json
  • The implementation class RandomnessSkill in init.py
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Search for any remaining occurrences of the old skill name
# that might need updating

echo "Searching for old skill name references..."
rg -i "skill-ovos-randomness" --type yaml --type json

echo "Checking setup.py for package name..."
fd -e py "setup.py" --exec cat {}

Length of output: 3437


Script:

#!/bin/bash
# Check for any references to skill name in various forms
echo "Checking for skill name references in all files..."
rg -i "randomness" --type json --type yaml --type python -l

echo "Checking manifest files..."
fd -e json "manifest.json" --exec cat {}

echo "Checking __init__.py for skill class name..."
fd -e py "__init__.py" --exec cat {}

Length of output: 5237


Line range hint 1-63: Consider adding language-specific fields for Galician support

Given that this PR aims to import Galician translations, consider adding language-specific fields (e.g., title, summary, description, examples) for the Galician locale (gl-es) in this configuration file.

skill_randomness/locale/gl-es/dialog/coin-result.dialog (1)

1-2: LGTM! Verify consistency with other locales.

The dialog entries follow the correct pattern with proper template variable usage. The Galician translations are appropriate for coin flip results.

✅ Verification successful

Let me gather more information to verify the locale structure and ensure we have a complete picture.


Dialog entries are consistent across locales

The verification confirms that both English (en-us) and Galician (gl-es) locales follow the same pattern:

  • Both have two lines for coin flip results
  • Both use the {result} template variable consistently
  • Both show a contextual line followed by the raw result
  • Translations are appropriate for each language ("It landed on" for English, "Foi" for Galician)
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if other locales follow same pattern
# Expect similar structure across locales

# Find and show content of all coin-result.dialog files
fd "coin-result.dialog" | xargs cat

Length of output: 88


Script:

#!/bin/bash
# Get the directory structure of locale files to understand which locales these belong to
fd "coin-result.dialog" -x dirname {}

Length of output: 117

skill_randomness/locale/gl-es/dialog/die-result.dialog (1)

1-2: LGTM! Verify consistency with other locales.

The dialog entries follow the correct pattern with proper template variable usage. The Galician translation "Sacaches un" is appropriate for dice roll results.

✅ Verification successful

Let me verify the file locations to ensure we're correctly interpreting which lines belong to which locale.


Structure and template variables are consistent across locales

The Galician (gl-es) translation in die-result.dialog perfectly matches the structure of its English (en-us) counterpart:

  • Both have exactly 2 lines
  • Both use the same template variable {result}
  • Both follow the pattern of a sentence followed by raw result
  • The translation is accurate and appropriate
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if other locales follow same pattern
# Expect similar structure across locales

# Find and show content of all die-result.dialog files
fd "die-result.dialog" | xargs cat

Length of output: 217


Script:

#!/bin/bash
# Show files with their content separately
fd "die-result.dialog" | while read -r file; do
    echo "=== $file ==="
    cat "$file"
    echo -e "\n"
done

Length of output: 1185

skill_randomness/locale/gl-es/intents/pick-a-number.intent (1)

1-1: Verify intent pattern consistency across locales

The intent pattern looks well-formed with proper alternatives and placeholders. Let's verify it matches the pattern structure used in other languages.

✅ Verification successful

The previous output seems to have merged multiple files together. Let's verify the structure more carefully to ensure we see all locales properly separated.


Intent pattern is consistent across locales

The verification shows that the pattern structure is consistent between English (en-us) and Galician (gl-es) versions:

  • Both use three alternative verbs in parentheses
  • Both maintain the same placeholder structure with {lower} and {upper}
  • Both follow the same grammatical pattern: "(verbs) number between {lower} and {upper}"
  • The only differences are the expected translations of the words
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Compare intent patterns across different locales
# Expected: Similar pattern structure with translated words

# Find and display all pick-a-number.intent files
fd "pick-a-number.intent" "skill_randomness/locale" --exec cat {} \;

Length of output: 187


Script:

#!/bin/bash
# Let's verify the file structure more carefully to ensure we're seeing all locales
fd "pick-a-number.intent" "skill_randomness/locale" --exec echo "=== {} ===" \; --exec cat {} \;

# Also list all locale directories to ensure we're not missing any
ls -la skill_randomness/locale/

Length of output: 600

skill_randomness/locale/gl-es/dialog/second-choice.dialog (1)

1-2: LGTM! Translations follow proper structure and patterns

The Galician translations are well-structured and provide good natural language variations. The files follow the correct locale directory structure and maintain consistency in terminology and patterns.

skill_randomness/locale/gl-es/dialog/choice-result.dialog (1)

1-2: LGTM! Verify Galician grammar and punctuation.

The dialog patterns look well-structured with proper placeholder usage for {choice}. Both translations provide good variety in responses:

  1. "Sen dúbida {choice}." (Without doubt {choice})
  2. "Unha escolla difícil. Mais imos escoller {choice}!" (A difficult choice. But we will choose {choice}!)

Consider having a native Galician speaker verify the grammar and punctuation.

skill_randomness/locale/gl-es/intents/roll-multiple-dice.intent (1)

1-2: LGTM! Well-structured intent patterns.

The patterns properly handle:

  • Plural/singular forms with (dados|dado)
  • Number of dice with {number}
  • Number of faces with {faces}
translations/gl-es/intents.json (1)

14-16: Verify variable naming consistency

The intent uses variables {lower} and {upper}. Ensure these match the variable names expected by the skill's code.

✅ Verification successful

Variable names in the intent match the implementation

The variables {lower} and {upper} in the intent file correctly match the implementation. The skill's code in skill_randomness/__init__.py extracts these values as lower_bound = message.data.get("lower", "") and upper_bound = message.data.get("upper", ""), where "lower" and "upper" are the variable names without the curly braces, which is the correct format for intent variables.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check variable names in the skill code
rg -g '!translations/*' -g '!locale/*' '{lower}|{upper}' .

Length of output: 151


Script:

#!/bin/bash
# Search for the skill implementation to check variable names
rg -g '!translations/*' -g '!locale/*' "pick-a-number" .

Length of output: 234


Script:

#!/bin/bash
# Check the implementation of the intent handler in the Python file
cat ./skill_randomness/__init__.py

Length of output: 4828

skill_randomness/locale/gl-es/dialog/fortune-query.dialog (1)

1-2: Dialog translations look good!

The Galician translations maintain the appropriate tone and meaning while being grammatically correct.

@@ -0,0 +1 @@
Dime (o |un |o meu | a miña) (futuro|fortuna)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Fix spacing inconsistencies in both locale intent patterns

Both locale files have spacing inconsistencies in their patterns that should be standardized:

  • en-us: (the |a |my |) has trailing spaces in alternatives
  • gl-es: (o |un |o meu | a miña) has trailing spaces in alternatives
# en-us/intents/fortune-teller.intent
-Tell me (the |a |my |)(future|fortune)
+Tell me (the|a|my|)(future|fortune)

# gl-es/intents/fortune-teller.intent
-Dime (o |un |o meu | a miña) (futuro|fortuna)
+Dime (o|un|o meu|a miña) (futuro|fortuna)
🔗 Analysis chain

Fix spacing in pattern alternatives.

There's an extra space after "a miña" in the pattern alternatives which may cause inconsistent matching.

-Dime (o |un |o meu | a miña) (futuro|fortuna)
+Dime (o|un|o meu|a miña) (futuro|fortuna)
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if other locales follow similar intent patterns
# Expect consistent pattern structure across locales

# Find and show content of all fortune-teller.intent files
fd "fortune-teller.intent" | xargs cat

Length of output: 122


Script:

#!/bin/bash
# Let's check the directory structure to understand locale organization better
fd "fortune-teller.intent" -l

Length of output: 247


Script:

#!/bin/bash
# Let's examine each file separately to properly see the content formatting
echo "=== English (en-us) version ==="
cat ./skill_randomness/locale/en-us/intents/fortune-teller.intent
echo -e "\n=== Galician (gl-es) version ==="
cat ./skill_randomness/locale/gl-es/intents/fortune-teller.intent

Length of output: 373

Comment on lines +1 to +2
Cal é a túa primeira opción?
Vale. Cal é a primeira escolla?
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Fix formatting in first-choice.dialog files

The files appear to have formatting issues:

  • The gl-es and en-us variations are merged into single lines
  • Missing proper line breaks between dialog variations

Locations needing attention:

  • skill_randomness/locale/gl-es/dialog/first-choice.dialog: Should have two lines
  • skill_randomness/locale/en-us/dialog/first-choice.dialog: Should have two lines
🔗 Analysis chain

Verify dialog variations across locales

The dialog variations provide good natural language flexibility. Let's verify the number of variations matches other languages.


Let me check the actual content of these files to compare the dialog variations.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Compare number of dialog variations across locales
# Expected: Similar number of variations per locale

# Find and count lines in all first-choice.dialog files
fd "first-choice.dialog" "skill_randomness/locale" --exec wc -l {} \;

Length of output: 188


Script:

#!/bin/bash
# Show content of first-choice.dialog files across locales
fd "first-choice.dialog" "skill_randomness/locale" --exec cat {} \;

Length of output: 183

Comment on lines +1 to +2
Cal é a túa segunda opción?
Vale. Agora cal é a segunda opción?
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Terminology inconsistency found in dialog files

The verification reveals inconsistent terminology usage across the dialog files:

  • first-choice.dialog mixes "opción" and "escolla"
  • second-choice.dialog uses only "opción"
  • choice-result.dialog uses "escolla"

This inconsistency in terminology between "opción" and "escolla" should be unified for better user experience and localization quality.

🔗 Analysis chain

Verify dialog flow and terminology consistency

The dialog variations maintain consistency with first-choice.dialog and use consistent terminology ("opción"). Let's verify the complete dialog flow across files.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check terminology consistency across dialog files
# Expected: Consistent use of terms like "opción" and "escolla"

# Search for terminology usage across dialog files
echo "Checking terminology consistency:"
rg -i "opción|escolla" "skill_randomness/locale/gl-es/dialog/"

Length of output: 603

@@ -0,0 +1,4 @@
lanza (un| ) dado
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Remove duplicate intent pattern.

The patterns "lanza (un| ) dado" and "lanza un dado" are redundant since the first pattern already covers both cases with its optional "un" specified by (un| ).

 lanza (un| ) dado
 lanza dados de {faces} caras
-lanza un dado
 lanza un dado de {faces} caras

Also applies to: 3-3

Comment on lines +1 to +4
lanza (un| ) dado
lanza dados de {faces} caras
lanza un dado
lanza un dado de {faces} caras
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Intent patterns for Galician dice rolling could be enhanced

Looking at the patterns across the locale files, I can confirm that the Galician (gl-es) dice rolling intents could benefit from additional patterns to match the natural language variations:

  • Add "tira" variations as it's a common verb for throwing/tossing in Galician:
    tira (un| ) dado
    tira un dado de {faces} caras
    
  • Add patterns with inverted word order which is natural in Galician:
    un dado (lanza|tira)
    un dado de {faces} caras (lanza|tira)
    

The current patterns are functional but could be more comprehensive to better match natural Galician language usage patterns.

🔗 Analysis chain

Verify intent pattern coverage.

The patterns cover:

  • Rolling a single die without faces specified
  • Rolling dice with specific faces
  • Rolling a single die with specific faces

However, consider if you need patterns for:

  • Throwing/tossing variations (e.g., "tira un dado")
  • Different word orders natural in Galician
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for similar patterns in other locales for consistency
rg -t txt "dado" skill_randomness/locale/

Length of output: 41


Script:

#!/bin/bash
# Let's check all files in the locale directory to understand the pattern coverage across languages
fd . skill_randomness/locale/ --type f -x cat {}

Length of output: 2140

@@ -0,0 +1,3 @@
lanza {number} (dados|dado)
lanza {number} (dados|dado) de {faces} caras
lanza {number} de {faces}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Clarify ambiguous intent pattern.

The pattern "lanza {number} de {faces}" is ambiguous as it omits the word "dado/dados". This could lead to confusion in natural language processing.

-lanza {number} de {faces}
+lanza {number} dados de {faces} caras
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
lanza {number} de {faces}
lanza {number} dados de {faces} caras

Comment on lines +19 to +32
if os.path.isfile(intents):
with open(intents) as f:
data = json.load(f)
for fid, samples in data.items():

if samples:
os.makedirs(f"{locale}/{lang.lower()}", exist_ok=True)
samples = [s.strip() for s in samples
if s and s.strip() != "[UNUSED]"] # s may be None
p = f"{locale}/{lang.lower()}/{fid.lstrip('/')}"
os.makedirs(dirname(p), exist_ok=True)
with open(p, "w") as f:
f.write("\n".join(sorted(samples)))

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Refactor duplicate code blocks

There's significant code duplication in the file processing blocks. This makes maintenance difficult and increases the chance of bugs.

Consider refactoring into a common function:

def process_translation_file(file_path: str, locale_dir: str, lang: str) -> None:
    """Process a translation file and write samples to corresponding locale files.
    
    Args:
        file_path: Path to the translation JSON file
        locale_dir: Base locale directory
        lang: Language code
    """
    if not os.path.isfile(file_path):
        return
        
    try:
        with open(file_path) as f:
            data = json.load(f)
    except json.JSONDecodeError as e:
        print(f"Error parsing {file_path}: {e}")
        return
        
    for fid, samples in data.items():
        if not samples:
            continue
            
        os.makedirs(f"{locale_dir}/{lang.lower()}", exist_ok=True)
        samples = [s.strip() for s in samples if s and s.strip() != "[UNUSED]"]
        
        p = f"{locale_dir}/{lang.lower()}/{fid.lstrip('/')}"
        os.makedirs(dirname(p), exist_ok=True)
        
        with open(p, "w") as f:
            f.write("\n".join(sorted(samples)))

# Usage
for lang in os.listdir(tx):
    for file_type in ['intents', 'dialogs', 'vocabs', 'regexes']:
        file_path = f"{tx}/{lang}/{file_type}.json"
        process_translation_file(file_path, locale, lang)

Also applies to: 33-45, 47-59, 61-73

Comment on lines +5 to +8
import json
from os.path import dirname
import os

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add type hints and error handling

The script lacks type hints and proper error handling for file operations and JSON parsing.

Add type hints and try-except blocks:

from typing import Dict, List
import json
from json.decoder import JSONDecodeError
from os.path import dirname
import os
import sys
from pathlib import Path

@JarbasAl
Copy link
Member Author

JarbasAl commented Dec 2, 2024

included in #20

@JarbasAl JarbasAl closed this Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant