Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(load): issue load path of dataset and virtualized=true error on csv #1474

Merged
merged 2 commits into from
Dec 13, 2024

Conversation

ArslanSaleem
Copy link
Collaborator

@ArslanSaleem ArslanSaleem commented Dec 13, 2024

Important

Fix dataset path validation and unsupported virtualization error handling, with minor refactoring and type hint correction.

  • Behavior:
    • In pandasai/__init__.py, load() now raises ValueError if dataset_path is not in 'organization/dataset' format.
    • In loader.py, load() raises ValueError for unsupported virtualization of CSV and Parquet files.
  • Refactoring:
    • Introduced _get_abs_dataset_path() in loader.py to centralize dataset path construction.
    • Replaced direct path construction with _get_abs_dataset_path() in _load_schema(), _get_cache_file_path(), and _load_from_source().
    • Added _get_table_name() in query_builder.py for table name retrieval.
  • Type Hints:
    • Changed min_version type hint in import_dependency() in environment.py from str | None to Union[str, None].

This description was created by Ellipsis for a6192ee. It will automatically update as commits are pushed.

@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Dec 13, 2024
Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Reviewed everything up to a6192ee in 2 minutes and 48 seconds

More details
  • Looked at 145 lines of code in 4 files
  • Skipped 0 files when reviewing.
  • Skipped posting 5 drafted comments based on config settings.
1. pandasai/chat/code_execution/environment.py:95
  • Draft comment:
    In Python 3.10+, str | None is valid syntax and more concise than Union[str, None]. Consider using it if compatibility with older versions is not a concern.
  • Reason this comment was not posted:
    Confidence changes required: 33%
    The change from str | None to Union[str, None] is unnecessary in Python 3.10+ where the former is valid syntax. However, it might be for compatibility with older versions.
2. pandasai/data_loader/query_builder.py:24
  • Draft comment:
    The _get_table_name method is defined twice, once here and once in loader.py. Consider refactoring to avoid redundancy.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The _get_table_name method is defined twice, once in query_builder.py and once in loader.py. This is redundant and can lead to maintenance issues.
3. pandasai/__init__.py:89
  • Draft comment:
    Consider stripping leading and trailing slashes from dataset_path to avoid path format issues.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The load function in __init__.py checks for a specific path format but does not handle cases where the path might have leading or trailing slashes, which could cause issues.
4. pandasai/data_loader/loader.py:54
  • Draft comment:
    The error message should be more specific and grammatically correct.
                    raise ValueError("Virtualization is not supported for CSV and Parquet file types.")
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
5. pandasai/data_loader/query_builder.py:28
  • Draft comment:
    The error message should be more specific and grammatically correct.
            raise ValueError("Table name is not found in the schema.")
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable:
    The original error message is already clear and grammatically correct. The suggested change is more formal but does not significantly improve clarity or correctness. The comment does not point out a clear issue that requires a change.
    I might be underestimating the importance of consistency in error message phrasing across the codebase, which could be a reason to adopt the suggested change.
    While consistency is important, the current message is not incorrect or unclear, and the suggestion does not provide a substantial improvement.
    The comment does not highlight a necessary change, as the original error message is already clear and correct. The comment should be deleted.

Workflow ID: wflow_QJtoXX8eQv9npoQM


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

pandasai/__init__.py Outdated Show resolved Hide resolved
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
@gventuri gventuri merged commit 24f4636 into release/v3 Dec 13, 2024
0 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:M This PR changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants