Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] issue-44: Add support for file_encoding in sources #46

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ChillarAnand
Copy link
Contributor

Description

Allow users to define custom encoding for files.

Changes Made

Add file_encoding support in sources.

Definition of Done

Before submitting this pull request, please ensure that the following criteria have been met:

  • All automated tests have passed successfully.
  • All manual tests have passed successfully.
  • Code has been reviewed by at least one other team member.
  • Code has been properly documented and commented as needed.
  • All new and existing code adheres to our project's coding standards.
  • All dependencies have been added or removed from the project's README or other documentation as needed.
  • Any relevant documentation or help files have been updated to reflect the changes made in this pull request.
  • Any necessary database migrations have been run.
  • Any relevant UI changes have been reviewed and approved by the UI/UX team.

Additional Notes

closes #44

Copy link
Contributor

@shpiyu shpiyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution Anand. I tried running the code in my local but ran into an error. It'd be great if you could fix those.

@ChillarAnand
Copy link
Contributor Author

Thanks for your contribution Anand. I tried running the code in my local but ran into an error. It'd be great if you could fix those.

Thanks, @shpiyu

I have updated the code to resolve the issues.

Copy link
Contributor

@swarnadhakad swarnadhakad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, @shpiyu Pls review and test.

@@ -31,6 +31,8 @@ def __init__(self, source, params_map):
self._src = source
else:
self._src = self.format_file_path(source, params_map)
if not source.get('file_encoding'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChillarAnand I was thinking to remove this default setting here and add defaults in pd.read_csv calls in file_reader and xml_reader. Like encoding=src.get('file_encoding', 'utf-8'). The benefit is, if we set default closer to where it is being used it'd be easier for the reader to understand that utf-8 is the default encoding. Let me know your thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can't read UTF-16 XML files
3 participants