Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for unicode utf-8 charmaps #97

Closed
thecodelearner opened this issue Jul 13, 2020 · 16 comments
Closed

Support for unicode utf-8 charmaps #97

thecodelearner opened this issue Jul 13, 2020 · 16 comments

Comments

@thecodelearner
Copy link
Contributor

thecodelearner commented Jul 13, 2020

Using mailmerge to send mass emails is better than using any gmail extensions, since i have the ability to use advanced markdown and html
however while working with a large (5k+) csv file of job applicants, i came across an issue that people had entered their names in a localized language (eg. mandarin, etc) that sometimes didn't fit the ascii charmaps.

It would be useful to open the csv file as encoding=utf8 with errors=ignore to avoid errors like this:
'charmap' codec can't decode byte 0x9d in position 1710: character maps to <undefined>
instead of just returning true if strings are ascii.

@thecodelearner
Copy link
Contributor Author

also I would like to contribute towards this but i am confused since i have not worked on any open source stuff on github, especially in python 😅

@awdeorio
Copy link
Owner

It sounds to me like perhaps there is a problem with the CSV file. I'm hesitant to ignore errors because some users might genuinely have a problem with their CSV file. Perhaps another solution is produce a better error message.

Could you post the smallest CSV file that creates this error? Maybe just one line?

@thecodelearner
Copy link
Contributor Author

thecodelearner commented Jul 13, 2020

Sure, here's a raw CSV which does throw an error

name,email
"大松 李",[email protected]
"Juán",[email protected]
"Jöse Felix",[email protected]
"Paveł",[email protected]

So anything like these might throw up the UnicodeDecodeError 😅

Yes a better error message would be ok and maybe just merge this with issue #96 to practically bypass and log errors for bad emails and/or parse errors

Or can't we just use utf8 encodings to enable support for multi languages?

@awdeorio
Copy link
Owner

Hmm, I'm having trouble replicating the problem using your database. Perhaps you could try attaching the csv file to this issue instead of copy-paste?

$ mailmerge --version
mailmerge, version 2.1.0
$ cat mailmerge_database.csv 
name,email
"大松 李",[email protected]
"Juán",[email protected]
"Jöse Felix",[email protected]
"Paveł",[email protected]
$ file mailmerge_database.csv 
mailmerge_database.csv: UTF-8 Unicode text
$ mailmerge --no-limit
>>> message 1
TO: [email protected]
SUBJECT: Testing mailmerge
FROM: My Self <[email protected]>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Date: Tue, 14 Jul 2020 11:31:47 -0000

Hi 大松 李

>>> message 1 sent
>>> message 2
TO: [email protected]
SUBJECT: Testing mailmerge
FROM: My Self <[email protected]>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Date: Tue, 14 Jul 2020 11:31:47 -0000

Hi Juán

>>> message 2 sent
>>> message 3
TO: [email protected]
SUBJECT: Testing mailmerge
FROM: My Self <[email protected]>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Date: Tue, 14 Jul 2020 11:31:47 -0000

Hi Jöse Felix

>>> message 3 sent
>>> message 4
TO: [email protected]
SUBJECT: Testing mailmerge
FROM: My Self <[email protected]>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Date: Tue, 14 Jul 2020 11:31:47 -0000

Hi Paveł

>>> message 4 sent
>>> This was a dry run.  To send messages, use the --no-dry-run option.

@thecodelearner
Copy link
Contributor Author

Seems like it must be an issue with my file.
Thanks for your help sire.
I'll test once again and if there's no issues, it would be better to close this issue

@thecodelearner
Copy link
Contributor Author

thecodelearner commented Jul 14, 2020

GitHub doesn't support csv uploads so here's a txt file of the csv:
mailmerge_database_txt.txt

edit: uploaded the mailmerge_database.csv here, just in case we have any issues with csv as a txt file
https://github.com/thecodelearner/tmp-files/blob/master/mailmerge_database.csv

also, I think that my python installation on windows is broken since the same file runs perfectly on linux, but gives the UnicodeDecodeError on Windows

---> Running mailmerge on Windows:

PS7 mailmerge-client> mailmerge
Traceback (most recent call last):
  File "c:\users\pc\appdata\local\programs\python\python38\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "c:\users\pc\appdata\local\programs\python\python38\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\PC\AppData\Local\Programs\Python\Python38\Scripts\mailmerge.exe\__main__.py", line 7, in <module>
  File "c:\users\pc\appdata\local\programs\python\python38\lib\site-packages\click\core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "c:\users\pc\appdata\local\programs\python\python38\lib\site-packages\click\core.py", line 782, in main
    rv = self.invoke(ctx)
  File "c:\users\pc\appdata\local\programs\python\python38\lib\site-packages\click\core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "c:\users\pc\appdata\local\programs\python\python38\lib\site-packages\click\core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "c:\users\pc\appdata\local\programs\python\python38\lib\site-packages\mailmerge\__main__.py", line 114, in main
    for _, row in enumerate_range(csv_database, start, stop):
  File "c:\users\pc\appdata\local\programs\python\python38\lib\site-packages\mailmerge\__main__.py", line 288, in enumerate_range
    for i, value in enumerate(iterable):
  File "c:\users\pc\appdata\local\programs\python\python38\lib\site-packages\mailmerge\__main__.py", line 273, in read_csv_database
    for row in reader:
  File "c:\users\pc\appdata\local\programs\python\python38\lib\csv.py", line 110, in __next__
    self.fieldnames
  File "c:\users\pc\appdata\local\programs\python\python38\lib\csv.py", line 97, in fieldnames
    self._fieldnames = next(self.reader)
  File "c:\users\pc\appdata\local\programs\python\python38\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 17: character maps to <undefined>

---> Running mailmerge on Linux with exact same database.csv:

root@Proton:/mnt/c/Users/PC/merge-test# cat mailmerge_database.csv
name,email
"大松 李",[email protected]
"Juán",[email protected]
"Jöse Felix",[email protected]
"Paveł",[email protected]
root@Proton:/mnt/c/Users/PC/merge-test# file mailmerge_database.csv
mailmerge_database.csv: CSV text
root@Proton:/mnt/c/Users/PC/merge-test# mailmerge --version
mailmerge, version 2.1.0
root@Proton:/mnt/c/Users/PC/merge-test# mailmerge
>>> message 1
TO: [email protected]
SUBJECT: Testing mailmerge
FROM: My Self <[email protected]>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Date: Tue, 14 Jul 2020 14:13:53 -0000

Hi, 大松 李

>>> message 1 sent
>>> Limit was 1 message.  To remove the limit, use the --no-limit option.
>>> This was a dry run.  To send messages, use the --no-dry-run option.

@seshrs
Copy link
Contributor

seshrs commented Jul 14, 2020

@awdeorio I wonder if this is related to the behavior of the open() method in Python. According to the docs, if no encoding is specified:

The default encoding is platform dependent (whatever locale.getpreferredencoding() returns)

On my Mac, this returns UTF-8 but I think the Windows default is something else. In fact, it looks like Python is planning to output warnings if no encoding is specified to open() (see PEP 597).


@thecodelearner do you want to try to verify if this fixes the issue for you? (It could make a great first-contribution! 😃 )

You'll need to "fork" the repository, clone your fork locally and setup your development environment as described in CONTRIBUTING.md. Then, find the spot where mailmerge opens the CSV file and change the encoding to UTF-8 as you had suggested.

Let us know if you have questions about any of this.

@thecodelearner
Copy link
Contributor Author

@seshrs awesome! I will try this.

@thecodelearner
Copy link
Contributor Author

thecodelearner commented Jul 14, 2020

Seems like some dependent modules are unix only, pytest fails at those.
As of opening the csv as UTF-8, I think it should work
As a safe measure here's the pytest log (running on windows, python 3.8 amd64)

(venv) PS7 mailmerge> pytest
=========================================================================== test session starts ===========================================================================
platform win32 -- Python 3.8.3, pytest-5.4.3, py-1.9.0, pluggy-0.13.1
rootdir: E:\CodeH\Open-Source\mailmerge
plugins: cov-2.10.0
collected 61 items / 2 errors / 59 selected

(venv) PS7 mailmerge> python3
(venv) PS7 mailmerge> pytest --cov ./mailmerge
=========================================================================== test session starts ===========================================================================
platform win32 -- Python 3.8.3, pytest-5.4.3, py-1.9.0, pluggy-0.13.1
rootdir: E:\CodeH\Open-Source\mailmerge
plugins: cov-2.10.0
collected 61 items / 2 errors / 59 selected

================================================================================= ERRORS ================================================================================== 
___________________________________________________________________ ERROR collecting tests/test_main.py ___________________________________________________________________ 
(venv) PS7 mailmerge> ls

    Directory: E:\CodeH\Open-Source\mailmerge

Mode                 LastWriteTime         Length Name
(venv) PS7 mailmerge> pytest --cov ./mailmerge
=========================================================================== test session starts ===========================================================================
platform win32 -- Python 3.8.3, pytest-5.4.3, py-1.9.0, pluggy-0.13.1
rootdir: E:\CodeH\Open-Source\mailmerge
plugins: cov-2.10.0
collected 61 items / 2 errors / 59 selected

================================================================================= ERRORS ==================================================================================
___________________________________________________________________ ERROR collecting tests/test_main.py ___________________________________________________________________
ImportError while importing test module 'E:\CodeH\Open-Source\mailmerge\tests\test_main.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
tests\test_main.py:13: in <module>
    import sh
c:\users\pc\appdata\local\programs\python\python38\lib\site-packages\sh.py:34: in <module>
    raise ImportError("sh %s is currently only supported on linux and osx. \
E   ImportError: sh 1.13.1 is currently only supported on linux and osx. please install pbs 0.110 (http://pypi.python.org/pypi/pbs) for windows support.
_______________________________________________________________ ERROR collecting tests/test_main_output.py ________________________________________________________________
ImportError while importing test module 'E:\CodeH\Open-Source\mailmerge\tests\test_main_output.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
tests\test_main_output.py:14: in <module>
    import sh
c:\users\pc\appdata\local\programs\python\python38\lib\site-packages\sh.py:34: in <module>
    raise ImportError("sh %s is currently only supported on linux and osx. \
E   ImportError: sh 1.13.1 is currently only supported on linux and osx. please install pbs 0.110 (http://pypi.python.org/pypi/pbs) for windows support.
============================================================================ warnings summary ============================================================================= 
c:\users\pc\appdata\local\programs\python\python38\lib\site-packages\future\standard_library\__init__.py:65
  c:\users\pc\appdata\local\programs\python\python38\lib\site-packages\future\standard_library\__init__.py:65: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
    import imp

-- Docs: https://docs.pytest.org/en/latest/warnings.html

----------- coverage: platform win32, python 3.8.3-final-0 -----------
Name                            Stmts   Miss  Cover
---------------------------------------------------
mailmerge\__init__.py               3      0   100%
mailmerge\__main__.py             128     89    30%
mailmerge\exceptions.py             1      0   100%
mailmerge\sendmail_client.py       54     45    17%
mailmerge\template_message.py     105     81    23%
mailmerge\utils.py                 20     11    45%
---------------------------------------------------
TOTAL                             311    226    27%

========================================================================= short test summary info ========================================================================= 
ERROR tests/test_main.py
ERROR tests/test_main_output.py
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 2 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 
====================================================================== 1 warning, 2 errors in 0.78s ======================================================================= 
(venv) PS7 mailmerge>

I'll switch to Linux and try running pytest again, with the updated encoding ofcourse.

@seshrs
Copy link
Contributor

seshrs commented Jul 14, 2020

@thecodelearner Were you able to dry-run mailmerge on Windows with the updated code? (I believe it should still work even if the unit tests don't.)

(Could you also tell us what locale.getpreferedencoding() returns for you in Python on your Windows machine?)

@thecodelearner
Copy link
Contributor Author

ah okay I'll post what locale.getpreferedencoding() returns when i log back into windows
for now, in linux there's a catch:

(env) $pip install --editable .[dev]        
zsh: no matches found: .[dev]

@seshrs
Copy link
Contributor

seshrs commented Jul 14, 2020

Based on https://stackoverflow.com/a/30539963, I think you'll need to escape the square brackets on zsh or perhaps surround the argument in quotes.

@thecodelearner
Copy link
Contributor Author

thecodelearner commented Jul 14, 2020

rookie mistake 😅

pytest report:

(env) $ pytest
'============================================ test session starts ============================================
platform linux -- Python 3.8.3, pytest-5.4.3, py-1.9.0, pluggy-0.13.1
rootdir: /home/sahil/code/Open-Source/mailmerge
plugins: cov-2.10.0
collected 96 items                                                                                          

tests/test_helpers.py ..............                                                                  [ 14%]
tests/test_main.py ...........................                                                        [ 42%]
tests/test_main_output.py ........                                                                    [ 51%]
tests/test_sendmail_client.py .............                                                           [ 64%]
tests/test_template_message.py ....................                                                   [ 85%]
tests/test_template_message_encodings.py ..............                                               [100%]

============================================= warnings summary ==============================================
env/lib/python3.8/site-packages/future/standard_library/__init__.py:65
  /home/sahil/code/Open-Source/mailmerge/env/lib/python3.8/site-packages/future/standard_library/__init__.py:65: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
    import imp

-- Docs: https://docs.pytest.org/en/latest/warnings.html
======================================= 96 passed, 1 warning in 6.60s =======================================

test coverage reports:

env/lib/python3.8/site-packages/future/standard_library/__init__.py:65
  /home/sahil/code/Open-Source/mailmerge/env/lib/python3.8/site-packages/future/standard_library/__init__.py:65: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
    import imp

-- Docs: https://docs.pytest.org/en/latest/warnings.html

----------- coverage: platform linux, python 3.8.3-final-0 -----------
Name                            Stmts   Miss  Cover
---------------------------------------------------
mailmerge/__init__.py               3      0   100%
mailmerge/__main__.py             128     30    77%
mailmerge/exceptions.py             1      0   100%
mailmerge/sendmail_client.py       54      0   100%
mailmerge/template_message.py     105      1    99%
mailmerge/utils.py                 20      2    90%
---------------------------------------------------
TOTAL                             311     33    89%

will get back to Windows and post what locale.getpreferedencoding() returns and try a dry run for mailmerge.

@thecodelearner
Copy link
Contributor Author

thecodelearner commented Jul 14, 2020

> locale.getpreferredencoding()
'cp1252'

https://en.wikipedia.org/wiki/Windows-1252

@thecodelearner
Copy link
Contributor Author

Yes, encoding="utf-8" solves the error on Windows:
here's a dry run

(env) PS7 mailmerge> mailmerge
>>> message 1
TO: [email protected]
SUBJECT: Testing mailmerge
FROM: My Self <[email protected]>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Date: Tue, 14 Jul 2020 15:48:03 -0000

Hi, 大松 李!

>>> message 1 sent
>>> Limit was 1 message.  To remove the limit, use the --no-limit option.
>>> This was a dry run.  To send messages, use the --no-dry-run option.

@awdeorio
Copy link
Owner

Closed by #98

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants