Skip to content
This repository has been archived by the owner on Apr 25, 2024. It is now read-only.

gnsync has problem with mutated vowels #100

Open
niklassemmler opened this issue Sep 11, 2018 · 0 comments
Open

gnsync has problem with mutated vowels #100

niklassemmler opened this issue Sep 11, 2018 · 0 comments
Labels
unicode_issue issues related to unicode / utf-8 encoding

Comments

@niklassemmler
Copy link

niklassemmler commented Sep 11, 2018

geeknote version: 2.0.15

Hi there,

I tried uploading a bunch of documents that contained mutated vowels (Umlaute in German) and these did not end up in evernote.

The problem is that gnsync first converts the file into ascii:

    @log
    def _get_file_content(self, path):
        """
        Get file content.
        """
        with codecs.open(path, 'r', encoding='utf-8') as f:
            content = f.read()

        # strip unprintable characters
        content = content.encode('ascii', errors='xmlcharrefreplace') # <--- HERE!
        content = Editor.textToENML(content=content, raise_ex=True, format=self.format)

And then converts it back to unicode

    @staticmethod
    def textToENML(content, raise_ex=False, format='markdown', rawmd=False):
        """
        Transform formatted text to ENML
        """

        if not isinstance(content, str): # <--- does not allow unicode
            content = ""                        # <--- same
        try:
            content = unicode(content, "utf-8") # <--- breaks mutated vowels
            # add 2 space before new line in paragraph for creating br tags
            content = re.sub(r'([^\r\n])([\r\n])([^\r\n])', r'\1  \n\3', content)
            # content = re.sub(r'\r\n', '\n', content)

Commenting these lines out solved the issue for me, but I did not dig deeper so there might be other problems now.

Cheers,
Niklas

@jeffkowalski jeffkowalski added the unicode_issue issues related to unicode / utf-8 encoding label Aug 9, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
unicode_issue issues related to unicode / utf-8 encoding
Projects
None yet
Development

No branches or pull requests

2 participants