MAX_SIZE minor bug fix #99

parasteh · 2018-12-05T21:44:42Z

Hi,
Thanks to your work, I just used it and noticed there is a minor problem with max_size, actually, it did not work and files with size larger than I defined just downloaded. So after a while, I figured out there should be a comparison with media.document.size and max_size. I did it, hope it works

Again thank you, It saved me a lot of hours.
BR

…omparison between the media size and max_size ! I just did a minor change and hope it helps.

Lonami

Undo the .gitignore change and confirm my question about None. Besides that, looks good!

Lonami · 2018-12-05T22:48:00Z

.gitignore

@@ -85,6 +85,9 @@ target/
 # pyenv
 .python-version

+#pycharm
+.idea


This belongs in your global .gitignore, so please undo this change.

Lonami · 2018-12-05T22:49:05Z

telegram_export/downloader.py

@@ -80,7 +80,8 @@ def _check_media(self, media):
        """
        Checks whether the given MessageMedia should be downloaded or not.
        """
-        if not media or not self.max_size:
+        # It is needed to chek the size with the max_size defined in config file
+        if not media or media.document.size > self.max_size:


Did you check at least twice the Telethon documentation to make sure document is not None and size is not None (or they can't be None, not present at all)? We don't want an error here.

I will fix it, let you know the result

There are some points need consideration, Media has multiple types and it is not easy to get the size of each object, I mean it is depended to the Type and calling media.document.size will only work for document type and no other. I think we need to use polymorphism to do that. each type has to implement an interface for general attributes like size, so calling something like media.getsize() return the size of the file regardless of the type of it,

each type has to implement an interface for general attributes like size

More like some utils method get_size(media), unless you're willing to do this change in Telethon which would be by far harder :)

yes, I found an easier way to do that, but not sure yet, so far I realized that just document types have size attribute, so when calling export_utils.get_media_type(media) returns "document" we can refer to media.document.size and compare it to the max_size,
What's your idea?

also in utils there is a method named get_file_location(media):, So already it has been implemented, I will use it and after test I will commit the code

Just add another function to utils.py to get_file_size for media, comparing media type with isinstance.

its done !, how can I pull the changes ?

parasteh · 2018-12-05T23:31:34Z

Sorry, there is a problem for media types that are not document, I made a mistake!,
Cause I need it for document types. I will fix it and commit the changes

…xsize which is defined by user, and in some cases there were a minor bug with filename, it threw error when file names consist os restricted characters like :, I fixed it

parasteh

I have changed the gitignore, I finished the code to consider the maxfile size, it is now working, I have tested it for both new exporting files as well as download-past-media option, Also there was a minor filename issue with restricted characters like ':' I solve it by adding a pice of code from here filename

Lonami

After a careful review there are a lot other things to consider and irrelevant changes that do not belong here.

Lonami · 2018-12-07T09:24:21Z

telegram_export/utils.py

+
+def format_filename(s):
+    """Take a string and return a valid filename constructed from the string.
+Uses a whitelist approach: any characters not present in valid_chars are


This formatting is terrible and not consistent with the rest at all. A simple:

"""Removes invalid file characters from the input string"""

Would be enough.

Lonami · 2018-12-07T09:24:56Z

telegram_export/utils.py

+an invalid filename.
+
+"""
+    valid_chars = "-_.() %s%s" % (string.ascii_letters, string.digits)


What about people using Chinese, Japanese, Persian or any other language that don't have ASCII characters in the file name?

you are right, I had to consider that, Thank you for the point, I will fix it

Lonami · 2018-12-07T09:25:09Z

telegram_export/utils.py

+"""
+    valid_chars = "-_.() %s%s" % (string.ascii_letters, string.digits)
+    filename = ''.join(c for c in s if c in valid_chars)
+    filename = filename.replace(' ', '_')  # I don't like spaces in filenames.


We don't care about personal opinions in the code.

Yeah, I just used the code I refered, So I will write my own code to fix it again, Thank you for your comments :)

Lonami · 2018-12-07T09:25:24Z

telegram_export/utils.py

+
+"""
+    valid_chars = "-_.() %s%s" % (string.ascii_letters, string.digits)
+    filename = ''.join(c for c in s if c in valid_chars)


A far better option is to use a regex and replace all "invalid" characters with _…

def format_filename(s): """Removes invalid file characters from the input string""" return re.sub(r'[\\/:"*?<>|]', '_', s)
what about this code ? this restriction is only for windows file system, is it necessary to make it conditional upon os type ? I mean only do it for windows os

That code is much better, I would not normally use those characters in any filename either way. But, I still think this should be a separate pull request.

Me neither, I think for some reasons it uses time for the file name! so the ':' in time format caused the exception for me! the code I have submitted is working and using it downloaded around 12000 docs from two separated group

Lonami · 2018-12-07T09:25:32Z

telegram_export/utils.py

+    valid_chars = "-_.() %s%s" % (string.ascii_letters, string.digits)
+    filename = ''.join(c for c in s if c in valid_chars)
+    filename = filename.replace(' ', '_')  # I don't like spaces in filenames.
+    return filename


No newline at the end of the file.

Lonami · 2018-12-07T09:25:58Z

telegram_export/utils.py

@@ -259,3 +260,20 @@ def parse_proxy_str(proxy_str):
    else:
        proxy = (proxy_type, host, port)
    return proxy
+
+
+def format_filename(s):


This change is irrelevant to the pull request and should not belong here. If you want to make this change use a new pull request.

Lonami · 2018-12-07T09:26:12Z

telegram_export/downloader.py

@@ -254,6 +267,9 @@ def _get_name(self, peer_id):
        if not ext:
            ext = export_utils.get_extension(media_row[4])

+        if isinstance(filename, str):
+            filename = export_utils.format_filename(filename)


This change is irrelevant to the pull request and should not belong here. If you want to make this change use a new pull request.

please let me know how to do this, I think I have to create a new branch and commit this part of code there and then create a new pull request? right?

Making a new branch is a bit harder because you have to revert the changes you have made in master first. Easiest way for you is:

Wait until MAX_SIZE bug fix is merged.

Delete your fork.

Fork again.

Make new changes.

But you can look up online how to do it without deleting the fork if you want.

#99 (comment)

Lonami · 2018-12-07T09:26:38Z

telegram_export/downloader.py

@@ -82,9 +81,18 @@ def _check_media(self, media):
        """
        if not media or not self.max_size:
            return False
+
        if not self.types:
            return True


This bypasses the max_size check. Are you sure this is the expected behaviour?

Yes it is if you check the config file, it says "Setting to "0" will not download any media."
So I think there is no problem with this one

No, that's not what I meant. I mean, if there are no types (i.e. "all media is valid"), the maximum size is not checked at all.

That's right, So the file size should be checked before this line!

telegram_export/downloader.py

parasteh added 3 commits December 6, 2018 00:58

bug fix, MAXSIZE defined in config did not work cause there were no c…

abe7caf

…omparison between the media size and max_size ! I just did a minor change and hope it helps.

Add .idea to the list

b40ec53

remove .idea from the remote repo

716c248

Lonami suggested changes Dec 5, 2018

View reviewed changes

parasteh added 2 commits December 7, 2018 00:14

2 correction, first file_size is now working, it get compared with ma…

874af06

…xsize which is defined by user, and in some cases there were a minor bug with filename, it threw error when file names consist os restricted characters like :, I fixed it

undo changes in gitignore file

f2869c3

parasteh commented Dec 6, 2018

View reviewed changes

Better formatting

0b0e3c7

Lonami suggested changes Dec 7, 2018

View reviewed changes

Lonami mentioned this pull request Dec 21, 2018

Unsafe chat names when downloading media #100

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MAX_SIZE minor bug fix #99

MAX_SIZE minor bug fix #99

parasteh commented Dec 5, 2018

Lonami left a comment

Lonami Dec 5, 2018

Lonami Dec 5, 2018

parasteh Dec 5, 2018

parasteh Dec 6, 2018 •

edited

Loading

Lonami Dec 6, 2018

parasteh Dec 6, 2018

parasteh Dec 6, 2018 •

edited

Loading

Lonami Dec 6, 2018

parasteh Dec 6, 2018

parasteh commented Dec 5, 2018

parasteh left a comment

Lonami left a comment

Lonami Dec 7, 2018

Lonami Dec 7, 2018

parasteh Dec 8, 2018

Lonami Dec 7, 2018

parasteh Dec 8, 2018

Lonami Dec 7, 2018

parasteh Dec 8, 2018

Lonami Dec 8, 2018

parasteh Dec 8, 2018

Lonami Dec 7, 2018

Lonami Dec 7, 2018

Lonami Dec 7, 2018

parasteh Dec 8, 2018 •

edited

Loading

Lonami Dec 8, 2018

Lonami Dec 8, 2018

Lonami Dec 7, 2018

parasteh Dec 8, 2018

Lonami Dec 8, 2018

parasteh Dec 8, 2018

MAX_SIZE minor bug fix #99

Are you sure you want to change the base?

MAX_SIZE minor bug fix #99

Conversation

parasteh commented Dec 5, 2018

Lonami left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

parasteh Dec 6, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

parasteh Dec 6, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

parasteh commented Dec 5, 2018

parasteh left a comment

Choose a reason for hiding this comment

Lonami left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

parasteh Dec 8, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

parasteh Dec 6, 2018 •

edited

Loading

parasteh Dec 6, 2018 •

edited

Loading

parasteh Dec 8, 2018 •

edited

Loading