-
Notifications
You must be signed in to change notification settings - Fork 96
MAX_SIZE minor bug fix #99
base: master
Are you sure you want to change the base?
Conversation
…omparison between the media size and max_size ! I just did a minor change and hope it helps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Undo the .gitignore
change and confirm my question about None
. Besides that, looks good!
.gitignore
Outdated
@@ -85,6 +85,9 @@ target/ | |||
# pyenv | |||
.python-version | |||
|
|||
#pycharm | |||
.idea |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This belongs in your global .gitignore
, so please undo this change.
telegram_export/downloader.py
Outdated
@@ -80,7 +80,8 @@ def _check_media(self, media): | |||
""" | |||
Checks whether the given MessageMedia should be downloaded or not. | |||
""" | |||
if not media or not self.max_size: | |||
# It is needed to chek the size with the max_size defined in config file | |||
if not media or media.document.size > self.max_size: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you check at least twice the Telethon documentation to make sure document is not None
and size is not None
(or they can't be None
, not present at all)? We don't want an error here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will fix it, let you know the result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some points need consideration, Media has multiple types and it is not easy to get the size of each object, I mean it is depended to the Type and calling media.document.size will only work for document type and no other. I think we need to use polymorphism to do that. each type has to implement an interface for general attributes like size, so calling something like media.getsize() return the size of the file regardless of the type of it,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
each type has to implement an interface for general attributes like size
More like some utils method get_size(media)
, unless you're willing to do this change in Telethon which would be by far harder :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I found an easier way to do that, but not sure yet, so far I realized that just document types have size attribute, so when calling export_utils.get_media_type(media) returns "document" we can refer to media.document.size and compare it to the max_size,
What's your idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also in utils there is a method named get_file_location(media):, So already it has been implemented, I will use it and after test I will commit the code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just add another function to utils.py
to get_file_size
for media, comparing media type with isinstance
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its done !, how can I pull the changes ?
Sorry, there is a problem for media types that are not document, I made a mistake!, |
…xsize which is defined by user, and in some cases there were a minor bug with filename, it threw error when file names consist os restricted characters like :, I fixed it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed the gitignore, I finished the code to consider the maxfile size, it is now working, I have tested it for both new exporting files as well as download-past-media option, Also there was a minor filename issue with restricted characters like ':' I solve it by adding a pice of code from here filename
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a careful review there are a lot other things to consider and irrelevant changes that do not belong here.
|
||
def format_filename(s): | ||
"""Take a string and return a valid filename constructed from the string. | ||
Uses a whitelist approach: any characters not present in valid_chars are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This formatting is terrible and not consistent with the rest at all. A simple:
"""Removes invalid file characters from the input string"""
Would be enough.
an invalid filename. | ||
|
||
""" | ||
valid_chars = "-_.() %s%s" % (string.ascii_letters, string.digits) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about people using Chinese, Japanese, Persian or any other language that don't have ASCII characters in the file name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right, I had to consider that, Thank you for the point, I will fix it
""" | ||
valid_chars = "-_.() %s%s" % (string.ascii_letters, string.digits) | ||
filename = ''.join(c for c in s if c in valid_chars) | ||
filename = filename.replace(' ', '_') # I don't like spaces in filenames. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't care about personal opinions in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I just used the code I refered, So I will write my own code to fix it again, Thank you for your comments :)
|
||
""" | ||
valid_chars = "-_.() %s%s" % (string.ascii_letters, string.digits) | ||
filename = ''.join(c for c in s if c in valid_chars) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A far better option is to use a regex and replace all "invalid" characters with _
…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def format_filename(s): """Removes invalid file characters from the input string""" return re.sub(r'[\\/:"*?<>|]', '_', s)
what about this code ? this restriction is only for windows file system, is it necessary to make it conditional upon os type ? I mean only do it for windows os
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That code is much better, I would not normally use those characters in any filename either way. But, I still think this should be a separate pull request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Me neither, I think for some reasons it uses time for the file name! so the ':' in time format caused the exception for me! the code I have submitted is working and using it downloaded around 12000 docs from two separated group
valid_chars = "-_.() %s%s" % (string.ascii_letters, string.digits) | ||
filename = ''.join(c for c in s if c in valid_chars) | ||
filename = filename.replace(' ', '_') # I don't like spaces in filenames. | ||
return filename |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No newline at the end of the file.
@@ -259,3 +260,20 @@ def parse_proxy_str(proxy_str): | |||
else: | |||
proxy = (proxy_type, host, port) | |||
return proxy | |||
|
|||
|
|||
def format_filename(s): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is irrelevant to the pull request and should not belong here. If you want to make this change use a new pull request.
@@ -254,6 +267,9 @@ def _get_name(self, peer_id): | |||
if not ext: | |||
ext = export_utils.get_extension(media_row[4]) | |||
|
|||
if isinstance(filename, str): | |||
filename = export_utils.format_filename(filename) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is irrelevant to the pull request and should not belong here. If you want to make this change use a new pull request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please let me know how to do this, I think I have to create a new branch and commit this part of code there and then create a new pull request? right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making a new branch is a bit harder because you have to revert the changes you have made in master
first. Easiest way for you is:
- Wait until
MAX_SIZE
bug fix is merged. - Delete your fork.
- Fork again.
- Make new changes.
But you can look up online how to do it without deleting the fork if you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -82,9 +81,18 @@ def _check_media(self, media): | |||
""" | |||
if not media or not self.max_size: | |||
return False | |||
|
|||
if not self.types: | |||
return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bypasses the max_size
check. Are you sure this is the expected behaviour?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is if you check the config file, it says "Setting to "0" will not download any media."
So I think there is no problem with this one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that's not what I meant. I mean, if there are no types (i.e. "all media is valid"), the maximum size is not checked at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right, So the file size should be checked before this line!
Hi,
Thanks to your work, I just used it and noticed there is a minor problem with max_size, actually, it did not work and files with size larger than I defined just downloaded. So after a while, I figured out there should be a comparison with media.document.size and max_size. I did it, hope it works
Again thank you, It saved me a lot of hours.
BR