-
-
Notifications
You must be signed in to change notification settings - Fork 30.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-109653: Improve the import time of email.utils
#109824
Conversation
Misc/NEWS.d/next/Library/2023-09-25-10-47-22.gh-issue-109653.TUHrId.rst
Outdated
Show resolved
Hide resolved
Co-authored-by: Nikita Sobolev <[email protected]>
Co-authored-by: Adam Turner <[email protected]>
Lib/email/utils.py
Outdated
if attr == "make_msgid": | ||
from email._msgid import make_msgid | ||
return make_msgid | ||
raise AttributeError(f"module {__name__!r} has no attribute {attr!r}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I don't like performance hacks like this, but they do have their place. I can't speak to whether or not this is a worthwhile performance hack, you should seek approval from the maintainers of the impacted modules for that.
That said, the stdlib itself makes no use of make_msgid, and email.utils is not itself considered part of the new email API. Moving it into a separate module and making that part of the non-legacy public API of the email module would actually make some sense. I guess we'd just call it 'msgid'? Then this code should issue a deprecation warning pointing to the new way to import make_msgid. It feels kind of weird to have a module with just one function, but there isn't really anything else related to it that I can think of. (I wonder...maybe make_msgid actually belongs in the UUID module? Probably not. Wrong RFC.)
This feels a little strange to me. Should we instead just defer the imports? Might even be faster to do so |
Yeah, that might be a better shout. Have been meaning to give it a go — just haven't got round to it yet :) I'm not really enthused about the idea of a painful deprecation period in order to change the place where people are "meant" to import it from. This function is pretty widely used by third-party packages, and the improvement in import time doesn't seem worth the disruption to me :) |
…/cpython into email-message-import
I've updated the PR. I abandoned the idea of adding a new submodule; now I just defer the problematic |
Co-authored-by: Adam Turner <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
@@ -94,6 +88,8 @@ def formataddr(pair, charset='utf-8'): | |||
name.encode('ascii') | |||
except UnicodeEncodeError: | |||
if isinstance(charset, str): | |||
# lazy import to improve module import time | |||
from email.charset import Charset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I like this one, only fine with it since formataddr
doesn't look too widely used (a lot of the time it's nice to pay these costs upfront, predictable performance is important, e.g. don't want the first request your webserver serves to be randomly slow)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(a lot of the time it's nice to pay these costs upfront, predictable performance is important
agreed. On the other hand, though, the email
package goes in quite heavily for lazy imports in some other places, so this does seem in keeping with that general philosophy:
Lines 28 to 61 in f786029
# Some convenience routines. Don't import Parser and Message as side-effects | |
# of importing email since those cascadingly import most of the rest of the | |
# email package. | |
def message_from_string(s, *args, **kws): | |
"""Parse a string into a Message object model. | |
Optional _class and strict are passed to the Parser constructor. | |
""" | |
from email.parser import Parser | |
return Parser(*args, **kws).parsestr(s) | |
def message_from_bytes(s, *args, **kws): | |
"""Parse a bytes string into a Message object model. | |
Optional _class and strict are passed to the Parser constructor. | |
""" | |
from email.parser import BytesParser | |
return BytesParser(*args, **kws).parsebytes(s) | |
def message_from_file(fp, *args, **kws): | |
"""Read a file and parse its contents into a Message object model. | |
Optional _class and strict are passed to the Parser constructor. | |
""" | |
from email.parser import Parser | |
return Parser(*args, **kws).parse(fp) | |
def message_from_binary_file(fp, *args, **kws): | |
"""Read a binary file and parse its contents into a Message object model. | |
Optional _class and strict are passed to the Parser constructor. | |
""" | |
from email.parser import BytesParser | |
return BytesParser(*args, **kws).parse(fp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be happy to change it so it's imported at the top of the function if you think that'd be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah that'd be worse, module level or here. I'm fine with this as is!
This patch reduces the import time of
email.utils
by around 46%.Why do I care about
email.utils
? Well,email.utils
is imported byemail.message
, andemail.message
is imported by lots of other stdlib modules:urllib.parse
,mailbox
, andimportlib.metadata
. Improving the import time of this module has a cascading effect through lots of the rest of the standard library.A lot of the import cost of
email.utils
has to do with themake_msgid
function, which requires therandom
andsocket
modules. This function is used by many third-party libraries, but isn't used at all by any of the stdlib modules that importemail.utils
. This patch moves the function into a separate submodule.