Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-109653: typing.py: improve import time by creating soft-deprecated members on demand #109651

Merged
merged 4 commits into from
Sep 23, 2023

Conversation

AlexWaygood
Copy link
Member

@AlexWaygood AlexWaygood commented Sep 21, 2023

This cuts about 1/3 off the import time of typing.py for me locally (PGO-optimised, non-debug build). 0.012s -> 0.008s.

Benchmark script

(There's probably a better way of benchmarking import times, but this seems to work pretty well.)

import sys
import time
import statistics
import subprocess

times = []

for _ in range(500):
    ret = subprocess.run(
        [sys.executable, "-c", f"import time; t0 = time.perf_counter(); import typing; print(time.perf_counter() - t0)"],
        check=True,
        capture_output=True,
        text=True
    )
    times.append(float(ret.stdout.strip()))

print(statistics.mean(times))

@AlexWaygood AlexWaygood added the 🔨 test-with-refleak-buildbots Test PR w/ refleak buildbots; report in status section label Sep 21, 2023
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @AlexWaygood for commit 180461f 🤖

If you want to schedule another build, you need to add the 🔨 test-with-refleak-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-refleak-buildbots Test PR w/ refleak buildbots; report in status section label Sep 21, 2023
Lib/typing.py Outdated Show resolved Hide resolved
@AlexWaygood AlexWaygood changed the title Improve the import time of typing.py gh-109653: Improve the import time of typing.py Sep 21, 2023
@AlexWaygood AlexWaygood changed the title gh-109653: Improve the import time of typing.py gh-109653: typing.py: improve import time by creating soft-deprecated members on demand Sep 21, 2023
@AA-Turner
Copy link
Member

Would it be going to far to use this approach for all of the _alias types?

If it would be, perhaps we could defer the collections types? By doing so (& changing _overload_registry to use .setdefault()), we'd be able to avoid importing collections and just use _collections_abc (which is already imported in e.g. os).

A

@AlexWaygood
Copy link
Member Author

AlexWaygood commented Sep 21, 2023

Would it be going to far to use this approach for all of the _alias types?

The main reason why the changes here save us so much is because we can avoid unconditionally importing re and contextlib -- the creation of these aliases themselves is pretty cheap, I think. So it would probably only be worth it for the others if we could remove more dependencies on the rest of the stdlib. And I'm not sure about this:

changing _overload_registry to use .setdefault()

IIRC, defaultdict is a fair bit faster than setdefault(), and the _overload_registry is quite performance-sensitive. (We spent a lot of time in #31716 trying to figure out how to implement it without having a huge performance regression for typing.overload!)

@AA-Turner
Copy link
Member

I can just about see a path to inlining functools, but it would also rely on changing _overload_registry -- so I think this PR is a sensible limit for now -- there are no other 'heavy' imports, as far as I can see.

A

@AlexWaygood
Copy link
Member Author

AlexWaygood commented Sep 21, 2023

FWIW @AA-Turner, here's a diff (relative to this PR branch) that shaves another 0.002s off the import time by avoiding the collections import. But it's a much bigger diff, for a smaller gain, so I'd prefer to defer it to its own PR and consider it separately. I'm honestly not sure it's worth the cost to readability, personally:

diff --git a/Lib/typing.py b/Lib/typing.py
index 84b741bfb0..0218b715f6 100644
--- a/Lib/typing.py
+++ b/Lib/typing.py
@@ -19,9 +19,7 @@
 """

 from abc import abstractmethod, ABCMeta
-import collections
-from collections import defaultdict
-import collections.abc
+import _collections_abc
 import copyreg
 import functools
 import operator
@@ -40,6 +38,11 @@
     Generic,
 )

+try:
+    from _collections import defaultdict
+except ImportError:
+    from collections import defaultdict
+
 # Please keep __all__ alphabetized within each category.
 __all__ = [
     # Super-special typing primitives.
@@ -219,7 +222,7 @@ def _should_unflatten_callable_args(typ, args):
     we need to unflatten it.
     """
     return (
-        typ.__origin__ is collections.abc.Callable
+        typ.__origin__ is _collections_abc.Callable
         and not (len(args) == 2 and _is_param_expr(args[0]))
     )

@@ -1317,7 +1320,7 @@ def _make_substitution(self, args, new_arg_by_param):
                             subargs.append(new_arg_by_param[x])
                     new_arg = old_arg[tuple(subargs)]

-            if self.__origin__ == collections.abc.Callable and isinstance(new_arg, tuple):
+            if self.__origin__ == _collections_abc.Callable and isinstance(new_arg, tuple):
                 # Consider the following `Callable`.
                 #   C = Callable[[int], str]
                 # Here, `C.__args__` should be (int, str) - NOT ([int], str).
@@ -1885,7 +1888,7 @@ def _proto_hook(cls, other):

             # ...or in annotations, if it is a sub-protocol.
             annotations = getattr(base, '__annotations__', {})
-            if (isinstance(annotations, collections.abc.Mapping) and
+            if (isinstance(annotations, _collections_abc.Mapping) and
                     attr in annotations and
                     issubclass(other, Generic) and getattr(other, '_is_protocol', False)):
                 break
@@ -2521,18 +2524,18 @@ class Other(Leaf):  # Error reported by type checker
 # Various ABCs mimicking those in collections.abc.
 _alias = _SpecialGenericAlias

-Hashable = _alias(collections.abc.Hashable, 0)  # Not generic.
-Awaitable = _alias(collections.abc.Awaitable, 1)
-Coroutine = _alias(collections.abc.Coroutine, 3)
-AsyncIterable = _alias(collections.abc.AsyncIterable, 1)
-AsyncIterator = _alias(collections.abc.AsyncIterator, 1)
-Iterable = _alias(collections.abc.Iterable, 1)
-Iterator = _alias(collections.abc.Iterator, 1)
-Reversible = _alias(collections.abc.Reversible, 1)
-Sized = _alias(collections.abc.Sized, 0)  # Not generic.
-Container = _alias(collections.abc.Container, 1)
-Collection = _alias(collections.abc.Collection, 1)
-Callable = _CallableType(collections.abc.Callable, 2)
+Hashable = _alias(_collections_abc.Hashable, 0)  # Not generic.
+Awaitable = _alias(_collections_abc.Awaitable, 1)
+Coroutine = _alias(_collections_abc.Coroutine, 3)
+AsyncIterable = _alias(_collections_abc.AsyncIterable, 1)
+AsyncIterator = _alias(_collections_abc.AsyncIterator, 1)
+Iterable = _alias(_collections_abc.Iterable, 1)
+Iterator = _alias(_collections_abc.Iterator, 1)
+Reversible = _alias(_collections_abc.Reversible, 1)
+Sized = _alias(_collections_abc.Sized, 0)  # Not generic.
+Container = _alias(_collections_abc.Container, 1)
+Collection = _alias(_collections_abc.Collection, 1)
+Callable = _CallableType(_collections_abc.Callable, 2)
 Callable.__doc__ = \
     """Deprecated alias to collections.abc.Callable.

@@ -2547,15 +2550,15 @@ class Other(Leaf):  # Error reported by type checker
     There is no syntax to indicate optional or keyword arguments;
     such function types are rarely used as callback types.
     """
-AbstractSet = _alias(collections.abc.Set, 1, name='AbstractSet')
-MutableSet = _alias(collections.abc.MutableSet, 1)
+AbstractSet = _alias(_collections_abc.Set, 1, name='AbstractSet')
+MutableSet = _alias(_collections_abc.MutableSet, 1)
 # NOTE: Mapping is only covariant in the value type.
-Mapping = _alias(collections.abc.Mapping, 2)
-MutableMapping = _alias(collections.abc.MutableMapping, 2)
-Sequence = _alias(collections.abc.Sequence, 1)
-MutableSequence = _alias(collections.abc.MutableSequence, 1)
+Mapping = _alias(_collections_abc.Mapping, 2)
+MutableMapping = _alias(_collections_abc.MutableMapping, 2)
+Sequence = _alias(_collections_abc.Sequence, 1)
+MutableSequence = _alias(_collections_abc.MutableSequence, 1)
 ByteString = _DeprecatedGenericAlias(
-    collections.abc.ByteString, 0, removal_version=(3, 14)  # Not generic.
+    _collections_abc.ByteString, 0, removal_version=(3, 14)  # Not generic.
 )
 # Tuple accepts variable number of parameters.
 Tuple = _TupleType(tuple, -1, inst=False, name='Tuple')
@@ -2571,20 +2574,15 @@ class Other(Leaf):  # Error reported by type checker
     To specify a variable-length tuple of homogeneous type, use Tuple[T, ...].
     """
 List = _alias(list, 1, inst=False, name='List')
-Deque = _alias(collections.deque, 1, name='Deque')
 Set = _alias(set, 1, inst=False, name='Set')
 FrozenSet = _alias(frozenset, 1, inst=False, name='FrozenSet')
-MappingView = _alias(collections.abc.MappingView, 1)
-KeysView = _alias(collections.abc.KeysView, 1)
-ItemsView = _alias(collections.abc.ItemsView, 2)
-ValuesView = _alias(collections.abc.ValuesView, 1)
+MappingView = _alias(_collections_abc.MappingView, 1)
+KeysView = _alias(_collections_abc.KeysView, 1)
+ItemsView = _alias(_collections_abc.ItemsView, 2)
+ValuesView = _alias(_collections_abc.ValuesView, 1)
 Dict = _alias(dict, 2, inst=False, name='Dict')
-DefaultDict = _alias(collections.defaultdict, 2, name='DefaultDict')
-OrderedDict = _alias(collections.OrderedDict, 2)
-Counter = _alias(collections.Counter, 1)
-ChainMap = _alias(collections.ChainMap, 2)
-Generator = _alias(collections.abc.Generator, 3)
-AsyncGenerator = _alias(collections.abc.AsyncGenerator, 2)
+Generator = _alias(_collections_abc.Generator, 3)
+AsyncGenerator = _alias(_collections_abc.AsyncGenerator, 2)
 Type = _alias(type, 1, inst=False, name='Type')
 Type.__doc__ = \
     """Deprecated alias to builtins.type.
@@ -2692,8 +2690,8 @@ def _make_nmtuple(name, types, module, defaults = ()):
     fields = [n for n, t in types]
     types = {n: _type_check(t, f"field {n} annotation must be a type")
              for n, t in types}
-    nm_tpl = collections.namedtuple(name, fields,
-                                    defaults=defaults, module=module)
+    from collections import namedtuple
+    nm_tpl = namedtuple(name, fields, defaults=defaults, module=module)
     nm_tpl.__annotations__ = nm_tpl.__new__.__annotations__ = types
     return nm_tpl

@@ -3432,6 +3430,17 @@ def __getattr__(attr):
     elif attr in {"ContextManager", "AsyncContextManager"}:
         import contextlib
         obj = _alias(getattr(contextlib, f"Abstract{attr}"), 1, name=attr)
+    elif attr in {"Deque", "DefaultDict", "OrderedDict", "Counter", "ChainMap"}:
+        import collections
+        match attr:
+            case "Deque":
+                obj = _alias(collections.deque, 1, name="Deque")
+            case "DefaultDict":
+                obj = _alias(collections.defaultdict, 2, name="DefaultDict")
+            case "OrderedDict" | "ChainMap":
+                obj = _alias(getattr(collections, attr), 2)
+            case "Counter":
+                obj = _alias(collections.Counter, 1)
     else:
         raise AttributeError(f"Module 'typing' has no attribute {attr!r}")
     globals()[attr] = obj

@AA-Turner
Copy link
Member

AA-Turner commented Sep 21, 2023

a smaller gain

Though still 25% (on the 8ms post-this PR) or 50% total incl. this PR.

I agree this PR is both bigger bang-for-buck and should come first (I've no comments save applying Tom's suggestion), but for e.g. CLIs, milli-seconds can matter in feeling responsive, so I think worth at least considering the further improvement!

A

@JelleZijlstra
Copy link
Member

I'm a bit skeptical of this because it relies on avoiding imports of common standard library modules only: how many real applications are there going to be that want to import typing but not re or contextlib?

On the other hand, the change in this PR is fairly small and self-contained, so I won't oppose it. But we shouldn't go through extremely lengths to avoid importing common stdlib modules.

Co-authored-by: Thomas Grainger <[email protected]>
@AlexWaygood
Copy link
Member Author

I'm a bit skeptical of this because it relies on avoiding imports of common standard library modules only: how many real applications are there going to be that want to import typing but not re or contextlib?

Well, for one example, from a quick grep, pluggy appears to import neither re nor contextlib in their src/pluggy/ directory. The pluggy maintainers were the original people who raised concern about the import time of typing, among other stdlib modules, in https://discuss.python.org/t/deferred-computation-evalution-for-toplevels-imports-and-dataclasses/34173

More than that, though, I think that it's generally good to do this kind of thing (where it's not significantly detrimental to code readability). It's best to reduce dependencies between stdlib modules wherever possible, and make it clearer exactly why certain stdlib modules depend on others. It helps reduce the extent to which the stdlib is a massive import cycle.

But we shouldn't go through extremely lengths to avoid importing common stdlib modules.

I definitely agree with you there :)

@JelleZijlstra
Copy link
Member

Well, for one example, from a quick grep, pluggy appears to import neither re nor contextlib in their src/pluggy/ directory.

But they do use importlib.metadata, which imports both re (in importlib/metadata/_adapters.py) and contextlib (in importlib/metadata/__init__.py). That's just one example of course, but it illustrates my point that a lot of applications won't benefit from deferring these imports.

@AlexWaygood
Copy link
Member Author

But they do use importlib.metadata, which imports both re (in importlib/metadata/_adapters.py) and contextlib (in importlib/metadata/__init__.py). That's just one example of course, but it illustrates my point that a lot of applications won't benefit from deferring these imports.

Sure, it's unlikely that this PR is going to hugely improve the import time of pluggy by itself. But the conclusion that leads me to isn't "Working on reducing the number of imports in typing is pointless", it's "Both typing and importlib.metadata should work on optimising their import times". importlib.metadata could use exactly the same rationale for not working on reducing the number of imports they have ("Well, typing still imports re, so what's the point?"). One of the two has to come first.

@AlexWaygood AlexWaygood added performance Performance or resource usage topic-typing 3.13 bugs and security fixes labels Sep 21, 2023
Copy link
Member

@JelleZijlstra JelleZijlstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are good points. Let's do it.

Copy link
Contributor

@hauntsaninja hauntsaninja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I'm fine with this, but #109651 (comment) crosses the line for me.

@AlexWaygood AlexWaygood merged commit e8be0c9 into python:main Sep 23, 2023
25 checks passed
@AlexWaygood AlexWaygood deleted the fewer-imports-in-typing branch September 23, 2023 07:46
@AlexWaygood
Copy link
Member Author

Thanks all for the helpful reviews!

@bedevere-bot

This comment was marked as off-topic.

@bedevere-bot

This comment was marked as off-topic.

@bedevere-bot

This comment was marked as off-topic.

@bedevere-bot
Copy link

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot PPC64LE Fedora Stable 3.x has failed when building commit e8be0c9.

What do you need to do:

  1. Don't panic.
  2. Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/all/#builders/90/builds/3814) and take a look at the build logs.
  4. Check if the failure is related to this commit (e8be0c9) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/all/#builders/90/builds/3814

Failed tests:

  • test_math

Summary of the results of the build (if available):

==

Click to see traceback logs
remote: Enumerating objects: 11, done.        
remote: Counting objects:   9% (1/11)        
remote: Counting objects:  18% (2/11)        
remote: Counting objects:  27% (3/11)        
remote: Counting objects:  36% (4/11)        
remote: Counting objects:  45% (5/11)        
remote: Counting objects:  54% (6/11)        
remote: Counting objects:  63% (7/11)        
remote: Counting objects:  72% (8/11)        
remote: Counting objects:  81% (9/11)        
remote: Counting objects:  90% (10/11)        
remote: Counting objects: 100% (11/11)        
remote: Counting objects: 100% (11/11), done.        
remote: Compressing objects:  12% (1/8)        
remote: Compressing objects:  25% (2/8)        
remote: Compressing objects:  37% (3/8)        
remote: Compressing objects:  50% (4/8)        
remote: Compressing objects:  62% (5/8)        
remote: Compressing objects:  75% (6/8)        
remote: Compressing objects:  87% (7/8)        
remote: Compressing objects: 100% (8/8)        
remote: Compressing objects: 100% (8/8), done.        
remote: Total 11 (delta 3), reused 9 (delta 3), pack-reused 0        
From https://github.com/python/cpython
 * branch                  main       -> FETCH_HEAD
Note: switching to 'e8be0c9c5a7c2327b3dd64009f45ee0682322dcb'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at e8be0c9c5a gh-109653: `typing.py`: improve import time by creating soft-deprecated members on demand (#109651)
Switched to and reset branch 'main'

../Objects/longobject.c: In function ‘long_format_binary’:
../Objects/longobject.c:2120:13: warning: ‘kind’ may be used uninitialized [-Wmaybe-uninitialized]
 2120 |     else if (kind == PyUnicode_1BYTE_KIND) {
      |             ^
../Objects/longobject.c:1996:9: note: ‘kind’ was declared here
 1996 |     int kind;
      |         ^~~~
../Objects/longobject.c: In function ‘long_to_decimal_string_internal’:
../Objects/longobject.c:1943:13: warning: ‘kind’ may be used uninitialized [-Wmaybe-uninitialized]
 1943 |     else if (kind == PyUnicode_1BYTE_KIND) {
      |             ^
../Objects/longobject.c:1767:9: note: ‘kind’ was declared here
 1767 |     int kind;
      |         ^~~~

Kill <WorkerThread #2 running test=test_gdb pid=713028 time=6 min 8 sec> process group
Kill <WorkerThread #8 running test=test_tokenize pid=713190 time=5 min 57 sec> process group
Kill <WorkerThread #9 running test=test_unparse pid=715520 time=3 min 58 sec> process group
Kill <WorkerThread #10 running test=test_tarfile pid=718168 time=2 min 48 sec> process group
Warning -- Failed to join <WorkerThread #2 running test=test_gdb pid=713028 time=6 min 38 sec> in 30.0 sec
Warning -- Failed to join <WorkerThread #9 running test=test_unparse pid=715520 time=4 min 30 sec> in 31.0 sec
Warning -- Failed to join <WorkerThread #10 running test=test_tarfile pid=718168 time=3 min 20 sec> in 32.0 sec
make: *** [Makefile:2039: buildbottest] Error 5

csm10495 pushed a commit to csm10495/cpython that referenced this pull request Sep 28, 2023
Glyphack pushed a commit to Glyphack/cpython that referenced this pull request Sep 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 bugs and security fixes performance Performance or resource usage topic-typing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants