Share Inception v3 network between FID, KID, Inception Score, and MiFID #2099

gavrieladler · 2023-09-21T21:10:09Z

🚀 Feature

A single copy of Inception v3 loaded on a GPU and shared between all the metrics which use it.

Motivation

FID, KID, Inception Score, and MiFID all use the same Inception v3 network, which uses 304MB of GPU memory when loaded. If you want to calculate all 4 scores while training, that's 1.2GB GPU memory per GPU. It would be ideal if the metrics could share a single copy of the network to minimize GPU memory usage.

Pitch

Torchmetrics should know if one metric which uses Inception V3 has been loaded and share the copy between the metrics, so when a second metric is created, a second copy of the network is not created.

Alternatives

Documentation / officially support the "unofficial non supported" way to do this -- pass a reference to the network from one metric into the __init__ of another metric. So for example, today you could do:

my_fid = torchmetrics.image.fid.FrechetInceptionDistance()
my_is = torchmetrics.image.inception.InceptionScore(feature=my_fid.inception)

However I am hesitant to rely on this as it depends on the internal details of metrics which are subject to change release to release. I would not know from release notes if this was broken or not.

A "parent metric" which created one copy of InceptionV3 and the metrics you selected to use it with, and calls to update / reset / compuse would in turn pass that down to each individual metric. This could get a bit messy as the __init__ APIs do not exactly line up.

Additional context

It would be doubly awesome if the solution allowed you to share activations as well as the network itself, as currently for each copy I'm calculating the same activations on each copy of the network.

Thanks so much!

The text was updated successfully, but these errors were encountered:

github-actions · 2023-09-21T21:11:42Z

Hi! thanks for your contribution!, great first issue!

SkafteNicki · 2023-09-25T15:06:07Z

Hi @gavrieladler, thanks for raising this issue.
I really think it is a good idea that you proposing as it is a bit silly that the same network is initialized multiple times and the activations are neither reused. Reusing internal computations is already part of MetricCollection but not in this way.

The current solution that I came up with is using a caching system and creating a specialized MetricCollection. This collection called FeatureShare (naming is work in progress) essentially has two features:

Make sure that a single copy of a network is used across all metrics. This is simply done by passing the network of the first metric in the collection to the rest, essentially overwriting.
Secondly, this shared network gets wrapped in a lru_cache meaning that input-output will be cached. If we set it to the max size of the cache to the number of networks in the collection (or a bit above) it should not impact memory that much (I hope)
and the same computation should be reused when the metrics are updated one after another.

Here is some work in progress code

class NetworkCache(Module):
    def __init__(self, network, max_size=100):
        super().__init__()
        self.max_size = max_size
        self.network = lru_cache(maxsize=self.max_size)(network)

    def forward(self, *args, **kwargs):
        return self.network(*args, **kwargs)


class FeatureShare(MetricCollection):
    def __init__(self,
        metrics: Union[Metric, Sequence[Metric], Dict[str, Metric]],
        *additional_metrics: Metric,
        network_names: Union[str, Sequence[str]],
    ):
        super().__init__(metrics=metrics, *additional_metrics)

        if isinstance(network_names, str):
            network_names = [network_names] * len(self)
        else:
            if len(network_names) != len(self):
                raise ValueError('The number of network names should be equal to the number of metrics.')
        shared_net = getattr(getattr(self, list(self.keys())[0]), network_names[0])
        cached_net = NetworkCache(shared_net)
        for (_, metric), network_name in zip(self.items(), network_names):
            setattr(metric, network_name, cached_net)

So very basic analysis:

from torchmetrics.image import FrechetInceptionDistance, InceptionScore, KernelInceptionDistance
from torchmetrics import MetricCollection

fs = FeatureShare([FrechetInceptionDistance(), InceptionScore(), KernelInceptionDistance()], network_names=['inception', 'inception', 'inception'])
mc = MetricCollection([FrechetInceptionDistance(), InceptionScore(), KernelInceptionDistance()])

import time
start = time.time()
for _ in range(10):
    x = torch.randint(255, (1, 3, 64, 64), dtype=torch.uint8)
    mc.update(x, real=True)
end = time.time()
print(end - start)

start = time.time()
for _ in range(10):
    x = torch.randint(255, (1, 3, 64, 64), dtype=torch.uint8)
    fs.update(x, real=True)
end = time.time()
print(end - start)

give me 5.57 and 1.92 respectively for MetricCollection and FeatureShare meaning that sharing the feature is approximately 2.9x times faster which is very much what I would expect when we have three metric that can reuse the same computation.

What do you think about this?

gavrieladler · 2023-09-26T14:03:22Z

Hi @SkafteNicki,

Thanks for the detailed reply! I think this looks good and is absolutely usable as presented.

One suggestion, and I could go either way: I do think there's an argument to be made that because inception is a class variable name set by the library, the line

fs = FeatureShare([FrechetInceptionDistance(), InceptionScore(), KernelInceptionDistance()], network_names=['inception', 'inception', 'inception'])

may be unnecessary for the user to specify? Torchmetrics only supports a finite number of metrics and is able to "know" that the selected features should share a network. Only requiring the user to specify

mc = MetricCollection([FrechetInceptionDistance(), InceptionScore(), KernelInceptionDistance()])

without having to have detailed knowledge of the inner workings of each metrics is a cleaner API. Especially since inception is an internal name not visible to the API, so asking the user to specify network_names requires the user to dig through the source code to know what the network names are. Requiring the user to specify network_names also creates the risk that these internal class variable names are changed between releases, breaking your code.

The code could automatically, when you create any of FID, IS, and KID, add their inception networks to a cache.

Thoughts? Like I said, the solution presented is definitely usable as is.

gavrieladler added the enhancement New feature or request label Sep 21, 2023

SkafteNicki added this to the v1.3.0 milestone Sep 25, 2023

SkafteNicki mentioned this issue Sep 28, 2023

New wrapper FeatureShare #2120

Merged

4 tasks

stancld closed this as completed in #2120 Dec 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Share Inception v3 network between FID, KID, Inception Score, and MiFID #2099

Share Inception v3 network between FID, KID, Inception Score, and MiFID #2099

gavrieladler commented Sep 21, 2023 •

edited

Loading

github-actions bot commented Sep 21, 2023

SkafteNicki commented Sep 25, 2023

gavrieladler commented Sep 26, 2023 •

edited

Loading

Share Inception v3 network between FID, KID, Inception Score, and MiFID #2099

Share Inception v3 network between FID, KID, Inception Score, and MiFID #2099

Comments

gavrieladler commented Sep 21, 2023 • edited Loading

🚀 Feature

Motivation

Pitch

Alternatives

Additional context

github-actions bot commented Sep 21, 2023

SkafteNicki commented Sep 25, 2023

gavrieladler commented Sep 26, 2023 • edited Loading

gavrieladler commented Sep 21, 2023 •

edited

Loading

gavrieladler commented Sep 26, 2023 •

edited

Loading