[WIP] [V1] VLM hashing and mapper caching #10868

alexm-neuralmagic · 2024-12-03T17:01:08Z

This PR is the first step towards caching of VLM images and skipping of the encoder execution. Currently it adds:

Hashing support for images
Simple LRU cache based on OrderedDict
Caching of MM mapper results

TODO:

Integrate with Ricky's KVCacheManager refactor
Skip the encoder execution for cached embeds

github-actions · 2024-12-03T17:01:25Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

comaniac · 2024-12-04T00:06:07Z

@alexm-neuralmagic is it possible to break this PR to 2 PRs so that we could merge the mm_hash sooner to unblock the embedding cache?

The current PR seems already has this functionality, so if you could add unit tests we should be good.

Also cc @yue-anyscale

alexm-neuralmagic · 2024-12-05T01:18:19Z

@comaniac yeah will try to break the PR to 2 pieces.

alexm-neuralmagic · 2024-12-09T13:57:56Z

Depends on #11020

alexm-neuralmagic · 2024-12-10T18:30:16Z

Not necessary anymore, replaced by #11020

[V1] VLM hashing and mapper caching

db28436

alexm-neuralmagic requested review from WoosukKwon, robertgshaw2-neuralmagic, njhill, ywang96 and comaniac as code owners December 3, 2024 17:01

alexm-neuralmagic self-assigned this Dec 3, 2024

alexm-neuralmagic mentioned this pull request Dec 3, 2024

[V1] VLM prefix caching: Add hashing of images #10497

Draft

alexm-neuralmagic marked this pull request as draft December 3, 2024 17:10

sync

4a5aecb

encoder cache

99b267c

alexm-neuralmagic closed this Dec 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] [V1] VLM hashing and mapper caching #10868

[WIP] [V1] VLM hashing and mapper caching #10868

alexm-neuralmagic commented Dec 3, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Dec 3, 2024

comaniac commented Dec 4, 2024 •

edited

Loading

alexm-neuralmagic commented Dec 5, 2024

alexm-neuralmagic commented Dec 9, 2024

alexm-neuralmagic commented Dec 10, 2024

[WIP] [V1] VLM hashing and mapper caching #10868

[WIP] [V1] VLM hashing and mapper caching #10868

Conversation

alexm-neuralmagic commented Dec 3, 2024 • edited by github-actions bot Loading

github-actions bot commented Dec 3, 2024

comaniac commented Dec 4, 2024 • edited Loading

alexm-neuralmagic commented Dec 5, 2024

alexm-neuralmagic commented Dec 9, 2024

alexm-neuralmagic commented Dec 10, 2024

alexm-neuralmagic commented Dec 3, 2024 •

edited by github-actions bot

Loading

comaniac commented Dec 4, 2024 •

edited

Loading