Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance for BGE-M3 inference dropped between 1.2.x and 1.3.x #1308

Open
ivlcic opened this issue Dec 27, 2024 · 0 comments
Open

Performance for BGE-M3 inference dropped between 1.2.x and 1.3.x #1308

ivlcic opened this issue Dec 27, 2024 · 0 comments

Comments

@ivlcic
Copy link

ivlcic commented Dec 27, 2024

Using code from 1.2.x and 1.3.x, up to 100% performance regression occurs during inference.
The performance degrades in subsequent calls to model.encode; M3Embedder.encode_single_device is 2x slower than the original 1.2.x code.

One obvious suggestion is to remove the following from the encode_single_device function:

self.model.to(device)
self.model.eval()

The second observation is that

self.model(...

is now invoked at least two times instead of once just to adjust batch size on Error?

best,
Nikola

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant