Performance for BGE-M3 inference dropped between 1.2.x and 1.3.x #1308

ivlcic · 2024-12-27T16:34:55Z

Using code from 1.2.x and 1.3.x, up to 100% performance regression occurs during inference.
The performance degrades in subsequent calls to model.encode; M3Embedder.encode_single_device is 2x slower than the original 1.2.x code.

One obvious suggestion is to remove the following from the encode_single_device function:

self.model.to(device)
self.model.eval()

The second observation is that

self.model(...

is now invoked at least two times instead of once just to adjust batch size on Error?

best,
Nikola

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance for BGE-M3 inference dropped between 1.2.x and 1.3.x #1308

Performance for BGE-M3 inference dropped between 1.2.x and 1.3.x #1308

ivlcic commented Dec 27, 2024

Performance for BGE-M3 inference dropped between 1.2.x and 1.3.x #1308

Performance for BGE-M3 inference dropped between 1.2.x and 1.3.x #1308

Comments

ivlcic commented Dec 27, 2024