You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using code from 1.2.x and 1.3.x, up to 100% performance regression occurs during inference.
The performance degrades in subsequent calls to model.encode; M3Embedder.encode_single_device is 2x slower than the original 1.2.x code.
One obvious suggestion is to remove the following from the encode_single_device function:
self.model.to(device)
self.model.eval()
The second observation is that
self.model(...
is now invoked at least two times instead of once just to adjust batch size on Error?
best,
Nikola
The text was updated successfully, but these errors were encountered:
Using code from 1.2.x and 1.3.x, up to 100% performance regression occurs during inference.
The performance degrades in subsequent calls to
model.encode
;M3Embedder.encode_single_device
is 2x slower than the original 1.2.x code.One obvious suggestion is to remove the following from the
encode_single_device
function:The second observation is that
is now invoked at least two times instead of once just to adjust batch size on Error?
best,
Nikola
The text was updated successfully, but these errors were encountered: