Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump sentence-transformers from 2.3.1 to 2.4.0 #5

Closed
wants to merge 1 commit into from

Conversation

dependabot[bot]
Copy link
Contributor

@dependabot dependabot bot commented on behalf of github Feb 25, 2024

Bumps sentence-transformers from 2.3.1 to 2.4.0.

Release notes

Sourced from sentence-transformers's releases.

v2.4.0 - Matryoshka models, SOTA loss functions, prompt templates, INSTRUCTOR support

This release introduces numerous notable features that are well worth learning about!

Install this version with

pip install sentence-transformers==2.4.0

MatryoshkaLoss (#2485)

Dense embedding models typically produce embeddings with a fixed size, such as 768 or 1024. All further computations (clustering, classification, semantic search, retrieval, reranking, etc.) must then be done on these full embeddings. Matryoshka Representation Learning revisits this idea, and proposes a solution to train embedding models whose embeddings are still useful after truncation to much smaller sizes. This allows for considerably faster (bulk) processing.

Training

Training using Matryoshka Representation Learning (MRL) is quite elementary: rather than applying some loss function on only the full-size embeddings, we also apply that same loss function on truncated portions of the embeddings. For example, if a model has an embedding dimension of 768 by default, it can now be trained on 768, 512, 256, 128, 64 and 32. Each of these losses will be added together, optionally with some weight:

from sentence_transformers import SentenceTransformer
from sentence_transformers.losses import CoSENTLoss, MatryoshkaLoss
model = SentenceTransformer("microsoft/mpnet-base")
base_loss = CoSENTLoss(model=model)
loss = MatryoshkaLoss(model=model, loss=base_loss, matryoshka_dims=[768, 512, 256, 128, 64])

  • Reference: MatryoshkaLoss

Inference

After a model has been trained using a Matryoshka loss, you can then run inference with it using SentenceTransformers.encode. You must then truncate the resulting embeddings, and it is recommended to renormalize the embeddings.

from sentence_transformers import SentenceTransformer
from sentence_transformers.util import cos_sim
import torch.nn.functional as F
model = SentenceTransformer("nomic-ai/nomic-embed-text-v1.5", trust_remote_code=True)
matryoshka_dim = 64
embeddings = model.encode(
[
"search_query: What is TSNE?",
"search_document: t-distributed stochastic neighbor embedding (t-SNE) is a statistical method for visualizing high-dimensional data by giving each datapoint a location in a two or three-dimensional map.",
"search_document: Amelia Mary Earhart was an American aviation pioneer and writer.",
]
)
embeddings = embeddings[..., :matryoshka_dim]  # Shrink the embedding dimensions
similarities = cos_sim(embeddings[0], embeddings[1:])
</tr></table>

... (truncated)

Commits
  • 9032631 Release v2.4.0
  • 578285d [docs] Address some small docs mistakes (#2498)
  • dbc0f16 Add F1 score evaluator for CrossEncoder. (#2493)
  • 579257a [feat] Allow saving a model to the Hub without providing a user + Upload Ma...
  • 5b24356 Move loss overview to "main" documentation (#2496)
  • 38383d5 [feat] Add prompt templates (#2477)
  • 3fc8da2 [feat] Add Matryoshka loss + examples + docs (#2485)
  • 20056c6 Ensure dtype consistency in Pooling forward method (#2492)
  • ecdda29 Slight improvements to docs phrasing (#2486)
  • 1eec036 [ci] On Ubuntu CI runner, use temporary directories as cache folders for so...
  • Additional commits viewable in compare view

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot dependabot bot added the dependencies Pull requests that update a dependency file label Feb 25, 2024
@dependabot dependabot bot force-pushed the dependabot/pip/sentence-transformers-2.4.0 branch from 9092d8e to b12920b Compare February 26, 2024 05:39
Bumps [sentence-transformers](https://github.com/UKPLab/sentence-transformers) from 2.3.1 to 2.4.0.
- [Release notes](https://github.com/UKPLab/sentence-transformers/releases)
- [Commits](UKPLab/sentence-transformers@v2.3.1...v2.4.0)

---
updated-dependencies:
- dependency-name: sentence-transformers
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
@dependabot dependabot bot force-pushed the dependabot/pip/sentence-transformers-2.4.0 branch from b12920b to 8a511c7 Compare February 26, 2024 05:42
Copy link
Contributor Author

dependabot bot commented on behalf of github Feb 26, 2024

Looks like sentence-transformers is no longer a dependency, so this is no longer needed.

@dependabot dependabot bot closed this Feb 26, 2024
@dependabot dependabot bot deleted the dependabot/pip/sentence-transformers-2.4.0 branch February 26, 2024 05:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants