Skip to content

Commit

Permalink
Resolved comments 2 PR#44: Missing comments, style changes, others
Browse files Browse the repository at this point in the history
  • Loading branch information
David-OC17 committed Nov 20, 2024
1 parent 55a342d commit d2d7d60
Show file tree
Hide file tree
Showing 3 changed files with 17 additions and 2 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,7 @@ Detailed model cards with more examples: [facebook/blaser-2.0-ref](https://huggi

### Classifying the toxicity of sentences with MuTox

[MuTox](https://github.com/facebookresearch/seamless_communication/tree/main/src/seamless_communication/cli/toxicity/mutox), the first highly multilingual audio-based classifier (binary) and dataset with toxicity labels. The dataset consists of 20k audio utterances for English and Spanish, and 4k for the other 19 languages, and uses the multi-model and multilingual encoders from SONAR. The output of the MuTox classifier is a probability of the evaluated being _"toxic"_, according to the definition adopted in the corresponding dataset.
[MuTox](https://github.com/facebookresearch/seamless_communication/tree/main/src/seamless_communication/cli/toxicity/mutox), the first highly multilingual audio-based classifier (binary) and dataset with toxicity labels. The dataset consists of 20k audio utterances for English and Spanish, and 4k for the other 19 languages, and uses the multi-model and multilingual encoders from SONAR. The output of the MuTox classifier is a logit of the evaluated being _"toxic"_, according to the definition adopted in the corresponding dataset.

```Python
from sonar.models.mutox.loader import load_mutox_model
Expand Down Expand Up @@ -175,7 +175,7 @@ with torch.inference_mode():
x = classifier(emb.to(device).to(dtype)) # tensor([[-19.7812]], device='cuda:0', dtype=torch.float16)

with torch.inference_mode():
emb = t2vec_model.predict(["She worked hard and made a significant contribution to the team."], source_lang='fra_Latn')
emb = t2vec_model.predict(["She worked hard and made a significant contribution to the team."], source_lang='eng_Latn')
x = classifier(emb.to(device).to(dtype)) # tensor([[-58.0625]], device='cuda:0', dtype=torch.float16)

with torch.inference_mode():
Expand Down
8 changes: 8 additions & 0 deletions sonar/cards/sonar_mutox.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,14 @@
# This source code is licensed under the BSD-style license found in the
# LICENSE file in the root directory of this source tree.

"""
This card is a duplicate of the original found at
[Facebook Research's Seamless Communication repository]
(https://github.com/facebookresearch/seamless_communication/blob/main/src/seamless_communication/cards/mutox.yaml).
It is included here to prevent circular dependencies between the Seamless Communication
repository and this project.
"""

name: sonar_mutox
model_type: mutox_classifier
model_arch: mutox
Expand Down
7 changes: 7 additions & 0 deletions sonar/inference_pipelines/mutox_speech.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,13 @@ def __init__(
self.model.to(device).eval()
self.mutox_classifier = mutox_classifier.to(device).eval()

if isinstance(mutox_classifier, str):
self.mutox_classifier = load_mutox_model(mutox_classifier, device=device,)
else:
self.mutox_classifier = mutox_classifier

self.mutox_classifier.to(device).eval()

@classmethod
def load_model_from_name(
cls,
Expand Down

0 comments on commit d2d7d60

Please sign in to comment.