Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⬆️ Update dependencies #120

Merged
merged 13 commits into from
Nov 29, 2024
Merged

⬆️ Update dependencies #120

merged 13 commits into from
Nov 29, 2024

Conversation

tengomucho
Copy link
Collaborator

@tengomucho tengomucho commented Nov 27, 2024

What does this PR do?

This contains many dependencies updates:

  • Jetstream v0.2.4
  • TGI v2.4.1
  • Transformers v4.46.3
  • Accelerate v1.1.1
  • Pytorch and Pytorch XLA v2.5.1
  • Safetensors v0.4.5

All these to update to newer versions. Required changes (some revealed by tests) were done accordingly.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

A warning appeared on on an old version of torch xla (2.3.0), but that
is not supported anymore.
When building the docker container, sometimes an error occurs due to a
"too many files open" error. Increasing the ulimit makes the error
disappear.
Also align Dockerfile to TGI's one.
The config object variable for these models was used by the Jetstream
code, but it does not completely match with HF's config definitions.
This creates a class that heritates from both classes, and makes the
adjustments necessary to avoid errors.
It is still possible to import it importing modeling, but it will reduce
the possibility of importing transformers and torch xla before xla2.
Conversions of scores tensors from jax to torch and back are done when
calling logits processor. This will be required in newer versions of
transformers.
This is to be coherent with accelerate dependencies, and to update to a
newer version.
@tengomucho tengomucho marked this pull request as ready for review November 28, 2024 16:58
@@ -4,6 +4,20 @@
from transformers import GenerationConfig, GenerationMixin, MixtralConfig


class MixtralConfigHf(MixtralConfig, mixtral_config.ModelArgs):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit brittle, as setting one argument will not update its aliased value. What is the benefit of adding this ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Transformer class from Jetstream is the model class for Mistral, and it has a config variable that used to be an instance of ModelArgs. When I use the model in TGI and want to use the GenerationMixin methods for sampling, it ends up using the config, that it expects it to be a transformers' MixtralConfig instance, and it fails because it is not. This is why I came up with this solution. I could try doing composition instead of heritage to see if I can find a cleaner solution though.

Comment on lines 29 to 31
# args = GemmaConfigHf(**config.to_dict())
# args.device = device
# super().__init__(args, env)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# args = GemmaConfigHf(**config.to_dict())
# args.device = device
# super().__init__(args, env)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This never happened 🧙‍♂️

Instead of assigning separate variables for Jetstream's config class,
properties are added, resulting in accessing the same data and avoiding
ambiguity.
@tengomucho tengomucho merged commit a1919c2 into main Nov 29, 2024
5 checks passed
@tengomucho tengomucho deleted the update-deps branch November 29, 2024 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants