-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
⬆️ Update dependencies #120
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
fd862c7
to
d8d290a
Compare
A warning appeared on on an old version of torch xla (2.3.0), but that is not supported anymore.
When building the docker container, sometimes an error occurs due to a "too many files open" error. Increasing the ulimit makes the error disappear.
Also align Dockerfile to TGI's one.
d8d290a
to
6bf378e
Compare
The config object variable for these models was used by the Jetstream code, but it does not completely match with HF's config definitions. This creates a class that heritates from both classes, and makes the adjustments necessary to avoid errors.
It is still possible to import it importing modeling, but it will reduce the possibility of importing transformers and torch xla before xla2.
Conversions of scores tensors from jax to torch and back are done when calling logits processor. This will be required in newer versions of transformers.
This is to be coherent with accelerate dependencies, and to update to a newer version.
6bf378e
to
f063f12
Compare
@@ -4,6 +4,20 @@ | |||
from transformers import GenerationConfig, GenerationMixin, MixtralConfig | |||
|
|||
|
|||
class MixtralConfigHf(MixtralConfig, mixtral_config.ModelArgs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit brittle, as setting one argument will not update its aliased value. What is the benefit of adding this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Transformer
class from Jetstream is the model class for Mistral, and it has a config variable that used to be an instance of ModelArgs
. When I use the model in TGI and want to use the GenerationMixin
methods for sampling, it ends up using the config, that it expects it to be a transformers' MixtralConfig
instance, and it fails because it is not. This is why I came up with this solution. I could try doing composition instead of heritage to see if I can find a cleaner solution though.
# args = GemmaConfigHf(**config.to_dict()) | ||
# args.device = device | ||
# super().__init__(args, env) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# args = GemmaConfigHf(**config.to_dict()) | |
# args.device = device | |
# super().__init__(args, env) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This never happened 🧙♂️
Instead of assigning separate variables for Jetstream's config class, properties are added, resulting in accessing the same data and avoiding ambiguity.
9e1fead
to
54183d7
Compare
What does this PR do?
This contains many dependencies updates:
All these to update to newer versions. Required changes (some revealed by tests) were done accordingly.