-
Notifications
You must be signed in to change notification settings - Fork 625
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Please provide an option to not depend on downloading model data #337
Comments
In the model tarball downloaded from git, the models include all float weights (e.g. for debugging), whereas for the Opus release tarball, the float data was removed to make the release as small as possible. |
What do you mean by downloading from git? The script is downloading from here https://media.xiph.org/opus/models/ Also it makes really hard to know which model tarball to update to, because the expected tarball commit is in the script only. If you are building in an environment without network access you need to know which filename to download beforehand.
Is there some kind of docs on how to do this? Also knowing the source of the model data would be great too. How is the model tarball produced? Thanks |
You can run scripts/shrink_model.sh to remove the debug float data from the extracted models. From there, you can just run "make dist" to build a tarball that has just what's needed. |
I also ran into this issue. Could you add the shrunken model data to the opus git repository, please, so that they're covered by your signed tags and opus can be built without any further downloads? The model tarballs are not an option as they're entirely unverified (and also very large). |
Maybe you can also provide an option to disable this new feature. |
The newer models are just too big for git. That being said, the main branch has been updated to verify them so they cannot be compromised on the download server. |
Looks like the process is now more or less documented. Would it be possible to still have this checksum a separate file so it wouldn't need to be parsed from a shell script? |
I'm not against, but I don't have the skills to update the Windows version of the script. @mklingb any thoughts on that? |
The batch file would probably only turn simpler by having the checksum in a separate file rather than having it parse the shell script. |
We build opus from git source and we'd like to not switch to tarballs or download the model tarball. Given the recent xz situation downloading tarballs with no trace on git is complicated and seems like a security issue.
Having an option to stop depending on that seems like it'd be great.
With the current meson configuration:
meson seems to fail on
meson.build:636:24: ERROR: File dnn/fargan_data.h does not exist.
and is not even aware of the download model script living in autogen.Whereas the download model script is passing an unknown commit https://github.com/xiph/opus/blob/ab4e83598e7fc8b2ce82dc633a0fc0c452b629aa/autogen.sh#L12C24-L12C31
What is the source of that commit? Can this model data be found on git somewhere? Can this be implemented as a git submodule?
Looking through the git history seems at some point there was a submodule, which was reverted in favour of downloading the model tarball.
Thanks!
The text was updated successfully, but these errors were encountered: