Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow multiple inputs #74

Open
schruefer opened this issue Nov 15, 2023 · 5 comments
Open

Allow multiple inputs #74

schruefer opened this issue Nov 15, 2023 · 5 comments
Labels
enhancement New feature or request

Comments

@schruefer
Copy link
Member

Currently audonnx.Model only takes a signal as input.
https://github.com/audeering/audonnx/blob/main/audonnx/core/model.py#L198-L209

For some models, however, it might be necessary to provide additional information.
E.g. in the case of an LSTM model, we would like to provide the last hidden state and the memory state as additional input.

This is currently impossible as all additional inputs are derived directly from the signal.

It would be great if we could provide an option to enable additional model inputs along side the signal.

@schruefer schruefer added the enhancement New feature or request label Nov 15, 2023
@hagenw
Copy link
Member

hagenw commented Nov 15, 2023

It's also an interesting question from the interface design perspective.
E.g. we could add additional keyword args for each input node that does not take the signal as input:

model(signal, sampling_rate, input_1='high')

The problem might be how to map the actual name of the input to the variable name as we cannot use stuff like input-with-fancy-name' as variable name.

Another solution would be that you provide a dictionary instead of the signal, then we can directly use the input names, e.g.

model({'signal': signal, 'state': 0.34}, sampling_rate)

Other ideas?

@hagenw
Copy link
Member

hagenw commented Nov 15, 2023

Or a combination of the two solutions and we provide a dict for all inputs that don't get the signal:

model(signal, sampling_rate, inputs={'state': 0.34})

@schruefer
Copy link
Member Author

I would be in favor of the second suggestion, with the dictionary as input.
If one would use onnxruntime directly, without audonnx one would use:

model_path = audeer.path(model_root, 'model.onnx')
onnx_model = ort.InferenceSession(model_path, providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])

output = self.onnx_model.run(['output_1', 'output_2'], {'signal': signal, 'additional_input': additional_input})

Which is already rather close to the dict approach.
I also think that this approach is the most future-proof, in the case at some point we don't even necessarily need a signal.

@hagenw
Copy link
Member

hagenw commented Nov 16, 2023

I'm slightly more in favor of:

model(signal, sampling_rate, inputs={'state': 0.34})

but I see your point that it does not cover the case, where we don't have an input signal.

So maybe we go indeed with

model({'signal': signal, 'state': 0.34}, sampling_rate)

In the implementation we would provide a default for sampling rate, e.g. sampling_rate=None so a user don't have to provide it if no input nodes needs it. I would set to None instead of a default 16000 as then we can raise an error if sampling rate should have been provided but wasn't.
For signal we could then support an array as input as we do now, or when a dict is provided it must contain the inputs for each input node.
Also this approach has a slight downside: we cannot support different sampling rates for different input nodes, but maybe this is ok?

@schruefer
Copy link
Member Author

I am also not fixed on my suggestion.

I'm slightly more in favor of:
model(signal, sampling_rate, inputs={'state': 0.34})
but I see your point that it does not cover the case, where we don't have an input signal.

I guess it's closer to our current solution and as we are focused on audio I also can't think of a case, where we would not start from a signal or something like a spectrogram derived from a signal.

So maybe we go indeed with
model({'signal': signal, 'state': 0.34}, sampling_rate)
In the implementation we would provide a default for sampling rate, e.g. sampling_rate=None so a user don't have to provide it if no input nodes needs it. I would set to None instead of a default 16000 as then we can raise an error if sampling rate should have been provided but wasn't.
For signal we could then support an array as input as we do now, or when a dict is provided it must contain the inputs for each input node.

Sounds great!

Also this approach has a slight downside: we cannot support different sampling rates for different input nodes, but maybe this is ok?

I also can't think of a case, where 2 different signals with 2 different sampling rates would be processed, by the same model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants