Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Add RMSEnergyExtractor [audio feature extraction]? #698

Open
Mixomo opened this issue Sep 11, 2024 · 3 comments
Open

[Feature]: Add RMSEnergyExtractor [audio feature extraction]? #698

Mixomo opened this issue Sep 11, 2024 · 3 comments
Labels
enhancement New feature or request feature

Comments

@Mixomo
Copy link

Mixomo commented Sep 11, 2024

Description

What Does RMSEnergyExtractor Do?

Calculates RMS Energy:

RMS energy is a measure of the power of an audio signal. It is computed as the square root of the average of the squared amplitudes of the signal over a specific time interval. This metric is useful because it provides an estimate of the "strength" or "intensity" of the sound, which can help in tasks such as volume normalization, sound event detection, or as a feature for speech synthesis and recognition models.

Usage in Model Training Context:

The RMSEnergyExtractor is used as part of the model's preprocessing pipeline. It specifically extracts energy features from the audio signal that are later used as input or auxiliary features for training purposes.

Problem

Why is RMS Energy Useful in Voice-Related Model Training?

Volume Normalization: RMS energy allows the model to differentiate between audio segments with different energy or volume levels, which is crucial for generating natural and accurate voice synthesis.

Speech Feature Detection: It helps identify parts of the audio with vocal activity or silence, providing an additional signal that can improve model quality.

Proposed Solution

In the early days of RVC about a year ago, I suggested in the original repository the implementation of this feature, since I had been experimenting before with other previous SVC programs and one of them integrated this function, which made the output much more expressive and natural, effectively capturing the differences in timbre between soft and loud passages that were presented in the trained dataset.
However, some time later they replied that they had added something similar but as a post processing effect, (what we know today in the interface as the input/output slider) which obviously did not have the same effect since it did not use data trained on the timbre variation and the singer's expressiveness present in the dataset.

Here I mention the details (among other things that are no longer important):

RVC-Project/Retrieval-based-Voice-Conversion-WebUI#169 (comment)

I don't know if it will be feasible, or if the code needs a lot of tweaking, but I'm just leaving it written here. Maybe it can be implemented in future versions of Applio, not necessarily in the short term.

Thanks for reading :)

Alternatives Considered

I don't know any others alternatives at the moment.

@Mixomo Mixomo added enhancement New feature or request feature labels Sep 11, 2024
@embis0126
Copy link

This sounds like a great idea.

@blaisewf
Copy link
Member

Description

What Does RMSEnergyExtractor Do?

Calculates RMS Energy:

RMS energy is a measure of the power of an audio signal. It is computed as the square root of the average of the squared amplitudes of the signal over a specific time interval. This metric is useful because it provides an estimate of the "strength" or "intensity" of the sound, which can help in tasks such as volume normalization, sound event detection, or as a feature for speech synthesis and recognition models.

Usage in Model Training Context:

The RMSEnergyExtractor is used as part of the model's preprocessing pipeline. It specifically extracts energy features from the audio signal that are later used as input or auxiliary features for training purposes.

Problem

Why is RMS Energy Useful in Voice-Related Model Training?

Volume Normalization: RMS energy allows the model to differentiate between audio segments with different energy or volume levels, which is crucial for generating natural and accurate voice synthesis.

Speech Feature Detection: It helps identify parts of the audio with vocal activity or silence, providing an additional signal that can improve model quality.

Proposed Solution

In the early days of RVC about a year ago, I suggested in the original repository the implementation of this feature, since I had been experimenting before with other previous SVC programs and one of them integrated this function, which made the output much more expressive and natural, effectively capturing the differences in timbre between soft and loud passages that were presented in the trained dataset. However, some time later they replied that they had added something similar but as a post processing effect, (what we know today in the interface as the input/output slider) which obviously did not have the same effect since it did not use data trained on the timbre variation and the singer's expressiveness present in the dataset.

Here I mention the details (among other things that are no longer important):

RVC-Project/Retrieval-based-Voice-Conversion-WebUI#169 (comment)

I don't know if it will be feasible, or if the code needs a lot of tweaking, but I'm just leaving it written here. Maybe it can be implemented in future versions of Applio, not necessarily in the short term.

Thanks for reading :)

Alternatives Considered

I don't know any others alternatives at the moment.

your idea seems nice, but do you have any draft of this implemented?

@Mixomo
Copy link
Author

Mixomo commented Sep 30, 2024

Description

What Does RMSEnergyExtractor Do?
Calculates RMS Energy:
RMS energy is a measure of the power of an audio signal. It is computed as the square root of the average of the squared amplitudes of the signal over a specific time interval. This metric is useful because it provides an estimate of the "strength" or "intensity" of the sound, which can help in tasks such as volume normalization, sound event detection, or as a feature for speech synthesis and recognition models.
Usage in Model Training Context:
The RMSEnergyExtractor is used as part of the model's preprocessing pipeline. It specifically extracts energy features from the audio signal that are later used as input or auxiliary features for training purposes.

Problem

Why is RMS Energy Useful in Voice-Related Model Training?
Volume Normalization: RMS energy allows the model to differentiate between audio segments with different energy or volume levels, which is crucial for generating natural and accurate voice synthesis.
Speech Feature Detection: It helps identify parts of the audio with vocal activity or silence, providing an additional signal that can improve model quality.

Proposed Solution

In the early days of RVC about a year ago, I suggested in the original repository the implementation of this feature, since I had been experimenting before with other previous SVC programs and one of them integrated this function, which made the output much more expressive and natural, effectively capturing the differences in timbre between soft and loud passages that were presented in the trained dataset. However, some time later they replied that they had added something similar but as a post processing effect, (what we know today in the interface as the input/output slider) which obviously did not have the same effect since it did not use data trained on the timbre variation and the singer's expressiveness present in the dataset.
Here I mention the details (among other things that are no longer important):
RVC-Project/Retrieval-based-Voice-Conversion-WebUI#169 (comment)
I don't know if it will be feasible, or if the code needs a lot of tweaking, but I'm just leaving it written here. Maybe it can be implemented in future versions of Applio, not necessarily in the short term.
Thanks for reading :)

Alternatives Considered

I don't know any others alternatives at the moment.

your idea seems nice, but do you have any draft of this implemented?

Unfortunately, these are the only references I can provide. It is always treated as a feature extractor.

https://github.com/search?q=RMSEnergyExtractor+NOT+language%3AHTML+NOT+language%3AreStructuredText+NOT+language%3AYAML&type=code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request feature
Projects
None yet
Development

No branches or pull requests

3 participants