You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to fit a Gaussian mixture model to a dataset using weights for each sample. My weights have a large distribution, ranging from 1 to ~1e9. The weights were calculated from a smooth function of the raw data, so there should be no abrupt changes in the weight for nearby values of the data. But the function is exponential, so the multiplicative range is unavoidably large.
If I run the fit without the weights, everything works as expected and I get a good fit to the distribution of the raw data. However, when the weights are included, I get an immediate negative improvement on the first iteration (or very nearly the first iteration, with the negative improvement after a small number of steps being larger than the total positive improvement) and the fit ends. The initialization using Kmeans works fine, and seems to effectively incorporate the weights (I know quite a bit about what the end result should look like in this case so I can confirm that the weighted results from Kmeans are close to the correct result).
I tried rescaling the weights to [0,1] but this did not improve matters. I have also tried turning up the inertia as high as I can to reduce the step size, but this seems to have no effect.
I am using the latest version of Pomegranate on Windows 10.
I'm not quite sure how to provide a reproducible example as the problem seems data-specific, except that I think it's the large number of of orders of magnitude spanned by the weights that causes the problem. Artificially rescaling the weights to span a smaller multiplicative range alleviates the issue, but of course completely ruins the fit I am trying to achieve.
The fact that Kmeans seems to incorporate these weights just fine gives me some hope that there is a potential resolution to this issue.
The text was updated successfully, but these errors were encountered:
Howdy. Sorry for the delay in getting back to you. Unfortunately, without an example it can be difficult for me to look deeper into the issue. I imagine that there is an overflow happening somewhere. What sorts of ranges did you find provided reasonable results?
I am trying to fit a Gaussian mixture model to a dataset using weights for each sample. My weights have a large distribution, ranging from 1 to ~1e9. The weights were calculated from a smooth function of the raw data, so there should be no abrupt changes in the weight for nearby values of the data. But the function is exponential, so the multiplicative range is unavoidably large.
If I run the fit without the weights, everything works as expected and I get a good fit to the distribution of the raw data. However, when the weights are included, I get an immediate negative improvement on the first iteration (or very nearly the first iteration, with the negative improvement after a small number of steps being larger than the total positive improvement) and the fit ends. The initialization using Kmeans works fine, and seems to effectively incorporate the weights (I know quite a bit about what the end result should look like in this case so I can confirm that the weighted results from Kmeans are close to the correct result).
I tried rescaling the weights to [0,1] but this did not improve matters. I have also tried turning up the inertia as high as I can to reduce the step size, but this seems to have no effect.
I am using the latest version of Pomegranate on Windows 10.
I'm not quite sure how to provide a reproducible example as the problem seems data-specific, except that I think it's the large number of of orders of magnitude spanned by the weights that causes the problem. Artificially rescaling the weights to span a smaller multiplicative range alleviates the issue, but of course completely ruins the fit I am trying to achieve.
The fact that Kmeans seems to incorporate these weights just fine gives me some hope that there is a potential resolution to this issue.
The text was updated successfully, but these errors were encountered: