Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NotImplementedError: zero-inflated poisson HMM #1107

Closed
KunFang93 opened this issue Jun 20, 2024 · 4 comments
Closed

NotImplementedError: zero-inflated poisson HMM #1107

KunFang93 opened this issue Jun 20, 2024 · 4 comments

Comments

@KunFang93
Copy link

KunFang93 commented Jun 20, 2024

Hi Jacob,

I wondered if it is possible to implement zero-inflated poisson HMM in currently version of pomegranate? I saw an old unsolved issue with NotImplementedError. I tried

import numpy as np
import pandas as pd

def zero_inflated_poisson(size, lam=3, zero_prob=0.5):
    poisson_part = np.random.poisson(lam, size)
    zero_inflation = np.random.binomial(1, zero_prob, size)
    return poisson_part * zero_inflation

num_observations = 1000
num_features = 11
zero_prob = 0.5

data = np.array([zero_inflated_poisson(num_observations, zero_prob=zero_prob) for _ in range(num_features)]).T
data = data.reshape(1,1000,11)
model = DenseHMM([ZeroInflated(Poisson())]*20, verbose=True, random_state=34)
model.fit(data) 

but got same error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/kfang/.conda/envs/mamba/envs/pomegranate/lib/python3.12/site-packages/pomegranate/hmm/_base.py", line 604, in fit
    logp += self.summarize(X_, sample_weight=w_, priors=p_).sum()
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kfang/.conda/envs/mamba/envs/pomegranate/lib/python3.12/site-packages/pomegranate/hmm/dense_hmm.py", line 543, in summarize
    X, emissions, sample_weight = super().summarize(X,
                                  ^^^^^^^^^^^^^^^^^^^^
  File "/home/kfang/.conda/envs/mamba/envs/pomegranate/lib/python3.12/site-packages/pomegranate/hmm/_base.py", line 681, in summarize
    emissions = _check_inputs(self, X, emissions, priors)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kfang/.conda/envs/mamba/envs/pomegranate/lib/python3.12/site-packages/pomegranate/hmm/_base.py", line 28, in _check_inputs
    emissions = model._emission_matrix(X, priors=priors)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kfang/.conda/envs/mamba/envs/pomegranate/lib/python3.12/site-packages/pomegranate/hmm/_base.py", line 287, in _emission_matrix
    logp = node.log_probability(X)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kfang/.conda/envs/mamba/envs/pomegranate/lib/python3.12/site-packages/pomegranate/distributions/_distribution.py", line 64, in log_probability
    raise NotImplementedError
NotImplementedError

Thanks for your time and help!

Best,
Kun

@jmschrei
Copy link
Owner

Sorry for the delay in responding to these issues. The answer is that conceptually it's very simple to add this in but that, implementation-wise, I struggled a little bit because the zero-inflated wrapper needs access to the per-feature log probabilities but my current implementation only gives the total log probability across all features. I think I need to just implement specific ZIP and ZINB distributions, as those are the most widely used anyway. I don't know when I'd get a chance to do that, though, so I'd recommend you try implementing them yourself based on the Poisson and ZeroInflated wrapper I provided.

@KunFang93
Copy link
Author

Not a problem at all and thank you for the reply! I wondered if you could shed some lights on how to implement by the poisson and zero inflated wrapper? Is something like

states = []
for i in range(20): # assume 20 number of stae
    cur_d = ZeroInflated(Poisson())
    cur_d.fit(data)
    state.append(cur_d)
model = DenseHMM(states)
model.fit(data)

Any suggestion is greatly appreciated! Thank you so much for your time and help again~

Best,
Kun

@jmschrei
Copy link
Owner

You'd need to create a new distribution object that applies the formula in the zero inflated wrapper to the log probabilities BEFORE they are summed. The current objects only work if you only have one dimension.

@KunFang93
Copy link
Author

Got it. I will delve into it. Thanks for your instruction!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants