Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bpnet Added Channels Leads to Profile Predictions of 0 #32

Open
RachitSharma2001 opened this issue Aug 9, 2021 · 0 comments
Open

Bpnet Added Channels Leads to Profile Predictions of 0 #32

RachitSharma2001 opened this issue Aug 9, 2021 · 0 comments

Comments

@RachitSharma2001
Copy link

RachitSharma2001 commented Aug 9, 2021

Hi,

I am running Bpnet with 3 added input channels (in addition to the one hot input). These channels are the following bigwig files: NT2 RNA Seq rep 1, NT2 RNA Seq rep2, and NT2 Methyl-C Seq. In addition to adding these input channels, the only other change I made is the batchnorming of these channels (code I provide below). The three added channels are not sparse.

After running Bpnet with these changes, the training results are completely normal, showing a drop in both training and validation loss. But when I then extract the model from the generated seqmodel.pkl file in the result directory, and check to see its predictions on particular inputs (selected from the training set), I notice that all of Bpnets profile predictions contain 0's in its output, as shown in one example output below:


{'rep1/profile': array([[[4.6234442e-41, 1.6620867e-31],
         [1.2753405e-38, 1.9987589e-30],
         [1.3018063e-41, 2.2856440e-29],
         ...,
 
        **[[0.0000000e+00, 0.0000000e+00],
         [0.0000000e+00, 0.0000000e+00],
         [0.0000000e+00, 0.0000000e+00],
         ...,
         [0.0000000e+00, 0.0000000e+00],
         [0.0000000e+00, 0.0000000e+00],
         [0.0000000e+00, 0.0000000e+00]]]**, dtype=float32),

This doesn't make sense, as a prediction of 0 should lead to -infinite profile loss, yet during training the profile loss was not -inf and was showing general improvement. Is there a reason that those three added input channels, from that particular experiment that generated them, caused this issue?

I've checked to see if there is any divide by zero or Nan involved during batchnorm, and found no such occurrence. I've tested bpnet without using batchnorm of its inputs, and I still see the same thing. I've also tested on other added channels (from files not from the experiment that generated the three troublesome files), and Bpnet gives expected predictions after training.

Here is the added batch norm code:

# Function to normalize specific column in a minibatch
    def normalize_column(self, np_row, col):
        row = np_row[:,:,col]
        mean = np.mean(row, axis=0)
        var = np.var(row, axis=0)
        row = np.subtract(row, mean)
        return np.divide(row, np.sqrt(var) + 1e-6)
# Function to normalize minibatch
    def normalize(self, data):
        # Turn data to numpy array
        np_row = np.array(data["seq"])

        # Normalize each specified column
        for i in range(self.batchnorm_begin, self.batchnorm_end+1):
            data["seq"][:,:,i] = self.normalize_column(np_row, i)

        return data
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant