Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transfer learning process incorrect #5

Closed
mjanonis opened this issue Feb 26, 2019 · 3 comments
Closed

Transfer learning process incorrect #5

mjanonis opened this issue Feb 26, 2019 · 3 comments

Comments

@mjanonis
Copy link

mjanonis commented Feb 26, 2019

As far as I can tell, during the transfer learning process, you're already trying to make the network classify the videos.

Instead, the 2D CNN should take 32 RGB frames from a video of a certain timestamp and the 3D CNN should take a video clip from the same timestamp. The network then should tell if the 32 frames and the video clip match. That way you can use a huge data set of unlabeled videos, because the label doesn't matter.

It shouldn't be too difficult to fix. The video generator could pick a random video with a random timestamp and extract 32 frames from it. Then it could either feed the same data to the 3D CNN with a label of '1', as the data is from the same video, or it could pick another video and feed that data to the 3D CNN with a label of '0'.

@rekon
Copy link
Owner

rekon commented Feb 26, 2019

Thanks for correction, I have updated the development branch to do a quite similar thing. Just need to do a sanity check. I'll let you know once I check them on proper GPU machines.

@mjanonis
Copy link
Author

I still don't understand how did the authors feed the 2D CNN 32 images while inputting a video clip to the 3D CNN at the same time.

2D input has the shape (224,224,3), but it has to take 32 images, so the shape then should be (32,224,224,3). The 3D CNN also takes the input shape of (32,224,224,3).

Is there a way to feed the 2D CNN 32 images one by one and only feed the 3D CNN one video clip in a single batch? If that's impossible, you'd have to replace the input layer of the 2D CNN to match the one of the 3D CNN.

@rekon
Copy link
Owner

rekon commented Feb 27, 2019

@MartynasJanonis, I agree with you.
The input shape to 2D CNN is (?, 224, 224, 3) and 3D CNN is (?, 32, 256, 256, 3) where ? denotes batch-size. My initial approach was to make batch-size of 2D CNN 32 and 3D CNN 1. But that didn't work and raised an error:
All input arrays (x) should have same number of samples. Got array of shapes : [(32, 224, 224, 3), (1, 32, 256, 256, 3)]
One way to tackle this is using keras.layers.TimeDistributed on the input-layer of 2D CNN. I have to do further digging on this and then can I come with a solution. If you find any other way, please comment.

Edit: Other people also have same opinion in original repo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants