Transfer learning process incorrect #5

mjanonis · 2019-02-26T10:41:09Z

As far as I can tell, during the transfer learning process, you're already trying to make the network classify the videos.

Instead, the 2D CNN should take 32 RGB frames from a video of a certain timestamp and the 3D CNN should take a video clip from the same timestamp. The network then should tell if the 32 frames and the video clip match. That way you can use a huge data set of unlabeled videos, because the label doesn't matter.

It shouldn't be too difficult to fix. The video generator could pick a random video with a random timestamp and extract 32 frames from it. Then it could either feed the same data to the 3D CNN with a label of '1', as the data is from the same video, or it could pick another video and feed that data to the 3D CNN with a label of '0'.

rekon · 2019-02-26T16:36:46Z

Thanks for correction, I have updated the development branch to do a quite similar thing. Just need to do a sanity check. I'll let you know once I check them on proper GPU machines.

mjanonis · 2019-02-27T13:51:18Z

I still don't understand how did the authors feed the 2D CNN 32 images while inputting a video clip to the 3D CNN at the same time.

2D input has the shape (224,224,3), but it has to take 32 images, so the shape then should be (32,224,224,3). The 3D CNN also takes the input shape of (32,224,224,3).

Is there a way to feed the 2D CNN 32 images one by one and only feed the 3D CNN one video clip in a single batch? If that's impossible, you'd have to replace the input layer of the 2D CNN to match the one of the 3D CNN.

rekon · 2019-02-27T18:57:30Z

@MartynasJanonis, I agree with you.
The input shape to 2D CNN is (?, 224, 224, 3) and 3D CNN is (?, 32, 256, 256, 3) where ? denotes batch-size. My initial approach was to make batch-size of 2D CNN 32 and 3D CNN 1. But that didn't work and raised an error:
All input arrays (x) should have same number of samples. Got array of shapes : [(32, 224, 224, 3), (1, 32, 256, 256, 3)]
One way to tackle this is using keras.layers.TimeDistributed on the input-layer of 2D CNN. I have to do further digging on this and then can I come with a solution. If you find any other way, please comment.

Edit: Other people also have same opinion in original repo

mjanonis mentioned this issue Feb 28, 2019

Fixed transfer learning architecture #6

Merged

rekon closed this as completed Nov 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transfer learning process incorrect #5

Transfer learning process incorrect #5

mjanonis commented Feb 26, 2019 •

edited

Loading

rekon commented Feb 26, 2019

mjanonis commented Feb 27, 2019

rekon commented Feb 27, 2019 •

edited

Loading

Transfer learning process incorrect #5

Transfer learning process incorrect #5

Comments

mjanonis commented Feb 26, 2019 • edited Loading

rekon commented Feb 26, 2019

mjanonis commented Feb 27, 2019

rekon commented Feb 27, 2019 • edited Loading

mjanonis commented Feb 26, 2019 •

edited

Loading

rekon commented Feb 27, 2019 •

edited

Loading