Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changed video processing section (Unit 7 - CNN Based Video Model) #356

Merged
merged 6 commits into from
Nov 13, 2024

Conversation

cjfghk5697
Copy link
Contributor

Changedchapters/en/unit7/video-processing/overview-of-previous-sota-models.mdx to chapters/en/unit7/video-processing/cnn-based-video-model.mdx at video processing section. This document provides an overview of various CNN video architectures which integrate different kinds of modalities into a unified representation space.

Part of #348

Who can review? (Initial)
@jungnerd @cjfghk5697 @mreraser and anyone who wants to review!

@mreraser
Copy link
Contributor

mreraser commented Oct 6, 2024

Hello @cjfghk5697 😄 I think you haven't modified _toctree.yml file yet.

RuntimeError: The following files are not present in the table of contents:
- unit7/video-processing/cnn-based-video-model
Add them to ../computer-vision-course/chapters/en/_toctree.yml.

Copy link
Owner

@johko johko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution, was a pleasure to read and learn 🙂

I just left some smaller suggestions, mainly formatting related

chapters/en/_toctree.yml Outdated Show resolved Hide resolved

In this paper, the authors applied the ResNet architecture to the 3D CNNs. This approach introduces deeper models for 3D CNNs and achieves higher accuracy.

Experiments showed that the 3D ResNets (especially deeper ones like the ResNet-34) outperform models like the C3D, particularly on larger datasets. Pretrained models like Sports-1M C3D can help mitigate overfitting on smaller datasets. Overall, 3D ResNets effectively leverage deeper architectures to capture complex spatiotemporal patterns in the video data.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you mention C3D as if it is already known, maybe shortly introduce it or give a link to the paper?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for reviewing it! I’ve added C3D research paper's link.

Copy link
Collaborator

@ATaylorAerospace ATaylorAerospace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Owner

@johko johko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes. LGTM 👍

@johko johko merged commit 0a8c4fd into johko:stage Nov 13, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants