Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

confusion about VIDAL-10M video-text data #44

Open
wli333 opened this issue Mar 13, 2024 · 0 comments
Open

confusion about VIDAL-10M video-text data #44

wli333 opened this issue Mar 13, 2024 · 0 comments

Comments

@wli333
Copy link

wli333 commented Mar 13, 2024

Thanks for your effort pushing MLLM into the next stage. Recently, I want to follow your work, and download VIDAL-10M video-text data id2title_folder_raw_ofa_mplug_gpt_sound10076613.json.

I found it contain around 10M video-text, I have following question wish you could give me some hints.

  1. what's the difference between this 10M video-text and 3M video-text mentioned in your ICLR paper.
  2. Regarding to this 10M video-text, I found many video's raw(including title and hashtag) contains some words like youtube, shorts. Take youtube ID LbxMRY4_W10 for example, its raw is I kicked this ball higher than Ja Morant can jump! #shorts #youtubeshorts #youtube #shortclips. But in your paper, you mention "we removed irrelevant words and hashtags, such as ”youtube”, ”fyp”, ”shorts”, etc".

Thanks in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant