You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for your effort pushing MLLM into the next stage. Recently, I want to follow your work, and download VIDAL-10M video-text data id2title_folder_raw_ofa_mplug_gpt_sound10076613.json.
I found it contain around 10M video-text, I have following question wish you could give me some hints.
what's the difference between this 10M video-text and 3M video-text mentioned in your ICLR paper.
Regarding to this 10M video-text, I found many video's raw(including title and hashtag) contains some words like youtube, shorts. Take youtube ID LbxMRY4_W10 for example, its raw is I kicked this ball higher than Ja Morant can jump! #shorts #youtubeshorts #youtube #shortclips. But in your paper, you mention "we removed irrelevant words and hashtags, such as ”youtube”, ”fyp”, ”shorts”, etc".
Thanks in advance.
The text was updated successfully, but these errors were encountered:
Thanks for your effort pushing MLLM into the next stage. Recently, I want to follow your work, and download VIDAL-10M video-text data id2title_folder_raw_ofa_mplug_gpt_sound10076613.json.
I found it contain around 10M video-text, I have following question wish you could give me some hints.
Thanks in advance.
The text was updated successfully, but these errors were encountered: