-
Notifications
You must be signed in to change notification settings - Fork 22
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #4 from tsaishien-chen/main
Push code
- Loading branch information
Showing
386 changed files
with
44,352 additions
and
168 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,10 +16,101 @@ Ming-Hsuan Yang, | |
Sergey Tulyakov | ||
|
||
<!-- [Arxiv Report](https://arxiv.org/abs/2307.04725) | [Project Page](https://snap-research.github.io/Panda-70M) --> | ||
[![arXiv](https://img.shields.io/badge/arXiv-2312.00000-b31b1b.svg)](https://arxiv.org/abs/2312.00000) | ||
[![arXiv](https://img.shields.io/badge/arXiv-2402.19479-b31b1b.svg)](https://arxiv.org/abs/2402.19479) | ||
[![Project Page](https://img.shields.io/badge/Project-Website-green)](https://snap-research.github.io/Panda-70M) | ||
|
||
*Code is coming soon!* | ||
## Introduction | ||
Panda-70M is a large-scale dataset with 70M high-quality video-caption pairs. | ||
This repository have three sections: | ||
- [Dataset Dataloading](./dataset_dataloading) includes the csv files listing the data of Panda-70M and the code to download the dataset. | ||
- [Splitting](./splitting) includes the code to split a long video into multiple semantics-consistent short clips. | ||
- [Captioning](./captioning) includes the proposed video captioning model trained on Panda-70M. | ||
|
||
## Dataset | ||
### Collection Pipeline | ||
<p align="center" width="100%"> | ||
<a target="_blank"><img src="assets/collection_pipeline.gif" style="width: 100%; min-width: 200px; display: block; margin: auto;"></a> | ||
</p> | ||
|
||
### Download | ||
| Split | Download | # Source Videos | # Samples | Video Duration | Storage Space| | ||
|-----------------|----------|-----------------|-----------|----------------|--------------| | ||
| Training (full) | [link](https://drive.google.com/file/d/1DeODUcdJCEfnTjJywM-ObmrlVg-wsvwz/view?usp=sharing) (2.01 GB) | 3,779,763 | 70,723,513 | 167 khrs | ~36 TB | | ||
| Training (10M) | [link](https://drive.google.com/file/d/1Lrsb65HTJ2hS7Iuy6iPCmjoc3abbEcAX/view?usp=sharing) (381 MB) | 3,755,240 | 10,473,922 | 37.0 khrs | ~8.0 TB | | ||
| Training (2M) | [link](https://drive.google.com/file/d/1jWTNGjb-hkKiPHXIbEA5CnFwjhA-Fq_Q/view?usp=sharing) (86.5 MB) | 800,000 | 2,400,000 | 7.56 khrs | ~1.6 TB | | ||
| Validation | [link](https://drive.google.com/file/d/1cTCaC7oJ9ZMPSax6I4ZHvUT-lqxOktrX/view?usp=sharing) (803 KB) | 2,000 | 6,000 | 18.5 hrs | ~4.0 GB | | ||
| Testing | [link](https://drive.google.com/file/d/1ee227tHEO-DT8AkX7y2q6-bfAtUL-yMI/view?usp=sharing) (803 KB) | 2,000 | 6,000 | 18.5 hrs | ~4.0 GB | | ||
|
||
More details can be found in [Dataset Dataloading](./dataset_dataloading) section. | ||
|
||
## Demonstration | ||
### Video-Caption Pairs in Panda-70M | ||
<table class="center"> | ||
<tr> | ||
<td width=33.3% style="border: none"><img src="./assets/aIPu1xGNbhc.49.gif"></td> | ||
<td width=33.3% style="border: none"><img src="./assets/AIyw1FO1aqs.57.gif"></td> | ||
<td width=33.3% style="border: none"><img src="./assets/Kb8ON0iCs38.97.gif"></td> | ||
</tr> | ||
<tr style="text-align: center;"> | ||
<td width=33.3% style="border: none">A rhino and a lion are fighting in the dirt.</td> | ||
<td width=33.3% style="border: none">A person is holding a long haired dachshund in their arms.</td> | ||
<td width=33.3% style="border: none">A rocket launches into space on the launch pad.</td> | ||
</tr> | ||
</table> | ||
|
||
<table class="center"> | ||
<tr> | ||
<td width=33.3% style="border: none"><img src="./assets/AvVDsFBc6bA.0.gif"></td> | ||
<td width=33.3% style="border: none"><img src="./assets/S-1NdEjjg7c.58.gif"></td> | ||
<td width=33.3% style="border: none"><img src="./assets/10Y6wIEuG00.62.gif"></td> | ||
</tr> | ||
<tr style="text-align: center;"> | ||
<td width=33.3% style="border: none">A person is kneading dough and putting jam on it.</td> | ||
<td width=33.3% style="border: none">A little boy is playing with a basketball in the city.</td> | ||
<td width=33.3% style="border: none">A 3d rendering of a zoo with animals and a train.</td> | ||
</tr> | ||
</table> | ||
|
||
<table class="center"> | ||
<tr> | ||
<td width=33.3% style="border: none"><img src="./assets/_uQs-YDb5VA.9.gif"></td> | ||
<td width=33.3% style="border: none"><img src="./assets/CgcadSRtAag.140.gif"></td> | ||
<td width=33.3% style="border: none"><img src="./assets/1NMpoAqzJfY.25.gif"></td> | ||
</tr> | ||
<tr style="text-align: center;"> | ||
<td width=33.3% style="border: none">A person in blue gloves is connecting an electrical supply to an injector.</td> | ||
<td width=33.3% style="border: none">There is a beach with waves and rocks in the foreground, and a city skyline in the background.</td> | ||
<td width=33.3% style="border: none">It is a rally car driving on a dirt road in the countryside, with people watching from the side of the road.</td> | ||
</tr> | ||
</table> | ||
|
||
<sup>**We will remove the video samples from our dataset / Github / project webpage as long as you need it. Please contact tsaishienchen at gmail dot com for the request.</sup> | ||
|
||
Please check [here](https://snap-research.github.io/Panda-70M/more_samples) for more samples. | ||
|
||
### Long Video Splitting and Captioning | ||
https://github.com/tsaishien-chen/Panda-70M/assets/43384650/481b369a-122b-4571-a83e-416201ebd6c9 | ||
|
||
https://github.com/tsaishien-chen/Panda-70M/assets/43384650/fee5468d-815f-41a7-8202-bdb3b60fcac7 | ||
|
||
## License of Panda-70M | ||
|
||
See [license](https://github.com/tsaishien-chen/Panda-70M/blob/main/LICENSE). | ||
The video samples are collected from a publicly available dataset. | ||
Users must follow [the related license](https://raw.githubusercontent.com/microsoft/XPretrain/main/hd-vila-100m/LICENSE) to use these video samples. | ||
|
||
## Citation | ||
|
||
If you find this project useful for your research, please cite our paper. :blush: | ||
|
||
```bibtex | ||
@article{chen2024panda70M, | ||
title = {Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers}, | ||
author = {Chen, Tsai-Shien and Siarohin, Aliaksandr and Menapace, Willi and Deyneka, Ekaterina and Chao, Hsiang-wei and Jeon, Byung Eun and Fang, Yuwei and Lee, Hsin-Ying and Ren, Jian and Yang, Ming-Hsuan and Tulyakov, Sergey}, | ||
journal = {arXiv preprint arXiv:2402.19479}, | ||
year = {2024} | ||
} | ||
``` | ||
|
||
## Contact Information | ||
**Tsai-Shien Chen**: [[email protected]](mailto:[email protected]) |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,92 @@ | ||
# 🐼 Panda-70M: Video Captioning | ||
|
||
## Introduction | ||
We propose a video captioning model to generate a caption for a short video clip. | ||
The model includes vision (green) and textual (blue) branches to benefit video captioning by both video and text inputs. | ||
We release the checkpoint trained on Panda-70M. | ||
<p align="center" width="100%"> | ||
<a target="_blank"><img src="assets/architecture.png" style="width: 60%; min-width: 200px; display: block; margin: auto;"></a> | ||
</p> | ||
|
||
## Preparations | ||
### Setup Repository and Enviroment | ||
``` | ||
git clone https://github.com/tsaishien-chen/Panda-70M.git | ||
cd Panda-70M/captioning | ||
# create a conda environment | ||
conda create --name panda70m_captioning python=3.9 -y | ||
conda activate panda70m_captioning | ||
pip install -r requirements.txt | ||
# install ffmpeg | ||
apt-get update -y | ||
apt-get install -y default-jre | ||
``` | ||
### Download Checkpoint | ||
You can manually download the file [here](https://drive.google.com/file/d/1Gjp5LrgGJobcFi3AaXvLnzlY7IWXyaI5/view?usp=sharing) (3.82GB) and move it to the `checkpoint` folder or run: | ||
``` | ||
wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1Gjp5LrgGJobcFi3AaXvLnzlY7IWXyaI5' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1Gjp5LrgGJobcFi3AaXvLnzlY7IWXyaI5" -O checkpoint/checkpoint_best.pth && rm -rf /tmp/cookies.txt | ||
``` | ||
### Prepare Vicuna: | ||
- Please follow the [intructions](https://github.com/lm-sys/FastChat/blob/main/docs/vicuna_weights_version.md) from FastChat to install **vicuna-7b-v0** weight. | ||
- **[Note]** You need to apply delta weights and after processed, the weights should be moved to `vicuna_weights/vicuna-7b-v0` folder with the file list like [this](https://github.com/tsaishien-chen/Panda-70M/blob/main/captioning/vicuna_weights/vicuna-7b-v0/README.md). | ||
|
||
## Quick Demo | ||
``` | ||
python inference.py --video-list inputs/video_list.txt --prompt-list inputs/prompt_list.txt | ||
``` | ||
The code will caption two test videos listed in the `video_list.txt` with the extra inputs of textual information from the `prompt_list.txt`. Here are some output examples: | ||
<table class="center"> | ||
<tr style="line-height: 0"> | ||
<td width=30% style="border: none; text-align: center"><b>Input Video</b></td> | ||
<td width=50% style="border: none; text-align: center"><b>Input Text</b></td> | ||
<td width=20% style="border: none; text-align: center"><b>Output Caption</b></td> | ||
</tr> | ||
<tr> | ||
<td width=30% style="border: none"><img src="assets/video1.gif" style="width:100%"></td> | ||
<td width=50% style="border: none; text-align: center"><sup> | ||
Some information about a video you will get:<br> | ||
Transcription: Today we're gonna take a quick look at the 1966 Ford Mustang GT 289 v8 under the hood.<br> | ||
Metadata: ['Old VS New - 1966 Ford Mustang GT & 2018 Ford Mustang | Just a Quick Look', 'Lets check out this beautiful 1966 Ford Mustang GT 289 in the showroom with the 2018 Ford Mustang!']<br> | ||
Please look at the video and faithfully summarize it in one sentence.</sup></td> | ||
<td width=20% style="border: none; text-align: center">A red mustang parked in a showroom with american flags hanging from the ceiling.</td> | ||
</tr> | ||
<tr> | ||
<td width=30% style="border: none"><img src="assets/video2.gif" style="width:100%"></td> | ||
<td width=50% style="border: none; text-align: center">Please faithfully summarize the following video in one sentence.</td> | ||
<td width=20% style="border: none; text-align: center">An aerial view of a city with a river running through it.</td> | ||
</tr> | ||
</table> | ||
|
||
<sup>**We will remove the video samples from our dataset / Github / project webpage as long as you need it. Please contact tsaishienchen at gmail dot com for the request.</sup> | ||
|
||
- **[Note]** You might get different outputs due to the randomness of LLM's generation. | ||
|
||
## Evaluation | ||
### Zero-shot Captioning Performance | ||
| | BLEU-4 | ROUGE-L | METEOR | CIDEr | BertScore | | ||
|------------|--------|---------|--------|-------|-----------| | ||
| **MSRVTT** | 25.4% | 50.1% | 27.7% | 31.5% | 87.9% | | ||
| **MSVD** | 32.8% | 61.2% | 35.3% | 49.2% | 90.2% | | ||
|
||
- **[Note]** The results might not be perfectly reproduced due to the randomness of LLM's generation and could have an deviation of ±0.5%. | ||
|
||
### Prepare Testing Data | ||
- You can download the video samples here [[MSRVTT](https://www.robots.ox.ac.uk/~maxbain/frozen-in-time/data/MSRVTT.zip) / [MSVD](https://www.cs.utexas.edu/users/ml/clamp/videoDescription/)] and move them to `test_datasets/video_samples/MSRVTT` or `MSVD` folder. | ||
- The caption annotations of the testing samples are already saved in `test_datasets/anno_downstream` folder. | ||
|
||
### Evaluation | ||
``` | ||
# MSRVTT | ||
python inference.py --video-list test_datasets/video_list/msrvtt_test.txt --output-json msrvtt_caption.json | ||
python compute_results.py --predict-json msrvtt_caption.json --target-json test_datasets/anno_downstream/msrvtt_caption_test.json | ||
# MSVD | ||
python inference.py --video-list test_datasets/video_list/msvd_test.txt --output-json msvd_caption.json | ||
python compute_results.py --predict-json msvd_caption.json --target-json test_datasets/anno_downstream/msvd_caption_test.json | ||
``` | ||
|
||
## Acknowledgements | ||
The code for video captioning is built upon [Video-LLaMA](https://github.com/DAMO-NLP-SG/Video-LLaMA). | ||
Thanks for sharing the great work! |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Put the model checkpoint here |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
from pycocoevalcap.tokenizer.ptbtokenizer import PTBTokenizer | ||
from pycocoevalcap.bleu.bleu import Bleu | ||
from pycocoevalcap.meteor.meteor import Meteor | ||
from pycocoevalcap.rouge.rouge import Rouge | ||
from pycocoevalcap.cider.cider import Cider | ||
from bert_score import score as bert_score_compute | ||
from tqdm import tqdm | ||
from collections import defaultdict | ||
import pandas as pd | ||
import argparse | ||
import json | ||
|
||
|
||
if __name__ == "__main__": | ||
parser = argparse.ArgumentParser(description="Evaluation") | ||
parser.add_argument("--predict-json", required=True, help="prediction json file.") | ||
parser.add_argument("--target-json", required=True, help="ground truth json file.") | ||
args = parser.parse_args() | ||
|
||
pd = json.load(open(args.predict_json)) | ||
gt = json.load(open(args.target_json)) | ||
pds = defaultdict(list) | ||
gts = defaultdict(list) | ||
pds_all = [] | ||
gts_all = [] | ||
|
||
for i, data in enumerate(gt): | ||
video, captions = data["video"], data["caption"] | ||
pds[i].append({"image_id":video, "caption":pd[video]}) | ||
pds_all += ([pd[video]]*len(captions)) | ||
|
||
for caption in captions: | ||
gts[i].append({"image_id":video, "caption":caption}) | ||
gts_all += captions | ||
|
||
tokenizer = PTBTokenizer() | ||
pds = tokenizer.tokenize(pds) | ||
gts = tokenizer.tokenize(gts) | ||
scorers = [(Bleu(4), ["Bleu_1", "Bleu_2", "Bleu_3", "Bleu_4"]), | ||
(Meteor(),"METEOR"), | ||
(Rouge(), "ROUGE_L"), | ||
(Cider(), "CIDEr")] | ||
|
||
eval_dict = {} | ||
for scorer, method in scorers: | ||
score, scores = scorer.compute_score(gts, pds) | ||
if scorer.method() == "Bleu": | ||
eval_dict["BLEU4"] = score[3] | ||
else: | ||
eval_dict[scorer.method()] = score | ||
|
||
_, _, score = bert_score_compute(pds_all, gts_all, lang='en', verbose=False) | ||
eval_dict["BERTScore"] = score.mean().item() | ||
|
||
for k, v in eval_dict.items(): | ||
print("%s: %.2f%%"%(k, v*100)) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
model: | ||
arch: video_llama | ||
model_type: pretrain_vicuna | ||
input_prompt: True | ||
ckpt: "checkpoint/checkpoint_best.pth" | ||
|
||
# Q-Former | ||
num_query_token: 32 | ||
|
||
# Vicuna | ||
llama_model: "vicuna_weights/vicuna-7b-v0" | ||
|
||
# Branch | ||
fusion_head_layers: 2 | ||
max_frame_pos: 32 | ||
fusion_header_type: "seqTransf" | ||
num_video_query_token: 32 | ||
num_text_query_token: 32 | ||
input_vid2tex_query_embed: True | ||
detach_video_query_embed: True | ||
|
||
max_caption_len: 48 | ||
max_prompt_len: 200 | ||
start_sym: "<s>" | ||
end_sym: "</s>" | ||
|
||
datasets: | ||
hdvila: | ||
vis_processor: | ||
train: | ||
name: "alpro_video_eval" | ||
n_frms: 8 | ||
image_size: 224 | ||
text_processor: | ||
train: | ||
name: "blip_caption" | ||
|
||
run: | ||
task: video_text_pretrain |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,82 @@ | ||
import glob | ||
import argparse | ||
import torch | ||
import json | ||
import os | ||
from video_llama.common.config import Config | ||
from video_llama.common.registry import registry | ||
from video_llama.processors.video_processor import load_video | ||
from transformers import StoppingCriteria, StoppingCriteriaList | ||
from tqdm import tqdm | ||
|
||
|
||
class DotDict(dict): | ||
"""dot.notation access to dictionary attributes""" | ||
__getattr__ = dict.get | ||
__setattr__ = dict.__setitem__ | ||
__delattr__ = dict.__delitem__ | ||
|
||
|
||
if __name__ == "__main__": | ||
parser = argparse.ArgumentParser(description="Inference") | ||
parser.add_argument("--cfg-path", default="eval_configs/panda70M_eval.yaml", help="path to configuration file.") | ||
parser.add_argument("--video-list", required=True, help="list of input videos.") | ||
parser.add_argument("--output-json", default=None, help="output json file. Leave none to print out the results.") | ||
parser.add_argument("--prompt-list", default=None, help="list of correponding input prompts. Leave none if no prompt input.") | ||
args = parser.parse_args() | ||
cfg = Config(args) | ||
|
||
model_config = cfg.model_cfg | ||
model_cls = registry.get_model_class(model_config.arch) | ||
model = model_cls.from_config(model_config).to("cuda") | ||
model.eval() | ||
|
||
vis_processor_cfg = DotDict({"name":"alpro_video_eval", "n_frms":8, "image_size":224}) | ||
vis_processor = registry.get_processor_class(vis_processor_cfg.name).from_config(vis_processor_cfg) | ||
text_processor_cfg = DotDict({"name":"blip_caption", "max_words":100}) | ||
text_processor = registry.get_processor_class(text_processor_cfg.name).from_config(text_processor_cfg) | ||
|
||
batch_size = 16 | ||
|
||
videos = open(args.video_list, "r").read().splitlines() | ||
if args.prompt_list: | ||
prompts = open(args.prompt_list, "r").read().split("\n\n") | ||
|
||
results = {} | ||
for i in tqdm(range(0, len(videos), batch_size)): | ||
video_batch = [] | ||
video_path_batch = [] | ||
prompt_batch = [] | ||
|
||
for j in range(i, min(i+batch_size, len(videos))): | ||
try: | ||
video_path = videos[j] | ||
video = load_video(video_path=video_path, n_frms=8, sampling ="uniform") | ||
video = vis_processor.transform(video) | ||
assert video.shape == torch.Size([3, 8, 224, 224]) | ||
except Exception as e: | ||
print(e) | ||
continue | ||
|
||
video_batch.append(video) | ||
video_path_batch.append(video_path.split('/')[-1]) | ||
prompt_batch.append(prompts[j] if args.prompt_list else "Please faithfully summarize the following video in one sentence.") | ||
|
||
video_batch = torch.stack(video_batch).to("cuda") | ||
outputs = model.inference(video_batch, prompt_batch) | ||
|
||
for video_path, output in zip(video_path_batch, outputs): | ||
output = output.capitalize()+"." | ||
if args.output_json: | ||
results[video_path] = output | ||
else: | ||
print("====="*20) | ||
print("[Input video]", video_path) | ||
print("[Input prompt]") | ||
print(prompt_batch[j-i]) | ||
print("[Output caption]", output) | ||
|
||
if args.output_json: | ||
results = json.dumps(results, indent = 4) | ||
with open(args.output_json, "w") as f: | ||
f.write(results) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
Some information about a video you will get: | ||
Transcription: Today we're gonna take a quick look at the 1966 Ford Mustang GT 289 v8 under the hood. | ||
Metadata: ['Old VS New - 1966 Ford Mustang GT & 2018 Ford Mustang | Just a Quick Look', 'Lets check out this beautiful 1966 Ford Mustang GT 289 in the showroom with the 2018 Ford Mustang!'] | ||
Please look at the video and faithfully summarize it in one sentence. | ||
|
||
Please faithfully summarize the following video in one sentence. |
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
inputs/video1.mp4 | ||
inputs/video2.mp4 |
Oops, something went wrong.