Merge pull request #24 from tsaishien-chen/main

Update readme to fix link errors
snap-research · Mar 12, 2024 · 4051ae4 · 4051ae4
2 parents 70226bd + a3e3028
commit 4051ae4
Show file tree

Hide file tree

Showing 5 changed files with 33 additions and 14 deletions.
diff --git a/README.md b/README.md
@@ -97,7 +97,7 @@ https://github.com/tsaishien-chen/Panda-70M/assets/43384650/fee5468d-815f-41a7-8
 
 ## License of Panda-70M
 
-See [license](https://github.com/tsaishien-chen/Panda-70M/blob/main/LICENSE).
+See [license](https://github.com/snap-research/Panda-70M/blob/main/LICENSE).
 The video samples are collected from a publicly available dataset.
 Users must follow [the related license](https://raw.githubusercontent.com/microsoft/XPretrain/main/hd-vila-100m/LICENSE) to use these video samples.
 

diff --git a/captioning/README.md b/captioning/README.md
@@ -11,17 +11,17 @@ We release the checkpoint trained on Panda-70M.
 ## Preparations
 ### Setup Repository and Enviroment
 ```
-git clone https://github.com/tsaishien-chen/Panda-70M.git
+git clone https://github.com/snap-research/Panda-70M.git
 cd Panda-70M/captioning
 
 # create a conda environment
 conda create --name panda70m_captioning python=3.9 -y
 conda activate panda70m_captioning
 pip install -r requirements.txt
 
-# install ffmpeg
-apt-get update -y
-apt-get install -y default-jre
+# install default JRE
+apt update
+apt install default-jre
 ```
 ### Download Checkpoint
 You can manually download the file [here](https://drive.google.com/file/d/1Gjp5LrgGJobcFi3AaXvLnzlY7IWXyaI5/view?usp=sharing) (3.82GB) and move it to the `checkpoint` folder or run:
@@ -30,7 +30,7 @@ wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download
 ```
 ### Prepare Vicuna:
 - Please follow the [intructions](https://github.com/lm-sys/FastChat/blob/main/docs/vicuna_weights_version.md) from FastChat to install **vicuna-7b-v0** weight.
-- **[Note]** You need to apply delta weights and after processed, the weights should be moved to `vicuna_weights/vicuna-7b-v0` folder with the file list like [this](https://github.com/tsaishien-chen/Panda-70M/blob/main/captioning/vicuna_weights/vicuna-7b-v0/README.md).
+- **[Note]** You need to apply delta weights and after processed, the weights should be moved to `vicuna_weights/vicuna-7b-v0` folder with the file list like [this](https://github.com/snap-research/Panda-70M/blob/main/captioning/vicuna_weights/vicuna-7b-v0/README.md).
 
 ## Quick Demo
 ```

diff --git a/dataset_dataloading/README.md b/dataset_dataloading/README.md
@@ -1,7 +1,7 @@
 # 🐼 Panda-70M: Dataset Dataloading
 The section includes the csv files listing the data samples in Panda-70M and the code to download the videos.
 
-**[Note] Please use the video2dataset tool from this repository to download the dataset. As the video2dataset from [the official repository](https://github.com/iejMac/video2dataset) cannot work with our csv format for now. We are working on making Panda-70M downloadable through the official video2dataset.**
+**[Note] Please use the video2dataset tool from this repository to download the dataset, as the video2dataset from [the official repository](https://github.com/iejMac/video2dataset) cannot work with our csv format for now.**
 
 ## Data Splitting and Download Link
   | Split           | Download | # Source Videos | # Samples | Video Duration | Storage Space|
@@ -20,12 +20,11 @@ The section includes the csv files listing the data samples in Panda-70M and the
 ## Download Dataset
 ### Setup Repository and Enviroment
 ```
-git clone https://github.com/tsaishien-chen/Panda-70M.git
+git clone https://github.com/snap-research/Panda-70M.git
 cd Panda-70M/dataset_dataloading/video2dataset
 pip install -e .
 cd ..
 ```
-
 ### Download Dataset
 Download the csv files and change `<csv_file>` and `<output_folder>` arguments to download corresponding data.
 ```
@@ -37,8 +36,25 @@ video2dataset --url_list="<csv_file>" \
               --save_additional_columns="[matching_score]" \
               --config="video2dataset/video2dataset/configs/panda_70M.yaml"
 ```
-- **[Note 1]** If you get `HTTP Error 403: Forbidden` error, it might because your IP got banned. Please refer [this issue](https://github.com/yt-dlp/yt-dlp/issues/8785) and try to download the data by another IP.
-- **[Note 2]** You will get `"status": "failed_to_download"` and `"error_message": "[Errno 2] No such file or directory: '/tmp/*.mp4'"`, if the YouTube video has been set to private or removed.
+### Common Errors
+<table class="center">
+  <tr style="line-height: 0">
+    <td width=40% style="border: none; text-align: center"><b>Error Message</td>
+    <td width=60% style="border: none; text-align: center"><b>Solution</td>
+  </tr>
+  <tr style="line-height: 0">
+    <td width=40% style="border: none; text-align: center"><pre>HTTP Error 403: Forbidden</pre></td>
+    <td width=60% style="border: none; text-align: center">Your IP got blocked. Please use proxy for downloading. Refer <a href="https://github.com/yt-dlp/yt-dlp/issues/8785">this issue</a>.</td>
+    </tr>
+    <tr style="line-height: 0">
+      <td width=40% style="border: none; text-align: center"><pre>HTTP Error 429: Too Many Requests</pre></td>
+      <td width=60% style="border: none; text-align: center">Your download request reaches a limitation. Please slow down the download speed. Refer <a href="https://github.com/iejMac/video2dataset/issues/267">this issue</a>.</td>
+    </tr>
+    <tr style="line-height: 0">
+      <td width=40% style="border: none; text-align: center">In the json file:<pre>"status": "failed_to_download" & "error_message":<br>"[Errno 2] No such file or directory: '/tmp/...'"</pre></td>
+      <td width=60% style="border: none; text-align: center">The YouTube video has been set to private or removed. Please skip this sample.</td>
+    </tr>
+</table>
 
 ### Dataset Format
 The code will download and store the data with the format:

diff --git a/splitting/README.md b/splitting/README.md
@@ -4,13 +4,16 @@ The section includes the code to split a long video into multiple semantics-cons
 ## Video Splitting and Quick Demo
 ### Setup Repository and Enviroment
 ```
-git clone https://github.com/tsaishien-chen/Panda-70M.git
+git clone https://github.com/snap-research/Panda-70M.git
 cd Panda-70M/splitting
 
 # create a conda environment
 conda create --name panda70m_splitting python=3.8.16 -y
 conda activate panda70m_splitting
 pip install -r requirements.txt
+
+# install ffmpeg
+apt install ffmpeg
 ```
 
 ### Step 1: Shot Boundary Detection
@@ -37,7 +40,7 @@ The code will process the videos listed in `video_list.txt` and stitch the seman
     "video2.mp4": [["0:00:00.000", "0:00:23.723"], ["0:00:23.723", "0:00:52.685"], ["0:00:52.685", "0:01:22.682"], ["0:01:22.682", "0:02:00.019"]]
 }
 ```
-- **[Note]** We make several changes of the parameters for better splitting results. If you want to use the same parameters as we collect Panda-70M, you can run [this line](https://github.com/tsaishien-chen/Panda-70M/blob/039da730b38de93e40e1a2f0ed5653cf93edf89c/splitting/event_stitching.py#L195) and comment out line 194.
+- **[Note]** We make several changes of the parameters for better splitting results. If you want to use the same parameters as we collect Panda-70M, you can run [line 200](https://github.com/snap-research/Panda-70M/blob/70226bd6d8ce3fc35b994b2d13273b57d5469da5/splitting/event_stitching.py#L200) and comment out [line 199](https://github.com/snap-research/Panda-70M/blob/70226bd6d8ce3fc35b994b2d13273b57d5469da5/splitting/event_stitching.py#L199).
 ### Step 3: Video Splitting
 ```
 python video_splitting.py --video-list video_list.txt --event-timecode event_timecode.json --output-folder outputs

diff --git a/splitting/event_stitching.py b/splitting/event_stitching.py
@@ -197,7 +197,7 @@ def repl_func(match: re.Match):
         cutscenes, cutscene_feature = verify_cutscene(cutscene, cutscene_raw_feature, cutscene_raw_status, transition_threshold=1.)
         events_raw, event_feature_raw = cutscene_stitching(cutscenes, cutscene_feature, eventcut_threshold=0.6)
         events, event_feature = verify_event(events_raw, event_feature_raw, fps, min_event_len=2.0, max_event_len=1200, redundant_event_threshold=0.0, trim_begin_last_percent=0.0, still_event_threshold=0.15)
-        # events, event_feature = verify_event(events_raw, event_feature_raw, min_event_len=2.5, max_event_len=60, redundant_event_threshold=0.3, trim_begin_last_percent=0.1, still_event_threshold=0.15)
+        # events, event_feature = verify_event(events_raw, event_feature_raw, fps, min_event_len=2.5, max_event_len=60, redundant_event_threshold=0.3, trim_begin_last_percent=0.1, still_event_threshold=0.15)
         video_events[video_path.split("/")[-1]] = transfer_timecode(events, fps)
 
     write_json_file(video_events, args.output_json_file)