Problems when Downloading the Italian Dataset #12

david-gimeno · 2023-11-12T12:08:41Z

Hi,

I run the following command to download the Italian Datasert from MuAViC:

python get_data.py --root-path ./esperanza/ --src-lang it

However, in some moment of the running the script was interrupted. Please find attached the full error trace:

Traceback (most recent call last):
  File "/home/dgimeno/phd/muavic/utils.py", line 62, in download_file
    wget.download(url, out=str(download_path / filename), bar=custom_bar)
  File "/home/dgimeno/anaconda3/envs/muavic/lib/python3.8/site-packages/wget.py", line 506, in download
    (fd, tmpfile) = tempfile.mkstemp(".tmp", prefix=prefix, dir=".")
  File "/home/dgimeno/anaconda3/envs/muavic/lib/python3.8/tempfile.py", line 331, in mkstemp
    return _mkstemp_inner(dir, prefix, suffix, flags, output_type)
  File "/home/dgimeno/anaconda3/envs/muavic/lib/python3.8/tempfile.py", line 250, in _mkstemp_inner
    fd = _os.open(file, flags, 0o600)
FileNotFoundError: [Errno 2] No such file or directory: './esperanza/metadata/it_metadata.tgz88g65ab3.tmp'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "get_data.py", line 115, in <module>
    main(args)
  File "get_data.py", line 84, in main
    prepare_mtedx(args)
  File "get_data.py", line 26, in prepare_mtedx
    preprocess_mtedx_video(
  File "/home/dgimeno/phd/muavic/mtedx_utils.py", line 220, in preprocess_mtedx_video
    video_metadata = load_video_metadata(
  File "/home/dgimeno/phd/muavic/utils.py", line 110, in load_video_metadata
    download_extract_file_if_not(
  File "/home/dgimeno/phd/muavic/utils.py", line 89, in download_extract_file_if_not
    download_file(url, download_path)
  File "/home/dgimeno/phd/muavic/utils.py", line 65, in download_file
    raise HTTPError(e.url, e.code, message, e.hdrs, e.fp)
AttributeError: 'FileNotFoundError' object has no attribute 'url'

The text was updated successfully, but these errors were encountered:

Anwarvic · 2024-01-05T22:19:13Z

Hi @david-gimeno ,

Thank you for raising this issue and so sorry for the late reply!

I couldn't replicate your error on my machine. However, I would suggest deleting tgz88g65ab3.tmp from your video files. I think this file wasn't downloaded fully, that's why it has the .tmp suffix. Once deleted, the script should recognize that this file is missing and try to download it again.

Hope this helps!

david-gimeno · 2024-01-14T13:34:41Z

No worries for the late reply :) Thanks, your suggestion worked!

However, I would like to highlight you that the number of videos available to download is decreasing. Consequently, one day there will no enough videos to allow further research to provide fair comparisons to previous studies w.r.t. audio-visual or visual-only settings. Regarding audio waveorms, there is no problem since they are coming from the MTEDx corpus.

I think that, although I can understand what this mean and all the infrastructure it can imply, the database should be shared in a different way, similar to LRS3 and not depending on YouTube availability video clips.

Kind regards!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems when Downloading the Italian Dataset #12

Problems when Downloading the Italian Dataset #12

david-gimeno commented Nov 12, 2023

Anwarvic commented Jan 5, 2024

david-gimeno commented Jan 14, 2024 •

edited

Loading

Problems when Downloading the Italian Dataset #12

Problems when Downloading the Italian Dataset #12

Comments

david-gimeno commented Nov 12, 2023

Anwarvic commented Jan 5, 2024

david-gimeno commented Jan 14, 2024 • edited Loading

david-gimeno commented Jan 14, 2024 •

edited

Loading