speeding up create_raw_dataset.py #56

ljj7975 · 2021-02-25T03:24:43Z

create_raw_dataset.py takes quite a long time to generate datasets.

I thinking multi-threading AudioDatasetMetadataWriter write will do the job.

Also, this process terminates with segfault

ljj7975 · 2021-03-07T01:26:35Z

segfault was happening due to numba
numba/numba#4323

ljj7975 · 2021-03-12T02:43:55Z

ljj7975 · 2021-04-20T03:12:22Z

When writing a dataset, process function should also take in sample (AudioClipExample) and use sample.audio_data when metadata.path does not exist (https://github.com/castorini/howl/blob/master/howl/data/dataset/serialize.py#L67-L72)

ColonelThirtyTwo · 2021-08-30T20:18:02Z

A simple way to speed this up is to call out to ffmpeg in AudioDatasetWriter rather than doing the conversions in Python (which is slow).

ljj7975 self-assigned this Feb 25, 2021

ljj7975 assigned jacobk52 Apr 20, 2021

ljj7975 mentioned this issue Apr 20, 2021

initial version of serialize test case #82

Merged

ColonelThirtyTwo mentioned this issue Jul 29, 2021

not able to create negative dataset #86

Open

jacobk52 removed their assignment Sep 1, 2021

Provide feedback