slow speed #3

zwbx · 2024-03-23T09:38:02Z

Hi, thanks for your work. I have generated my own dataset successfully. However, I notice that the processing speed is a little a bit.
Cause I am notr a expert for tensorflow, I would like to ask if all the setting goes well. I follow the instruction to enable the Parallelizing Data Processing, but I am not sure wether it works.

(rlds_env) wenbo@wenbo-4090:~/Documents/data/rlds_dataset_builder/RLBench_dataset$ tfds build --overwrite --beam_pipeline_options="direct_running_mode=multi_processing,direct_num_workers=10"
INFO[build.py]: Loading dataset from path: /media/wenbo/12T/rlds_dataset_builder/RLBench_dataset/RLBench_dataset_dataset_builder.py
2024-03-23 20:01:38.474182: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
2024-03-23 20:01:38.495278: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-23 20:01:38.753751: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-03-23 20:01:38.853793: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:995] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-03-23 20:01:38.867960: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:995] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-03-23 20:01:38.868039: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:995] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-03-23 20:01:39.080119: W tensorflow/tsl/platform/cloud/google_auth_provider.cc:184] All attempts to get a Google authentication bearer token failed, returning an empty token. Retrieving token from files failed with "NOT_FOUND: Could not locate the credentials file.". Retrieving token from GCE failed with "FAILED_PRECONDITION: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Could not resolve host: metadata.google.internal".
INFO[resolver.py]: Using /tmp/tfhub_modules to cache modules.
2024-03-23 20:01:40.125738: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:995] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-03-23 20:01:40.125853: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:995] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-03-23 20:01:40.125898: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:995] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-03-23 20:01:40.167775: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:995] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-03-23 20:01:40.167863: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:995] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-03-23 20:01:40.167917: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:995] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-03-23 20:01:40.167973: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 21885 MB memory: -> device: 0, name: NVIDIA GeForce RTX 4090, pci bus id: 0000:01:00.0, compute capability: 8.9
INFO[load.py]: Fingerprint not found. Saved model loading will continue.
INFO[build.py]: download_and_prepare for dataset rl_bench_dataset/1.0.0...
INFO[native_type_compatibility.py]: Using Any for unsupported type: typing.Sequence[~T]
INFO[bigquery.py]: No module named google.cloud.bigquery_storage_v1. As a result, the ReadFromBigQuery transform CANNOT be used with method=DIRECT_READ.
INFO[dataset_builder.py]: Generating dataset rl_bench_dataset (/home/wenbo/tensorflow_datasets/rl_bench_dataset/1.0.0)
Downloading and preparing dataset Unknown size (download: Unknown size, generated: Unknown size, total: Unknown size) to /home/wenbo/tensorflow_datasets/rl_bench_dataset/1.0.0...
Generating splits...: 0%| 2024-03-23 20:01:46.127202: I tensorflow/tsl/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2024-03-23 20:01:46.328955: I tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:606] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
Generating train examples...: 5 examples [00:32, 6.87s/ examples]

The text was updated successfully, but these errors were encountered:

kpertsch · 2024-03-23T18:01:44Z

Hi,
thanks for your interest! I suggest switching to the multi-threaded branch -- you can likely copy over most of your code changes -- the parallelization on the main branch turned out to not work very well, and in the multithreaded branch I manually parallelize the processing which seems to work more reliably!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

slow speed #3

slow speed #3

zwbx commented Mar 23, 2024 •

edited

Loading

kpertsch commented Mar 23, 2024

slow speed #3

slow speed #3

Comments

zwbx commented Mar 23, 2024 • edited Loading

kpertsch commented Mar 23, 2024

zwbx commented Mar 23, 2024 •

edited

Loading