Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to install TensorRT after recent CM changes #914

Closed
WarrenSchultz opened this issue Aug 27, 2023 · 4 comments
Closed

Unable to install TensorRT after recent CM changes #914

WarrenSchultz opened this issue Aug 27, 2023 · 4 comments
Labels

Comments

@WarrenSchultz
Copy link

WarrenSchultz commented Aug 27, 2023

I just used this script on a couple other machines, but I recently spun up a new clean machine (Ubuntu 22.04 or on WSL2, like the other machines) and it is tanking out on the TensorRT install.
I'm also having issues with it detecting the number of GPUs, which is 4 instead of 2, but I can't diagnose that issue until I'm sure all the prereqs are installed properly.

This is the output when I attempt to run the install script (either get tensorrt _dev, or just get tensorrt)
get-cudnn works fine still.

I haven't been able to find a place in the docs that says if there's a way to pull a specific timestamp/git changelist/etc. to have identical code on multiple machines. (I thought I saw it before, but it's not coming up in searches now if it exists.)

Thanks



cmr "get tensorrt _dev" --tar_file=/home/ptuser/nvmestorage/nvidia-prereq/TensorRT-8.6.1.6.Linux.x86_64-gnu.cuda-12.0.tar
* cm run script "get tensorrt _dev"
  * cm run script "detect os"
  * cm run script "get python3"
      - More than 1 cached script output found for "get,python3":

        0) /home/ptuser/CM/repos/local/cache/9e5a860af737467d (get,python3,python,get-python,get-python3,virtual,name-mlperf-tests,script-artifact-d0b5dd74373f4a62,version-3.10.12) (Version 3.10.12)
        1) /home/ptuser/CM/repos/local/cache/40f241a1ea414e4c (get,python,python3,get-python,get-python3,script-artifact-d0b5dd74373f4a62,version-3.10.12,non-virtual) (Version 3.10.12)

        Make your selection or press Enter for 0 or use -1 to skip:

        Selected 0: /home/ptuser/CM/repos/local/cache/9e5a860af737467d
Traceback (most recent call last):
  File "/home/ptuser/.local/bin/cmr", line 8, in <module>
    sys.exit(run_script())
  File "/home/ptuser/.local/lib/python3.10/site-packages/cmind/cli.py", line 76, in run_script
    return run(['run', 'script'] + argv)
  File "/home/ptuser/.local/lib/python3.10/site-packages/cmind/cli.py", line 35, in run
    r = cm.access(argv, out='con')
  File "/home/ptuser/.local/lib/python3.10/site-packages/cmind/core.py", line 546, in access
    r = action_addr(i)
  File "/home/ptuser/CM/repos/mlcommons@ck/cm-mlops/automation/script/module.py", line 1322, in run
    r=utils.load_python_module({'path':path, 'name':'customize'})
  File "/home/ptuser/.local/lib/python3.10/site-packages/cmind/utils.py", line 1300, in load_python_module
    code = imp.load_module(code_uid, full_name, full_path, found_module[2])
  File "/usr/lib/python3.10/imp.py", line 235, in load_module
    return load_source(name, filename, file)
  File "/usr/lib/python3.10/imp.py", line 172, in load_source
    module = _load(spec)
  File "<frozen importlib._bootstrap>", line 719, in _load
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 879, in exec_module
  File "<frozen importlib._bootstrap_external>", line 1017, in get_code
  File "<frozen importlib._bootstrap_external>", line 947, in source_to_code
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/home/ptuser/CM/repos/mlcommons@ck/cm-mlops/script/get-tensorrt/customize.py", line 89
    return {'return': 1, 'error': 'Please envoke cm run script ' + " ".join(tags) + " --tar_file={full path to the TensorRT tar file}'}
                                                                                    ^
SyntaxError: unterminated string literal (detected at line 89)
@arjunsuresh
Copy link
Contributor

Sorry @WarrenSchultz for the issue. This PR should solve this issue. @gfursin can you please merge this?

@WarrenSchultz
Copy link
Author

WarrenSchultz commented Aug 27, 2023

@arjunsuresh Thanks for the quick turnaround. I did the change locally to the single quote from double, but this is the error now.

EDIT: I made a note on your PR where the missing colon mentioned in the error below is missing. The process went through this time.

Traceback (most recent call last):
  File "/home/ptuser/.local/bin/cmr", line 8, in <module>
    sys.exit(run_script())
  File "/home/ptuser/.local/lib/python3.10/site-packages/cmind/cli.py", line 76, in run_script
    return run(['run', 'script'] + argv)
  File "/home/ptuser/.local/lib/python3.10/site-packages/cmind/cli.py", line 35, in run
    r = cm.access(argv, out='con')
  File "/home/ptuser/.local/lib/python3.10/site-packages/cmind/core.py", line 546, in access
    r = action_addr(i)
  File "/home/ptuser/CM/repos/mlcommons@ck/cm-mlops/automation/script/module.py", line 1322, in run
    r=utils.load_python_module({'path':path, 'name':'customize'})
  File "/home/ptuser/.local/lib/python3.10/site-packages/cmind/utils.py", line 1300, in load_python_module
    code = imp.load_module(code_uid, full_name, full_path, found_module[2])
  File "/usr/lib/python3.10/imp.py", line 235, in load_module
    return load_source(name, filename, file)
  File "/usr/lib/python3.10/imp.py", line 172, in load_source
    module = _load(spec)
  File "<frozen importlib._bootstrap>", line 719, in _load
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 879, in exec_module
  File "<frozen importlib._bootstrap_external>", line 1017, in get_code
  File "<frozen importlib._bootstrap_external>", line 947, in source_to_code
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/home/ptuser/CM/repos/mlcommons@ck/cm-mlops/script/get-tensorrt/customize.py", line 87
    if env.get('CM_TENSORRT_REQUIRE_DEV', '') != 'yes'
                                                      ^
SyntaxError: expected ':'

@arjunsuresh
Copy link
Contributor

Thank you @WarrenSchultz for fixing that

@gfursin
Copy link
Contributor

gfursin commented Aug 28, 2023

Thank you @WarrenSchultz and @arjunsuresh - I have merged the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants