Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flashing of presigned artifacts fails on an Orin NX board #1813

Closed
akfaro opened this issue Jan 17, 2025 · 6 comments · Fixed by #1815
Closed

Flashing of presigned artifacts fails on an Orin NX board #1813

akfaro opened this issue Jan 17, 2025 · 6 comments · Fixed by #1815

Comments

@akfaro
Copy link

akfaro commented Jan 17, 2025

This is a follow-up from this comment in issue #1698, which IMO was closed prematurely:
#1698 (comment).
Feel free to close this issue if you feel that it's better to reopen #1698.

Describe the bug

If the image is built with TEGRA_SIGNING_ARGS set, then initrd-flash crashes before starting to write data to the target.
More details are given below.

I observed the same symptom also before burning the PKC fuse on the Jetson board.

Just like the OP in #1698, I can flash the image successfully if the artifacts are not signed during BitBake build, and initrd-flash is called with the -u parameter (obviously, only on a board with the PKC fuse burnt).

To Reproduce
Steps to reproduce the behavior:

  1. Build meta-tegra branch 'scarthgap' with MACHINE based on 'conf/machine/include/orin-nx.inc' (Orin NX 8GB)
  2. Build with bitbake argument
    No specific arguments, TEGRA_SIGNING_ARGS="-u ${SECUREBOOT_KEYS_DIR}/${PKC_FILENAME}" is set in local.conf
  3. Deploy to hardware with method initrd-flash, without specifying the keys
  4. The flashing aborts with error messages, as seen below.

Additional context

With a cleanly unpacked tegraflash.tar.gz package, running initrd-flash shows this output:

$ sudo ./initrd-flash
Starting at 2025-01-17T14:16:17+01:00
Machine:       my_machine
Rootfs device: nvme0n1p1
Waiting for Jetson to appear on USB.......
[found: 1-2]
== Step 1: Signing binaries at 2025-01-17T14:16:22+01:00 ==
== Step 2: Boot Jetson via RCM at 2025-01-17T14:16:22+01:00 ==
Found Jetson device in recovery mode at USB 1-2
ERR: did not get device serial number at 2025-01-17T14:16:27+01:00

Seems like the variable $serial_number would be written to ./boardvars.sh by the flash helper within sign_binaries(), but the call is skipped with $PRESIGNED set.

I can hard-code the serial number of my particular Jetson device into the initrd-flash script, before the line if [ -z "$serial_number" ]; then.
In this case, the flashpkg data is written to the target, which proceeds with its bootloader sequence, and exposes the external nvme0n1device, which is seen on the Host as/dev/sdb`. However, partitioning the external device and writing the images to it fails on the host:

$ sudo ./initrd-flash
Starting at 2025-01-17T14:52:37+01:00
Machine:       my_machine
Rootfs device: nvme0n1p1
Found Jetson device in recovery mode at USB 1-2
== Step 1: Signing binaries at 2025-01-17T14:52:37+01:00 ==
== Step 2: Boot Jetson via RCM at 2025-01-17T14:52:38+01:00 ==
Found Jetson device in recovery mode at USB 1-2
== Step 3: Sending flash sequence commands at 2025-01-17T14:52:43+01:00 ==
Waiting for USB storage device flashpkg from <my_serial>........[/dev/sdb]
Device size in blocks: 262144
Unmounted /dev/sdb.
== Step 4: Writing partitions on external storage device at 2025-01-17T14:53:12+01:00 ==
Waiting for USB storage device nvme0n1 from <my_serial>...[/dev/sdb]
Traceback (most recent call last):
  File "....tegraflash/nvflashxmlparse", line 483, in <module>
    ret = main()
          ^^^^^^
  File "....tegraflash/nvflashxmlparse", line 408, in main
    rewrite_layout(args.filename, args.rewrite_contents_from.split(','), outf)
  File "....tegraflash/nvflashxmlparse", line 266, in rewrite_layout
    maptree = ET.parse(mapfile)
              ^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/xml/etree/ElementTree.py", line 1204, in parse
    tree.parse(source, parser)
  File "/usr/lib/python3.12/xml/etree/ElementTree.py", line 558, in parse
    source = open(source, "rb")
             ^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'internal-secureflash.xml'
Traceback (most recent call last):
  File "....tegraflash/nvflashxmlparse", line 483, in <module>
    ret = main()
          ^^^^^^
  File "....tegraflash/nvflashxmlparse", line 426, in main
    layout = PartitionLayout(args.filename)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "....tegraflash/nvflashxmlparse", line 142, in __init__
    tree = ET.parse(configfile)
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/xml/etree/ElementTree.py", line 1204, in parse
    tree.parse(source, parser)
  File "/usr/lib/python3.12/xml/etree/ElementTree.py", line 569, in parse
    self._root = parser._parse_whole(source)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
xml.etree.ElementTree.ParseError: no element found: line 1, column 0
No partition definitions found in initrd-flash.xml
ERR: write failure to external storage at 2025-01-17T14:53:15+01:00

During this time, the target completes writing the bootloader part to QSPI, and proceeds booting into unchanged RootFS after I disconnect the USB cable.

I suppose, the reason for this kind of failure is that the command mv secureflash.xml internal-secureflash.xml is not executed due to early-exit from sign_binaries() in case of non-zero $PRESIGNED.

@ichergui
Copy link
Member

Which OS you are using ? Looks like you are not using the right wheel package

@akfaro
Copy link
Author

akfaro commented Jan 17, 2025

Which OS you are using ? Looks like you are not using the right wheel package

My OS is Ubuntu 24.04.1
wheel version seems to be 0.42.0-2:

apt list | grep wheel
                                                                                                                                                                                                                                            
python-wheel-common/noble 0.42.0-2 all
python3-wheel-whl/noble 0.42.0-2 all
python3-wheel/noble 0.42.0-2 all

@ichergui
Copy link
Member

Is it mandatory to use Ubuntu 24.04 ?

@akfaro
Copy link
Author

akfaro commented Jan 17, 2025

Is it mandatory to use Ubuntu 24.04 ?

I guess, I can switch to whatever you'd recommend, but I suspect that this is not really the root cause for the symptom: It looks to me that the symptoms can be tracked to places in the flashing scripts which are skipped specifically in the presigned case.

In case the versions in the build system are relevant: I forgot to specify above that I have different machines for building and flashing.
We run the Yocto builds (on the CI server and in developer WSL VM) with Docker containers based on Ubuntu 22.04.02, and containing wheel packages in version 0.37.1-2ubuntu0.22.04.1.

Since on the flashing machine with Ubuntu 24.04

  • the operation without secure-boot features worked fine
  • the image can be properly flashed if the secure-boot signing happens post-build,
  • and burning the fuses with tools in the nvidia-sdk worked fine...

I dare doubting that changing Ubuntu version would fix the symptoms.

@madisongh
Copy link
Member

@akfaro Yes, and thanks for opening a new issue for this. Getting initrd-flash to play nicely with pre-signed binaries needs more work.

@madisongh
Copy link
Member

Should be fixed now. I fused one of my Orins for secure boot so I could test the changes myself, and with the latest changes I was able to flash and boot successfully with both secured and non-secured devices.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants