-
Notifications
You must be signed in to change notification settings - Fork 403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Virtual Boot can easily brick the chip. #340
Comments
I don't think I had registered the seriousness of the problem. "Under some circumstances, an upload destroys the bootloader as well as the app, and you need to revert to ISP to recover" didn't sound awful. I will re-examine the issue. |
I, too, would be interested in helping fix this. I'm also stuck on determining the start/end address of the bootloader. Another challenge I'm unsure about: if my (very hazy and new) understanding of the protocol is correct, optiboot doesn't rely on the programmer telling it that finishing is done; it just lets the WDT cause a reset after an idle period. Since there's no reliable point at which optiboot knows for sure that programming is done, I'm not sure about when optiboot would perform the final write to the last byte in flash. |
OH! So is THAT how it communicates that it's done? Shit, no wonder I couldn't figure out how it told it to leave programming mode! |
It does detect "leave Progmode", at which point it shortens the WDT to run the applications "sooner":
|
I wonder whether almost everything could be fixed by using a program other than avrdude for the upload? |
Well to optimize the bootloader size, like... you to make it very tightly optimized to the minimal protocol used. But that is irrelevant for two reasons:
I don't see any other way around it. The only other popular bootloader already does it - it erases the whole non-bootloader flash area and the Edit - to be clear I hate avrdude, for dozens of reasons beyond the scope of this thread. But don't think I can pin this one on avrdude. It is a failure in the design of the STK500 protocol itself because it wasn't meant to program through a bootloader it was meant to control an external programmer. In any event, the to the extent that I can lay blame for the extent of the problem you cite, avrdude being written in C. is my top vote. Without instructions to make a working build environment, thus ensuring that only the High Priests of Arduino can make changes and actually distribute them without everyone yelling and screaming "my platform isn't supported", progress is greatly impeded |
Interesting discussion. I think avrdude has made a lot of progress here. With the deprecation of STK500v1 HW programmer (avrdudes/avrdude#1046), stk500.c in avrude git main is only for bootloaders and things like "Arduino as ISP". If necessary, we can also create a new programmer for optiboot if the existing |
The above does create a regression for optiboot_x and optiboot_dx. As mentioned by @stefanrueger, if you (@WestfW and @SpenceKonde) can specify what optiboot (and optiboot_x and optiboot_dx) needs, then the issue can be fixed. |
If you think Arduino guys are too slow to package good Arduino binaries, you can probably add your voice to the following avrdude issue. Personally I always try to build avrdude myself (and it is rather easy under Windows using MSYS2 mingw 32/64, Homebrew under macOS, and Linux), but I also understand why people would like to have an official binary from avrdude project. I can see that MegaCoreX from has moved to Arduino's packaging of avrdude 7.0. The users of MegaCoreX will potentially face an issue here (not yet fixed by Arduino) but the workaround is pretty simple.. Of course you can still wait until Arduino formally releases their packaging of avrdude 7.0. |
The issue has been fixed in avrdude git main. |
For avrdude 7.0 release, this should be good enough for production. You can also try avrdude git main snapshot (which will lead to 7.1 release). For those who want to be more on the cutting edge, you can try my snapshot here. |
I belive one potential solution is to go to urboot which has pretty solid vector bootloader support and @MCUdude and I were not able to break it. urboot: It needs to work together with |
@mcuee Thanks for advertising urboot. However, @SpenceKonde described the cause of the problem as | Vcc isn't settled or that the battery voltage plunges under the slight increase in current That isn't really something that the bootloader can do a lot about, or? It is true that when the reset vector no longer points to the bootloader that's when the bootloader bricks, but it does not matter where the application vector points to. (Only when all flash below the bootloader is 0xff does the reset vector not matter). Yes, although bootloaders can protect the reset vector (urboot does this automatically if there is space left, but also on explicit compile-time option), I see it more a responsibility of the uploader to ensure the reset vector keeps pointing to the bootloader. @WestfW seems to agree here, as he asked
AVRDUDE now has a dedicated bootloader programmer: Once v7.1 is released give When writing the AVRDUDE programmer Finally, I am unclear about in which situation a board with a bootloader can be borderline and dipping Vcc while the uploader host is a happy bunny? Over the air programming? Inaccessible devices? Does optibbot even do OTA? |
Fair enough.
It may be difficult for the device or the host to detect power/battery low case. Sometimes it is due to cost reasons that the user may not afford to have additional power fail detection circuitry. Sometimes the power fail detection may not be so reliable. Sometimes the contact may get loose. Sometimes it is due to not enough hold-up capacitors to bridge the short power interruption. So it can be all kind of reasons. Still I understand there is so much the bootloader firmware can do when there is HW issues like the above. What about a noisy line? I can see quite some issues here when people use RS-485 to have better noise immunity. As for OTA, it seems that people use bluetooth with optiboot. Does optiboot/urboot have retry mechanism against serial communication errors? How robust will it be? |
Is this a good reason to recommend HW-supported bootloaders against vector bootloader in use case like the above? Ref: |
| recommend HW-supported bootloaders against vector bootloader in use case like the above I suspect this happened on a part without hardware boot section support, so there was no choice. There are not many cases where it makes sense to use Optiboot's virtual boot feature on a part with hardware boot section support as this costs some 110 additional code bytes. Urboot bootloaders in contrast let |
So, the board might not have held enough charge in caps near the MCU's Vcc lines? Good board design has a suitable cap vvv close to every Vcc line of the chip. |
Maybe it is good to mention that in the documentation.
Indeed ATtinyCore has to use virtual bootloader as the core supports chips without the hardware boot section.
Nice one. |
Most of the properly designed board will have the necessary decoupling capacitors. I actually mean to say different thing -- hold-up capacitors against power interruption (eg: 10ms hold-up time is a common requirement, then plus x ms for the firmware upgrade). I tend to think this is not really a hard requirement in many product categories. But then it does create issues for situations like this. |
This conversation has made me think, it's better for the urboot bootloaders to erase flash from top to bottom. And guess what, saves two bytes code too. Have just created a PR over at urboot |
|
I belive I communicated this serious defect to @WestfW in passing, but I realized it's not listed here. This has caused intense pain for someone (cost me thousands of dollars by losing me a very lucrative client, and i'm sure it cost him much more than that, because he had thousands of vulnerable devices in the field, and he'd lose a substantial fraction if them every season.)
This issue keeps coming up in the context of ATTinyCore, whch despite it's age is still extremely popular. Almost every ATTinyCore part's optiboot is impacted by this bug. Almost everyone working with an impacted configuration has had this happen to them at least once. Had I know about this bug when I added the support for optiboot - I'm not sure I would have added it.
In a nutshell It is possible to end up in a situation where neither the app nor the bootloader is capable of functioning and the bootloader is no longer run nor is a working app present. This occurs when a programming command on the first page is initiated, but for some unknown reason, fails to complete. The problem is that Optiboot with Virtual boot proceeds like normal optiboot, but this is completely inappropriate behavior. Why?
Consider when an interruption happens (brownout, reset, etc). This will most likely happen where? In the first page, because there's the highest likelihood that Vcc isn't settled or that the battery voltage plunges under the slight increase in current. Or a bungled handlining of the start of the upload (ex, another twiddle of reset after the erase starts). The most likely corrupted page will be the first page, the one where the vectors are both stored. What happens when this occurs?
Now - thje bootloader will never run, because the vector pointing to it is gone. The app will no longer run, because the vector to it is likewise gone. Hence execution skids along empty flash until the end of the erased section and then starts executing code from the middle of the sketch, with unpredictable but never correct results. The board is bricked from the persective of someone who can't reburn bootloader easily.
There is a very simple solution, but somewhat tricky to implement.
The application vector must be loacated in the last word of the last page of application flash, not somewhere else in the first page or two.
Whenever the first erase command is sent, instead of that one page, insead, all pages not earmarked for the bootlootloader must ALL be erased in reverse order, so the first thing to go is the page with the application vector relocated to, and the first thing last thing is the page with the reset vector.
Then, we proceed to write pages normally, with the reset vector rewritten so it's pointing to start of bootloader, Finally at the end (last page if we completely filled the flash, otherwise skip a bunch of middle pages to get to the last page, then write a jmp or rjmp to the application vector in the last byte of that flash.
This is 100% safe. An interruption before flash is done erasing will merely break the application. Since the first page erased is the one with app vector, it would simply run the bootloader so a new sketch could be uploaded.
An interruption at the point of 100% complete erase of the flash would have nothing but empty flash untll it hit the bootloader, Execution would skid along the 0xFFFF instructions (which as I understand is interpreted as a SBRS r31,7 - skip if bt in register 31 is set). This does not modify r31, hence depending on the value in that register, either every instruction is executed, or alternating SBRS and skipped SBRS. are. Either way, we ae guaranteed that hen t hts the even numbered start of the boot loader, t will not skip it, because stating from 0, with a constant r31, it is impossible that any even address would be skipped and the bootrloader starts at such an address.
Finally, there is the case of interruption while writing the flash,. In this case, the first page with bootloader vector is intact but some portion of later pages are not and the application vector is not written. In this final case, there is also no problem,. The bootloader can be reached normally, but the application, not uploaded completely, cannot,because the vector will not have been written.
This method is used by micronucleus and is rock-solid stable there, such that they have no qualms about disabling reset and relying on being able to update the bootloader with code that takes command of the reset vector itself, erases the bootloader, copies it's payload (new bootloadeer) in it's place, and then deletes itself.
So why haven't I fixed it?
In short, because I just can't figure out how to know when it's time to find the last page, what the last page is, and how to write to only those two (or 4) bytes. The version of the ATtiny Optimized codebase I am starting from is here: https://github.com/SpenceKonde/ATTinyCore/blob/v2.0.0-dev/avr/bootloaders/optiboot/source/optiboot.c - nothing terribly notable (do note the 2.0.0-dev branch. Everything in master is getting flushed in not very long)
Thanks
This is something I'd love to have a solution for - I don't even need someone else to do it for me, just to outline how to approach the task - right now the gulf between what I uinderstand and what is needed for this is such that I can't see it from my vantage point.. I would love to be able to drop my warnings against use of optiboot in critical use cases.
The text was updated successfully, but these errors were encountered: