-
Notifications
You must be signed in to change notification settings - Fork 13.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data written to EEPROM gets randomly wiped out (rewritten as 0xff) after reset, wake from sleep, or power off/on #9047
Comments
Can you try these changes to EEPROM source? Enable verbose output in the IDE preferences menu, and then read the build log to find out where our Core and EEPROM lib files are located
Then, remove diff --git a/libraries/EEPROM/EEPROM.cpp b/libraries/EEPROM/EEPROM.cpp
index e193237d..2f361ac7 100644
--- a/libraries/EEPROM/EEPROM.cpp
+++ b/libraries/EEPROM/EEPROM.cpp
@@ -65,7 +65,7 @@ void EEPROMClass::begin(size_t size) {
_size = size;
- if (!ESP.flashRead(_sector * SPI_FLASH_SEC_SIZE, reinterpret_cast<uint32_t*>(_data), _size)) {
+ if (!ESP.flashRead(_sector * SPI_FLASH_SEC_SIZE, _data, _size)) {
DEBUGV("EEPROMClass::begin flash read failed\n");
}
@@ -132,7 +132,7 @@ bool EEPROMClass::commit() {
return false;
if (ESP.flashEraseSector(_sector)) {
- if (ESP.flashWrite(_sector * SPI_FLASH_SEC_SIZE, reinterpret_cast<uint32_t*>(_data), _size)) {
+ if (ESP.flashWrite(_sector * SPI_FLASH_SEC_SIZE, _data, _size)) {
_dirty = false;
return true;
} |
I need a way to reliably test whether or not the proposed fix (once I apply it) is working. That is, to reproduce the issue; either at will or with a reasonable likelihood so that I can repeat the test many many times and see if it ever fails, first without the fix and then with the fix. Doing that with my current real-life sketch many times and wait for the issue to happen is too time-consuming. So, I need a test that I can rapidly run hundreds or thousands of times and see if it ever fails. Or better yet would be, if possible at all, a test that would provoke the issue systematically. Any ideas on that? I've tried with a very minimal sketch but it's never triggering the issue in the first place. I guess if the program is too trivial, it'll never cause the kind of "unexpected" alignment of data in memory that supposedly causes the issue (assuming the issue is what you think it is). |
The only way I've been able to kind-of-reproduce the issue at will has been: with a minimal sketch basically like the example code I posted above (except I write and read a bunch of bytes instead of one), I reset several times in very, very rapid succession. But I guess that's because I manage to reset right while the EEPROM is being written, and that's definitely not what happens when I observe the issue spontaneously in real life. |
Guess above is about sizing, mostly. You'll have to at least share your EEPROM class setup, not a random sketch that does nothing for our tests :/
Note that eeprom class has two steps - erase sector, then write. Mayhaps you reset some time between these two operations. |
What is EEPROM class size? What setup do you need? The code I posted is literally how I read from and write to the EEPROM, except I do that with several bytes (one at a time) at several positions. There's no "new" involved in my code. Whatever EEPROM does internally, I don't know.
Maybe when I tried the trivial read-and-write test and I reset multiple times very quickly, yes. I don't reset at all during execution. My code runs and goes to sleep. Then it either wakes up when the timer goes off or I wake it up manually with RTS, and then when the first value is read from EEPROM, sometimes the value 0xff is found instead of what was supposed to be written. In order to test the changes you suggested to the core (removing the reinterpret_cast), I need a suggestion on how to trigger the issue with a higher probability (based on whatever your guess is about what might be causing the issue). If I do the changes and just try it with my real code, I will need to run it for a looooong time before I can be remotely confident that it's working. |
Whether you construct EEPROMClass manually, address becomes different. Don't know if you do that. |
I don't. I use the global instance
I don't. I call The code I posted is LITERALLY all I do in my real code with
The first read I do is at address 0, and if that's not the expected value (which is a "magic character", namely literally 'W'), I stop and don't read anything else, so I don't know whether or not the issue happens at other addresses as well. I will retry the "trivial test" with address 0, and I'll also add more debugging to my real code to see whether, when the byte at address 0 is erased, other stuff also is. |
I've been able to watch more closely, and it turns out, when the issue happens, there's an unexpected reset very shortly after the normal reset. This might be a hardware issue after all. Context: I have a "reset" button that when pressed, connects the RST pin to GND causing a reset. So here's what happens when the issue happens (which again, is randomly and sporadically):
This by itself still doesn't explain why the EEPROM bytes would be erased. The unexpected reset is NOT happening during write/commit, nor during |
The unexpected reset happens very shortly after the initial reset from a push button. |
Turns out I was wrong about that! It may be happening during a write/commit.
Apparently, long enough to execute at least all of this (plus whatever comes before the
and possibly part of this but no more:
so, it could be during the commit. 🤦
I think that's unlikely. I tried quickly pressing the button twice and I'm very easily able to push it the second time before it gets to print "Setup" as per the code above. I don't think an electrical bouncing can be longer than a human pressing a button twice (a relatively big plastic mechanical button, like 5mm thick). However, a poorly soldered connection in the button is a possibility. 🤦 |
If I used SPIFFS instead of EEPROM, would that be safe against a reset/power-off in the middle of writing? By "safe" I mean you end up with either the new data completely written or the old data still intact, but not with corrupt data or all 0xff. |
Hi, did you find a solution yet ? I have the exact same problem and I can't figure why this happens and how to fix it. |
Yes, I use My issue was that there sometimes is a sudden reset very soon after startup, whose root cause I've never figured out for sure but it could be hardware and it could be just a (unusually late) bounce from the reset button. I haven't fixed the reset itself, but as long as the reset only happens within a limited, short interval of time from startup, the only problem it causes for me is the flash data loss, so EEPROM_Rotate avoids that (and other problems). Their readme explains how. |
Basic Infos
Platform
Settings in IDE
Problem Description
My code writes to the (fake?) EEPROM with
EEPROM.write()
andEEPROM.commit()
and reads withEEPROM.read()
the data that has been written in previous "sessions".Sometimes, utterly randomly, after waking up from sleep, when my sketch reads a byte from a given position, instead of the value that it had previously written, it finds a
0xff
.My code had been running for literally years without issues on a dozen identical boards.
Recently I re-compiled and re-uploaded with the last version of the Arduino core on a new unit of the same identical board. I made no changes to the relevant part of the code, but the version compiled with the latest core has this issue.
Note that I'm observing the issue on a brand new chip, so this is not the Flash memory getting damaged by too many write cycles. Also, the old boards that are still running the code that was compiled with the old core, are still having no issue (they haven't got anywhere near 10k write cycles).
MCVE Sketch
NOTE: I cannot share the original sketch. I haven't written and tested a minimal sketch. The issue is hard to reproduce at will because it occurs randomly, but it happens often. I'm writing a minimal code example just to explain the issue, but I have not run and tested the code below; consider it as explanatory pseudo-code, it may very well contain mistakes.
Debug Messages
The text was updated successfully, but these errors were encountered: