Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initialization functions in constructors are causing undefined behaviour, crashes ESP32 #16

Open
KenWillmott opened this issue Sep 7, 2021 · 30 comments
Assignees
Labels
bug Something isn't working

Comments

@KenWillmott
Copy link

When I upload TM1637 example sketches to the ESP32 Devkit V1, the program crashes with a watchdog reset. I already found the cause, and the fix. The problem is, the fix requires a big change to several example and driver files.

It's a simple issue, actually. In some of the constructors of inherited classes like TM1637, there is initialization code that uses 'micros()'. In the Arduino environment, you can not safely call system functions from a constructor, because 'init()' which sets up functions like that, does not run until just before setup(). The reason it runs on the AVR and not on the ESP, is because the behaviour of such functions is "undefined" and so can be processor dependent.

There are two global changes that have to be applied:

  1. All class constructors must be modified to remove all display initialization code.
  2. All example sketches must be modified to call setupDisplay() in setup() before any other display code is run, instead of depending on the constructor to do it.

I have already forked the library and tested the change locally, it is the only way to use the ESP32 (and possibly the ESP8266, not tested yet) with this display library. I'm able and willing to write the changes, but I thought I should invite some discussion first, as more than one file needs updating.

@maxint-rd maxint-rd self-assigned this Sep 8, 2021
@maxint-rd maxint-rd added the bug Something isn't working label Sep 8, 2021
@maxint-rd
Copy link
Owner

Hello @KenWillmott , thank you for opening this issue. In the past year I mainly used Atmel processors (and the LGT8F328P). Although I have tested this library on ESP8266, it was a while ago and subsequent changes may have introduced the issue you encountered. BTW. the ESP32 has also been used, but that was even longer ago, so I don't know quite which specific display was used then. It could very well be that at that time the ESP core that I used behaved differently than what you have used now.

I think your findings are valuable and your suggestion sound like the way to go. I think I should perform some more thorough testing (especially with ESP32), but I don't know yet when I can find the time. As you mention, the suggested change involves each display specific library file as well as the examples. I always try to test any change I publish. As this would involve many display modules, you can imagine that it will take some time. Thank you for offering to help. I may need it!

I'm not quite sure how I would like to address this overhaul. My first thought is to make the changes myself and ask you to assist in testing. Do you only have that TM1637 or do you also have other TM16xx displays?
The current projects I have laying on my desk are a TM1638 10x8 led-matrix and a TM1650 4x7 segment display (both with buttons), so a good first step for me could be to reproduce your issue on an ESP8266 and on ESP32.

@KenWillmott
Copy link
Author

KenWillmott commented Sep 8, 2021

That's great. I agree about the testing. May I suggest, since I already performed and tested one change for the TM1637, that I update my branch and open a pull request for only that one instance? Then you could evaluate and test it on your end. For that one, here the test is based on a ESP32 DevKit V1 board (ESP-WROOM-32 IC) connected to a Chinese "DM" TM1637 4 digit with colon display. I downloaded the latest ESP32 core from Espressif.

I think I may have overstated the scale of the changes. I did only a brief survey of the files with this issue in view, I need to spend longer, but it is my impression that only a few classes have the problem. I will make a more careful survey this week.

For testing, I have AVR based boards including Mega2560, also 32U4 based, Due, ESP32, ESP8266, and STM32F1, F3, F4, H7. For displays, I have TM1637 4 and 6 digit version modules, TM1638 8 digit with 8 switches. Also I have a rail of TM1636 DIP IC's that I haven't tested. I haven't seen any TM1650 or others on AliExpress where I mostly find modules, but I will look again. Those would take a month to reach me, though.

Edit - I ordered TM1636, TM1650 modules, and TM1668 IC's for testing.

@maxint-rd
Copy link
Owner

maxint-rd commented Sep 8, 2021

Hi Ken, thank you for your contributions. Go ahead to open the pull request so I can see the precise changes you suggest. I won't immediately commit your pull to the master tree without further work.

From reading your suggestions, the change would involve changes to the base class, to all chip specific inherited classes and to all examples. As the change would break existing code that doesn't call setupDisplay(), I'm thinking of using ifdefs to make it only required on platforms that require this specific behavior. I'm curious to see how the ESP32 behaves in my current setup. My ESP32 board is quite old, probably also an early revision.

I don't have a 32U4, most STM's you mention ora TM1636, so testing those would be quite interesting.
For the TM1650 I designed this TM1650 4-digit display module and purchased the chips via Ali. I also made some other modules. I submitted all of them to the public domain and added links to the project pages on JLCPCB, where you can order them.

While ordering: may I also suggest the TM1640? I find that chip great fun to work with for LED matrices and there are some modules available on Ali based on the TM1640.

Edit: on my YouTube channel you can find various videos about the TM16xx chips, including this video and this video featuring the TM1640. Note that the first link shows a D1 mini -based project using the TM1640. The project in this video is also D1 mini based and includes a TM1637, so somewhere in the past I did manage to have the library working with an ESP8266.

@KenWillmott
Copy link
Author

KenWillmott commented Sep 8, 2021

No need to thank me, thank you for considering my input. Today, I went back and repeated my work in order to discover the absolute minimum changes. In the process, I found another issue, already flagged in the code - concerning the implementation of min() and max(). More about that later. With new as well as existing architectures, there is a problem of evolving tool chains, especially C++ compilers. So it is better to steer away from tool chain specific accommodations, rather better to attempt to choose methods that are more universal to most tool chains.

To address the specific issue of initialization, I think the really best approach is to implement a 'begin()' method, which calls 'clearDisplay()' and 'setupDisplay()'. It's more explicit and intuitive than calling 'setupDisplay()'. Actually, many device libraries use this paradigm, so Arduino programmers are quite used to it. It also provides default values for 'setupDisplay()' so the user doesn't actually have to know anything about the parameters that are passed to it. It becomes an optional method.

It may be convenient to perform initialization in the constructor, but then you are faced with conditional compilation as you mentioned. That may leave an opportunity for problems with future tool chains or processors. I suggest removing initialization from the constructor, and allowing the 'begin()' method to do it in setup(), for all sub-classes.

I don't mean to overlook your concern about existing code. But, I ask, "would it really be broken?". It would depend on the default state of the display IC at power up time. I suppose some experiments could be run to witness the effect of not calling 'setupDisplay()' prior to using other functions. If the power up state is "on" with "intensity=7" and the internal RAM cleared, then the first 'setupDisplay()' call in the constructor, actually doesn't have any effect. Anyway, I don't believe someone who has already written a sketch, and updates the library, would have a real problem with the change if it's well documented.

I guess you can see from such verbosity, that I have a strong opinion on the matter. Perhaps it is too soon to conclude definitely one way or the other. It's also possible that both approaches could be compatible, that there could be a 'begin()' method and also conditional compilation in the constructor for architectures that are already proven to work. I don't see any basic incompatibility there. To be perfectly honest, I don't really know how to implement such conditional compilation, beyond the basics. I believe there are some variables that reveal to the compiler, which processor is in use. You must know more about that than I do.

Unfortunately, in the case of the ESP32, I am faced with both the initialization problem and also the min() macro problem. So I guess there should be two pull requests, one for each. The min() macro is only used to limit the range of a brightness parameter, but has a really deep problem - the compiler has already defined min() and max() as functions, not macros. Therefore, an '#if defined' block will not work, and the compilation fails with "multiple definition" errors. When the macros are disabled, the functions fail because of mismatched parameter types. It's just not worth using type casts to force it to work.

Probably, someone knows a really elegant way to fix that - I don't, but in my opinion it would not be harmful to simply use bit masking instead of the min() macro/function. It's only used in TM16xx.cpp and in TM1650.cpp, so if it's changed in those places, the issue will never come up again. That's another thing, the ESP32 compiler is choking on things in classes that the user has not even referenced - because the errors related to TM1650.cpp came up even though I only included "TM1637.h".

Today, I notated and tested all the changes locally, but I can see that they need to be cleaned up more before an upload to the forked code base. At least, it does fully work with my NTP driven clock sketch.

Considering the testing, I would like to include headings in every example sketch, defining the pins used with each different processor or board. Not only will it help keep track of what has been tested, but a user can more easily adapt the sketch to the hardware on hand, by simply commenting or uncommenting the processor/board in use. Because, by the time I've tested everything with my menagerie of boards, there will be a lot of entries there.

@KenWillmott
Copy link
Author

Update - I opened pull request #17 on this project, to allow compilation with ESP32. It is a prerequisite for any testing because nothing will compile without it. This morning, I found an interesting result, after compiling a test program for my TM1638 "LED-KEY" board. Although an instantiation of TM1637 previously crashed the ESP, the unmodified TM1638 code in this library runs without crashing. This means that the other problem, of uninitialized functions called in the constructor, is intermittent.

So, I suggest, apply the pull or download my fork, which has no changes apart from the same pull request. You can use it to test various display drivers with ESP32.

I watched some of your videos, nice work. I am also fond of the mini-matrix displays. I tried to build one myself since there are none on the market. My 16x8 worked but when placed alongside another one, there was beat interference. I never did solve that one, so I gave up. I have other custom wired displays, but none as difficult as yours.

@maxint-rd
Copy link
Owner

Hi @KenWillmott , I'm sorry I didn't have time to spend on this issue yet. Hopefully I can find some time later this week to get into this. I wonder what causes the intermittent behavior. I intend to first try to see if I can reproduce it with the versions I currently have installed, and then retest with the most recent versions of Arduino IDE and ESP8266/ESP32 cores.
Besides all the testing I want to take an effort to clean up the examples. For some basic stuff I now refer to the original library examples, but this library has evolved far enough to warrant independent examples showing the basic features (and specifics).,

Good to read you liked my videos. I found the 8x8 mini-matrix displays on Ali and designed my own PCB to make 16x8 modules. The modified Brian Lough clock uses three 16x8 TM1640 mini matrix modules using my library. I have had it running continuously for the past few years and I haven't seen any interference.

For my current project I've made a custom 10x8 matrix on perfboard, driven by a TM1638. I found that green double sided 7x9 board has just the right number of holes ( (26x31) to put the leds diagonally with transformer wire on the back. Soldering that board took quite some patience, but it wasn't too difficult.

@KenWillmott
Copy link
Author

KenWillmott commented Sep 15, 2021

Some updates for you - I'm convinced that the ESP compiler behaviour is incorrect. A numeric literal like '7' is supposed to be represented by the smallest available type. That is an unsigned char, or "byte". Your comparison with min() is then valid, the compiler is wrong. But a workaround of some kind is needed to compile. I suppose this problem, if it is true, should be submitted as a bug report of the compiler. It is not the first time the "helpful" macros provided by Arduino, began to create problems as compilers changed.

I also tested the TM1638, as I told you. But, I observed display glitches every hour or so. Then I went to the data sheet. 3.3V operation is not indicated. The TM1637 specification is not perfectly clear, but lists 5V as "typical" instead of minimum or maximum. That is why I began running the modules directly from the ESP32 3.3V supply, and eliminates the need for level translators. So while the 1638 may work with the ESP, I need to try it again with level translation and running on 5V to see if it helps.

I did notice that there are no single TM1638 examples. It wasn't very hard to produce one from the TM1637+TM1638 example, though... also there is an opportunity (although perhaps not a need) to support specific commercial modules. I haven't mentioned it yet, but I have integrated some TM1637 6 digit modules from Robodyn. They have mixed up the digit locations, so I had to over-ride a method to re-arrange the digits. I can show you all that later, but let's see what you can discover.

By the way, my Arduino core is a little out of date because I use the Mint Linux distribution, I'm too lazy to attempt an install of a more recent version. But I don't think it should really make much difference - at least not as much as the Espressif code.

EDIT - the min() function works when the redefinition
#if !defined(max) // MMOLE 180325: // min, max are no macro in ESP core 2.3.9 libraries, see https://github.com/esp8266/Arduino/issues/398 #define min(a,b) ((a)<(b)?(a):(b)) #define max(a,b) ((a)>(b)?(a):(b)) #endif
is commented out, and the parameter type of 'intensity' is changed from byte to int:
void TM16xx::setupDisplay(boolean active, int intensity) { sendCommand(TM16XX_CMD_DISPLAY | (active ? 8 : 0) | min(7, intensity)); }

@maxint-rd
Copy link
Owner

maxint-rd commented Sep 19, 2021

Hi Ken,

Today I performed some tests on a fresh ESP8266 (model ESP-12F). I installed the current version of the TM16xx library in the latest version of the Arduino IDE (1.8.16) and the ESP8266 core from this boardmanager:
https://arduino.esp8266.com/stable/package_esp8266com_index.json

I tested the TM16xxDisplay_Print example with a common TM1637 4-digit LED display using DIO=GPIO0 and CLK=GPIO4. That example has no explicit call to the setupDisplay() method.
In the IDE I kept most options to their default value. It lists Espressif FW nonos-SDK 2.2,1+100 (190703).
In that setup I could not reproduce any crashes and compilation didn't fail on the min()/max(). I will do a bit more testing, but based on my earlier experience with the ESP8266 I assume that my library is still compatible with the current ESP8266 core.
Edit: FYI: the TM1637 module ran on the 5V pin of my module, without level-shifters. The ESP used the onboard 3v3 regulator.

BTW. I still need to test the ESP32. As I have not used that for a long while, I first need to find a board I can use and then I intend to also do a fresh install of the most recent versions.

@maxint-rd
Copy link
Owner

maxint-rd commented Sep 21, 2021

Hello @KenWillmott ,

Just now I did some testing on my 2019 TinyPICO ESP32 board (model ESP32-PICO-D4 Rev 1, 2 cores), using the same (Windows) Arduino IDE (1.8.16) and the ESP32 core for TinyPICO using this board manager URL:
https://dl.espressif.com/dl/package_esp32_index.json

I tested the same TM16xxDisplay_Print example with the same TM1637 4-digit LED display using DIO=GPIO4 and CLK=GPIO14, powered by the 3v3 line. Upon compilation the console mentions it's using packages\esp32\hardware \esp32\1.0.6

In that setup I also could not reproduce any crashes and compilation didn't fail on the min()/max().
I'll do some more testing and let you know if I find any issues...

EDIT: During some further testing with my TinyPICO I saw that its helper library (to use the onboard DotStar RGB LED) also has some initialization in its constructor:
https://github.com/tinypico/tinypico-arduino/tree/master/TinyPICO-Helper
It includes a call to millis() as well as calls to some other system functions. I wonder if those could also cause crashes, but at the moment I have not been able to reproduce any crashes yet.

BTW, that library also implements a way to avoid potential initialization problems which I intended to try out too. Using an init flag it delays some initialization to execute until the first call to the show() method. Similarly I could call a begin() function at the first call to setSegments() or something similar. That way existing code would not need to explicitly call begin().
At the moment I'm still testing and trying to reproduce your issues. Next on my list is to test the TM1638 and to add a minimal example that shows calling base library methods and includes some examples for various chips/MCUs.

@KenWillmott
Copy link
Author

Great. It's possible that the difference is due to the fact that my platform is Linux. That may have resulted in a different tool chain being used. I'm assuming you are using Apple OS or Windows 10. If it works there, it makes my problem very rare. Who knows, it might even disappear with subsequent releases.

It is difficult to find information about the constructor issue. But this thread does have some comments on that:
https://arduino.stackexchange.com/questions/30233/compiles-but-doesnt-print-anything/30234#30234
The TinyPICO constructor does call millis(), but only to obtain one value. Just as is the case in an ISR, it is not used to implement any delays, but only once, to capture a value. I suspect that it is fortuitous that it returns a useful value before init() is called. The postponement method you describe is interesting, I've never seen that before.

I think the most useful thing I can do while you are investigating, is to explore the STM32 series, so I will begin on that.

@maxint-rd
Copy link
Owner

Hi Ken, I'm still using windows 7 (eeeks!...). I do have one laptop running windows 10, but it's not my favorite. I use linux on servers and the Raspberry, and still intend to try Linux on the desktop. So far that never got high enough attention though...

Today I tested the TM1638QYF module on the ESP32 and encountered some issues. At first my TM16xxDisplay_Print setup remained dark. That was due to the setupDisplay() method not being called in the constructor of that driver.

Then I went on to test the buttons using the TM16xxButtons_clicks example and got some erroneous readings. After lengthy debugging I found that the ESP32 (@240MHz) ran too fast. The CLK line exceeded 1.6 MHz and that was too much. So I took a closer look at the datasheet and indeed... It lists a maximum clockspeed of 1MHz. After adding a tiny delay it worked fine and the button events were successfully recognized. I intend to repeat that test on the ESP8266 and most likely will put a suitable delay in the base class. I also want to do a bit more ESP32 testing to see if more issues arise.

I also have some changes waiting for release to improve support for >8 segments (e.g. the TM1638 supports 10x8).
I needed that for a led-matrix project, but also have some 13 segment LED modules which I haven't tried out yet.

I'm looking forward to read your findings on the STM32 MCU in the Arduino environment. I have bought a blue-pill module some years ago, but have not tried that one either...

@maxint-rd
Copy link
Owner

Hello Ken, yesterday and today I was testing the TM1638QYF module and struggled with an exception that occurred only on the ESP8266 and only in the TM1638 QYF driver. The ESP would crash at the very first call to display some segments.
After isolating the issue and installing the Exception Decoder I found out it had to do with byte alignement of the 64-bit buffer that is used to transpose the common anode segments. Today I published a fix for this issue, as well as a fix for the problem I had due to the high speed of the ESP32 when reading key presses.

A first question: have you installed the Exception Decoder and what does it say when your ESP32 crashes? Is the stacktrace useful?

While digging into the code, I kept thinking of an easy way to implement a fix to your initialization issue (which I still can't reproduce). I noticed that the activateDisplay parameter of the constructor could be a used for an easy intermediate improvement. If that parameter is set to false, the user intends to activate the display at a later stage and no clearing or setup of the display is required. What do you think about using that parameter? Simple examples can then be kept at minimal code, with the notice that

@KenWillmott
Copy link
Author

No, I haven't installed the Exception Decoder. I am quite new to the ESP32 environment, and when things fail, I almost always go back to the source and documentation to troubleshoot (as well as implementing in program debug steps).

I considered the initialization issue again. Disregarding for a moment the importance of it and only looking at impacts... at the moment the TM1637_extended example sketch, and I think others, do not depend on the constructor to initialize the display, but do so explicitly by calling setupDisplay() and clearDisplay() in the sketch setup() or loop().

I didn't consider this before, but doesn't that mean that if legacy users followed the pattern of the example sketches, removing initialization from the constructor would actually have no effect? I think it would only create a mystery if the user started from scratch with a sketch that doesn't include those. I think it's reasonable to expect users to follow the design patterns in the example sketches - they are, after all, test sketches as well. So they can be a baseline reference if they have a problem somewhere.

It is both good and bad that you can't reproduce the initialization issue. Good because it shows that the issue has no immediate impact. Bad because it creates a false sense of security. The problem itself is not well known because 99.999% of C++ programs run in an OS that has performed all the necessary hardware and system initialization a long time before any user program is called. A big part of the embedded programming world is more C focused and has no constructors because there are no classes. I'm just trying to explain why it's not better documented and known. It's better to understand and acknowledge the mechanism by which it works, and the nature of the assertions that guarantee correct operation on any platform, than it is to observe and judge the end results experimentally. That is because, it is impossible to perform an experiment that is guaranteed to trigger the undesired behaviour in all circumstances.

For example, it's possible that many of the compilers we're using, are actually calling init() after the constructors are called. In that case, all legitimate code would function 100% perfectly in the constructor body. However, this is not guaranteed by the language specifications so it is possible (and I have read some anecdotes of actual instances) that only some, or none of the initialization code has run. The crux of the issue, is not that it works or doesn't work, it's that it is unpredictable.

While I consider it quite important, as I pointed out at the beginning of this message, the initialization inside the constructor may not actually be necessary even for legacy code patterned after the examples you've created.

I think that your idea of making activateDisplay=false is a good one, but when compared with typical example sketches, actually would not make any difference. Again, because the examples don't make the assumption of activateDisplay=true, instead they call setupDisplay() explicitly.

To summarize that point, I doubt that removing the constructor initialization would have any real impact. Doing so is a good idea because it future-proofs the code against what is a known general problem with class initialization in stand alone embedded code such as this. Thus I don't see that efforts to accommodate it somehow, are really valuable.

Regarding testing. It's my fault, really, that I haven't fully explored the problem. I don't even have a list of the exact environment, versions, etc. Also I really need to try the same thing on Windows 10. On that note, since you say you have Windows 7, it's really important that someone try it. Here, I do have some W10 machines but they got caught in a flurry of pandemic shuffling and got offline. I plan to set one of them up, both for the ESP32 testing, and also for the STM32 testing that I plan to do. To be honest, things are kind of a mess over here, and also due to time constraints, it will take me a while to set up.

Here is a quick roadmap of my intentions:

  • TM1650 4 digit display is arriving soon. I've never had one before, so I will test it.
  • Set up W10 machine and try to compile the failed sketch there and see if it's different (I suspect the answer is yes)
  • build a 74HCT244 level converter board and test TM1638 with ESP32 and STM32 (Blue Pill first)

@maxint-rd
Copy link
Owner

maxint-rd commented Sep 28, 2021

Hi Ken, thank you for your elaborate response, Due to lack of time I have to keep this response short (Edit: it got longer... ;-).

  • Thanks for your explanation. I think I follow your line of thought and see your points of attention. As my spare time is limited I like to prioritize on issues that I can reproduce and that have the highest impact. Without reproduction I cannot properly test.
  • I've tested the TM1638 without any level conversion on both ESP32 (5V and 3v3 power to module) and on ESP8266 (5V) without level converters. In the early days it was unclear, but I recall statements about ESP8266 data pins being 5v tolerant and I thought ESP32 was 5v tolerant as well.
  • I use the portable installation of the Arduino IDE, allowing me to have multiple versions on my harddisk. Making it portable is easy: just create a directory called "Portable" below the directory that has arduino.exe. After that all sketches and libraries will be installed in that directory.

Edit - a few more remarks:

  • While testing the ESP32/ESP8266 I found some other issues, resulting in some updates in the past days. Thank you for initiating my retests! (I also had a bricked ESP12-F module which miraculously revived after using the latest core)
  • I'm still looking into the initialization issues. I'd like this library to be compatible with all Arduino platforms I have available. Although the library may assume some AVR specific behavior from historic legacy, this should be corrected when incompatibility is shown. I'm very interested to your findings with the STM32. Hopefully you can find some time to test using that MCU!
    BTW. I did find these decade old possibly relevant topic1 and topic2 on the Arduino forum.
    There may be different behavior when using delayMicroseconds() vs. delay(), as the first one doen't rely on interrupts to count the time.
  • There are some more interesting modules using TM16xx chips. Today I tried to buy the 16 digit TM1640 display module on Ali, but could only find an old listing. I did order a 4 character 16-segment display using the HT16K33 which I heard is compatible with the TM1640. I do have some of those bare LED components on shelf, and still want to make a similar module on perfboard. There is also a variety of 16x8 and 8x8 LED matrix boards available.
  • I already retested two different TM1638 modules, a TM1628, my TM1650 module and some TM1637 modules. when I had problems reading buttons, I used the logical analyzer to assess and fix timing issues.

And a final note:

  • Your first remark in this issue is "When I upload TM1637 example sketches to the ESP32 Devkit V1, the program crashes with a watchdog reset." This reminds me of some similar experience I had back in the days with the ESP8266. My program did too much without calling yield(). Could the reason for your crash be related to this? Was your sketch doing anything else than mine? Perhaps in your environment the core takes more time to initialize prior to first constructor, leaving less time for initialization?
    If so: a potential - albeit perhaps quite ugly - workaround might be to first call yield() in the constructor...

@KenWillmott
Copy link
Author

KenWillmott commented Sep 30, 2021

Okay, there is getting to be a lot going on here. Should I open another topic to only focus on testing?

I just finished a successful test of the TM1637 with the STMF103C8T6 on a "Blue Pill" board. Environment:

  • OS - Windows 10 Home version 21H1
  • pins - 3.3V, gnd, PB8->DIO, PB9->CLK
  • display: DM 6 digit TM1637 (basically Robodyn clone)
  • IDE: Arduino 1.8.12
  • core: stm32duino core 2.1.0
  • test program: TM1637_extended

Issues:
First compile failed because AdafruitGFX.h reported missing. So I installed Adafruit GFX. Then it failed because Adafruit_I2CDevice.h not found. Installed Adafruit_BusIO to fix that and it worked.

Hmmm... actually this test sketch doesn't use any GFX functions (I think). Why is it complaining about this?

@maxint-rd
Copy link
Owner

maxint-rd commented Sep 30, 2021

Hi Ken,

Thank you for hanging on and for giving such clear responses.
If you want you can open another issue about testing the STM environment.
With regards to the Adafruit_GFX dependency; this is a known issue that has been reported some time ago. In short: It's only used for LED matrix modules, but the Arduino IDE isn't aware of that and first compiles the entire library.

I haven't been able to make that section conditional in an easy way. Unless I find a better way to deal with dependencies, I'll probably separate this part and make it a separate library. In fact, I already did that in my ESP32 installation. Easiest way to continue testing without the Adafruit library is to remove the TM166xxMatrixGFX files (both .cpp and .h). The library will compile fine without them.

Final note (unrelated): while working on some TM1650 modules, I implemented a way to combine multiple modules into one. With only sleight modifications and sharing some pins, I now have the TM16xxDisplay_ScrolledPrint example running a ticker tape (on an ESP12-F), that starts on a 8-digit TM1638 and continues on two 4-digit TM1650's. After cleanup and adding an example I intend to publish this new version within a week or so...

@KenWillmott
Copy link
Author

Today, I finally managed to set up a crucial test - compiling a test sketch for the ESP32 with TM1650 and TM1637 but on a Windows 10 platform instead of Mint Linux. I used your latest build and today's ESP32 core direct from Espressif.

The results mirror exactly the behaviour mentioned before, using Mint Linux. With the TM1650, the sketch compiles and runs perfectly. With the TM1650, the ESP32 exhibits a watchdog timeout as indicated by its debug messages.

The sketches and drivers are the same ones that have also been tested and run perfectly with no modifications when using the AVR and STM32 cores.

The main difference between the TM1650 and TM1637 extensions to the TM16xx class, is the use of initialization within the constructor. Previously, I proved this by simply changing that in the TM1637 sub class, which caused it to work. I didn't do that in this case because I already have pretty good evidence for the cause. This just rules out the Linux tool chain.

The problem could be considered active, as many users will be compiling using the ESP32 core in the Windows OS environment.

@maxint-rd
Copy link
Owner

maxint-rd commented Oct 24, 2021

Hi Ken, I consider that good news! So far I have not been able to reproduce this issue. Can I assume you used your test-sketch for this?

I have an ESP32 with the most recent core and TM1650 ready on my desk and I'm anxious to reproduce your issue in order to fix this properly. I also have a Windows 10 laptop to which I can easily perform a fresh installation of the Arduino IDE.
In the past days I didn't have time to look at your sketch, but should have some spare time during the coming week).
{I will keep this issue open as long as needed).

@KenWillmott
Copy link
Author

KenWillmott commented Oct 24, 2021

Yes, I did use my test sketch. I did copy freely from your sketches, with a few modifications and additions. However, you will find that the TM1650 sketches (or my test sketch configured for TM1650) compile and work perfectly. It's the TM1637 sketches that crash the ESP32. I think you can use the existing example sketches and you will discover the same results.

Edit - later in the day, I tried running a clock sketch on the TM1650 base, on the ESP32 DevKit. But this time, from Linux. That produced a crash, and it ran only when I removed the intitialization from the TM1650 subclass. So what I said about the subclass isn't right. I guess more like luck that TM1650 ran when compiled in W10 vs. Linux. At least, that is my interpretation. But no doubt, it's easier for you to do your own investigations.

@maxint-rd
Copy link
Owner

maxint-rd commented Oct 26, 2021

Hello Ken,

I really appreciate your efforts in making the test sketch and doing the testing. Today I'm doing some ESP32 testing using your TM1637_test_7segment_generic sketch and I will use this post to share my findings (to be updated as results poor in):

  1. Setup: Windows 7, Arduino IDE v1.8.16, TinyPico ESP32, core esp32\1.0.6, Mx TM1650 4 digits, 3 buttons on DIO=15, CLK=27, LED=25). Result: sketch works fine, measured speed 222. I restarted the MCU quite a few times, but no crash whatsoever,
  2. Setup: Windows 7, Arduino IDE v1.8.16, TinyPico ESP32, core esp32\1.0.6, Noname TM1637 4 digits, no buttons on DIO=15, CLK=27, LED=25). Result: sketch works fine, measured speed 355. I restarted the MCU a few times, but no crash,
  3. Setup: Windows 7, Arduino IDE v1.8.16, TinyPico ESP32, core esp32\1.0.6, TM1638 LED & KEY 8-digit module 8 buttons on DIO=15, CLK=27, STB=26, LED=25). Result: sketch works, measured speed 153 (l printed on pos 5 due to incorrect sprintf formatting option..). No crash,
  4. Setup: Windows 10, Arduino IDE v1.8.16, TinyPico ESP32, core esp32\1.0.6, TM1638 LED & KEY 8-digit module 8 buttons on DIO=15, CLK=27, STB=26, LED=25). Result: sketch works, measured speed 152 (1ms faster?). No crash,
  5. Setup: Windows 10, Arduino IDE v1.8.16, TinyPico ESP32, core esp32\1.0.6, , Noname TM1637 4 digits, no buttons on DIO=15, CLK=27, LED=25). Result: sketch works, measured speed 353. No crash, Measured speed 355 on compilation with TinyPICO board definition instead of ESP32 Dev Module.

@maxint-rd
Copy link
Owner

maxint-rd commented Oct 26, 2021

Hi Ken, While testing I've made some changes/additions to your sketch. Would you like me to publish my version in your branch? That way we can work together on making further revisions of the test sketch to support more modules, prior to including it in the master branch.

Some first comments about the sketch:

  • Perhaps a more generic name for the sketch can be used? You very nicely made the sketch to also support other TM16xx chips with only minor changes in displayTypes.h
  • My TinyPico has no LED_BUILTIN, it has an RGB LED. On AVR arduinos the LED is usually on when HIGH. I suggest to implement a setLed() function to make it easier to customize the sketch.
  • The speed of the TM16xx is currently limited using a bitDelay() method. Most chips require this delay, especially when reading buttons. I found this when testing the ESP32 (running at 240 MHz). Many TM16xx chips do support higher write speeds than the datasheet describes, but reading is fairly slow. The required delay varies per chip. Currently the library is optimized for size and compatibility rather than speed.
  • The TM1638 and others also use a STB pin. Using define in displayTypes.h the test sketch can select which parameter the MODULE needs.

@KenWillmott
Copy link
Author

Yes, I'd like to see your changes. Let me know where it ends up, unless the Github will notify me. It would be great to have two minds looking at it.

Replies:

  • yes, a better name would be good
  • yes, and each MCU board uses a different pin for LED_BUILTIN and one may or may not exist. Also consider, not all devices have serial during testing. For example, the STM32 might not have a convenient serial port to work with, while it is programmed via STLINK debugging tool.
  • device interface speed can be influenced by more than just the IC specification. For example, the DiyMore modules have 100pF caps on the DIO and CLK lines to limit EMI emissions. It would be nice to test, but I haven't looked to see whether the data/clock speed can be adjusted at the application level. This would be a good test to add to the test program.
  • yes, in my current scheme I would code a variable like this:

const int hasSTBpin = true;
const int STB = xx;

@maxint-rd
Copy link
Owner

Hi Ken, I think I'll upload my version of your sketch in a few days.

Today I've tested my TinyPICO ESP32 module on windows 10 (64bit). I first tested your sketch with the TM1638 and then with the TM1637 which crashes on you. So far I've not seen it crash on either Windows 7 (32bit) or on Windows 10 . Just to be sure I tested with board selection ESP32 Dev Module as well as the TinyPICO board selection. (TinyPICO works with both). However, both use the same core esp32\1.0.6 and xtensa-esp32-elf-gcc\1.22.0-97.

Could it be that you are using a different core? Perhaps you can also try the Portable installation of Arduino IDE 1.8.16 with the same core as I have?
To make a portable installation you just unzip the downloaded archive and create a directory named Portable under the directory that has arduino.exe. Then you run arduino.exe and install the ESP32 board and the TM16xx library (and delete the TM16xxMatrixGFX files).

@KenWillmott
Copy link
Author

KenWillmott commented Oct 30, 2021

I upgraded the PC to Windows 11.
Edition Windows 11 Home
Version 21H2
Installed on ‎2021-‎10-‎27
OS build 22000.258
Serial number xxxxxxxxx [redacted]
Experience Windows Feature Experience Pack 1000.22000.258.0

System:
Device name DESKTOP-xxxxxxxxxxx [redacted]
Processor Intel(R) Core(TM) i5-10400 CPU @ 2.90GHz 2.90 GHz
Installed RAM 12.0 GB (11.8 GB usable)
Device ID [redacted]
Product ID [redacted]
System type 64-bit operating system, x64-based processor
Pen and touch No pen or touch input is available for this display

I downloaded and installed a fresh copy of the library directly from this repository. I have an ESP32 DevKit V1, display is DiyMore "4 digit LED display" which uses TM1637. The test sketch is "TM1637_test_7segment_generic" which I uploaded to my fork. ESP core installed is version 2.0.0, the latest release. Chip is ESP32-D0WDQ6 (revision 1). Arduino core is 1.8.16.

The ESP32 produces this watchdog message every few seconds:
`ets Jun 8 2016 00:22:57

rst:0x10 (RTCWDT_RTC_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:1
load:0x3fff0030,len:1240
load:0x40078000,len:13012
load:0x40080400,len:3648
entry 0x400805f8
`

Then, I comment out the initialization in the TM1637 class constructor in TM1637.cpp:

TM1637::TM1637(byte dataPin, byte clockPin, byte numDigits, boolean activateDisplay, byte intensity)
: TM16xx(dataPin, clockPin, dataPin, TM1637_MAX_POS, numDigits, activateDisplay, intensity)
{ // NOTE: Like the TM1640, the TM1637 only has DIO and CLK. Therefor the DIO-pin is initialized as strobe in the constructor
// clearDisplay();
// setupDisplay(activateDisplay, intensity);
//delay(1); // NOTE: using delay in constructor (called prior to Setup) may hang the MCU!
}

After that, it uploads and runs perfectly!

@maxint-rd
Copy link
Owner

maxint-rd commented Nov 2, 2021

Hi Ken, thanks for posting these details. From what I see the main difference is the version of the ESP32 core. I've been using this URL in the board manager:
https://dl.espressif.com/dl/package_esp32_index.json
That URL gives me version 1.0.6 as the latest stable version. Using that version I haven't been able to reproduce the issue.

Following you mentioning you're using version 2.0, I found newer releases on https://github.com/espressif/arduino-esp32
The 2.0 version mentioned there is a development release which supports ESP32-S2 and ESP32-C3 (both I don't have)
I think I'll give that release a test as soon as I have time and expect to be able to reproduce the crashes then.
The whole reason I want to reproduce this behavior is that I want to fix it in a way that is compatible with existing code and that keeps memory footprint low for small MCU's, such as ATtiny. I also need to spend some time to update my knowledge and see what changes this new version brings. (I'm thinking of ordering one of these newer ESP32's for further testing).

@maxint-rd
Copy link
Owner

maxint-rd commented Nov 2, 2021

FYI: I found a post of someone encountering the same issue in the LiquidCrystal library using the latest ESP32 core. In this case the root cause is the delay() and as remedy they state there should be no call to any begin() in a contstructor. If I remember well using delayMicroseconds() should be okay, as long as yield() is used when the length of the delay combined with other code could trigger the watchdog.

@maxint-rd
Copy link
Owner

Hi again,

I have good news! (sort of...)
This afternoon I installed the 2.0.0 development release in Arduino IDE 1.8.16 using this board manager URL:
https://raw.githubusercontent.com/espressif/arduino-esp32/gh-pages/package_esp32_dev_index.json
I first removed the old core installation, but kept the library and sketches as is.

Testing your sketch with my 4-digit TM1637 module finally gave me the looping reset that you encounter:
rst:0x10 (RTCWDT_RTC_RESET),boot:0x17 (SPI_FAST_FLASH_BOOT)
I then modified the TM1637 according your post and saw that the sketch then works as it should.
My current interpretation is that the 2.0.0 implementation of delayMicroseconds() acts more like delay(), of which I already discovered years ago that it cannot be used in a constructor.

Next step is to implement a compact, compatible fix that works well for all supported MCU's. Prior to releasing that fix, I'd like to also test it on my STM32 board, but that doesn't have top priority now.

@KenWillmott
Copy link
Author

KenWillmott commented Nov 2, 2021

Did you highlight the wrong board manager URL? The one you posted is for the ESP8266. I assume that your comments apply to an ESP32 board manager instead. I deliberately avoided newer releases, instead downloaded the "stable" version to help rule out unknown problems. Or, am I mistaken about that? It was dumb of me to not post the URL, it's
https://raw.githubusercontent.com/espressif/arduino-esp32/gh-pages/package_esp32_dev_index.json
I admit to some confusion about versions, there. What I did was to act like a naive user, follow the Espressif site links to download, and then chose what seemed to be the most obvious general purpose core, hoping it would be the most commonly used.

That's probably what you have, but I'll wait to see what you say. If so, it's certainly weird that you don't see the issue. My Windows PC was purchased for mostly development, and has very minimal customization and complement of applications. It's also fairly new. So I am quite surprised about your report. It's inconvenient, too, since there really isn't much hope until you can see it on your end.
[edit - posted this only a few seconds after your last post - it appears you did reproduce it]

Another attempt could be made if you have a Blue Pill, I seem to recall you were in the process of obtaining one. I could repeat the above test with STM32F103C8, in that case a completely different tool chain. You could then reproduce that test over there.

Generally, ESP32 is too good a device, to be missing. It's not useful for everything, but the things it does, it does brilliantly. I even tried the on chip Hall effect sensor, it works! Kind of a strange afterthought from the designers, but there it is. The BT is useful for device network configuration as well as other things. I have about 6 of them now, 2 are already installed in projects.

I've been researching other STM lines, because I have a need for non-wifi devices. I want to future-proof myself so I'm trying to guess where things will be in a few years after some shake-out of the F103 line due to obsolescence. STM promotes the G0 and G4 series for such low cost application. Some modules just came on the market last week so I ordered a bunch of G0's. My expectation is that the G0 will eventually surpass the F0 in commercial appllications, the sales will increase and the prices will become extremely low. When those arrive, I can also test them with this library.

Also the G0 are available as 20 pin and I think 6 pin IC's. Doesn't that suggest something like an ATTiny? :)

That's an interesting post from the Arduino forum, it lends some weight to my theory. Another interesting thing is the work-around suggested by forum member "guix" about instantiating the driver object inside the setup() function. I think any such problem could be solved that way without making any changes to the library. But it's not exactly typical code. Still, it's useful to have a "hammer fix" that will always solve the problem. It's not going to make users happy, though, because the methods have to be called with pointers, the example given was:
lcd = new LiquidCrystal( ... );
lcd->begin( ... );
All the other methods would also have to be called via pointer, not very convenient.

@maxint-rd
Copy link
Owner

maxint-rd commented Nov 2, 2021

Aha, our posts crossed. I saw you still read my previous post. (I should have pressed save a few minutes earlier,...)
Sorry, my first board URL contained both ESP8266 and ESP32, I edited that post...

I have an STM32 blue pill, but just haven't used it yet and still need to make a working Arduino installation for it.
Thanks for your suggestions. Have you already done a project with the STM8? I bought a couple (because of the price), but also didn't look at them. Would be nice to have an STM8 core for Arduino, but don't even know if it exists.

I have a lot of admiration for the people that made ESP32 as well as it predecessor. In my opinion one should choose the MCU that fits the need. Personally I also like to struggle with more limited MCUs, such as the ATtiny44 and ATtiny13, but all MCU's have their limits.

With regards to the fix I'd like to see minimal or no changes in existing sketches. I think that that's doable, although it may require an additional flag and some extra code.

@KenWillmott
Copy link
Author

KenWillmott commented Nov 2, 2021

Yes, I played with STM8. IIRC there isn't an Arduino core. There is a slight problem, though not impossible to overcome. You can freely download the STM8Cube (I think, because it was a few years ago I did that). However, the compiler isn't available freely from STM, but from Cosmic. They offer it for free, but you have to jump through some administrative and technical hoops to get it installed. You get a free license to use it only on one machine.

In my opinion, the best performance/price in the sphere of small STM modules, are the STMF030F4P6 Demo boards: https://www.aliexpress.com/item/32828346568.html
I only just now noticed that those are also offered right there, as G0 boards. You could migrate the 20 pin chip to a really small board... have to order a few of those as well.
But now, many F4 boards with a Blue Pill pinout are emerging, at incredible prices compared with last year. So that is a trend to watch as well. The F4 has all kinds of incredible goodies inside.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants