-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash on Startup on Vista / fb2k 1.4.4 / ChronFlow 0.5.0 #2
Comments
That is a really odd crash. In case you're interested: The crash log points to an OpenGL helper library I'm using, more specifically a part that handles something about monitors. So my first question would be whether you happen to have an unusual monitor setup. Next, did you insert multiple foo_chronflow panels into your UI, or only one? My strongest guess is that this is a bad interaction with some other component. But the only good candidate I found in your list is And last but not least, here's a special build for you: foo_chronflow.zip. This shouldn't crash, but you should get an error instead. What does it say? |
Hmm... looks interesting. My monitor setup is a pretty standard dual head setup, fb2k runs on the secondary monitor, so in that array the monitor fb2k renders on would come second at i == 1. So let's try the naive stuff first: Obviously, the first thing I'll try is weather moving it to the primary monitor changes the behaviour:
So it's not dependent on which monitor the window opens up on. So far so bad. :|
That was a even a bit surprising. I hoped that since the window position is irrelevant, that at least the number of actual monitors would change things, like not causing the array index to go out of bounds, but it seems that's either not the problem, or it's caused by the first monitor, and doesn't really need the help of another being there. Now windows counts monitors 1-based, so _glfw.monitors[0] would be monitor #1. But oh well, that's how it's been all along.
Upon closer inspection though, the problem is not the array index but rather the _GLFWmonitor* pointer being taken from that array. If that points where it shouldn't, it would explain the access violation seen in the dump. So the trap is probably set up where _gflw.monitors gets initialized with a bogus pointer, the dereference is really just the trap snapping at the touch.
Finally,
Hmm... That's not really making a light bulb appear over my head. ;) Are you using GFLW only starting with 0.5.0? If so, this is likely a bug stemming from it, as I said the pointer dereference in line 79 of the file you linked will cause an access violation if the pointer is incorrect. I doubt that you can do much about it, GFLW will likely self-initialize without much you could do here. What's odd is that the error seems to occur in the Do you ever call this method directly? Can you trace if you call it indirectly and if so, how often? Anyway I doubt that it's a problem involving some other module unless you are only now using GFLW, possibly now causing dual initialization and/or termination and/or unless it's related to something else that's new in 0.5.0, since with 0.4.5 the problem doesn't occur with all other modules being the same. I'll back up my configuration and try what happens if I run a plain fb2k with only ChronFlow added and will report what I see. Thanks for your quick responses BTW, much appreciated. The work week has started, so I hope I can keep up with you testing stuff... ;) |
So deinstalling everything but the bundled plug-ins and ChronFlow didn't change the behaviour, 0.4.5 works, 0.5.0 crashes, the snapshot you provided prints a similar error. I'll attach new dumps in case that helps due to less clutter from other loaded modules. |
This looks suspicious: EngineWindow.cpp#L30. Can the count appear to be equal to 0 to different threads or EngineWindow instances or the like causing gflwTerminate() to be called repeatedly, causing the pointers to be dereferenced into freed memory on the second invocation? My C/C++ times are long ago, so I am sorry, I cannot quite pinpoint the issue, but using invocation counts to make something a singleton could be a problem, especially when the code is invoked concurrently. Also is the GLFW engine supposed to be a singleton throughout a process? In that case I'd expect there to be some factory that either returns the global instance, initializing it only when required. But again, I am just shooting from the hip here, I don't know the fb2k SDK, nor GLFW, nor did I really study your code enough. |
Looks like you're properly calling the factory, but I don't really see a need for the invocation counter, since I think you should check for initialization errors using Even better, I think would be to simply call In that case the access violation might be prevented (if I am right about multiple invocations causing it in the first place), but GLFW may not initialize either. But that's another story. 0.4.5 works, so I assume there is a way GL works on my dated system. :) |
Almost oversaw that question... I've only got one ChronFlow panel in my layout. |
No, but I'm using a newer version and linking it statically instead of using a dll.
Not really. As you saw this is only called on window destruction, which should only happen once since a window is also only created once. But it should also be fine to call the function repeatedly. For the monitors this is easy to see:
No, enginewindows are only ever created and destroyed in the mainthread. I'm still really unsure as to why glfwTerminate could fail. But it's not actually necessary to call it at all – the build I gave you simply has that call removed :).
I also got confused by the many numbers at first, but it actually points to the line that is failing: Lines 173 to 174 in f6e76cd
And it turns out that |
Hi Chronial, thanks for the analysis. I've got the platform update installed (not just now, but tried a reinstall), and it won't solve the issue. But I don't think you have a chance to do much about it. I'll keep using 0.4.5 for the time being, and it's time to get off Vista anyway in the near future. It won't be Windows 10 and thus Microsoft breaking my install, or deleting my files, so I'll probably be moving on to Linux for my main workstation. Too many things won't work on Vista any more, like the new Picard 2.0.4. (It crashes on startup, too.) You might want to officially drop support for Vista, though, pointing to this issue, in case there's another guy like me still using it. ;) |
As I said earlier, there is a race condition if two threads call it concurrently, since in the code
the memory allocated to _glfw.monitors is freed before the reference is set to null and the count is set to zero, so at least theoretically, if another invocation occurs and reaches the loop between the free call and the counter being zeroed, an illegal access can occur. A terminate call happens implicitly if the initialization fails, and then calling it again might cause the issue. I think, in any event, they should restructure their code to first set I also didn't study the GLFW code enough to see if perhaps The one thing I am curious, and I think you should be, too, is whether the crash/illegal access could be triggered in the same way by other failures than mine. Generally, as a user, I would very much prefer a starting application with the error logged, over a crash-on-startup, so unless you decide to leave out the What I'd suggest is that you inspect the whole module for runtime exceptions being thrown explicitly by your own code, and ensuring that they are caught at the latest in your entry point (i. e. the method lowest on the stack that is under your control) and then terminating / disabling / or simply "neutering" your plug-in by logging to the console, that you couldn't initialize and doing nothing further. You must not allow your plug-in to take out the whole process by allowing an uncaught exception to escape further up on the stack beyond your control. The same applies for any external routines you call that may cause runtime exceptions. It's your responsibility to catch and log these errors, then terminate / deactivate gracefully, and to not let them escape beyond your control. I am not sure if access_violation is catchable though? My C++ times are to long gone. ;) |
BTW, reading through this, and the link it points to in the 2018-12-31 comment, the actual reason for the access violation might even be the Nvidia driver version I run, which is "the latest" but as outdated as my OS is... |
I am not 100% convinced that the plugin can't work on Vista yet – I'll wait for a second confirmation.
I am aware of that, but the only thread this function is ever called from is the main thread.
glwf.init actually seems works fine. It is a completely different part of the code that fails for you. There seem to be two separate issues here – one is kind of expected because of your OS (the initial failure), the other is not (the crash within glfw).
I can not easily prevent that. The only way to prevent it is to actually go bughunting within gflw which is hard for me, as I can't reproduce the error. I tried running the plugin in a Vista VM, but it just reports the lack of a proper GPU, as it should – without any crasing.
If you can't open basic win32 windows, then you have way bigger problems than my component :). But logging to the console is still a good idea – that makes it easier to copy the errors.
That is already the case.
Access violations are not trivially catchable. I think there might be a way to do that, but that is also not a good idea, because most of all they are not recoverable. An access violation tells you that the process is in an invalid state and the only sensible course of action is to terminate the process – which foobar is already doing. Thanks a lot for your help with this. If you are willing to test one more version, you could please give this a try: foo_chronflow.zip. I removed the |
Oh, and could you please run |
I managed to get Vista running in my VM, and I think I found the issue with Vista. This version works for me on Vista: foo_chronflow.zip. Care to give it try? |
Yup, looks good! Congratulations! Just looking at your commit... Oh really!? LOL. I wonder if mankind will ever agree on which side to drive our cars on and where to put the green subpixel and such. ;)
Did you mean to write "BGR" there? Sorry I didn't have time to check it early, I had a couple of buy workweeks and didn't even start my PC at all... |
Regarding the GLFW crash on the terminate call, I guess that's now simply circumvented, of course. Which is nice for you and me, still I wonder if we should let upstream know about it, and whether there is some smaller snippet of code that might reproduce the problem. Since you're right: of course, an Access Violation is a bug in their code in any event when they're on the top of the stack when it occurs, and yup, I remember now that they told us that they can't / shouldn't be caught... |
Nice to hear :).
I did, thx.
I can't really do anything about this, as I can't reproduce the problem on my end. If you want to help me investigate further, you could give this version a try: |
See also my comment on the hydrogenaudio forum
Unfortunately on my Vista installation of fb2k 1.4.4, chronflow 0.5.0, fb2k crashes as soon as it starts (the UI comes up for a moment, then closes and the crash report window appears). The dumps point to ChronFlow being involved. I have a ChronFlow panel in my layout. When I remove ChronFlow 0.5.0 using the safe mode, fb2k starts just fine. I also tried removing the foo_chronflow.dll.cfg file from my profile, but it didn't fix the error.
I've since verified that the old version 0.4.5 works with fb2k 1.4.4 (so removing 0.5.0 and reinstalling 0.4.5 allows fb2k to start normally and ChronFlow works as usual), and it used to work with 1.4.1, from which I upgraded today. So it must be a regression in 0.5.0.
I'm happy to try a fix, but I'm not set up to build it myself, so I'd need binaries / an fb2k component archive.
failure_00000002.txt
failure_00000002.dmp.txt
Cheers,
Flexx
The text was updated successfully, but these errors were encountered: