Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fatal crash in databrowser on Linux #2660

Open
willrogers opened this issue Oct 19, 2020 · 21 comments
Open

Fatal crash in databrowser on Linux #2660

willrogers opened this issue Oct 19, 2020 · 21 comments

Comments

@willrogers
Copy link
Contributor

Having built CS-Studio against the latest Eclipse version, we have a serious problem that I have not seen before.

Simply opening the databrowser causes CS-Studio to disappear immediately, leaving only the following error:

[xcb] Unknown request in queue while dequeuing
[xcb] Most likely this is a multi-threaded client and XInitThreads has not been called
[xcb] Aborting, sorry about that.
java: xcb_io.c:165: dequeue_pending_request: Assertion `!xcb_xlib_unknown_req_in_deq' failed.

@kasemir I realise that you won't have much interest in fixing this, but do you have any ideas as to where we might look? I expect the problem is somewhere in org.csstudio.swt.rtplot.

@kasemir
Copy link
Contributor

kasemir commented Oct 19, 2020

The same happens in phoebus, ControlSystemStudio/phoebus#705

With phoebus, I've occasionally seen it on startup. Googling around led me to believe that it can be caused by calling UI code off the UI thread, but unclear where that would be taking place.

@GuillemCastro
Copy link
Contributor

We had the same error in ITER. Every time we'd open the databrowser CS-Studio would crash.

Debugging I found that these lines aren't executed in a UI thread. We solved it by wrapping them in a Display.syncExec.

@kasemir
Copy link
Contributor

kasemir commented Oct 19, 2020

The whole point of the data browser is that the image is created in a background thread, so moving updateImageBuffer() to the UI thread means the UI is slowing down as you plot more and more data.
In the phoebus/JavaFX version, there were occasional issues with obtaining an in-memory image buffer, so the code to allocate an in-memory image buffer moved to the UI thread, but the code which then keeps re-using that buffer to draw into the buffer runs in background threads.

@willrogers
Copy link
Contributor Author

@GuillemCastro lucky you saw this ticket, thanks. That gives us a lot more information.

@willrogers
Copy link
Contributor Author

We have tried putting updateBufferImage in syncExec and it certainly appears to fix the problems we are seeing. I can't yet assess the performance impact.

I can't find any way to make a smaller change that fixes the problems. The code as it is will fail in a number of ways and even if I change updateBufferImage just to draw a small red square, CS-Studio will often deadlock if I play around for a while.

@willrogers
Copy link
Contributor Author

See DiamondLightSource@a18f978 for that change.

@kasemir
Copy link
Contributor

kasemir commented Oct 20, 2020

Can you tell where it's locking/failing?
Is it the initial new Image call?
In the JavaFX version, the comparable call to get an in-memory image was problematic and had to happen on the UI thread, so some https://github.com/ControlSystemStudio/phoebus/blob/master/core/ui/src/main/java/org/phoebus/ui/javafx/BufferUtil.java is used to handle that, and the rest of the actual drawing code can then run in the background.

@willrogers
Copy link
Contributor Author

I made a minimal copy of your Phoebus code and then caught a deadlock in the debugger, you can see quite a lot in this image. It appears to have locked at the display.syncExec(), oddly enough, before it tries to create the image. I didn't set a breakpoint, I waited until it locked up and then suspended the thread to see the stack trace.

I notice that there are two "RTPlotUpdateThrottle" threads, I don't know whether that is correct.

deadlock

@kasemir
Copy link
Contributor

kasemir commented Oct 21, 2020

For what it's worth, in the current phoebus incarnation the "RTPlotUpdateThrottle" is different. There's just one global "UpdateThrottle" thread (since there's also just one UI, after all), and an "RTPlot" pool sized to match the CPU core count. The images are also created in a double-buffer scheme, i.e. 2 per plot which are then re-used. So it's all quite different and doesn't help at all to explain what you're seeing in the SWT code.

I can't tell what's blocking here, i.e. who's holding the lock, but I think the approach could be changed.
'sync' exec blocks, so with that you wouldn't even need a CompletableFuture. I would use asyncExec, and then actually utilize the CompletableFuture to await the result:

// On background thread
// TODO Add some check to assert that we _are_ on a background thread
CompletableFuture<BufferUtil> result = new CompletableFuture<>();

// On display thread, at some time in the near future, create the buffer
Display.asyncExec( () -> result.complete(new BufferUtil(...    );

// Back off the display thread, await the result
BufferUtil buffer = result.get(500, TimeUnit.MILLISECONDS);

@kasemir
Copy link
Contributor

kasemir commented Oct 21, 2020

The way you use syncExec, you might as well put the result in an AtomicReference:

AtomicReference <BufferUtil> result = new AtomicReference <>();

// On display thread, at some time in the near future, create the buffer
Display.syncExec( () -> result.set(new BufferUtil(...    );

// As we return from the blocking syncExec, result has been set.
// ... or we block for a while, because blocking is what syncExec is meant to do
BufferUtil buffer = result.get();

@willrogers
Copy link
Contributor Author

Aha yes AtomicReference will be what I want.

@kasemir
Copy link
Contributor

kasemir commented Oct 21, 2020

Actually, I'd try asyncExec to decouple the threads.

@kasemir
Copy link
Contributor

kasemir commented Oct 21, 2020

The 'syncExec' can block forever (or until the next power outage). With asyncExec, the CompletableFuture.get(..) will time out after 500ms, so you skip one update, but you're not stuck.

@willrogers
Copy link
Contributor Author

I didn't actually understand that asyncExec() was also set to run on the UI thread, I was thinking that it meant run a task in the background. I guess that's a fairly fundamental misunderstanding. Anyway, now I have moved it to an asyncExec as you suggest and it tells me a bit more - once the UI locks up it is indeed timing out after 500ms every time, presumably because the UI thread is blocked. I assume that Thread [main] is the UI thread.

@willrogers
Copy link
Contributor Author

It seems that any use of the GC in a background thread is causing the problems.

@kasemir
Copy link
Contributor

kasemir commented Oct 21, 2020

Bummer. It's certainly possible that the latest version of SWT limits all access to the UI thread, i.e. can no longer use SWT to prepare images in the background. So you'll have to run updateImageBuffer() in the UI thread. Or use databrowser3, or phoebus, which actually draw in the background with AWT.

@willrogers
Copy link
Contributor Author

I asked on an Eclipse mailing list but it hasn't got through moderation, and I asked on the Eclipse forums but I haven't had an answer.

I think we are going to have to move more processing to the UI thread, and then revisit this either if we see performance problems or if I find out any more information about SWT.

@kasemir
Copy link
Contributor

kasemir commented Nov 11, 2020

Not sure if there is an officially blessed SWT example for preparing images in background threads, then loading them on the SWT thread.
There are various examples, like https://www.eclipse.org/forums/index.php/t/168168/ first running into Invalid thread access, then followed by an example that basically does offScreenImage = new Image(display, 400, 300); ... draw ... in a background thread, ending in display.asyncExec(... show offScreenImage ..), which is what we've been doing.

But there is no such example in the official https://www.eclipse.org/swt/snippets/

@willrogers
Copy link
Contributor Author

I have opened an Eclipse bug here: https://bugs.eclipse.org/bugs/show_bug.cgi?id=568859. Here is a relevant reply from the mailing list.

Hi Will,

In general most of SWT code expects to be executed on UI (main) thread only, but you seem to work on non UI thread?
If you can provide a standalone reproducer example, please create bug for that, we should discuss on bug.
I guess the new behavior is related to removing locks around native GTK calls.
See
https://bugs.eclipse.org/bugs/show_bug.cgi?id=546743#c17
https://git.eclipse.org/r/c/platform/eclipse.platform.swt/+/155907
https://git.eclipse.org/r/c/platform/eclipse.platform.swt/+/155989 .

Kind regards,
Andrey Loskutov

@kasemir
Copy link
Contributor

kasemir commented Nov 16, 2020

Try Snippet141 from https://www.eclipse.org/forums/index.php/t/168168/ :

new Thread("Animation")
{
                        public void run()
                        {
                            /* Create an off-screen image to draw on, and fill it with the shell background. */
                            Image offScreenImage = new Image(display, 400, 300);
                            GC offScreenImageGC = new GC(offScreenImage);
....

combined with the fix suggested further down in that thread to then use getDisplay().asyncExec(.. to place that image created in a background thread in the UI.

@willrogers
Copy link
Contributor Author

It seems that there has been a bug introduced that causes problems when creating an Image on a background thread. I expect that I will need to keep the workaround until that bug is fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants