Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cover:export/1 never returns in Erlang 27 #8661

Open
arcusfelis opened this issue Jul 11, 2024 · 7 comments · Fixed by #8742
Open

cover:export/1 never returns in Erlang 27 #8661

arcusfelis opened this issue Jul 11, 2024 · 7 comments · Fixed by #8742
Assignees
Labels
bug Issue is reported as a bug team:VM Assigned to OTP team VM

Comments

@arcusfelis
Copy link

arcusfelis commented Jul 11, 2024

Describe the bug
This call never returns in Erlang 27 (even after 3 hours):

cover:export("/tmp/mongoose_combined.coverdata").

To Reproduce
Tricky setup: we get it on MongooseIM test CI. So, we have 5 nodes we have to collect coverage from. We have a lot of tests running before, so there could be a lot of coverage data.

Expected behavior
Same as in Erlang 26: returns after a couple of second.

Affected versions
27

Additional context
I would expect something is wrong with async/remote_calls in the cover.erl. It has a lot of places where it spawns a new process, which would send a reply. If something wrong happens in that new process, caller would never be notified.
It is a bit tricky to figure out what is wrong because of a lot of pmap and remote calls (and the fact that it is hard to get current_stacktraces of these spawned processes). Hope to add more info after debugging.

@arcusfelis arcusfelis added the bug Issue is reported as a bug label Jul 11, 2024
@jhogberg jhogberg added team:VM Assigned to OTP team VM stalled waiting for input by the Erlang/OTP team labels Jul 15, 2024
@jhogberg
Copy link
Contributor

Thanks for your report! Unfortunately most of us are on vacation right now, we'll revisit this once we're back. :-)

@arcusfelis
Copy link
Author

arcusfelis commented Jul 15, 2024

We meck mongoose_tcp_listener:

(mongooseim@localhost)4> code:get_coverage(cover_id_line, mongoose_acc).
[{1,0},
 {3,0},
...
 {28,...},
 {...}|...]
 
(mongooseim@localhost)5> code:get_coverage(cover_id_line, mongoose_tcp_listener).
** exception error: bad argument
     in function  code:get_coverage/2
        called as code:get_coverage(cover_id_line,mongoose_tcp_listener)
        *** argument 1: must be one of: function or line

so, cover crashes when trying to collect data from the remote node.

btw, calling mongoose_tcp_listener coverage on the node that was not ever mecked works fine:

rpc:call('reg1@localhost', code, get_coverage, [cover_id_line, mongoose_tcp_listener]).
[{1,0},
 {2,0},
 {3,0}
 ...

code:get_coverage/2 is called from cover.erl when exporting coverage report.

@arcusfelis
Copy link
Author

I've added a bug report to Meck eproxus/meck#248

@arcusfelis
Copy link
Author

arcusfelis commented Jul 15, 2024

sadly, cover error handling is bad.
For example, that crash just crashes a new spawned process, so the caller waits forever for response.
We should add try..catch at least, or cover_server process should monitor their spawned processes.

Using spawn_link instead of spawn in pmap and other places could prevent unreported errors (though would crash cover_server).

@arcusfelis
Copy link
Author

Basically, this happens if cover is started in the distributed mode, i.e. cover:start([Node1, Node2...]).
Sadly, meck was broken for this mode before, it was not correctly restoring the modules in meck:unload/1.

Starting cover in local_only mode and importing from all nodes manually (i.e. not using cover distributed mode), "solves" the issue.

@bjorng
Copy link
Contributor

bjorng commented Aug 23, 2024

There does not seem to be a good way to report problem via the current cover API functions, so #8742 avoids the hanging by catching the call, logging that the call failed using logger, and otherwise ignore the failure.

@arcusfelis
Copy link
Author

Yeah, it looks like the best way for Meck to support multinode mocking and cover, is to make an explicit PR to OTP with missing functionality. Meck is doing mocking for cover in a bit of a hacky way to begin with. Remote cover nodes not exposing any API to load a mocked version just make stuff worse in this particular case.

We managed to start cover in distributed mode on first node-under-test. Good that we only need to mock stuff on that particular node.

bjorng added a commit that referenced this issue Aug 28, 2024
…/OTP-19203

cover: Gracefully handle failures to collect coverage data
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is reported as a bug team:VM Assigned to OTP team VM
Projects
None yet
4 participants