Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PID mode not working with Erlang beam #294

Open
hkrutzer opened this issue Aug 20, 2023 · 9 comments
Open

PID mode not working with Erlang beam #294

hkrutzer opened this issue Aug 20, 2023 · 9 comments
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@hkrutzer
Copy link

I am using Elixir. I have export ERL_FLAGS="+JPperf map" and a perf map file is present in /tmp

The following works:

sudo ./bin/ddprof --environment test --service test-service --preset cpu_only -l debug iex

but starting iex separately, and running

sudo ./bin/ddprof --environment test --service test-service --preset cpu_only -l debug -p $(pidof beam.smp)

does not.

<NOTICE>Aug 20 12:27:51.522223 ddprof[6194]: datadog.profiling.native.event.count: 0
<NOTICE>Aug 20 12:27:51.522230 ddprof[6194]: datadog.profiling.native.event.lost: 0
<NOTICE>Aug 20 12:27:51.522237 ddprof[6194]: datadog.profiling.native.sample.count: 0
<NOTICE>Aug 20 12:27:51.522243 ddprof[6194]: datadog.profiling.native.target_process.cpu_usage.millicores: 0
<NOTICE>Aug 20 12:27:51.522249 ddprof[6194]: datadog.profiling.native.unwind.avg_time_ns: -1
<NOTICE>Aug 20 12:27:51.522254 ddprof[6194]: datadog.profiling.native.unwind.frames: 0
<NOTICE>Aug 20 12:27:51.522260 ddprof[6194]: datadog.profiling.native.unwind.errors: 0
<NOTICE>Aug 20 12:27:51.522265 ddprof[6194]: datadog.profiling.native.unwind.stack.truncated_input: 0
<NOTICE>Aug 20 12:27:51.522271 ddprof[6194]: datadog.profiling.native.unwind.stack.truncated_output: 0
<NOTICE>Aug 20 12:27:51.522278 ddprof[6194]: datadog.profiling.native.unwind.stack.incomplete: 0
<NOTICE>Aug 20 12:27:51.522285 ddprof[6194]: datadog.profiling.native.unwind.stack.avg_size: -1
<NOTICE>Aug 20 12:27:51.522291 ddprof[6194]: datadog.profiling.native.unwind.stack.avg_depth: -1
<NOTICE>Aug 20 12:27:51.522297 ddprof[6194]: datadog.profiling.native.symbols.jit.reads: 0
<NOTICE>Aug 20 12:27:51.522303 ddprof[6194]: datadog.profiling.native.symbols.jit.failed_lookups: 0
<NOTICE>Aug 20 12:27:51.522308 ddprof[6194]: datadog.profiling.native.symbols.jit.symbol_count: 0
<NOTICE>Aug 20 12:27:51.522314 ddprof[6194]: datadog.profiling.native.profiler.rss: 3399680
<NOTICE>Aug 20 12:27:51.522320 ddprof[6194]: datadog.profiling.native.profiler.cpu_usage.millicores: 1
<NOTICE>Aug 20 12:27:51.522326 ddprof[6194]: datadog.profiling.native.dso.unhandled_sections: 0
<NOTICE>Aug 20 12:27:51.522330 ddprof[6194]: datadog.profiling.native.dso.new: 0
<NOTICE>Aug 20 12:27:51.522335 ddprof[6194]: datadog.profiling.native.dso.size: 0
<NOTICE>Aug 20 12:27:51.522341 ddprof[6194]: datadog.profiling.native.pprof.size: 0
<NOTICE>Aug 20 12:27:51.522347 ddprof[6194]: datadog.profiling.native.profile.duration_ms: 59001
<NOTICE>Aug 20 12:27:51.522353 ddprof[6194]: datadog.profiling.native.aggregation.avg_time_ns: -1
<NOTICE>Aug 20 12:27:51.522357 ddprof[6194]: No statsd socket provided
<NOTICE>Aug 20 12:27:51.522360 ddprof[6194]: DWFL_SYMB NO CALLS
<NOTICE>Aug 20 12:27:51.522363 ddprof[6194]: DSO_SYMB  |       SIZE | 0
<NOTICE>Aug 20 12:27:51.522370 ddprof[6194]: DWFL_HDR  |    NB MODS | 0
<NOTICE>Aug 20 12:27:51.522580 ddprof[6194]: [EXPORTER] Export buffer of size 121

I'm unsure if this is the same issue as the other PID related issues, as in this case nothing shows up at all.

@aristotelhs
Copy link
Contributor

It might be a bit of a similar case. We run Julia and we are using the global profiling option on the node (since the we have hit another bug in the pid code (#283). In the global mode the frames from the jitted functions are either missing or they appear as anonymous. You can check the last comments on #276

@r1viollet
Copy link
Collaborator

If the first command is working, it should be a dupe of this issue: #283
It should not relate to #276 which now seems to be an issue with JITDump. In this case we are using perfmaps.

To help us prioritize this, is the wrapper workaround (first command) OK to use ? Or do you need PID mode ?

@hkrutzer
Copy link
Author

I wasn't not sure whether the issue I was seeing was the same, because in my case no data at all shows up, where in the other issues it looks like at least something shows up.

To help us prioritize this, is the wrapper workaround (first command) OK to use ? Or do you need PID mode ?

For not entirely rational reasons, PID mode would be required for me to convince people to try this out, as then there is no "scary wrapper" required (and requires no modification of the existing methods to deploy services).

@r1viollet
Copy link
Collaborator

The bug is that we only follow the initial TID. If it happens that nothing happens in that given TID, we won't be showing anything.

Erlang might not be easy to visualize through call stacks. With the working wrapper command, were the graphs helpful ? Was the activity understandable ?

@hkrutzer
Copy link
Author

I only ran a simple web server through it, but I would say they were reasonably understandable.

Here is a screenshot of a profile:
image

Due to how Erlang processes work profiling might not be as easy to work with as some other languages, but spawn_link definitely has meaning to an Erlang/Elixir developer.

Over at parca-dev/parca-agent#145 (comment) someone has been able to get application function calls as well.

@r1viollet
Copy link
Collaborator

Thanks for the screen shot. The risk I see is that a lot of activity will relate to the framework and not the user activity, like scheduling logics (example: epoll and sched_yield stacks). Tracing might have a more intuitive approach.

Could I trouble you to push the simple web server to a public repo (in a docker?) so I can run it on my side and check how / where the perf maps are published?

@hkrutzer
Copy link
Author

The Parca Dockerfile from their example should work I think, if not let me know and I will create something.

You are right that the trace above shows a lot of epoll etc. However I also found https://blog.rabbitmq.com/posts/2022/05/flame-graphs/ which shows that you can get a graph of the application code with perf. You are not using perf of course, but the same data should be available right?

@r1viollet
Copy link
Collaborator

Thanks, I tried it in wrapper mode. I am seeing stacks with JIT information from erlang.
I am not sure how helpful this would be to users.

image

In --global mode, we do not match the correct filename (due to a pid namespace issue).

@r1viollet
Copy link
Collaborator

For wholehost: the fix is to retrieve the namespaced PID and to look for the perf map with the correct name.

@r1viollet r1viollet added bug Something isn't working good first issue Good for newcomers labels Sep 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants