-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Agent fails with ERR_DLOPEN_FAILED
on Windows
#331
Comments
Probably a new native library failed to be bundled. Also isn't the CI supposed to test basic functionality cross platform to prevent regression? This should be caught by the CI - please check @tegefaulkes @aryanjassal |
Well, we only check if the help text shows up fine. Which it did. The first agent ran just fine, but the issues came when running another agent, or running the same agent again. So, a bundling issue seems implausible. I will look into this further. A way to ensure cross-platform compatibility is to run the entire test suite on each platform instead of just printing the help text. Maybe @brynblack can help with that. |
pkg unpacks and extracts it's native binaries to a temporary location. When that error occurs check if that path still exists and interrogate its contents. This may be a misnomer. |
Yes, we did that. The |
Did you try that in the midst of a second execution when the process failed? |
The process fails literally instantly, so I can't do this. Moreover, the error message contains the actual path I followed to locate the relevant binary, so I cannot follow the path before the error message gets printed. |
Yes I mean follow the path after error is printed. |
And you'd want to make sure no state changes occurred after the fact. While doing it, use strace or equivalent in windows to check. Sanity check with chatgpt with there's known issues loading the same DLL for some reason. |
You can always attach a trace to the process as it is running - on Windows it should be possible too. See this chatgpt transcript (learn to do this when investigating problems): Prompt:I need the equivalent of strace for windows. How can I use it, and what is it? Response:The equivalent of What is Process Monitor?Process Monitor is a real-time monitoring tool for Windows that tracks:
How to Use Process Monitor
Examples of Use Cases
Key Differences Between
|
Feature | strace (Linux) |
Procmon (Windows) |
---|---|---|
Primary Focus | System call tracing | File, registry, and process activity |
Granularity | Detailed system calls | Broader OS-level activities |
Output Format | Text-based logs | GUI with filters and visualizations |
For lightweight alternatives to Procmon, you can explore Sysmon or DebugView from the Sysinternals suite, but Procmon remains the most versatile and closest to strace
.
Prompt:
I need to attach to a node process to trace what is happening prior to ERR_DLOPEN_FAILED
.
Response:
Tracing what is happening in a Node.js process before encountering an ERR_DLOPEN_FAILED
error requires a combination of debugging and system monitoring tools. Here’s a step-by-step approach to identify the root cause using tools available on your operating system.
Understanding ERR_DLOPEN_FAILED
This error typically occurs when Node.js fails to dynamically load a native module. Common causes include:
- Missing or incorrect file paths.
- Architecture or ABI mismatch.
- Missing dependencies (e.g., shared libraries).
- Permission issues.
Steps to Trace the Error
1. Enable Node.js Debugging
Run the Node.js process with additional debugging options to capture internal activity.
-
Use the
--trace-warnings
flag to get detailed stack traces for warnings:bashCopy codenode --trace-warnings app.js
-
Enable verbose logging for native module loading:
bashCopy codeNODE_DEBUG=module node app.js
This will log information about how modules are resolved and loaded, which can help pinpoint the problem.
2. Attach a Debugger
Node.js has built-in debugging capabilities. You can use the --inspect
flag to attach a debugger:
-
Start the process with:
bashCopy codenode --inspect-brk app.js
This will pause execution at the beginning of the script, allowing you to inspect the environment.
-
Connect to the debugger using a compatible tool:
- Chrome DevTools: Open
chrome://inspect
in Chrome. - VSCode: Use the "Attach to Node.js" configuration.
- Chrome DevTools: Open
Place breakpoints near the code triggering the ERR_DLOPEN_FAILED
error to inspect variables and stack traces.
3. Use Process Monitor (Windows) or strace (Linux)
If you suspect system-level issues, trace the process's system calls.
-
On Linux: Use
strace
to monitor file system and library calls:bashCopy codestrace -f -e open,openat,stat,access,lstat,node app.js
This will show attempts to open files or libraries, revealing missing dependencies or incorrect paths.
-
On Windows: Use Process Monitor (Procmon):
- Filter for your
node.exe
process. - Look for
File Not Found
orAccess Denied
entries related to.node
files or shared libraries.
- Filter for your
4. Check for Missing Shared Libraries
Native modules may depend on system-level shared libraries. Use tools to verify dependencies:
-
On Linux: Use
ldd
to inspect dependencies of the problematic module:bashCopy codeldd path/to/module.node
Look for missing libraries marked as
not found
. -
On macOS: Use
otool
to inspect dependencies:bashCopy codeotool -L path/to/module.node
-
On Windows: Use Dependency Walker or ldd-like tools like lucasg/Dependencies to check missing DLLs.
5. Verify the Node.js ABI Compatibility
Node.js native modules are compiled against a specific Node.js ABI version. Ensure the module matches your Node.js version:
- Find the Node.js ABI version:
bashCopy codenode -p 'process.versions.modules'
- Compare it with the compiled module's ABI version. If they differ, recompile the module:
bashCopy codenpm rebuild
6. Use GDB for Low-Level Debugging (Advanced)
If debugging native code, attach gdb
to the running process:
- Start your process:
bashCopy codenode app.js
- Attach
gdb
to the running process:bashCopy codegdb -p $(pgrep -f node)
- Set breakpoints in the loading code or shared libraries.
7. Check Permissions
Ensure the .node
file and any dependencies have the correct permissions:
- Verify ownership and access:
bashCopy codels -l path/to/module.node
8. Review Your Build Environment
If the module is custom-built, ensure the build environment matches your runtime environment:
- Check
node-gyp
logs for errors during the build:bashCopy codenpm rebuild --verbose
Combining Techniques
For comprehensive debugging:
- Use
--trace-warnings
andNODE_DEBUG=module
for application-level logs. - Use
strace
or Procmon to monitor system calls. - Inspect native module dependencies with
ldd
orDependency Walker
. - Recompile or rebuild the module to ensure compatibility.
This multi-level approach will allow you to pinpoint the exact issue causing the ERR_DLOPEN_FAILED
error. Let me know if you'd like help interpreting specific traces or outputs!
Prompt:
No this is happening to Windows, so I want to know all the system calls that occurred prior to the process crashing. So that's why I want to attach strace here.
Response:
To achieve the functionality of strace
on Windows, where you can capture system calls leading up to a crash (e.g., ERR_DLOPEN_FAILED
), you can use the following tools and approaches:
1. Process Monitor (Procmon)
Procmon is the closest equivalent to strace
on Windows. It allows you to monitor all file, registry, and process/thread activity.
Steps to Use Procmon:
-
Download and Launch Procmon:
- Download from the Microsoft Sysinternals.
- Run
Procmon.exe
with administrator privileges.
-
Set Up Filters for the Node.js Process:
- Click on the Filter menu and set:
arduinoCopy codeProcess Name is node.exe
- Add additional filters for:
- File System Activity (
Path contains .node
if debugging native modules). - Registry Activity (e.g.,
RegQueryKey
for environment variables).
- File System Activity (
- Click on the Filter menu and set:
-
Reproduce the Error:
- Start capturing by clicking the magnifying glass icon.
- Reproduce the process crash (
ERR_DLOPEN_FAILED
).
-
Analyze the Logs:
- Look for failed system calls (e.g.,
ACCESS DENIED
,FILE NOT FOUND
) or other anomalies leading up to the crash. - Save the logs for deeper inspection or sharing (
File > Save
).
- Look for failed system calls (e.g.,
2. Windows Performance Recorder and Analyzer (WPR/WPA)
Windows Performance Recorder (WPR) and Windows Performance Analyzer (WPA) can provide detailed tracing of system calls and performance metrics.
Steps to Use WPR:
-
Install Windows Performance Toolkit:
- Part of the Windows 10/11 SDK (get it from the Microsoft Developer Center).
-
Record Traces:
- Open Command Prompt as Administrator.
- Start recording:
cmdCopy codewpr -start GeneralProfile -start CPU
- Reproduce the error.
- Stop recording:
cmdCopy codewpr -stop output.etl
-
Analyze Traces with WPA:
- Open
output.etl
in Windows Performance Analyzer. - Focus on:
- CPU Activity.
- File I/O and Registry traces.
- Drill into the Node.js process to identify anomalies.
- Open
3. Debugging Tools for Windows (WinDbg)
If you want to trace low-level system calls and debug the process crash directly, you can use WinDbg.
Steps to Use WinDbg:
-
Install WinDbg:
- Part of the Windows 10 SDK.
-
Attach to the Process:
- Launch your Node.js process:
cmdCopy codenode app.js
- Open WinDbg and attach to the process:
arduinoCopy codeFile > Attach to Process > Select node.exe
- Launch your Node.js process:
-
Enable System Call Tracing:
- Use the following command to enable verbose output for DLL loading and system calls:
cmdCopy code!ntsdexts.locks
- Monitor the sequence of system calls up to the crash.
- Use the following command to enable verbose output for DLL loading and system calls:
-
Analyze Crash Dumps (Optional):
- If the process crashes, generate a crash dump and analyze it for detailed insights:
cmdCopy code.dump /ma crash.dmp
- If the process crashes, generate a crash dump and analyze it for detailed insights:
4. Sysinternals Tools
a. DebugView
Use DebugView for debugging output if the application emits debug logs via OutputDebugString
.
b. ProcDump
Capture a memory dump when the process crashes:
cmdCopy codeprocdump -e -w node.exe dump.dmp
Analyze the dump in WinDbg or a similar tool.
Combined Workflow for Best Results:
- Use Procmon for high-level system activity tracing leading up to the crash.
- Use WPR/WPA or WinDbg for low-level system call tracing and debugging.
- Analyze dependencies with tools like Dependency Walker or lucasg/Dependencies.
Let me know if you need guidance on setting up or interpreting results from any of these tools!
Prompt:
Yea so how would I attach procmon to the node process? Assume I'm running the entire program via powershell admin.
Response:
To attach Procmon to your Node.js process while running it via an elevated PowerShell session, follow these steps:
1. Launch Process Monitor with Administrator Privileges
-
Start Procmon:
- Open it by running
Procmon.exe
directly from yourSysinternals
folder.
- Open it by running
-
When prompted, ensure Procmon runs with administrator privileges (this is necessary for tracing system-level activity).
2. Apply a Filter for the Node.js Process
Procmon will capture everything by default, which can be overwhelming. Narrow the scope to just your Node.js process:
-
In Procmon, click Filter > Filter... to open the filter configuration.
-
Add the following filter conditions:
Process Name
isnode.exe
→ Include.- Optionally, add specific paths if you are debugging
.node
native modules or specific files:Path
contains.node
→ Include.
-
Click Add, then Apply to activate the filters.
3. Run Your Node.js Script via PowerShell
-
In your elevated PowerShell session, run the Node.js script as you normally would. For example:
powershellCopy codenode app.js
If you're troubleshooting, you can also add Node.js debugging flags like
--trace-warnings
orNODE_DEBUG=module
. -
Procmon will automatically pick up the Node.js process and start capturing its events based on the filters you applied.
4. Monitor and Capture the Events
-
Observe the live stream of system calls in Procmon.
- Look for:
FILE NOT FOUND
(missing files or modules).ACCESS DENIED
(permission issues).- Registry operations (e.g., environment variables being queried).
- Look for:
-
Pause logging after the error occurs to avoid unnecessary data capture:
- Click the magnifying glass icon or press
Ctrl + E
to stop capturing events.
- Click the magnifying glass icon or press
5. Save and Analyze the Log
-
Save the filtered log:
- Go to File > Save and save it as a
.PML
file for later analysis or sharing.
- Go to File > Save and save it as a
-
Analyze the sequence of events leading up to the error:
- Look for failed system calls (
Result = FILE NOT FOUND
,ACCESS DENIED
, etc.). - Focus on
.node
file operations, registry reads, and process threads.
- Look for failed system calls (
6. Additional Tips
-
If Node.js is launched dynamically or spawns child processes:
- Adjust the filter to include child processes by enabling Options > Enable Advanced Output.
- Alternatively, add
Parent PID
to the display columns to trace spawned processes.
-
Use Procmon’s Highlighting feature:
- Highlight events like
FILE NOT FOUND
orACCESS DENIED
for quick identification.
- Highlight events like
Example Workflow Summary
- Start Procmon.
- Filter for
Process Name is node.exe
. - Run
node app.js
in an elevated PowerShell session. - Observe and capture the system calls/events leading to
ERR_DLOPEN_FAILED
. - Save the log and analyze the failure.
This approach ensures a clear view of what system-level interactions Node.js attempted before encountering the error. Let me know if you need help interpreting specific logs!
Describe the bug
On Windows, when running a fresh install by downloading the executable directly from GitHub, the executable works perfectly. However, after closing the agent and trying to open it again, a
ERR_DLOPEN_FAILED
error is raised. Note that the process is being run in a VirtualBox VM.I have verified that all the files exist where they should be. The path
C:\Users\...\node.napi.node
is an actual binary which exists and can be opened. This happened after I attempted to run a node while one was already running using an alternative node path.It doesn't matter how I reset the state, removing temp files, removing cache, reinstalling binary, reinstalling node, nothing works.
To Reproduce
Expected behavior
Polykey should launch as expected every time
Screenshots
Platform
Additional context
Error: unknown memory
.node
to run it.node.napi.node
file gave us the expectedMODULE_NOT_FOUND
error, and the file is also not locked as we can still open and read the file.Notify maintainers
This is a general bug, so there is no one maintainer.
@aryanjassal @tegefaulkes
The text was updated successfully, but these errors were encountered: