Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry execution of firmadyne console after error #2

Open
wants to merge 1 commit into
base: firmadyne-v4.1.17
Choose a base branch
from

Conversation

AndrewFasano
Copy link

@AndrewFasano AndrewFasano commented Oct 16, 2022

For some firmwares I see the execution of/firmadyne/console fail with error -2. I suspect this is caused by the filesystem not being fully set up by the time the 5th execve is triggered. This PR detects these sorts of failures by using running call_usermodehelper with UMH_WAIT_EXEC instead of UMH_NO_WAIT and checking the error code.

If there are systems where the console binary is missing, this new current design would log a warning every 5 execs which might be a bit too verbose? But it should make the process of getting a shell much more reliable. Happy to adjust this change if anyone has suggestions for another way to ensure the shell gets started if it fails at first (perhaps increasing the number of execs it waits for every time it retries?)

@ddcc
Copy link
Collaborator

ddcc commented Oct 17, 2022

An alternative approach might be to make the execute threshold configurable, instead of fixed at five. So if the user sets it to 0, it'll try launching the console at the first execve, whereas if the user sets it to like -1 or something else, it'll disable launching the console entirely. Also, it'd probably be useful to always print the return value of the execution to syslog.

I'm a little less convinced about always reexecuting on every fifth execution if the return value is nonzero. Perhaps only if the return value is ENOENT, but then how do you distinguish between the file actually being missing from just not mounted? I guess I'm not understanding why the file would be missing in the first place, isn't it included in the initial initramfs so it should always be present?

@AndrewFasano
Copy link
Author

Sure, that could definitely work too and I'd be open to making that change if that's how you want it.

But in my mind, I imagine a typical Firmadyne user probably doesn't want to do any manual analysis or trial and error to figure out how many execs to wait for, they just want a shell to show up when they set execute=1. Since we can detect when the shell fails to start and retry again later, it seems like we could design this to just give users what they want.

Since opening this PR, I did a bit more analysis to figure out what's going on and think I have a better solution.

For some context: I'm working with various ARM firmware images (using PANDA as the emulator since it can support multiple serial devices which is unfortunately. not supported in QEMU). I'm not sure if this sort of failure could also happen with MIPS firmware images.

For these FWs a variety of kworkers are started which increment the execute count up to 6. At this point, /firmadyne/console is not available in the filesystem. Shortly after the swapper process runs, then the file is subsequently available.

Now that I've figured this out, I could see a few better approaches, do any of these sound good to you?

  1. Only increment execute if the process name isn't kworker
  2. Don't increment execute until seeing a process named swapper
  3. Retry with a back-off, e.g., after 5 execs, 10 execs, 50 execs, then print an error if we still haven't run successfully

@ddcc
Copy link
Collaborator

ddcc commented Oct 19, 2022

Hmm, I think it would still be useful to dig a bit deeper. AFAIK, the kworker is just a generic kernel worker thread, and swapper is the idle process that gets scheduled when nothing else is runnable. So while there may be some correlation here, I don't think they're the root cause.

Another approach that may make more sense, is to only attempt execution of the shell after a successful mount system call is performed by the kernel, because a new filesystem has been mounted. That way, normal process execution is not affected by re-execution of the shell.

@AndrewFasano
Copy link
Author

Didn't mean to close this, sorry! I haven't had time to try out your suggestions but they seem to be a good idea.

@AndrewFasano AndrewFasano reopened this Nov 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants