Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

puppet agent works interactively but refuses to start service #2459

Closed
rismoney opened this issue Jan 13, 2024 · 15 comments
Closed

puppet agent works interactively but refuses to start service #2459

rismoney opened this issue Jan 13, 2024 · 15 comments
Labels
bug Something isn't working triaged Jira issue has been created for this

Comments

@rismoney
Copy link

rismoney commented Jan 13, 2024

After upgrading, either by running an 8.x install either by first uninstalling a 7.x install or by running it over the top, puppet installs successfully on windows 2019/2022 however I receive errors on the service. Seems to happen moreso on core, but it occurs on both desktop experience and regular. Sometimes after 5-10 starts it will actually start, but usually if stopped, it refuses to start again.

Failed to transition the puppet service to the SERVICE_RUNNING state. Detail: Failed to start the service: The service did not respond to the start or control request in a timely fashion.

Running puppet agent interactively works perfectly fine. rolling back to 7.x (even 7.27.0) works, as does starting the 7.x service.
The issue is exclusively on the service starting cleanly after installation. A reboot does not fix it. Nor does removing C:\programdata\puppetlabs, or refreshing to certs.

Any other additional information needed, let me know.

@rismoney rismoney added the bug Something isn't working label Jan 13, 2024
@joshcooper
Copy link
Contributor

I haven't been able to reproduce using 8.3.1. I've started & stopped the puppet service many times using the service control manager & net start, but no failures.

Is it possible the service is loading a different puppet.conf than is used when running puppet agent -t?

Are you running the service as LocalSystem or domain user account?

Might also try checking that Windows Defender or other other AV isn't blocking the service?

@rismoney
Copy link
Author

rismoney commented Jan 18, 2024

I am only able to reproduce. I have eliminated defender. I have tried both localsystem and domain user. I dont know how it would reference a different puppet.conf. It is the only one in the ProgramData directory. I have tried removing from domain, eliminating gpo.

The node I have been testing the most on is Server Core, but I see it on Win2019 desktop experience.

Most of the nodes do not have internet, but that doesn't seem to matter either as I see it on internet accessible nodes as well (no proxy). I am going to try a clean ISO install of windoes next. But it is frustrating that I cannot get a real error messge, as it seems to fail before the service launches.

@rismoney
Copy link
Author

This appears to be an issue with our NTP software and not the puppet agent. I am not entirely sure why, but upgrading the NTP software to the latest version seems to have remediated. If this is still a problem, I will re-open. Thank you for your time!

@joshcooper
Copy link
Contributor

Thanks for letting us know!

@rismoney rismoney reopened this Jan 18, 2024
@rismoney
Copy link
Author

I think there is an issue between puppet and the ntp software, but I am not sure what it is. It seems as though something in ruby 3.2 ffi is querying services and getting hung up on this service. I am not sure why or what or what changed in puppet 8.x that is causing this condition. It definitely doesn't present itself in version 7.x or if I stop the ntp client. I think something puppet is doing (not every restart-service) is causing it to fail.

I am running the client in here- https://www.greyware.com/download/shareware.asp

@rismoney
Copy link
Author

Alright - the difference between the ntp software, and all the other services is that this particular service name contains spaces.

I believe there is a bug maybe in the Puppet::Util::Services, whereby if the service name, (effectively the registry key in HKLM:/CurrentControlSet/Services/) has a space in it, then the daemon is failing in to initiate.

This was easily reproducable using the software above, and hacking the regkey to not have a space in it (renamed it from "Domain Time Client" to DomainTimeClient, and rebooting. This is a guess, but I believe, potentially any service that comes before puppet alphabetically with a space in it, might cause issue.

My understanding is that this is indeed a valid service name, but puppet isn't handling it properly. I could recommend the vendor Greyware, modify their service name, but I think it should be fixed on puppet's side.

@joshcooper
Copy link
Contributor

I don't think I've ever seen a service with a space in the service name before (not to be confused with display name). I'm pretty sure that's asking for trouble, but the docs don't say you can't:

https://learn.microsoft.com/en-us/windows/win32/api/winsvc/nf-winsvc-createservicew

[in] lpServiceName

The name of the service to install. The maximum string length is 256 characters. The service control manager database preserves the case of the characters, but service name comparisons are always case insensitive. Forward-slash (/) and backslash () are not valid service name characters.

That said, puppet does not have trouble resolving the service:

C:\>irb
irb(main):001:0> require 'puppet'
=> true
irb(main):002:0> Puppet.initialize_settings
=> {}
...
irb(main):006:0> Puppet::Util::Windows::Service.services.select {|k,v| k.match? 'Domain Time Client'}
=> {"Domain Time Client"=>{:display_name=>"Domain Time Client", :service_status_process=>#<Puppet::FFI::Windows::Structs::SERVICE_STATUS_PROCESS:0x000000000b8a0998>}}
irb(main):007:0> Puppet::Util::Windows::Service.service_start_type('Domain Time Client')
=> :SERVICE_AUTO_START
...
irb(main):011:0> Puppet::Util::Windows::Service.logon_account('Domain Time Client')
=> "LocalSystem"

Is it possible the service is protected?

@rismoney
Copy link
Author

rismoney commented Jan 19, 2024

I went down the exact same path actually today. I totally agree with everything you wrote.
I agree with you it's asking for trouble, however the domain time vendor reported back, that they cannot make the change as it would be breaking their compatibility promises. I have seen one other service from OneDrive Updater with a space in it. So it is definitely an edge case but not illegal as the win32 api states.

I too concurred that querying the service (all services for that matter) via irb through the Puppet::Util::Windows::Service enumerates properly.

Based on the procmon, enumeration of all services is definitely happening when this particular service is stopped, but not when it is running.

To my knowledge, the service is not protected. I thought that protected services are used moreso for drivers with a sys file that has been digitially signed. I believe this to be a 'simple' exe. I wasn't able to follow along with all the daemon code, but is it possible the values are somehow getting munged between the FFI stuff, and the service string with the space and causing an exception killing the daemon? As mentioned this behavior all started in Puppet8+ and does not present on any version before 7 or before.

The weirdest part is that "sometimes" it will actually start, but I think its really a bad start, because after a runinterval cycles then the agent will effectively die.

@rismoney
Copy link
Author

So is this something that can be fixed in the daemon or do we need more information? Were you able to reproduce? I can provide any info needed, ie perfmon captures.

Thank you for all your help in this matter.

@joshcooper
Copy link
Contributor

I installed the service you mentioned and couldn't reproduce the failure.

@rismoney
Copy link
Author

Which OS?

@rismoney
Copy link
Author

rismoney commented Feb 2, 2024

I think the daemon might not be using Puppet::Util::Windows::Service. I think its doing something different using FFI. I still can't get to the bottom of this.

Copy link

github-actions bot commented May 7, 2024

Migrated issue to PA-6394

@joshcooper
Copy link
Contributor

This should hopefully be fixed by puppetlabs/puppet#9338

@joshcooper
Copy link
Contributor

This was fixed in puppetlabs/puppet#9386 and backported to 7.x in puppetlabs/puppet#9389

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triaged Jira issue has been created for this
Projects
None yet
Development

No branches or pull requests

3 participants