Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add signals API to send a signal to one or more services #92

Merged
merged 9 commits into from
Dec 8, 2021

Conversation

benhoyt
Copy link
Contributor

@benhoyt benhoyt commented Dec 2, 2021

This implements spec JU029 Pebble: sending signals to services. Specifically:

  • An API to send a signal to one or more running services:
    • POST /v1/signals {"signal": "SIGHUP", "services": ["s1", "s2"]}
  • A Go client to send a signal:
    • c.SendSignal(&client.SendSignalOptions{Signal: "SIGHUP", Services: []string{"s1, "s2"})
  • A CLI command to send a signal:
    • pebble signal HUP s1 s2 ...

One question I had was whether the API should return an error if the service being signaled is not running. After discussion with Harry we decided to mimic the behavior of the Unix kill command, which gives an error if the pid is not active. But that's open for debate -- it will mean if you call send_signal() in a charm it'll raise an exception if the service hasn't been started yet (maybe that's a good thing).

@@ -168,11 +168,15 @@ type helpCategory struct {
var helpCategories = []helpCategory{{
Label: "Basics",
Description: "basic management",
Commands: []string{"run", "autostart", "start", "stop", "exec", "services", "logs"},
Commands: []string{"run", "exec"},
Copy link
Contributor Author

@benhoyt benhoyt Dec 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of things here:

  • I added the new signal command for this commit.
  • I'm adding in restart and replan, missed here when those were added.
  • "basic management" is getting too long, especially with the checks command that will be added in another PR. I think the category "service management" is good, though I'm not sure about the "basic management" name anymore -- suggestions for a better name/grouping?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing my mistake with restart/replan

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I like this grouping

@@ -321,15 +320,3 @@ func (s *apiSuite) TestServicesReplan(c *C) {
c.Check(tasks[0].Summary(), Equals, `Start service "test1"`)
c.Check(tasks[1].Summary(), Equals, `Start service "test2"`)
}

type wrappedServiceManager struct {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wasn't being used anywhere, deleting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I don't even know why this was here.

@@ -499,14 +499,18 @@ func getAction(config *plan.Service, success bool) (action plan.ServiceAction, o
func (s *serviceData) sendSignal(signal string) error {
switch s.state {
case stateStarting, stateRunning:
sig := unix.SignalNum(signal)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The send-signal functionality already existed via the restart/backoff work; I'm just tweaking the error behavior here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to double check, what's the API we intend to offer here? "HUP", or "SIGHUP"? I suggest enforcing one of them only, and perhaps only "SIGHUP" here as it's the more formal name, which we're opting not to use in the CLI as we have "signal HUP" already.

if err := decoder.Decode(&payload); err != nil {
return statusBadRequest("cannot decode request body: %v", err)
}
if len(payload.Services) <= 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other APIs do we not deal with the wildcard case (might just be autostart?)? Here it probably doesn't make sense just wondering if we should seek consistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, replan and autostart error if you do provide service names, and start, stop, and restart error if you don't (which makes sense for each of those, I think). I don't think it's a good idea (or makes a lot of sense) to send a signal to all services. If someone really wants to they can do something like this:

service_names = [s.name for s in container.get_services()]
container.send_signal('SIGHUP', service_names)


type signalsPayload struct {
Signal string `json:"signal"`
Services []string `json:"services"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should look at adding a ProcessGroup/Group/SignalGroup bool, where if true the signal is sent to the process group of the service rather than just the pid of the process.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That way a SIGUSR1 can just be sent to the service process and not children in the same process group.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea. I was wondering whether it should send to children or not. I guess there are two questions:

  1. What should the default be, send to just the service, or send to group? I think your approach of sending to just the service is a reasonable default (least potentially destructive).
  2. What should the field be named? I think group is not great, as that indicates a group name (and is used as a group name in other Pebble APIs and the plan config). Maybe SendToGroup bool -- longer but nice and clear?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure if we need to signal children or not, but I'm pretty sure we should not have that as the default behavior. We have no idea about what sub-processes the given service will have, and signaling the entire group seems somewhat of a bad practice. The one exception being SIGKILL, probably, which we don't want to be sent this way in any case.

As such, I suggest just limiting to the process itself for now. As we learn why people want to signal the group, we can design the API and implement it accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I've limited it to just the process for now, not the group.

logger.Noticef("Sending %s to service %q", signal, s.config.Name)
err := syscall.Kill(-s.cmd.Process.Pid, unix.SignalNum(signal))
err := syscall.Kill(-s.cmd.Process.Pid, sig)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect for your usage, this signals the process group (-pid is a process group), which I think only makes sense in service stop.

I think we need to review this anyway for service stop:

  • Stop should first signal SIGTERM to the service process (not the group, the process should SIGTERM its child processes)
  • Wait the grace time
  • Always send SIGKILL to the process group (even if the process exits, we need to signal the whole process group to ensure all child processes are killed).

@niemeyer does this sound right to you?

Copy link
Contributor Author

@benhoyt benhoyt Dec 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good call -- I fairly new to this signal handling stuff and hadn't considered that. What you propose sounds reasonable. I'll try to do a bit more reading on it, and probably add this to the agenda for tomorrow night's spec review meeting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this on video and have a solution -- will follow up in a separate PR as it's not related to the send-signal API.

Copy link
Contributor

@niemeyer niemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!

Assuming these comments make sense, LGTM once they are sorted. Otherwise, happy to chat further.

client/signals.go Outdated Show resolved Hide resolved
}, {
Label: "Services",
Description: "service management",
Commands: []string{"autostart", "start", "stop", "restart", "services", "logs", "signal"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels worth reorganizing indeed, but the groups feels a bit blurred, in the sense that it's not clear what goes inside which given their descriptions. Also, as a previous problem we already had, the descriptions are currently not worded evenly: some of them are actions ("view changes") while others are passive ("basic/service management"). Finally, "basic" now has "run" and "exec", which are completely unlike each other, and the proximity of their names doesn't help either.

So maybe we should dedicate a further moment to try to help the situation here and have more distinctive "buckets".

Here is an initial attempt:

"Run", "run pebble"
    {"run", "help", "version"}
"Plan", "view and change configuration"
    {"add", "plan", "replan"}
"Services", "manage services"
    {"services", "logs", "start", "restart", "signal", "stop", "replan"}
"Changes", "manage changes and their tasks"
    {"changes", "tasks"}
"Warnings", "manage warnings"
    {"warnings", "okay"}

Some notes:

  • Isn't "autostart" the old version of "replan"? That means we can drop the old one.
  • If "autostart" and "replan" are indeed friends, it's curious we had them in different groups. What's the right one?
  • I've tried to order services inside the groups using some subjective logic while taking into account their meaning, relevance, and proximity

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above is missing "exec", which doesn't really fit anywhere. I've updated suggesting adding "Operations" which will have "exec" and (soon) "checks". Let's discuss tonight.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I've added a "Files" group for files and exec.

cmd/pebble/cmd_signal.go Outdated Show resolved Hide resolved
name must be uppercase to distinguish it from the service names, for example:

pebble signal HUP mysql nginx
pebble signal SIGHUP mysql nginx # full signal name is fine too
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not strongly against the optional SIG prefix, but can we please wait a bit before we support it? It feels like consistency generally wins, and from the two lines above the first one reads the best. This way we also don't even need to document the exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed it from the docs. However, it's still supported as we have to add the SIG prefix for the API anyway, so it would be explicit code to prevent it from working. I think consistently using HUP (for example) in the docs but allowing the alternative is fine.

@@ -321,15 +320,3 @@ func (s *apiSuite) TestServicesReplan(c *C) {
c.Check(tasks[0].Summary(), Equals, `Start service "test1"`)
c.Check(tasks[1].Summary(), Equals, `Start service "test2"`)
}

type wrappedServiceManager struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I don't even know why this was here.


type signalsPayload struct {
Signal string `json:"signal"`
Services []string `json:"services"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure if we need to signal children or not, but I'm pretty sure we should not have that as the default behavior. We have no idea about what sub-processes the given service will have, and signaling the entire group seems somewhat of a bad practice. The one exception being SIGKILL, probably, which we don't want to be sent this way in any case.

As such, I suggest just limiting to the process itself for now. As we learn why people want to signal the group, we can design the API and implement it accordingly.

@@ -499,14 +499,18 @@ func getAction(config *plan.Service, success bool) (action plan.ServiceAction, o
func (s *serviceData) sendSignal(signal string) error {
switch s.state {
case stateStarting, stateRunning:
sig := unix.SignalNum(signal)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to double check, what's the API we intend to offer here? "HUP", or "SIGHUP"? I suggest enforcing one of them only, and perhaps only "SIGHUP" here as it's the more formal name, which we're opting not to use in the CLI as we have "signal HUP" already.

if err != nil {
return err
}

case stateBackoff, stateTerminating, stateKilling, stateStopped:
return fmt.Errorf("cannot send signal while service is stopped or stopping")
return fmt.Errorf("service is not running")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to lack the context ("cannot send signal ...").

The message below seems to lack a verb as well ("is invalid" or similar).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, given the context added below this is okay.

internal/overlord/servstate/manager.go Show resolved Hide resolved
@benhoyt
Copy link
Contributor Author

benhoyt commented Dec 7, 2021

Just to double check, what's the API we intend to offer here? "HUP", or "SIGHUP"? I suggest enforcing one of them only, and perhaps only "SIGHUP" here as it's the more formal name, which we're opting not to use in the CLI as we have "signal HUP" already.

Yep, it's always the full SIGHUP in the API (also in the existing exec API).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants