Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Consumer] Prevent event bus recursion #79

Closed
timmc-edx opened this issue Jul 20, 2022 · 9 comments
Closed

[Consumer] Prevent event bus recursion #79

timmc-edx opened this issue Jul 20, 2022 · 9 comments
Labels
event-bus Work related to the Event Bus.

Comments

@timmc-edx
Copy link
Contributor

timmc-edx commented Jul 20, 2022

It is possible (though moderately unlikely) that an IDA could inadvertently be configured to both produce a signal to the event bus and also consume that signal from the event bus. This would result in any such event being reproduced and circulated indefinitely on a tight loop, causing quadratically increasing load and eventually an outage.

There are a couple of ways we could avoid this:

  1. When converting an event to a Django signal-send, mark it as having been consumed from the event bus, and do not allow such marked signal-sends to be produced back onto the event bus. This is the simplest option and would only require having a special keyword arg that won't collide with other kwargs on the signal-send.
    • We may eventually start attaching event envelope/header information to signal-sends (e.g. to preserve and convey timestamps) and the mere presence of this information would suffice as a "mark".
  2. Have openedx-events refuse to configure both a consumer and a producer for the same topic in the same IDA (raise exception on startup). This is a bigger hammer and may not be what we want.
  3. Record the IDA-of-origin on any OpenEdxPublicSignal send and copy that onto events and signal-sends when producing and consuming, then in the consumer drop any event that originated on the IDA on a previous iteration. Much more complicated, but would handle the (less likely) case of mutual recursion between 2+ IDAs.

Any such safeguard would not be able to detect anything that explicitly receives, recreates, and re-sends a signal, but that sort of thing is probably not worth worrying about.

@timmc-edx timmc-edx added event-bus Work related to the Event Bus. backlog To be put on a team's backlog or wishlist labels Jul 20, 2022
@timmc-edx timmc-edx moved this to Todo in Arch-BOM Jul 20, 2022
@timmc-edx timmc-edx added this to the [Event Bus] Future milestone Jul 20, 2022
@ormsbee
Copy link

ormsbee commented Aug 4, 2022

Some random thoughts:

CloudEvents does have a source field that we could use to see where something came from.

That being said, I think that option (2) here is a nice, big, simple hammer and we should use it until it starts getting in someone's way and then re-evaluate at that time. If a service is catching an event that it emitted, it might as well use the signal version of it instead, right?

In general, I think we should try to make it so that events either have one producer and many receivers, or many producers and one receiver--partly so we can avoid cycle issues. I'm sure there are valid use cases outside of those two modes, but I feel like any such use cases should be strongly examined to check for alternatives.

@timmc-edx
Copy link
Contributor Author

I was somewhat partial to option 2, but one wrinkle is that in our current semi-prototype code we're making explicit producer and consumer calls, which would bypass the protection that that option contemplates. (It would only work if all "signal teleporting" is fully configuration-based, so that openedx-events would have complete visibility and control.)

Option 1 (and maybe option 3) works even if producer/consumer calls are in code rather than config, since it happens at the signal/event translation layer, under the control of openedx-events or its implementations. So I'm leaning that direction at the moment.

@robrap robrap changed the title Prevent event bus recursion [Defer] Prevent event bus recursion Aug 9, 2022
@ormsbee
Copy link

ormsbee commented Aug 15, 2022

I was somewhat partial to option 2, but one wrinkle is that in our current semi-prototype code we're making explicit producer and consumer calls, which would bypass the protection that that option contemplates. (It would only work if all "signal teleporting" is fully configuration-based, so that openedx-events would have complete visibility and control.)

Sorry if this is a silly question, but why are we making explicit producer/consumer calls?

@timmc-edx
Copy link
Contributor Author

timmc-edx commented Aug 16, 2022

Oh, just because we don't have the config-based produce/consume code yet! That code should go away relatively soon.

EDIT: See #89

@robrap robrap changed the title [Defer] Prevent event bus recursion [Consumer] Prevent event bus recursion Nov 15, 2022
@robrap
Copy link
Contributor

robrap commented Jan 10, 2023

  • I could see a service wanting to consume its own events for async processing as an alternative to celery. This is not an urgent requirement, but seems like a reasonable possible future use case once events are flying.
  • This is a downside of reusing the same signal on both sides of the bus.
  • After Discovery: Signal metadata, event bus consumers, and timestamps #162 is completed, we will have two different send_event calls, one for each side of the bus, so we could introduce a metadata field that details which side of the bus we are on.
  • Preventing ourselves from re-sending the same events over the bus multiple times is probably the simple part, because it could be encapsulated in the libraries. The annoying part would be that service plugins would want to avoid listening to the out-of-process copy of the signal.

@robrap robrap removed the status in Arch-BOM Mar 23, 2023
@robrap robrap removed the backlog To be put on a team's backlog or wishlist label Apr 20, 2023
@robrap robrap removed this from the [Event Bus] Future milestone Jun 8, 2023
mariajgrimaldi pushed a commit that referenced this issue Feb 12, 2024
@mariajgrimaldi
Copy link
Member

mariajgrimaldi commented Feb 15, 2024

Hi there, @timmc-edx @robrap. Was this solved by #312? If so, should we close this, or is there any other work to do?

@timmc-edx
Copy link
Contributor Author

It should be solved by that PR. I'd prefer to test it locally before closing out the issue, but if I'm being honest, that's lower priority than a lot of stuff I have on my plate at the moment -- so it might make sense to just close this out.

(I didn't entirely follow the event-tracking data flow, but it sounds like they maybe needed that PR in order to stop recursion from happening in another PR, so... I guess it worked? Maybe that's all the testing that's needed!)

@bmtcril
Copy link
Contributor

bmtcril commented Mar 5, 2024

This was definitely triggered in the "consumer & producer in the same IDA" scenario for event-tracking, and #312 for sure solved it there. I don't know of any other scenarios that would result in that behavior so I'd suggest this is good to close.

@timmc-edx
Copy link
Contributor Author

Perfect, thanks!

There are more complicated situations that could cause a recursive cascade, but they're much less likely, and impossible to prevent in all of their variations -- so, closing out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
event-bus Work related to the Event Bus.
Projects
None yet
Development

No branches or pull requests

5 participants