-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
automod: mentions spike rule #482
Conversation
automod/rules/spammentions.go
Outdated
if !newMentions { | ||
return nil | ||
} | ||
if mentionHourlyThreshold <= evt.GetCountDistinct("mentions", did, automod.PeriodHour) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly, I think it's basically impossible for something to trigger on a single message, due to the order of events: counter increments get buffered until persist time, so if we're reading the counts on the same topic like this during the same rule function, we're getting the data from before we started.
That's maybe fine, but seems like maybe we should spawn a todo to write something up about that in a doc block somewhere about things to be expecting as a rule author?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup! And in particular with limits you'll hit most rules on "limit plus one" not the limit number directly. Agree this should be included in rule-writing guide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! It is basically how I would write it.
Before i'd deploy this, i'd probably want to test it a bit, either with a capture of real-world behavior, or by reducing the threshold and testing against local or staging.
automod/rules/spammentions.go
Outdated
if !newMentions { | ||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this skip-a-network-fetch-if-nothing-new is an interesting little idiom. I think my rules are mostly the other way around (read first, increment later), but this is potentially more efficient.
automod/rules/spammentions.go
Outdated
|
||
var _ automod.PostRuleFunc = SpamMentionsRule | ||
|
||
var mentionHourlyThreshold = 20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, my intuition is that we'll need to bump this a lot. one example is follow-friday type threads where people list out a bunch of recommended follows, easily a few dozen.
in theory we could do some data science, but I think what probably works best is what you're doing here implicitly: start with just a "flag" which will result in a slack notification, and we can see how legit the hits are.
one workflow thing that will need improvement is flag naming. right now once a flag gets set, it sticks around forever. if we change the theshold, or re-write the rule entirely, it would make sense to purge all the old flags. we could use manually process to work around this (prefix new flags with "dev-" or "beta-"), but there are probably better options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, daily numbers might be useful too then. Follow-friday behaviors could tip the hourly count is that number is relatively low, but if we also do a daily number that's considerably higher, and it should be easy for follow-friday to stay below that while real spam still goes above it.
A mechanism for counting hours-today-that-crossed-the-line would also buff that out. Follow-friday shouldn't go above 1 or 2.
automod/rules/spammentions.go
Outdated
for _, facet := range post.Facets { | ||
for _, feature := range facet.Features { | ||
mention := feature.RichtextFacet_Mention | ||
if mention == nil { | ||
continue | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is an ExtractFacets
helper function, though in this case the code looks pretty tight without it:
https://pkg.go.dev/github.com/bluesky-social/indigo/automod/rules#ExtractFacets
automod/rules/spammentions.go
Outdated
if mention == nil { | ||
continue | ||
} | ||
evt.IncrementDistinct("mentions", did, mention.Did) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we use the IncrementDistinctPeriod
variant to reduce the number of hyperloglog data counters in memory. on the other hand, distinct mentions do seem like the kind of thing we might want to persist long-term, or change our mind amount, so seems good to keep as-is.
A rule's name should describe what it does, not presumptively describe the valence of what it hopes that it's detecting.
Introduce a simple new rule for detecting and flagging spikes of mentions within a period.