Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

automod: mentions spike rule #482

Merged
merged 2 commits into from
Dec 19, 2023
Merged

Conversation

warpfork
Copy link
Contributor

Introduce a simple new rule for detecting and flagging spikes of mentions within a period.

@warpfork warpfork requested a review from bnewbold December 15, 2023 13:44
if !newMentions {
return nil
}
if mentionHourlyThreshold <= evt.GetCountDistinct("mentions", did, automod.PeriodHour) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, I think it's basically impossible for something to trigger on a single message, due to the order of events: counter increments get buffered until persist time, so if we're reading the counts on the same topic like this during the same rule function, we're getting the data from before we started.

That's maybe fine, but seems like maybe we should spawn a todo to write something up about that in a doc block somewhere about things to be expecting as a rule author?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup! And in particular with limits you'll hit most rules on "limit plus one" not the limit number directly. Agree this should be included in rule-writing guide.

Copy link
Collaborator

@bnewbold bnewbold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! It is basically how I would write it.

Before i'd deploy this, i'd probably want to test it a bit, either with a capture of real-world behavior, or by reducing the threshold and testing against local or staging.

Comment on lines 30 to 32
if !newMentions {
return nil
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this skip-a-network-fetch-if-nothing-new is an interesting little idiom. I think my rules are mostly the other way around (read first, increment later), but this is potentially more efficient.


var _ automod.PostRuleFunc = SpamMentionsRule

var mentionHourlyThreshold = 20
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, my intuition is that we'll need to bump this a lot. one example is follow-friday type threads where people list out a bunch of recommended follows, easily a few dozen.

in theory we could do some data science, but I think what probably works best is what you're doing here implicitly: start with just a "flag" which will result in a slack notification, and we can see how legit the hits are.

one workflow thing that will need improvement is flag naming. right now once a flag gets set, it sticks around forever. if we change the theshold, or re-write the rule entirely, it would make sense to purge all the old flags. we could use manually process to work around this (prefix new flags with "dev-" or "beta-"), but there are probably better options.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, daily numbers might be useful too then. Follow-friday behaviors could tip the hourly count is that number is relatively low, but if we also do a daily number that's considerably higher, and it should be easy for follow-friday to stay below that while real spam still goes above it.

A mechanism for counting hours-today-that-crossed-the-line would also buff that out. Follow-friday shouldn't go above 1 or 2.

Comment on lines 18 to 23
for _, facet := range post.Facets {
for _, feature := range facet.Features {
mention := feature.RichtextFacet_Mention
if mention == nil {
continue
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is an ExtractFacets helper function, though in this case the code looks pretty tight without it:
https://pkg.go.dev/github.com/bluesky-social/indigo/automod/rules#ExtractFacets

if mention == nil {
continue
}
evt.IncrementDistinct("mentions", did, mention.Did)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we use the IncrementDistinctPeriod variant to reduce the number of hyperloglog data counters in memory. on the other hand, distinct mentions do seem like the kind of thing we might want to persist long-term, or change our mind amount, so seems good to keep as-is.

A rule's name should describe what it does, not presumptively describe
the valence of what it hopes that it's detecting.
@bnewbold bnewbold merged commit 998c066 into main Dec 19, 2023
7 checks passed
@bnewbold bnewbold deleted the warpfork/automod-mentions-spike-rule branch December 19, 2023 07:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants