automod: test capture framework #470

bnewbold · 2023-12-09T06:35:04Z

This PR is currently rebased on top of #466, to demonstrate testing that rule. UPDATE: that PR merged, so now against main

Adds a hepa command to "capture" the current state of a real-world account: currently some account metadata (identity, profile, etc), plus some recent post records. This gets serialized to JSON for easy dumping to file, like:

go run ./cmd/hepa/ capture-recent atproto.com > automod/testdata/capture_atprotocom.json

Then, a test helper function which loads this file, and processes all the post records using an engine fixture.

Combined, these fixtures make it easy to do test-driven-development of new rules. You find an account which recently sent spam or violated some policy, take a capture snapshot, set up a test case, and then write a rule which triggers and satisfies the test.

Some notes:

tried moving the "test helpers" in to a sub-package (indigo/automod/automodtest) but hit a circular import, so left where it is
this won't work with all rule types, and some captures/rules may need additional mocking (eg, additional identities in the mock directory), but that should be fine
it usually isn't appropriate to capture real-world content in to public code. we can be careful about what we add in this repo (indigo); the "hackerdarkweb" example included in this PR seems fine to snapshot to me. the code does strip "Private" account metadata by default.
probably could use docs/comments. i'm not sure where best to put effort, feedback welcome!

warpfork · 2023-12-11T16:35:26Z

fwiw about packages and cyclic imports: since that issue comes up lot as code grows, and file moving is such a bummer for git history, I have kind of a semi-standard convention for preemptively avoiding that:

foo package -- as empty as possible! contains mostly helper functions, usage examples, and aliases. top (e.g. largest transitives) of the import graph.
foo/core package -- most of the type definitions and interfaces. bottom (e.g. fewest transitives) of the import graph.
foo/feature_a -- imports foo/core. Maybe imported and used directly by library consumers; maybe just exposed via foo's use of it and aliasing of it.

There's lots of ways to lay out packages, but I like the above because people usually start looking for docs and examples at the shorter paths into the package tree.

Putting all the most essential types in the rootwards packages is a natural thing to do... but almost invariably creates frustration in the long run. Having the highest level usages and the coremost type defns in the same package is pretty much guaranteed to result in cycle issues when attempting to extract things.

Aliasing (especially nowadays that we have type aliasing) also makes it pretty easy to have the rootmost package expose "everything" a downstream consumer needs, without necessarily exposing them to the internal package graph details. (So for example, feature_a package refers to core.WhateverType a lot, but... foo can still expose func(WhateverType) without tossing more package imports at its consumers by also exporting type WhateverType = core.WhateverType.)

(Maybe this is all familiar old hat to you, but, 2c :))

bnewbold · 2023-12-14T16:12:45Z

package structure: makes sense! I'd be open to refactoring automod to fit that pattern. I think we should wait until a moment when there are not other PRs in flight though.

Manually resolved conficts: automod/engine_test.go automod/rules/fixture_test.go

bnewbold · 2023-12-14T17:26:59Z

Merged main, resolving conflicts.

warpfork

LGTM, including all the caveats you mentioned. One question about the schema.

warpfork · 2023-12-13T13:51:57Z

cmd/hepa/main.go

@@ -198,12 +200,42 @@ var runCmd = &cli.Command{
 	},
 }

+// for simple commands, not long-running daemons
+func configEphemeralServer(cctx *cli.Context) (*Server, error) {


warpfork · 2023-12-15T12:31:31Z

automod/testing.go

+// Test helper which processes all the records from a capture. Intentionally exported, for use in other packages.
+//
+// This method replaces any pre-existing directory on the engine with a mock directory.
+func ProcessCaptureRules(e *Engine, capture AccountCapture) error {


Definitely feeling the lack of packages here or other strong organizational cues for what's testing-land and what's not. Understood that that's a future PR topic, but just want to ratify that outloud :)

The comment being explicit that this function is rewiring the engine is definitely very very good and appreciated 👍 , because I wouldn't necessarily presume that from the signature or name otherwise.

warpfork · 2023-12-15T12:34:16Z

automod/fetch.go

+
+type AccountCapture struct {
+	CapturedAt  syntax.Datetime                     `json:"capturedAt"`
+	AccountMeta AccountMeta                         `json:"accountMeta"`


Okay, main question: if there will be rules that want to look at multiple identities, should we brace for that now by making this a list or a map? Or is that overkill.

That is a reasonable question, but I think it is less ergonomic and maybe overkill. For simple rules it is nice to have a direct struct field to get access to the account meta for the owner/creator/author of the repo/record, which is by far the common access case.

For fetching more account meta, I think the pattern we should use is tiered caching. The rule code should just call evt.GetAccountMeta(did-or-handle), and the engine should read-through whatever layers of cache to get that info. That caching may even include per-event caching (not implemented currently!) which would be 1) fast and 2) ensure consistent behavior between rules executed on the same event (aka, the whole event should see the same account meta for all accounts).

This is following the facebook FXL functional pattern of basically memoizing all function calls within the scope of an event.

(side note: still mulling a better name than "event")

💭 I guess we could have an internal accountMetaMap and have a helper method (CurrentAccount()) which would access the current account?

Manually resolved conflicts: automod/event.go

bnewbold added 5 commits December 8, 2023 19:43

automod: move content fetching to new file

6052e0f

automod: content capture framework for testing rules

b4a5c15

automod: refactor engine test fixture in to exported code

5711e36

rule test for multi-identical-reply

ccf1dbe

automod: always persist flags (redis or in-process, not mod API)

02aa19c

bnewbold requested a review from warpfork December 9, 2023 06:35

Merge branch 'main' into bnewbold/test-capture

6e13b3d

Manually resolved conficts: automod/engine_test.go automod/rules/fixture_test.go

warpfork approved these changes Dec 15, 2023

View reviewed changes

bnewbold added 2 commits December 15, 2023 19:57

Merge branch 'main' into bnewbold/test-capture

1b0e668

Manually resolved conflicts: automod/event.go

automod: resolve more merge conflicts

0a4a842

bnewbold merged commit 705a15d into main Dec 15, 2023
7 checks passed

bnewbold deleted the bnewbold/test-capture branch December 15, 2023 13:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

automod: test capture framework #470

automod: test capture framework #470

bnewbold commented Dec 9, 2023 •

edited

Loading

warpfork commented Dec 11, 2023

bnewbold commented Dec 14, 2023

bnewbold commented Dec 14, 2023

warpfork left a comment

warpfork Dec 13, 2023

warpfork Dec 15, 2023

warpfork Dec 15, 2023

bnewbold Dec 15, 2023

bnewbold Dec 15, 2023

automod: test capture framework #470

automod: test capture framework #470

Conversation

bnewbold commented Dec 9, 2023 • edited Loading

warpfork commented Dec 11, 2023

bnewbold commented Dec 14, 2023

bnewbold commented Dec 14, 2023

warpfork left a comment

Choose a reason for hiding this comment

warpfork Dec 13, 2023

Choose a reason for hiding this comment

warpfork Dec 15, 2023

Choose a reason for hiding this comment

warpfork Dec 15, 2023

Choose a reason for hiding this comment

bnewbold Dec 15, 2023

Choose a reason for hiding this comment

bnewbold Dec 15, 2023

Choose a reason for hiding this comment

bnewbold commented Dec 9, 2023 •

edited

Loading