Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(table-test): make guidance less presciptive #174

Merged
merged 7 commits into from
May 9, 2023

Conversation

tyler-french
Copy link
Contributor

Resolves #173

This PR adjusts the recommendations in the Table Test section to be less prescriptive and promotes usage that better aligns with readability and maintainability.

@CLAassistant
Copy link

CLAassistant commented Mar 15, 2023

CLA assistant check
All committers have signed the CLA.

@tyler-french tyler-french force-pushed the table-test branch 2 times, most recently from 2011d04 to 3690657 Compare March 15, 2023 19:44
Copy link
Contributor

@prashantv prashantv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fwiw, I disagree with this recommendation.

I find it useful to have error and non-error cases in the same test table, so when looking at cases, you can see the small changes that cause something to become invalid/valid.

I think having simple test-logic is important to ensure the test is correct, and don't think that wantErr is complex enough to break correctness -- it's also a commonly used pattern in the stdlib.

I don't think the "Bad" example is bad in this case, it's pretty reflective of real usage of test tables.

That said, the test runner should be simple: avoiding nested ifs, many ifs. Basically, the test runner shouldn't be so complex that it's not trivial to follow.

Comment on lines 6 to 8
If a system under test needs to be tested against _multiple conditions_ where
*only* the **inputs** and **outputs** change, a table-driven test *can* and **should**
be used to reduce redundancy and improve readability.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple of nits:

  • I think we can tone it down a little (e.g., there's a lot of emphasis only, can, should)
  • I think it's useful to provide some context for why

@abhinav
Copy link
Collaborator

abhinav commented Mar 16, 2023

Fwiw, I disagree with this recommendation.

I find it useful to have error and non-error cases in the same test table, so when looking at cases, you can see the small changes that cause something to become invalid/valid.

I agree that just forbidding sharing of tables for success and failure cases is bad.
But I think it might be possible to express that nuance here better while pushing the policy of clean tables.

@mway
Copy link
Contributor

mway commented Mar 16, 2023

I would focus more on having concise "test goals" and test scope in general. For example, it's probably uncontroversial that tests should avoid branching (or should branch as little as possible) and unnecessary complexity, but I think that's probably more tractable through the lens of simpler ideas like test goals and scope.

A contrived but not unrealistic example, assuming we have the following API:

package foo

type Foo interface {
  Name() string
  HandleBar(Bar) error // Bar.Name() must match Foo.Name()
}

func NewFoo(name string) (Foo, error) {
  if len(name) == 0 {
    return nil, ErrEmptyName
  }

  return foo{
    name: strings.ToLower(name),
  }, nil
}

type Bar interface {
  Name() string
}

One might write a "comprehensive table test" like the following:

package foo_test

func TestFoo(t *testing.T) {
  cases := map[string]struct{
    giveName           string
    giveLogger         *zap.Logger
    wantError          error
    giveBar            foo.Bar
    wantHandleBarError error
  } {
    "nominal": {
      giveName:           "hello",
      giveLogger:         zap.NewNop(),
      wantError:          nil,
      giveBar:            foo.NewBar("hello"),
      wantHandleBarError: nil,
    },
    "empty name": {
      giveName:           "",
      giveLogger:         zap.NewNop(),
      wantError:          foo.ErrEmptyName,
      giveBar:            nil, // unreachable in this case
      wantHandleBarError: nil, // unreachable in this case
    },
    "nil logger": {
      giveName:           "hello",
      giveLogger:         nil,
      wantError:          foo.ErrNilLogger,
      giveBar:            nil, // unreachable in this case
      wantHandleBarError: nil, // unreachable in this case
    },
    "nil bar": {
      giveName:           "hello",
      giveLogger:         zap.NewNop(),
      wantError:          nil,
      giveBar:            nil,
      wantHandleBarError: foo.ErrNilBar,
    },
    "mismatched names": {
      giveName:           "hello",
      giveLogger:         zap.NewNop(),
      wantError:          nil,
      giveBar:            foo.NewBar("world"),
      wantHandleBarError: foo.ErrNameMismatch,
    },
  }

  for name, tt := range cases {
    t.Run(name, func(t *testing.T) {
      foo, err := foo.NewFoo(tt.giveName, tt.giveLogger)
      require.ErrorIs(t, err, tt.wantError)
      
      if tt.wantError != nil {
        require.Nil(t, foo)
        return
      }

      require.Equal(t, strings.ToLower(tt.giveName), foo.Name())
      require.ErrorIs(t, foo.HandleBar(tt.giveBar), tt.wantHandleBarError)
    })
  }
}

This example is suboptimal primarily because it's doing mixed-mode testing of Foo: it tests (1) nominal construction, (2) construction errors, (3) construction side effects (Name() check), and both (4) nominal and (5) error cases for a method (HandleFoo).

This is part of what we're trying to hedge against; there's probably too much responsibility here for a single test, and it may become either (a) complex to the point that the test logic may itself need testing, and/or (b) difficult to read and/or maintain, especially over time and multiple feature changes/additions.

However, it's hard to communicate that effectively through "how/when/where to table test" guidance, which probably won't apply everywhere (and may be polarizing enough that it's ignored). If we focus on scope, however, this helps us understand (1) what should be in each table, and (2) by extension, what the test body should do with the table.

For example, assuming we said:

  • Tests should focus on the narrowest unit of behavior possible
  • Tests should contain no if statements if at all possible
  • All table fields should be used in all table tests
  • All test logic should be evaluated for all table cases

then we have something that we can start to apply more generally. Obviously there will still be exceptions, but ideally they'd represent a minority of cases. If we applied that to our scenario:

  • We can split into (1) nominal construction vs errors, (2) input/output translations, and (3) a specific method
  • We can avoid the "if an error is expected return early, else test more things" branching and complexity
  • Because we've split the test into discrete concerns, we can consume all fields in each relevant test, because we're not doing anything conditionally
  • We don't have any skipped lines for any test cases, because we're not doing anything conditionally

which ends up looking like:

func TestNewFoo(t *testing.T) {
  cases := map[string]struct{
    giveName   string
    giveLogger *zap.Logger
    wantFoo    foo.Foo
    wantError  error
  } {
    "nominal": {
      giveName:   "hello",
      giveLogger: zap.NewNop(),
      wantFoo:    foo.NewFoo("hello"),
      wantError:  nil,
    },
    "empty name": {
      giveName:   "",
      giveLogger: zap.NewNop(),
      wantFoo:    nil,
      wantError:  foo.ErrEmptyName,
    },
    "nil logger": {
      giveName:   "hello",
      giveLogger: nil,
      wantFoo:    nil,
      wantError:  foo.ErrNilLogger,
    },
  }

  for name, tt := range cases {
    t.Run(name, func(t *testing.T) {
      foo, err := foo.NewFoo(tt.giveName, tt.giveLogger)
      require.ErrorIs(t, err, tt.wantError)
      require.Equal(t, foo, tt.wantFoo)
    })
  }
}

func TestFoo_Name(t *testing.T) {
  cases := map[string]struct{
    giveName string
    wantName string
  }{
    "lowercase": {
      giveName: "hello",
      wantName: "hello",
    },
    "uppercase": {
      giveName: "HELLO",
      wantName: "hello",
    },
    "mixed case": {
      giveName: "hElLo",
      wantName: "hello",
    },
    "utf8": {
      giveName: "Über",
      wantName: "über",
    },
    // ...
  }

  for name, tt := range cases {
    t.Run(name, func(t *testing.T) {
      x, err := foo.NewFoo(tt.giveName)
      require.NoError(t, err)
      require.Equal(t, tt.wantName, x.Name())
    })
  }
}

func TestFoo_HandleBar(t *testing.T) {
  newTestFoo := func(t *testing.T, name string) foo.Foo {
    x, err := foo.NewFoo(name)
    require.NoError(t, err)
    return x
  }

  cases := map[string]struct{
    foo       foo.Foo
    giveBar   foo.Bar
    wantError error
  } {
    "nominal": {
      foo:       newTestFoo("hello"),
      giveBar:   foo.NewBar("hello"),
      wantError: nil,
    },
    "empty bar name": {
      foo:       newTestFoo("hello"),
      giveBar:   foo.NewBar(""),
      wantError: foo.ErrNameMismatch,
    },
    "mismatched foo and bar names": {
      foo:       newTestFoo("hello"),
      giveBar:   foo.NewBar("hello"),
      wantError: foo.ErrNameMismatch,
    },
    "nil bar": {
      foo:       newTestFoo("hello"),
      giveBar:   nil,
      wantError: foo.ErrNilBar,
    },
  }

  for name, tt := range cases {
    t.Run(name, func(t *testing.T) {
      require.ErrorIs(t, tt.foo.HandleBar(tt.giveBar), tt.wantError)
    })
  }
}

which is much clearer and easier to understand, is more likely to be correct (due to conciseness and the lack of "test business logic"), allows us to exercise more nuanced cases without either table or branch bloat, requires less cognitive overhead, etc.

@tyler-french tyler-french marked this pull request as draft March 16, 2023 17:40
@tyler-french
Copy link
Contributor Author

Converting to draft to address feedback

Copy link
Collaborator

@abhinav abhinav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Thanks for the update @tyler-french.
I realize that this is still a draft, but I gave it a quick pass before we post it for others to review.

src/test-table.md Outdated Show resolved Hide resolved
src/test-table.md Outdated Show resolved Hide resolved
src/test-table.md Outdated Show resolved Hide resolved
src/test-table.md Outdated Show resolved Hide resolved
src/test-table.md Show resolved Hide resolved
src/test-table.md Outdated Show resolved Hide resolved
src/test-table.md Outdated Show resolved Hide resolved
src/test-table.md Outdated Show resolved Hide resolved
@tyler-french
Copy link
Contributor Author

tyler-french commented Apr 18, 2023

Very nice! Thanks for the update @tyler-french. I realize that this is still a draft, but I gave it a quick pass before we post it for others to review.

Thanks for the review @abhinav! I left it as draft so that I could try to incorporate more feedback from @mway

Copy link
Contributor

@mway mway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great, thanks @tyler-french!

Left a few preemptive nits - feel free to use/amend what you think makes sense.

@@ -100,6 +104,87 @@ for _, tt := range tests {
}
```

### Avoid Messy Table Tests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about:

Avoid Unnecessary Complexity in Table Tests

Mostly just thinking about "messiness" probably being more subjective than "complexity".

This can apply to tests more broadly as well, not just table tests, so in the future we could further generalize this section (i.e., to refer to test complexity in general, not only that of table tests) if we want.

Copy link
Contributor Author

@tyler-french tyler-french Apr 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that "Complexity" would probably be more clear/concrete from the perspective of a reader trying to understand when to use this.

Comment on lines 109 to 111
Table tests become difficult to read and maintain if there becomes significant
logic forks within the subtest `for` loop. Table tests should **NOT** be used whenever
there needs to be complex logic added inside this execution loop.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

  • s/if there becomes significant/if there are significant/
  • s/within the subtest for/within the subtest's for/

Alternatively:

Suggested change
Table tests become difficult to read and maintain if there becomes significant
logic forks within the subtest `for` loop. Table tests should **NOT** be used whenever
there needs to be complex logic added inside this execution loop.
Table tests can be difficult to read or maintain if their subtests contain
conditional assertions or other branching logic. Table tests should **NOT** be
used whenever there needs to be complex or conditional logic inside subtests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to take your suggestions but adjusted the wording a bit. I still think it might be more clear to mention that the for loop is generally where the logic that becomes messy ends up.

The word subtest could be a bit ambiguous because it could refer to an entry in the tests slice or the t.Run( call within the for loop.

src/test-table.md Outdated Show resolved Hide resolved
src/test-table.md Outdated Show resolved Hide resolved
src/test-table.md Outdated Show resolved Hide resolved
src/test-table.md Outdated Show resolved Hide resolved
src/test-table.md Outdated Show resolved Hide resolved
@tyler-french tyler-french requested a review from prashantv April 18, 2023 16:07
@tyler-french tyler-french marked this pull request as ready for review April 18, 2023 16:07
Copy link
Collaborator

@abhinav abhinav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM w minor comments.

@tyler-french thanks a lot for doing this.
Be sure to run make style.md to update the top-level style guide file.

This is good as-is, but if you'd prefer for this item to appear in the Table of Contents, you'll want to move it to its own file (test-table-simple.md?) but that's not necessary: we don't have the other subsection (parallel tests) in a separate file either.

Please also wait for approval from @mway.

src/test-table.md Outdated Show resolved Hide resolved
src/test-table.md Outdated Show resolved Hide resolved
@tyler-french tyler-french force-pushed the table-test branch 2 times, most recently from cb12dc0 to f9645e8 Compare April 18, 2023 16:30
@tyler-french
Copy link
Contributor Author

This is good as-is, but if you'd prefer for this item to appear in the Table of Contents, you'll want to move it to its own file (test-table-simple.md?) but that's not necessary: we don't have the other subsection (parallel tests) in a separate file either.

Yeah, I think it's ok to keep this under the "Table Tests" section.

Copy link
Contributor

@prashantv prashantv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I agree in principle with the guidance (keep test bodies simple), I worry that this guidance is pushing for separate test methods to avoid ifs in the test body, but doesn't take into account the impact to readability by splitting tests:

  • Test cases split across different test methods, making it hard to see differences between cases that are leading to different behaviour
  • Test runners will be duplicated, but likely some small differences, but in way that's harder to identify the differences.

A test with multiple mock methods is because the code under test is calling multiple methods -- have we tried going through an exercise of this guidance would affect tests for methods like that, and whether that's better for readability?

My personal preference is to group related test cases together to make it clear how a change to a test case affects behaviour. The test struct should make the runner logic obvious, if that's not the case, then it's probably too complex.

* Ensure that all test logic runs for all table cases

Concretely, that means that you should avoid
having multiple branching pathways (e.g. `shouldError`, `expectCall`, etc.),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: typically it's called wantErr (and the example below uses that too)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the intent was different than wantErr (conventionally speaking) - wantErr is generally an error that you assert (e.g. with errors.Is), but shouldError is probably a boolean that implies branching for conditional assertion.

Having thought about it a bit more, I think we can probably remove this
paragraph and instead call out the things that are bad in the example below; that would help tie them to a concrete example ("how these things can be bad") without necessarily over-prescribing ("these things are always bad").

})
}
}
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add the "Good" example of the above that allows testing the different cases too -- my worry is we're not seriously considering what the alternative would look like to test the same cases, and whether it's actually better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, we should have both good/bad examples where possible, if only to illustrate contrast.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a "good" example and tried to emphasize that this isn't a one-size-fits-all situation. The guidance should be more about avoiding complexity and confusion rather than any specific implementation in particular.

@mway
Copy link
Contributor

mway commented Apr 25, 2023

While I agree in principle with the guidance (keep test bodies simple), I worry that this guidance is pushing for separate test methods to avoid ifs in the test body, but doesn't take into account the impact to readability by splitting tests:

I think it's OK to encourage decomposing tests into smaller units, but maybe we need a caveat? Something like:

Be careful not to over-correct: for example, when testing behavior that
changes based on input, it may be preferable to group similar cases
together to better illustrate how behavior changes across all inputs,
rather than splitting otherwise comparable units into separate tests
and making them harder to compare and contrast.

This also brings to mind another subtlety that's been implied (sort of) but not directly discussed: test "depth", by which I mean "within a given test, the number of things successively asserted that require previous assertions to hold". Basically cyclomatic complexity, but specifically about relationships between assertions (which implies conditional assertion). Trivial example:

func(t *testing.T) {
  // depth = 0; no dependency on previous assertions
  foo, err := NewFoo(tt.giveFooParam)
  require.ErrorIs(t, err, tt.wantFooError)

  if tt.wantFooError != nil {
    return
  }

  // depth = 1; depends on foo
  bar, err := foo.GetBar(tt.giveBarParam)
  require.ErrorIs(t, err, tt.wantBarError)

  if tt.wantBarError != nil {
    return
  }

  // depth = 2; depends on bar
  require.ErrorIs(t, bar.Baz(tt.giveBazParam), tt.wantBazError)
}

Through that lens, I think the goal is actually twofold: (1) minimizing depth, and by extension (2) avoiding conditional assertions. Conditions are probably fine if they're terminal, but once you start increasing depth (read: adding conditional assertions), that's when it gets hairy.

Some ideals to aim for are:

* Focus on the narrowest unit of behavior
* Minimize `if` statements and other control flow (if possible)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on this comment:

Suggested change
* Minimize `if` statements and other control flow (if possible)
* Minimize "test depth", and avoid conditional assertions (see below)

and then expand later, after the list:

In this context, "test depth" means "within a given test, the number of
successive assertions that require previous assertions to hold" (similar
to cyclomatic complexity, except that it also describes prerequisite
relationships between assertions).

Having "shallower" tests means that there are fewer relationships between
assertions and, more importantly, that those assertions are less likely
to be conditional by default.

That might better clarify the actual goal (versus banning conditions in general). I think that this would also allow the guidance to be (1) applied more consistently and (2) more flexible around what a test should look like.

Thoughts?

@tyler-french
Copy link
Contributor Author

@mway @prashantv @abhinav I tried to update the wording with some of @mway's suggestions, while also keeping some examples with the if.

I worry that getting too "theoretical" will hurt the experience for some readers. It's good to have a concrete sense of how something could look (e.g. multiple ifs in the loop). Conversely, I do agree that saying "avoid ifs" goes against the original intent of being less prescriptive.

Thanks again for the detailed comments and suggestions here, and thanks for the patience with the long turnaround 😄

Copy link
Collaborator

@abhinav abhinav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @tyler-french. I don't think we should spend any more cycles making this guidance more abstract or higher level. The style guide is intended to be accessible and something folks can link to in discussions: concrete Dos and Don'ts with a simple caveats where applicable.

We don't need 100% coverage of nuanced corners, just more than what is currently there. I don't see anything in the proposed change that is false, so I consider this a net positive. I propose that we merge this.

src/test-table.md Outdated Show resolved Hide resolved
src/test-table.md Outdated Show resolved Hide resolved
abhinav added a commit that referenced this pull request May 9, 2023
The operation failed in #174 because the PR was made from a fork.
Looking at the [documentation for git-auto-commit-acton](https://github.com/stefanzweifel/git-auto-commit-action), it's possible to support this for forks as well if we use the `pull_request_target` event (which runs in the context of the base branch), and check out the PR repository (which may be a fork).
See also https://github.com/stefanzweifel/git-auto-commit-action#workflow-should-run-in-base-repository for details.
Copy link
Contributor

@mway mway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thank you @tyler-french!

Once we sort CI this can be merged.

@abhinav
Copy link
Collaborator

abhinav commented May 9, 2023

Do not merge yet. PR green is a false. Needs #183 first.

@tyler-french
Copy link
Contributor Author

Thanks for the reviews! And I don't have merge permission anyway here, so best of luck with the CI!

@mway mway merged commit e0d2d1a into uber-go:master May 9, 2023
abhinav pushed a commit that referenced this pull request May 9, 2023
The tables added in #174 are a bit wide and hard to fit on smaller screens.
By changing the `\t` to `  `, and splitting a line, we can reduce this width.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

proposal: table test guidance is too prescriptive
5 participants