Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a benchmark for read_schema #472

Merged
merged 6 commits into from
Nov 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -169,9 +169,10 @@ For more advanced usage, a tutorial, and detailed options refer to the full [Doc

Some performance benchmarks are run on each commit to `main` in order to track performance over time. Each benchmark is run against Postgres 14.8, 15.3, 16.4, 17.0 and "latest". Each line on the chart represents the number of rows the benchmark was run against, currently 10k, 100k and 300k rows.

* Backfill: Rows/s to backfill a text column with the value `placeholder`. We use our default batching strategy of 10k rows per batch with no backoff.
* WriteAmplification/NoTrigger: Baselines rows/s when writing data to a table without a `pgroll` trigger.
* WriteAmplificationWithTrigger: Rows/s when writing data to a table when a `pgroll` trigger has been set up.
* `Backfill:` Rows/s to backfill a text column with the value `placeholder`. We use our default batching strategy of 10k rows per batch with no backoff.
* `WriteAmplification/NoTrigger:` Baseline rows/s when writing data to a table without a `pgroll` trigger.
* `WriteAmplification/WithTrigger:` Rows/s when writing data to a table when a `pgroll` trigger has been set up.
* `ReadSchema:` Checks the number of executions per second of the `read_schema` function which is a core function executed frequently during migrations.

They can be seen [here](https://xataio.github.io/pgroll/benchmarks.html).

Expand Down
8 changes: 6 additions & 2 deletions dev/benchmark-results/build.go
Original file line number Diff line number Diff line change
Expand Up @@ -226,9 +226,13 @@ func loadData(filename string) (allReports []BenchmarkReports, err error) {
}

// Benchmarks are grouped by the number of rows they were tested against. We need to trim this off
// the end.
// the end if it exists.
func trimName(name string) string {
return strings.TrimPrefix(name[:strings.LastIndex(name, "/")], "Benchmark")
name = strings.TrimPrefix(name, "Benchmark")
if i := strings.LastIndex(name, "/"); i != -1 {
name = name[:i]
}
return name
}

// First 7 characters
Expand Down
6 changes: 6 additions & 0 deletions dev/benchmark-results/build_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,9 @@ func TestBuildChartsRegression(t *testing.T) {
// 5 versions * 3 benchmarks
assert.Len(t, generated, 15)
}

func TestTrimName(t *testing.T) {
assert.Equal(t, "Test1", trimName("BenchmarkTest1/1000"))
assert.Equal(t, "Test1/Case2", trimName("BenchmarkTest1/Case2/1000"))
assert.Equal(t, "Test1", trimName("BenchmarkTest1"))
}
37 changes: 36 additions & 1 deletion internal/benchmarks/benchmarks_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,10 @@ import (
"github.com/xataio/pgroll/pkg/roll"
)

const unitRowsPerSecond = "rows/s"
const (
unitRowsPerSecond = "rows/s"
unitExecutionsPerSecond = "executions/s"
)

var (
rowCounts = []int{10_000, 100_000, 300_000}
Expand Down Expand Up @@ -154,6 +157,38 @@ func BenchmarkWriteAmplification(b *testing.B) {
})
}

func BenchmarkReadSchema(b *testing.B) {
ctx := context.Background()
testSchema := testutils.TestSchema()
var opts []roll.Option

testutils.WithMigratorInSchemaAndConnectionToContainerWithOptions(b, testSchema, opts, func(mig *roll.Roll, db *sql.DB) {
b.Cleanup(func() {
require.NoError(b, mig.Close())
})

setupInitialTable(b, ctx, testSchema, mig, db, 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems clear that for the shake of this test and the others, we probably want a more complex schema here, right? not for this PR

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, agree. I'm not sure exactly how the performance of read_schema would change depending on the complexity of the schema but I guess that's something we could actually benchmark too :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

b.ResetTimer()

// We don't want this benchmark to test the network so instead we run the actual function in a tight
// loop within a single execution.
executions := 10000
q := fmt.Sprintf(`SELECT %s.read_schema($1) FROM generate_series(1, $2);`, pq.QuoteIdentifier(mig.State().Schema()))
_, err := db.ExecContext(ctx, q, testSchema, executions)
b.StopTimer()
require.NoError(b, err)
perSecond := float64(executions) / b.Elapsed().Seconds()
b.ReportMetric(perSecond, unitExecutionsPerSecond)

reports.AddReport(BenchmarkReport{
Name: b.Name(),
Unit: unitExecutionsPerSecond,
RowCount: executions,
Result: perSecond,
})
})
}

func setupInitialTable(tb testing.TB, ctx context.Context, testSchema string, mig *roll.Roll, db *sql.DB, rowCount int) {
tb.Helper()

Expand Down