Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cmds][scd/store] Add 'db-manager scd-evict' subcommand enabling listing and deletion of expired SCD entities #1116

Merged
merged 4 commits into from
Oct 15, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -114,8 +114,10 @@ test-go-units-crdb: cleanup-test-go-units-crdb
@docker run -d --name dss-crdb-for-testing -p 26257:26257 -p 8080:8080 cockroachdb/cockroach:v24.1.3 start-single-node --insecure > /dev/null
@until [ -n "`docker logs dss-crdb-for-testing | grep 'nodeID'`" ]; do echo "Waiting for CRDB to be ready"; sleep 3; done;
go run ./cmds/db-manager/main.go migrate --schemas_dir ./build/db_schemas/rid --db_version latest --cockroach_host localhost
go run ./cmds/db-manager/main.go migrate --schemas_dir ./build/db_schemas/scd --db_version latest --cockroach_host localhost
go test -count=1 -v ./pkg/rid/store/cockroach --cockroach_host localhost --cockroach_port 26257 --cockroach_ssl_mode disable --cockroach_user root --cockroach_db_name rid
go test -count=1 -v ./pkg/rid/application --cockroach_host localhost --cockroach_port 26257 --cockroach_ssl_mode disable --cockroach_user root --cockroach_db_name rid
go test -count=1 -v ./pkg/scd/store/cockroach --cockroach_host localhost --cockroach_port 26257 --cockroach_ssl_mode disable --cockroach_user root --cockroach_db_name scd
@docker stop dss-crdb-for-testing > /dev/null
@docker rm dss-crdb-for-testing > /dev/null

Expand Down
67 changes: 67 additions & 0 deletions cmds/db-manager/cleanup/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# DB Cleanup

## scd-evict
CLI tool that lists and deletes expired entities in the DSS store.
At the time of writing this README, the entities supported by this tool are:
- SCD operational intents;
- SCD subscriptions.

The usage of this tool is potentially dangerous: inputting wrong parameters may result in loss of data.
As such it is strongly recommended to always review and validate the list of entities identified as expired, and to
ensure that a backup of the data is available before deleting anything using the `-delete` flag

### Usage
Extract from running `db-manager scd-evict --help`:
```
List and evict SCD expired entities

Usage:
db-manager scd-evict [flags]

Flags:
--delete set this flag to true to delete the expired entities
-h, --help help for scd-evict
--op_intents set this flag to true to list expired operational intents (default true)
mickmis marked this conversation as resolved.
Show resolved Hide resolved
--scd_subs set this flag to true to list expired SCD subscriptions (default true)
--ttl duration time-to-live duration used for determining expiration, defaults to 2*56 days which should be a safe value in most cases (default 2688h0m0s)

Global Flags:
--cockroach_application_name string application name for tagging the connection to cockroach (default "dss")
--cockroach_db_name string application name for tagging the connection to cockroach (default "dss")
--cockroach_host string cockroach host to connect to
--cockroach_max_retries int maximum number of attempts to retry a query in case of contention, default is 100 (default 100)
--cockroach_port int cockroach port to connect to (default 26257)
--cockroach_ssl_dir string directory to ssl certificates. Must contain files: ca.crt, client.<user>.crt, client.<user>.key
--cockroach_ssl_mode string cockroach sslmode (default "disable")
--cockroach_user string cockroach user to authenticate as (default "root")
--max_conn_idle_secs int maximum amount of time in seconds a connection may be idle, default is 30 seconds (default 30)
--max_open_conns int maximum number of open connections to the database, default is 4 (default 4)

```

Do note:
- by default expired entities are only listed, not deleted, the flag `-delete` is required for deleting entities;
- expiration of entities is preferably determined through their end times, however when they do not have end times, the last update times are used;
- the flag `-ttl` accepts durations formatted as [Go `time.Duration` strings](https://pkg.go.dev/time#ParseDuration), e.g. `24h`;
- the CockroachDB cluster connection flags are the same than [the `core-service` command](../../core-service/README.md).

### Examples
The following examples assume a running DSS deployed locally through [the `run_locally.sh` script](../../../build/dev/standalone_instance.md).

#### List all entities older than 1 week
```shell
docker compose -f docker-compose_dss.yaml -p dss_sandbox exec local-dss-core-service db-evictor \
-cockroach_host=local-dss-crdb -ttl=168h
```

#### List operational intents older than 1 week
```shell
docker compose -f docker-compose_dss.yaml -p dss_sandbox exec local-dss-core-service db-evictor \
-cockroach_host=local-dss-crdb -ttl=168h -op_intents=true -scd_subs=false
```

#### Delete all entities older than 30 days
```shell
docker compose -f docker-compose_dss.yaml -p dss_sandbox exec local-dss-core-service db-evictor \
-cockroach_host=local-dss-crdb -ttl=720h -delete
```
127 changes: 127 additions & 0 deletions cmds/db-manager/cleanup/scd-evict.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
package cleanup

import (
"context"
"fmt"
"log"
"time"

"github.com/interuss/dss/pkg/cockroach"
crdbflags "github.com/interuss/dss/pkg/cockroach/flags"
dssmodels "github.com/interuss/dss/pkg/models"
scdmodels "github.com/interuss/dss/pkg/scd/models"
"github.com/interuss/dss/pkg/scd/repos"
scdc "github.com/interuss/dss/pkg/scd/store/cockroach"
"github.com/spf13/cobra"
"github.com/spf13/pflag"
)

var (
ScdEvictCmd = &cobra.Command{
Use: "scd-evict",
Short: "List and evict SCD expired entities",
RunE: scdEvict,
}
flags = pflag.NewFlagSet("scd-evict", pflag.ExitOnError)
listOpIntents = flags.Bool("op_intents", true, "set this flag to true to list expired operational intents")
mickmis marked this conversation as resolved.
Show resolved Hide resolved
listScdSubs = flags.Bool("scd_subs", true, "set this flag to true to list expired SCD subscriptions")
ttl = flags.Duration("ttl", time.Hour*24*112, "time-to-live duration used for determining expiration, defaults to 2*56 days which should be a safe value in most cases")
deleteExpired = flags.Bool("delete", false, "set this flag to true to delete the expired entities")
)

func init() {
ScdEvictCmd.Flags().AddFlagSet(flags)
}

func scdEvict(cmd *cobra.Command, _ []string) error {
var (
ctx = cmd.Context()
threshold = time.Now().Add(-*ttl)
)

scdStore, err := getSCDStore(ctx)
if err != nil {
return err
}

var (
expiredOpIntents []*scdmodels.OperationalIntent
expiredSubs []*scdmodels.Subscription
)
action := func(ctx context.Context, r repos.Repository) (err error) {
if *listOpIntents {
expiredOpIntents, err = r.ListExpiredOperationalIntents(ctx, threshold)
if err != nil {
return fmt.Errorf("listing expired operational intents: %w", err)
}
if *deleteExpired {
for _, opIntent := range expiredOpIntents {
mickmis marked this conversation as resolved.
Show resolved Hide resolved
if err = r.DeleteOperationalIntent(ctx, opIntent.ID); err != nil {
return fmt.Errorf("deleting expired operational intents: %w", err)
}
}
}
}

if *listScdSubs {
expiredSubs, err = r.ListExpiredSubscriptions(ctx, threshold)
if err != nil {
return fmt.Errorf("listing expired subscriptions: %w", err)
}
if *deleteExpired {
for _, sub := range expiredSubs {
if err = r.DeleteSubscription(ctx, sub.ID); err != nil {
return fmt.Errorf("deleting expired subscriptions: %w", err)
}
}
}
}

return nil
}
if err = scdStore.Transact(ctx, action); err != nil {
return fmt.Errorf("failed to execute CRDB transaction: %w", err)
}

for _, opIntent := range expiredOpIntents {
logExpiredEntity("operational intent", opIntent.ID, threshold, *deleteExpired, opIntent.EndTime != nil)
}
for _, sub := range expiredSubs {
logExpiredEntity("subscription", sub.ID, threshold, *deleteExpired, sub.EndTime != nil)
}
if len(expiredOpIntents) == 0 && len(expiredSubs) == 0 {
log.Printf("no entity older than %s found", threshold.String())
} else if !*deleteExpired {
log.Printf("no entity was deleted, run the command again with the `--delete` flag to do so")
}
return nil
}

func getSCDStore(ctx context.Context) (*scdc.Store, error) {
connectParameters := crdbflags.ConnectParameters()
connectParameters.ApplicationName = "db-manager"
connectParameters.DBName = scdc.DatabaseName
scdCrdb, err := cockroach.Dial(ctx, connectParameters)
if err != nil {
return nil, fmt.Errorf("failed to connect to database with %+v: %w", connectParameters, err)
mickmis marked this conversation as resolved.
Show resolved Hide resolved
}

scdStore, err := scdc.NewStore(ctx, scdCrdb)
if err != nil {
return nil, fmt.Errorf("failed to create strategic conflict detection store with %+v: %w", connectParameters, err)
}
return scdStore, nil
}

func logExpiredEntity(entity string, entityID dssmodels.ID, threshold time.Time, deleted, hasEndTime bool) {
logMsg := "found"
if deleted {
logMsg = "deleted"
}

expMsg := "last update before %s (missing end time)"
if hasEndTime {
expMsg = "end time before %s"
}
log.Printf("%s %s %s; expired due to %s", logMsg, entity, entityID.String(), fmt.Sprintf(expMsg, threshold.String()))
}
2 changes: 2 additions & 0 deletions cmds/db-manager/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ import (
"log"
"os"

"github.com/interuss/dss/cmds/db-manager/cleanup"
"github.com/interuss/dss/cmds/db-manager/migration"
"github.com/spf13/cobra"
)
Expand All @@ -19,6 +20,7 @@ var (
func init() {
DBManagerCmd.PersistentFlags().AddGoFlagSet(flag.CommandLine) // enable support for flags not yet migrated to using pflag (e.g. crdb flags)
DBManagerCmd.AddCommand(migration.MigrationCmd)
DBManagerCmd.AddCommand(cleanup.ScdEvictCmd)
}

func main() {
Expand Down
10 changes: 10 additions & 0 deletions pkg/scd/repos/repos.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@ package repos

import (
"context"
"time"

"github.com/golang/geo/s2"
dssmodels "github.com/interuss/dss/pkg/models"
scdmodels "github.com/interuss/dss/pkg/scd/models"
Expand All @@ -27,6 +29,10 @@ type OperationalIntent interface {
// GetDependentOperationalIntents returns IDs of all operations dependent on
// subscription identified by "subscriptionID".
GetDependentOperationalIntents(ctx context.Context, subscriptionID dssmodels.ID) ([]dssmodels.ID, error)

// ListExpiredOperationalIntents lists all operational intents older than the threshold.
// Their age is determined by their end time, or by their update time if they do not have an end time.
ListExpiredOperationalIntents(ctx context.Context, threshold time.Time) ([]*scdmodels.OperationalIntent, error)
}

// Subscription abstracts subscription-specific interactions with the backing repository.
Expand Down Expand Up @@ -54,6 +60,10 @@ type Subscription interface {

// LockSubscriptionsOnCells locks the subscriptions of interest on specific cells.
LockSubscriptionsOnCells(ctx context.Context, cells s2.CellUnion) error

// ListExpiredSubscriptions lists all subscriptions older than the threshold.
// Their age is determined by their end time, or by their update time if they do not have an end time.
ListExpiredSubscriptions(ctx context.Context, threshold time.Time) ([]*scdmodels.Subscription, error)
}

type UssAvailability interface {
Expand Down
26 changes: 26 additions & 0 deletions pkg/scd/store/cockroach/operational_intents.go
Original file line number Diff line number Diff line change
Expand Up @@ -341,3 +341,29 @@ func (s *repo) GetDependentOperationalIntents(ctx context.Context, subscriptionI

return dependentOps, nil
}

// ListExpiredOperationalIntents lists all operational intents older than the threshold.
// Their age is determined by their end time, or by their last update time if they do not have an end time.
func (s *repo) ListExpiredOperationalIntents(ctx context.Context, threshold time.Time) ([]*scdmodels.OperationalIntent, error) {
expiredOpIntentsQuery := fmt.Sprintf(`
SELECT
%s
FROM
scd_operations
WHERE
scd_operations.ends_at IS NOT NULL AND scd_operations.ends_at <= $1
OR
scd_operations.ends_at IS NULL AND scd_operations.updated_at <= $1 -- use last update time as reference if there is no end time
LIMIT $2`, operationFieldsWithPrefix)
barroco marked this conversation as resolved.
Show resolved Hide resolved

result, err := s.fetchOperationalIntents(
ctx, s.q, expiredOpIntentsQuery,
threshold,
dssmodels.MaxResultLimit,
)
if err != nil {
return nil, stacktrace.Propagate(err, "Error fetching Operations")
}

return result, nil
}
Loading
Loading