From d9d9ebd29d013ac37df414d364dd8b5c491313ca Mon Sep 17 00:00:00 2001 From: Zirko <64951262+QuantumEnigmaa@users.noreply.github.com> Date: Mon, 12 Feb 2024 23:03:57 +0100 Subject: [PATCH 1/7] Helm: fix egress-dicovery netpols (#11838) **What this PR does / why we need it**: This PR fix an issue happening when deploying the `egress-discovery` networkPolicy or ciliumNetworkPolicy. When one wants to deploy the `egress-discovery` netpol, one only has to specify the `networkPolicy.discovery.port` field in the values. However, with the current state of the templates, when this field is specified, **both** the networkPolicy and the ciliumNetworkPolicy are generated since the only condition for their creation is `if .Values.networkPolicy.discovery.port`. I thus added an additional condition : `eq .Values.networkPolicy.flavor "cilium"` (or `"kubernetes"` for the networkPolicy) so that only one of these is generated according to the `flavor` specified in the values. **Special notes for your reviewer**: **Checklist** - [x] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (**required**) - [ ] Documentation added - [ ] Tests updated - [x] `CHANGELOG.md` updated - [ ] If the change is worth mentioning in the release notes, add `add-to-release-notes` label - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/setup/upgrade/_index.md` - [x] For Helm chart changes bump the Helm chart version in `production/helm/loki/Chart.yaml` and update `production/helm/loki/CHANGELOG.md` and `production/helm/loki/README.md`. [Example PR](https://github.com/grafana/loki/commit/d10549e3ece02120974929894ee333d07755d213) - [ ] If the change is deprecating or removing a configuration option, update the `deprecated-config.yaml` and `deleted-config.yaml` files respectively in the `tools/deprecated-config-checker` directory. [Example PR](https://github.com/grafana/loki/pull/10840/commits/0d4416a4b03739583349934b96f272fb4f685d15) --------- Signed-off-by: QuantumEnigmaa Co-authored-by: Michel Hollands <42814411+MichelHollands@users.noreply.github.com> --- production/helm/loki/CHANGELOG.md | 1 + production/helm/loki/templates/ciliumnetworkpolicy.yaml | 2 +- production/helm/loki/templates/networkpolicy.yaml | 2 +- 3 files changed, 3 insertions(+), 2 deletions(-) diff --git a/production/helm/loki/CHANGELOG.md b/production/helm/loki/CHANGELOG.md index 9a5b2a3cb4f27..e849918585ea2 100644 --- a/production/helm/loki/CHANGELOG.md +++ b/production/helm/loki/CHANGELOG.md @@ -15,6 +15,7 @@ Entries should include a reference to the pull request that introduced the chang ## 5.42.2 +- [BUGFIX] Added condition for `egress-discovery` networkPolicies and ciliumNetworkPolicies. - [BUGFIX] Remove trailing tab character in statefulset templates ## 5.42.1 diff --git a/production/helm/loki/templates/ciliumnetworkpolicy.yaml b/production/helm/loki/templates/ciliumnetworkpolicy.yaml index 5633ae1945206..ddcef3b61a8ec 100644 --- a/production/helm/loki/templates/ciliumnetworkpolicy.yaml +++ b/production/helm/loki/templates/ciliumnetworkpolicy.yaml @@ -156,7 +156,7 @@ spec: {{- end }} -{{- if .Values.networkPolicy.discovery.port }} +{{- if and .Values.networkPolicy.discovery.port (eq .Values.networkPolicy.flavor "cilium") }} --- apiVersion: cilium.io/v2 kind: CiliumNetworkPolicy diff --git a/production/helm/loki/templates/networkpolicy.yaml b/production/helm/loki/templates/networkpolicy.yaml index 27c85280eb08c..5052e81162b3d 100644 --- a/production/helm/loki/templates/networkpolicy.yaml +++ b/production/helm/loki/templates/networkpolicy.yaml @@ -172,7 +172,7 @@ spec: {{- end }} -{{- if .Values.networkPolicy.discovery.port }} +{{- if and .Values.networkPolicy.discovery.port (eq .Values.networkPolicy.flavor "kubernetes") }} --- apiVersion: networking.k8s.io/v1 kind: NetworkPolicy From c277158f515f0ba4ad27e7189dea4da9f50e7e4c Mon Sep 17 00:00:00 2001 From: Paul Rogers <129207811+paul1r@users.noreply.github.com> Date: Mon, 12 Feb 2024 18:52:27 -0500 Subject: [PATCH 2/7] Cleanup data race associated with workerID var (#11922) **What this PR does / why we need it**: A data race existed with the workerID variable, as it could be modified by multiple goroutines. Relates to: https://github.com/grafana/loki/issues/8586 -- Before fix: ``` go test -count=1 -race ./pkg/querier/worker ================== WARNING: DATA RACE Read at 0x00c000494108 by goroutine 229: github.com/grafana/loki/pkg/querier/worker.(*processorManager).concurrency.func1() /Users/progers/dev/src/github.com/grafana/loki/pkg/querier/worker/processor_manager.go:81 +0x118 Previous write at 0x00c000494108 by goroutine 222: github.com/grafana/loki/pkg/querier/worker.(*processorManager).concurrency() /Users/progers/dev/src/github.com/grafana/loki/pkg/querier/worker/processor_manager.go:70 +0x108 github.com/grafana/loki/pkg/querier/worker.(*querierWorker).resetConcurrency() /Users/progers/dev/src/github.com/grafana/loki/pkg/querier/worker/worker.go:267 +0x10c github.com/grafana/loki/pkg/querier/worker.(*querierWorker).AddressAdded() /Users/progers/dev/src/github.com/grafana/loki/pkg/querier/worker/worker.go:219 +0x868 github.com/grafana/loki/pkg/querier/worker.TestResetConcurrency.func1() /Users/progers/dev/src/github.com/grafana/loki/pkg/querier/worker/worker_test.go:64 +0x1c8 testing.tRunner() /opt/homebrew/Cellar/go/1.21.6/libexec/src/testing/testing.go:1595 +0x1b0 testing.(*T).Run.func1() /opt/homebrew/Cellar/go/1.21.6/libexec/src/testing/testing.go:1648 +0x40 Goroutine 229 (running) created at: github.com/grafana/loki/pkg/querier/worker.(*processorManager).concurrency() /Users/progers/dev/src/github.com/grafana/loki/pkg/querier/worker/processor_manager.go:75 +0xcc github.com/grafana/loki/pkg/querier/worker.(*querierWorker).resetConcurrency() /Users/progers/dev/src/github.com/grafana/loki/pkg/querier/worker/worker.go:267 +0x10c github.com/grafana/loki/pkg/querier/worker.(*querierWorker).AddressAdded() /Users/progers/dev/src/github.com/grafana/loki/pkg/querier/worker/worker.go:219 +0x868 github.com/grafana/loki/pkg/querier/worker.TestResetConcurrency.func1() /Users/progers/dev/src/github.com/grafana/loki/pkg/querier/worker/worker_test.go:64 +0x1c8 testing.tRunner() /opt/homebrew/Cellar/go/1.21.6/libexec/src/testing/testing.go:1595 +0x1b0 testing.(*T).Run.func1() /opt/homebrew/Cellar/go/1.21.6/libexec/src/testing/testing.go:1648 +0x40 Goroutine 222 (running) created at: testing.(*T).Run() /opt/homebrew/Cellar/go/1.21.6/libexec/src/testing/testing.go:1648 +0x5e8 github.com/grafana/loki/pkg/querier/worker.TestResetConcurrency() /Users/progers/dev/src/github.com/grafana/loki/pkg/querier/worker/worker_test.go:52 +0x1b0 testing.tRunner() /opt/homebrew/Cellar/go/1.21.6/libexec/src/testing/testing.go:1595 +0x1b0 testing.(*T).Run.func1() /opt/homebrew/Cellar/go/1.21.6/libexec/src/testing/testing.go:1648 +0x40 ================== --- FAIL: TestResetConcurrency (0.02s) --- FAIL: TestResetConcurrency/concurrency_is_correct_when_numTargets_does_not_divide_evenly_into_maxConcurrent (0.01s) testing.go:1465: race detected during execution of test testing.go:1465: race detected during execution of test FAIL FAIL github.com/grafana/loki/pkg/querier/worker 4.626s FAIL ``` -- After fix: ``` go clean -testcache go test -count=1 -race ./pkg/querier/worker ok github.com/grafana/loki/pkg/querier/worker 6.034s ``` **Which issue(s) this PR fixes**: Fixes # **Special notes for your reviewer**: **Checklist** - [ ] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (**required**) - [ ] Documentation added - [ ] Tests updated - [ ] `CHANGELOG.md` updated - [ ] If the change is worth mentioning in the release notes, add `add-to-release-notes` label - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/setup/upgrade/_index.md` - [ ] For Helm chart changes bump the Helm chart version in `production/helm/loki/Chart.yaml` and update `production/helm/loki/CHANGELOG.md` and `production/helm/loki/README.md`. [Example PR](https://github.com/grafana/loki/commit/d10549e3ece02120974929894ee333d07755d213) - [ ] If the change is deprecating or removing a configuration option, update the `deprecated-config.yaml` and `deleted-config.yaml` files respectively in the `tools/deprecated-config-checker` directory. [Example PR](https://github.com/grafana/loki/pull/10840/commits/0d4416a4b03739583349934b96f272fb4f685d15) --- pkg/querier/worker/processor_manager.go | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/pkg/querier/worker/processor_manager.go b/pkg/querier/worker/processor_manager.go index 3a2c8c338865d..74c9517f86c24 100644 --- a/pkg/querier/worker/processor_manager.go +++ b/pkg/querier/worker/processor_manager.go @@ -65,21 +65,20 @@ func (pm *processorManager) concurrency(n int) { n = 0 } - workerID := 0 for len(pm.cancels) < n { - workerID++ + workerID := len(pm.cancels) + 1 ctx, cancel := context.WithCancel(pm.ctx) pm.cancels = append(pm.cancels, cancel) pm.wg.Add(1) - go func() { + go func(workerID int) { defer pm.wg.Done() pm.currentProcessors.Inc() defer pm.currentProcessors.Dec() pm.p.processQueriesOnSingleStream(ctx, pm.conn, pm.address, strconv.Itoa(workerID)) - }() + }(workerID) } for len(pm.cancels) > n { From 75600674397f270e9d4a57303de9dc468a58faaf Mon Sep 17 00:00:00 2001 From: Owen Diehl Date: Mon, 12 Feb 2024 16:31:05 -0800 Subject: [PATCH 3/7] Bloom/running (#11918) Includes a few small fixes in order to get the compactor running, including * Wires up i/o from `v1` pkg and the `bloomshipper` pkg in order to archive+write blocks remotely after building * config error handling * fix an error iterating chunks * handles gzipping when loading tsdbs from storage * bypasses internal cache in `BloomTSDBStore`'s internal `indexshipper/storage.Client` to get up to date results, which is easier for prototyping * `ClosableReadSeekerAdapter` --------- Signed-off-by: Christian Haudum Signed-off-by: Owen Diehl Co-authored-by: Christian Haudum --- docs/sources/configure/_index.md | 23 ++++++++------- integration/cluster/cluster.go | 1 - pkg/bloomcompactor/bloomcompactor.go | 16 ++++++++-- pkg/bloomcompactor/config.go | 29 +++++++++++-------- pkg/bloomcompactor/controller.go | 11 +++++-- pkg/bloomcompactor/spec.go | 23 ++++++++++++--- pkg/bloomcompactor/tsdb.go | 23 +++++++++++---- pkg/loki/loki.go | 5 +++- pkg/loki/modules.go | 2 +- pkg/storage/bloom/v1/block.go | 4 +++ pkg/storage/bloom/v1/bloom_tokenizer.go | 2 +- .../stores/shipper/bloomshipper/client.go | 29 +++++++++++++++++-- 12 files changed, 124 insertions(+), 44 deletions(-) diff --git a/docs/sources/configure/_index.md b/docs/sources/configure/_index.md index e79a2503176fc..d5dd9b43bd146 100644 --- a/docs/sources/configure/_index.md +++ b/docs/sources/configure/_index.md @@ -2641,21 +2641,22 @@ ring: # CLI flag: -bloom-compactor.enabled [enabled: | default = false] -# Directory where files can be downloaded for compaction. -# CLI flag: -bloom-compactor.working-directory -[working_directory: | default = ""] - # Interval at which to re-run the compaction operation. # CLI flag: -bloom-compactor.compaction-interval [compaction_interval: | default = 10m] -# Minimum age of a table before it is considered for compaction. -# CLI flag: -bloom-compactor.min-compaction-age -[min_compaction_age: | default = 24h] - -# Maximum age of a table before it is considered for compaction. -# CLI flag: -bloom-compactor.max-compaction-age -[max_compaction_age: | default = 168h] +# How many index periods (days) to wait before compacting a table. This can be +# used to lower cost by not re-writing data to object storage too frequently +# since recent data changes more often. +# CLI flag: -bloom-compactor.min-table-compaction-period +[min_table_compaction_period: | default = 1] + +# How many index periods (days) to wait before compacting a table. This can be +# used to lower cost by not trying to compact older data which doesn't change. +# This can be optimized by aligning it with the maximum +# `reject_old_samples_max_age` setting of any tenant. +# CLI flag: -bloom-compactor.max-table-compaction-period +[max_table_compaction_period: | default = 7] # Number of workers to run in parallel for compaction. # CLI flag: -bloom-compactor.worker-parallelism diff --git a/integration/cluster/cluster.go b/integration/cluster/cluster.go index 831da46f2cb99..7e978b84eb326 100644 --- a/integration/cluster/cluster.go +++ b/integration/cluster/cluster.go @@ -84,7 +84,6 @@ bloom_gateway: bloom_compactor: enabled: false - working_directory: {{.dataPath}}/bloom-compactor compactor: working_directory: {{.dataPath}}/compactor diff --git a/pkg/bloomcompactor/bloomcompactor.go b/pkg/bloomcompactor/bloomcompactor.go index 8a3e7c6266c1d..5cece24172526 100644 --- a/pkg/bloomcompactor/bloomcompactor.go +++ b/pkg/bloomcompactor/bloomcompactor.go @@ -205,9 +205,19 @@ func (c *Compactor) runOne(ctx context.Context) error { } func (c *Compactor) tables(ts time.Time) *dayRangeIterator { - from := model.TimeFromUnixNano(ts.Add(-c.cfg.MaxCompactionAge).UnixNano() / int64(config.ObjectStorageIndexRequiredPeriod)) - through := model.TimeFromUnixNano(ts.Add(-c.cfg.MinCompactionAge).UnixNano() / int64(config.ObjectStorageIndexRequiredPeriod)) - return newDayRangeIterator(DayTable(from), DayTable(through)) + // adjust the minimum by one to make it inclusive, which is more intuitive + // for a configuration variable + adjustedMin := min(c.cfg.MinTableCompactionPeriod - 1) + minCompactionPeriod := time.Duration(adjustedMin) * config.ObjectStorageIndexRequiredPeriod + maxCompactionPeriod := time.Duration(c.cfg.MaxTableCompactionPeriod) * config.ObjectStorageIndexRequiredPeriod + + from := ts.Add(-maxCompactionPeriod).UnixNano() / int64(config.ObjectStorageIndexRequiredPeriod) * int64(config.ObjectStorageIndexRequiredPeriod) + through := ts.Add(-minCompactionPeriod).UnixNano() / int64(config.ObjectStorageIndexRequiredPeriod) * int64(config.ObjectStorageIndexRequiredPeriod) + + fromDay := DayTable(model.TimeFromUnixNano(from)) + throughDay := DayTable(model.TimeFromUnixNano(through)) + return newDayRangeIterator(fromDay, throughDay) + } func (c *Compactor) loadWork(ctx context.Context, ch chan<- tenantTable) error { diff --git a/pkg/bloomcompactor/config.go b/pkg/bloomcompactor/config.go index 37aac3310829a..dd821d81c906b 100644 --- a/pkg/bloomcompactor/config.go +++ b/pkg/bloomcompactor/config.go @@ -20,15 +20,14 @@ type Config struct { // section and the ingester configuration by default). Ring ring.RingConfig `yaml:"ring,omitempty" doc:"description=Defines the ring to be used by the bloom-compactor servers. In case this isn't configured, this block supports inheriting configuration from the common ring section."` // Enabled configures whether bloom-compactors should be used to compact index values into bloomfilters - Enabled bool `yaml:"enabled"` - WorkingDirectory string `yaml:"working_directory"` - CompactionInterval time.Duration `yaml:"compaction_interval"` - MinCompactionAge time.Duration `yaml:"min_compaction_age"` - MaxCompactionAge time.Duration `yaml:"max_compaction_age"` - WorkerParallelism int `yaml:"worker_parallelism"` - RetryMinBackoff time.Duration `yaml:"compaction_retries_min_backoff"` - RetryMaxBackoff time.Duration `yaml:"compaction_retries_max_backoff"` - CompactionRetries int `yaml:"compaction_retries"` + Enabled bool `yaml:"enabled"` + CompactionInterval time.Duration `yaml:"compaction_interval"` + MinTableCompactionPeriod int `yaml:"min_table_compaction_period"` + MaxTableCompactionPeriod int `yaml:"max_table_compaction_period"` + WorkerParallelism int `yaml:"worker_parallelism"` + RetryMinBackoff time.Duration `yaml:"compaction_retries_min_backoff"` + RetryMaxBackoff time.Duration `yaml:"compaction_retries_max_backoff"` + CompactionRetries int `yaml:"compaction_retries"` MaxCompactionParallelism int `yaml:"max_compaction_parallelism"` } @@ -37,23 +36,29 @@ type Config struct { func (cfg *Config) RegisterFlags(f *flag.FlagSet) { cfg.Ring.RegisterFlagsWithPrefix("bloom-compactor.", "collectors/", f) f.BoolVar(&cfg.Enabled, "bloom-compactor.enabled", false, "Flag to enable or disable the usage of the bloom-compactor component.") - f.StringVar(&cfg.WorkingDirectory, "bloom-compactor.working-directory", "", "Directory where files can be downloaded for compaction.") f.DurationVar(&cfg.CompactionInterval, "bloom-compactor.compaction-interval", 10*time.Minute, "Interval at which to re-run the compaction operation.") f.IntVar(&cfg.WorkerParallelism, "bloom-compactor.worker-parallelism", 1, "Number of workers to run in parallel for compaction.") - f.DurationVar(&cfg.MinCompactionAge, "bloom-compactor.min-compaction-age", 24*time.Hour, "Minimum age of a table before it is considered for compaction.") + f.IntVar(&cfg.MinTableCompactionPeriod, "bloom-compactor.min-table-compaction-period", 1, "How many index periods (days) to wait before compacting a table. This can be used to lower cost by not re-writing data to object storage too frequently since recent data changes more often.") // TODO(owen-d): ideally we'd set this per tenant based on their `reject_old_samples_max_age` setting, // but due to how we need to discover tenants, we can't do that yet. Tenant+Period discovery is done by // iterating the table periods in object storage and looking for tenants within that period. // In order to have this done dynamically, we'd need to account for tenant specific overrides, which are also // dynamically reloaded. // I'm doing it the simple way for now. - f.DurationVar(&cfg.MaxCompactionAge, "bloom-compactor.max-compaction-age", 7*24*time.Hour, "Maximum age of a table before it is considered for compaction.") + f.IntVar(&cfg.MaxTableCompactionPeriod, "bloom-compactor.max-table-compaction-period", 7, "How many index periods (days) to wait before compacting a table. This can be used to lower cost by not trying to compact older data which doesn't change. This can be optimized by aligning it with the maximum `reject_old_samples_max_age` setting of any tenant.") f.DurationVar(&cfg.RetryMinBackoff, "bloom-compactor.compaction-retries-min-backoff", 10*time.Second, "Minimum backoff time between retries.") f.DurationVar(&cfg.RetryMaxBackoff, "bloom-compactor.compaction-retries-max-backoff", time.Minute, "Maximum backoff time between retries.") f.IntVar(&cfg.CompactionRetries, "bloom-compactor.compaction-retries", 3, "Number of retries to perform when compaction fails.") f.IntVar(&cfg.MaxCompactionParallelism, "bloom-compactor.max-compaction-parallelism", 1, "Maximum number of tables to compact in parallel. While increasing this value, please make sure compactor has enough disk space allocated to be able to store and compact as many tables.") } +func (cfg *Config) Validate() error { + if cfg.MinTableCompactionPeriod > cfg.MaxTableCompactionPeriod { + return fmt.Errorf("min_compaction_age must be less than or equal to max_compaction_age") + } + return nil +} + type Limits interface { downloads.Limits BloomCompactorShardSize(tenantID string) int diff --git a/pkg/bloomcompactor/controller.go b/pkg/bloomcompactor/controller.go index cf6fff090f0ae..38831ef932e69 100644 --- a/pkg/bloomcompactor/controller.go +++ b/pkg/bloomcompactor/controller.go @@ -179,13 +179,20 @@ func (s *SimpleBloomController) buildBlocks( closePreExistingBlocks() return errors.Wrap(err, "failed to get client") } - for newBlocks.Next() { + + for newBlocks.Next() && newBlocks.Err() == nil { blockCt++ blk := newBlocks.At() + built, err := bloomshipper.BlockFrom(tenant, table.String(), blk) + if err != nil { + level.Error(logger).Log("msg", "failed to build block", "err", err) + return errors.Wrap(err, "failed to build block") + } + if err := client.PutBlock( ctx, - bloomshipper.BlockFrom(tenant, table.String(), blk), + built, ); err != nil { level.Error(logger).Log("msg", "failed to write block", "err", err) closePreExistingBlocks() diff --git a/pkg/bloomcompactor/spec.go b/pkg/bloomcompactor/spec.go index 70ea71c4e605f..d9d9c68947a73 100644 --- a/pkg/bloomcompactor/spec.go +++ b/pkg/bloomcompactor/spec.go @@ -275,7 +275,7 @@ func NewStoreChunkLoader(fetcherProvider fetcherProvider, metrics *Metrics) *Sto } func (s *StoreChunkLoader) Load(ctx context.Context, userID string, series *v1.Series) (*ChunkItersByFingerprint, error) { - // NB(owen-d): This is probalby unnecessary as we should only have one fetcher + // NB(owen-d): This is probably unnecessary as we should only have one fetcher // because we'll only be working on a single index period at a time, but this should protect // us in the case of refactoring/changing this and likely isn't a perf bottleneck. chksByFetcher := make(map[chunkFetcher][]chunk.Chunk) @@ -338,9 +338,7 @@ func newBatchedLoader(ctx context.Context, work []chunkWork, batchSize int, metr func (b *batchedLoader) Next() bool { if len(b.batch) > 0 { - b.cur, b.err = b.format(b.batch[0]) - b.batch = b.batch[1:] - return b.err == nil + return b.prepNext(false) } if len(b.work) == 0 { @@ -357,7 +355,24 @@ func (b *batchedLoader) Next() bool { b.work = b.work[1:] } + if len(toFetch) == 0 { + return false + } + b.batch, b.err = next.fetcher.FetchChunks(b.ctx, toFetch) + if b.err != nil { + return false + } + + return b.prepNext(true) +} + +func (b *batchedLoader) prepNext(checkLen bool) bool { + if checkLen && len(b.batch) == 0 { + return false + } + b.cur, b.err = b.format(b.batch[0]) + b.batch = b.batch[1:] return b.err == nil } diff --git a/pkg/bloomcompactor/tsdb.go b/pkg/bloomcompactor/tsdb.go index be45d293f6286..e6fd92961c46c 100644 --- a/pkg/bloomcompactor/tsdb.go +++ b/pkg/bloomcompactor/tsdb.go @@ -12,6 +12,7 @@ import ( "github.com/prometheus/common/model" "github.com/prometheus/prometheus/model/labels" + "github.com/grafana/loki/pkg/chunkenc" baseStore "github.com/grafana/loki/pkg/storage" v1 "github.com/grafana/loki/pkg/storage/bloom/v1" "github.com/grafana/loki/pkg/storage/config" @@ -49,12 +50,12 @@ func NewBloomTSDBStore(storage storage.Client) *BloomTSDBStore { } func (b *BloomTSDBStore) UsersForPeriod(ctx context.Context, table DayTable) ([]string, error) { - _, users, err := b.storage.ListFiles(ctx, table.String(), false) + _, users, err := b.storage.ListFiles(ctx, table.String(), true) // bypass cache for ease of testing return users, err } func (b *BloomTSDBStore) ResolveTSDBs(ctx context.Context, table DayTable, tenant string) ([]tsdb.SingleTenantTSDBIdentifier, error) { - indices, err := b.storage.ListUserFiles(ctx, table.String(), tenant, false) + indices, err := b.storage.ListUserFiles(ctx, table.String(), tenant, true) // bypass cache for ease of testing if err != nil { return nil, errors.Wrap(err, "failed to list user files") } @@ -84,16 +85,25 @@ func (b *BloomTSDBStore) LoadTSDB( id tsdb.Identifier, bounds v1.FingerprintBounds, ) (v1.CloseableIterator[*v1.Series], error) { - data, err := b.storage.GetUserFile(ctx, table.String(), tenant, id.Name()) + withCompression := id.Name() + gzipExtension + + data, err := b.storage.GetUserFile(ctx, table.String(), tenant, withCompression) if err != nil { return nil, errors.Wrap(err, "failed to get file") } + defer data.Close() + + decompressorPool := chunkenc.GetReaderPool(chunkenc.EncGZIP) + decompressor, err := decompressorPool.GetReader(data) + if err != nil { + return nil, errors.Wrap(err, "failed to get decompressor") + } + defer decompressorPool.PutReader(decompressor) - buf, err := io.ReadAll(data) + buf, err := io.ReadAll(decompressor) if err != nil { return nil, errors.Wrap(err, "failed to read file") } - _ = data.Close() reader, err := index.NewReader(index.RealByteSlice(buf)) if err != nil { @@ -226,7 +236,8 @@ func NewTSDBStores( if err != nil { return nil, errors.Wrap(err, "failed to create object client") } - res.stores[i] = NewBloomTSDBStore(storage.NewIndexStorageClient(c, cfg.IndexTables.PathPrefix)) + prefix := path.Join(cfg.IndexTables.PathPrefix, cfg.IndexTables.Prefix) + res.stores[i] = NewBloomTSDBStore(storage.NewIndexStorageClient(c, prefix)) } } diff --git a/pkg/loki/loki.go b/pkg/loki/loki.go index a83f8ba43394f..75401decb8fc0 100644 --- a/pkg/loki/loki.go +++ b/pkg/loki/loki.go @@ -248,6 +248,9 @@ func (c *Config) Validate() error { if err := c.QueryRange.Validate(); err != nil { return errors.Wrap(err, "invalid query_range config") } + if err := c.BloomCompactor.Validate(); err != nil { + return errors.Wrap(err, "invalid bloom_compactor config") + } if err := ValidateConfigCompatibility(*c); err != nil { return err @@ -648,7 +651,7 @@ func (t *Loki) setupModuleManager() error { Write: {Ingester, Distributor}, Backend: {QueryScheduler, Ruler, Compactor, IndexGateway, BloomGateway, BloomCompactor}, - All: {QueryScheduler, QueryFrontend, Querier, Ingester, Distributor, Ruler, Compactor}, + All: {QueryScheduler, QueryFrontend, Querier, Ingester, Distributor, Ruler, Compactor, BloomCompactor}, } if t.Cfg.Querier.PerRequestLimitsEnabled { diff --git a/pkg/loki/modules.go b/pkg/loki/modules.go index 23e59e711f29b..111d313956881 100644 --- a/pkg/loki/modules.go +++ b/pkg/loki/modules.go @@ -691,7 +691,7 @@ func (t *Loki) updateConfigForShipperStore() { t.Cfg.StorageConfig.TSDBShipperConfig.Mode = indexshipper.ModeWriteOnly t.Cfg.StorageConfig.TSDBShipperConfig.IngesterDBRetainPeriod = shipperQuerierIndexUpdateDelay(t.Cfg.StorageConfig.IndexCacheValidity, t.Cfg.StorageConfig.TSDBShipperConfig.ResyncInterval) - case t.Cfg.isModuleEnabled(Querier), t.Cfg.isModuleEnabled(Ruler), t.Cfg.isModuleEnabled(Read), t.Cfg.isModuleEnabled(Backend), t.isModuleActive(IndexGateway), t.isModuleActive(BloomCompactor): + case t.Cfg.isModuleEnabled(Querier), t.Cfg.isModuleEnabled(Ruler), t.Cfg.isModuleEnabled(Read), t.Cfg.isModuleEnabled(Backend), t.isModuleActive(IndexGateway), t.Cfg.isModuleEnabled(BloomCompactor): // We do not want query to do any updates to index t.Cfg.StorageConfig.BoltDBShipperConfig.Mode = indexshipper.ModeReadOnly t.Cfg.StorageConfig.TSDBShipperConfig.Mode = indexshipper.ModeReadOnly diff --git a/pkg/storage/bloom/v1/block.go b/pkg/storage/bloom/v1/block.go index 09cc5fa4866e7..84bc71a6b203c 100644 --- a/pkg/storage/bloom/v1/block.go +++ b/pkg/storage/bloom/v1/block.go @@ -32,6 +32,10 @@ func NewBlock(reader BlockReader) *Block { } } +func (b *Block) Reader() BlockReader { + return b.reader +} + func (b *Block) LoadHeaders() error { // TODO(owen-d): better control over when to decode if !b.initialized { diff --git a/pkg/storage/bloom/v1/bloom_tokenizer.go b/pkg/storage/bloom/v1/bloom_tokenizer.go index 7dd0d8ae44974..59bb2644f87e8 100644 --- a/pkg/storage/bloom/v1/bloom_tokenizer.go +++ b/pkg/storage/bloom/v1/bloom_tokenizer.go @@ -94,7 +94,7 @@ func (bt *BloomTokenizer) Populate(swb *SeriesWithBloom, chks Iterator[ChunkRefW var tokenBuf []byte var prefixLn int - for chks.Err() == nil && chks.Next() { + for chks.Next() && chks.Err() == nil { chk := chks.At() itr := chk.Itr tokenBuf, prefixLn = prefixedToken(bt.lineTokenizer.N, chk.Ref, tokenBuf) diff --git a/pkg/storage/stores/shipper/bloomshipper/client.go b/pkg/storage/stores/shipper/bloomshipper/client.go index 2e31106548d1a..80eba70d18cdb 100644 --- a/pkg/storage/stores/shipper/bloomshipper/client.go +++ b/pkg/storage/stores/shipper/bloomshipper/client.go @@ -149,7 +149,20 @@ type Block struct { Data io.ReadSeekCloser } -func BlockFrom(tenant, table string, blk *v1.Block) Block { +// CloseableReadSeekerAdapter is a wrapper around io.ReadSeeker to make it io.Closer +// if it doesn't already implement it. +type ClosableReadSeekerAdapter struct { + io.ReadSeeker +} + +func (c ClosableReadSeekerAdapter) Close() error { + if closer, ok := c.ReadSeeker.(io.Closer); ok { + return closer.Close() + } + return nil +} + +func BlockFrom(tenant, table string, blk *v1.Block) (Block, error) { md, _ := blk.Metadata() ref := Ref{ TenantID: tenant, @@ -159,9 +172,21 @@ func BlockFrom(tenant, table string, blk *v1.Block) Block { EndTimestamp: md.Series.ThroughTs, Checksum: md.Checksum, } + + // TODO(owen-d): pool + buf := bytes.NewBuffer(nil) + err := v1.TarGz(buf, blk.Reader()) + + if err != nil { + return Block{}, errors.Wrap(err, "archiving+compressing block") + } + + reader := bytes.NewReader(buf.Bytes()) + return Block{ BlockRef: BlockRef{Ref: ref}, - } + Data: ClosableReadSeekerAdapter{reader}, + }, nil } type BlockClient interface { From 0bb257404029529e316f359454209ea3a72ef8bc Mon Sep 17 00:00:00 2001 From: Owen Diehl Date: Mon, 12 Feb 2024 23:35:39 -0800 Subject: [PATCH 4/7] makes batchedLoader generic + removes unnecessary interfaces & adapters (#11924) While reviewing https://github.com/grafana/loki/pull/11919, I figured it'd be nice to make `batchedLoader` generic so we can reuse it's logic. This let me test it easier and remove a lot of now-unnecessary adapter code (interfaces, types) --- pkg/bloomcompactor/bloomcompactor.go | 2 +- pkg/bloomcompactor/spec.go | 206 ++++++++++++++------------- pkg/bloomcompactor/spec_test.go | 127 +++++++++++++++++ 3 files changed, 235 insertions(+), 100 deletions(-) diff --git a/pkg/bloomcompactor/bloomcompactor.go b/pkg/bloomcompactor/bloomcompactor.go index 5cece24172526..ed1f50ae72582 100644 --- a/pkg/bloomcompactor/bloomcompactor.go +++ b/pkg/bloomcompactor/bloomcompactor.go @@ -90,7 +90,7 @@ func New( c.metrics = NewMetrics(r, c.btMetrics) chunkLoader := NewStoreChunkLoader( - NewFetcherProviderAdapter(fetcherProvider), + fetcherProvider, c.metrics, ) diff --git a/pkg/bloomcompactor/spec.go b/pkg/bloomcompactor/spec.go index d9d9c68947a73..58dd2674895ed 100644 --- a/pkg/bloomcompactor/spec.go +++ b/pkg/bloomcompactor/spec.go @@ -16,6 +16,7 @@ import ( logql_log "github.com/grafana/loki/pkg/logql/log" v1 "github.com/grafana/loki/pkg/storage/bloom/v1" "github.com/grafana/loki/pkg/storage/chunk" + "github.com/grafana/loki/pkg/storage/chunk/fetcher" "github.com/grafana/loki/pkg/storage/stores" "github.com/grafana/loki/pkg/storage/stores/shipper/bloomshipper" "github.com/grafana/loki/pkg/storage/stores/shipper/indexshipper/tsdb" @@ -235,39 +236,13 @@ type ChunkLoader interface { Load(ctx context.Context, userID string, series *v1.Series) (*ChunkItersByFingerprint, error) } -// interface modeled from `pkg/storage/stores/composite_store.ChunkFetcherProvider` -type fetcherProvider interface { - GetChunkFetcher(model.Time) chunkFetcher -} - -// interface modeled from `pkg/storage/chunk/fetcher.Fetcher` -type chunkFetcher interface { - FetchChunks(ctx context.Context, chunks []chunk.Chunk) ([]chunk.Chunk, error) -} - -// Adapter turning `stores.ChunkFetcherProvider` into `fetcherProvider` -// The former returns a concrete type and is heavily used externally -// while the latter returns an interface for better testing and -// is used internally -type FetcherProviderAdapter struct { - root stores.ChunkFetcherProvider -} - -func NewFetcherProviderAdapter(root stores.ChunkFetcherProvider) *FetcherProviderAdapter { - return &FetcherProviderAdapter{root: root} -} - -func (f *FetcherProviderAdapter) GetChunkFetcher(t model.Time) chunkFetcher { - return f.root.GetChunkFetcher(t) -} - // StoreChunkLoader loads chunks from a store type StoreChunkLoader struct { - fetcherProvider fetcherProvider + fetcherProvider stores.ChunkFetcherProvider metrics *Metrics } -func NewStoreChunkLoader(fetcherProvider fetcherProvider, metrics *Metrics) *StoreChunkLoader { +func NewStoreChunkLoader(fetcherProvider stores.ChunkFetcherProvider, metrics *Metrics) *StoreChunkLoader { return &StoreChunkLoader{ fetcherProvider: fetcherProvider, metrics: metrics, @@ -278,7 +253,7 @@ func (s *StoreChunkLoader) Load(ctx context.Context, userID string, series *v1.S // NB(owen-d): This is probably unnecessary as we should only have one fetcher // because we'll only be working on a single index period at a time, but this should protect // us in the case of refactoring/changing this and likely isn't a perf bottleneck. - chksByFetcher := make(map[chunkFetcher][]chunk.Chunk) + chksByFetcher := make(map[*fetcher.Fetcher][]chunk.Chunk) for _, chk := range series.Chunks { fetcher := s.fetcherProvider.GetChunkFetcher(chk.Start) chksByFetcher[fetcher] = append(chksByFetcher[fetcher], chunk.Chunk{ @@ -292,119 +267,152 @@ func (s *StoreChunkLoader) Load(ctx context.Context, userID string, series *v1.S }) } - work := make([]chunkWork, 0, len(chksByFetcher)) + var ( + fetchers = make([]Fetcher[chunk.Chunk, chunk.Chunk], 0, len(chksByFetcher)) + inputs = make([][]chunk.Chunk, 0, len(chksByFetcher)) + ) for fetcher, chks := range chksByFetcher { - work = append(work, chunkWork{ - fetcher: fetcher, - chks: chks, - }) + fn := FetchFunc[chunk.Chunk, chunk.Chunk](fetcher.FetchChunks) + fetchers = append(fetchers, fn) + inputs = append(inputs, chks) } return &ChunkItersByFingerprint{ fp: series.Fingerprint, - itr: newBatchedLoader(ctx, work, batchedLoaderDefaultBatchSize, s.metrics), + itr: newBatchedChunkLoader(ctx, fetchers, inputs, s.metrics, batchedLoaderDefaultBatchSize), }, nil } -type chunkWork struct { - fetcher chunkFetcher - chks []chunk.Chunk +type Fetcher[A, B any] interface { + Fetch(ctx context.Context, inputs []A) ([]B, error) +} + +type FetchFunc[A, B any] func(ctx context.Context, inputs []A) ([]B, error) + +func (f FetchFunc[A, B]) Fetch(ctx context.Context, inputs []A) ([]B, error) { + return f(ctx, inputs) } // batchedLoader implements `v1.Iterator[v1.ChunkRefWithIter]` in batches // to ensure memory is bounded while loading chunks // TODO(owen-d): testware -type batchedLoader struct { +type batchedLoader[A, B, C any] struct { metrics *Metrics batchSize int ctx context.Context - work []chunkWork + fetchers []Fetcher[A, B] + work [][]A - cur v1.ChunkRefWithIter - batch []chunk.Chunk - err error + mapper func(B) (C, error) + cur C + batch []B + err error } const batchedLoaderDefaultBatchSize = 50 -func newBatchedLoader(ctx context.Context, work []chunkWork, batchSize int, metrics *Metrics) *batchedLoader { - return &batchedLoader{ - metrics: metrics, - batchSize: batchSize, +func newBatchedLoader[A, B, C any]( + ctx context.Context, + fetchers []Fetcher[A, B], + inputs [][]A, + mapper func(B) (C, error), + batchSize int, +) *batchedLoader[A, B, C] { + return &batchedLoader[A, B, C]{ + batchSize: max(batchSize, 1), ctx: ctx, - work: work, + fetchers: fetchers, + work: inputs, + mapper: mapper, } } -func (b *batchedLoader) Next() bool { - if len(b.batch) > 0 { - return b.prepNext(false) - } +func (b *batchedLoader[A, B, C]) Next() bool { - if len(b.work) == 0 { - return false - } + // iterate work until we have non-zero length batch + for len(b.batch) == 0 { - // setup next batch - next := b.work[0] - batchSize := min(b.batchSize, len(next.chks)) - toFetch := next.chks[:batchSize] - // update work - b.work[0].chks = next.chks[batchSize:] - if len(b.work[0].chks) == 0 { - b.work = b.work[1:] - } + // empty batch + no work remaining = we're done + if len(b.work) == 0 { + return false + } - if len(toFetch) == 0 { - return false - } + // setup next batch + next := b.work[0] + batchSize := min(b.batchSize, len(next)) + toFetch := next[:batchSize] + fetcher := b.fetchers[0] + + // update work + b.work[0] = b.work[0][batchSize:] + if len(b.work[0]) == 0 { + // if we've exhausted work from this set of inputs, + // set pointer to next set of inputs + // and their respective fetcher + b.work = b.work[1:] + b.fetchers = b.fetchers[1:] + } - b.batch, b.err = next.fetcher.FetchChunks(b.ctx, toFetch) - if b.err != nil { - return false + // there was no work in this batch; continue (should not happen) + if len(toFetch) == 0 { + continue + } + + b.batch, b.err = fetcher.Fetch(b.ctx, toFetch) + // error fetching, short-circuit iteration + if b.err != nil { + return false + } } - return b.prepNext(true) + return b.prepNext() } -func (b *batchedLoader) prepNext(checkLen bool) bool { - if checkLen && len(b.batch) == 0 { - return false - } - b.cur, b.err = b.format(b.batch[0]) +func (b *batchedLoader[_, B, C]) prepNext() bool { + b.cur, b.err = b.mapper(b.batch[0]) b.batch = b.batch[1:] return b.err == nil } -func (b *batchedLoader) format(c chunk.Chunk) (v1.ChunkRefWithIter, error) { - chk := c.Data.(*chunkenc.Facade).LokiChunk() - b.metrics.chunkSize.Observe(float64(chk.UncompressedSize())) - itr, err := chk.Iterator( - b.ctx, - time.Unix(0, 0), - time.Unix(0, math.MaxInt64), - logproto.FORWARD, - logql_log.NewNoopPipeline().ForStream(c.Metric), - ) +func newBatchedChunkLoader( + ctx context.Context, + fetchers []Fetcher[chunk.Chunk, chunk.Chunk], + inputs [][]chunk.Chunk, + metrics *Metrics, + batchSize int, +) *batchedLoader[chunk.Chunk, chunk.Chunk, v1.ChunkRefWithIter] { + + mapper := func(c chunk.Chunk) (v1.ChunkRefWithIter, error) { + chk := c.Data.(*chunkenc.Facade).LokiChunk() + metrics.chunkSize.Observe(float64(chk.UncompressedSize())) + itr, err := chk.Iterator( + ctx, + time.Unix(0, 0), + time.Unix(0, math.MaxInt64), + logproto.FORWARD, + logql_log.NewNoopPipeline().ForStream(c.Metric), + ) - if err != nil { - return v1.ChunkRefWithIter{}, err - } + if err != nil { + return v1.ChunkRefWithIter{}, err + } - return v1.ChunkRefWithIter{ - Ref: v1.ChunkRef{ - Start: c.From, - End: c.Through, - Checksum: c.Checksum, - }, - Itr: itr, - }, nil + return v1.ChunkRefWithIter{ + Ref: v1.ChunkRef{ + Start: c.From, + End: c.Through, + Checksum: c.Checksum, + }, + Itr: itr, + }, nil + } + return newBatchedLoader(ctx, fetchers, inputs, mapper, batchSize) } -func (b *batchedLoader) At() v1.ChunkRefWithIter { +func (b *batchedLoader[_, _, C]) At() C { return b.cur } -func (b *batchedLoader) Err() error { +func (b *batchedLoader[_, _, _]) Err() error { return b.err } diff --git a/pkg/bloomcompactor/spec_test.go b/pkg/bloomcompactor/spec_test.go index 798d65e2f2bcd..44b1fa26a4d1f 100644 --- a/pkg/bloomcompactor/spec_test.go +++ b/pkg/bloomcompactor/spec_test.go @@ -3,6 +3,7 @@ package bloomcompactor import ( "bytes" "context" + "errors" "testing" "github.com/go-kit/log" @@ -155,3 +156,129 @@ func TestSimpleBloomGenerator(t *testing.T) { }) } } + +func TestBatchedLoader(t *testing.T) { + errMapper := func(i int) (int, error) { + return 0, errors.New("bzzt") + } + successMapper := func(i int) (int, error) { + return i, nil + } + + expired, cancel := context.WithCancel(context.Background()) + cancel() + + for _, tc := range []struct { + desc string + ctx context.Context + batchSize int + mapper func(int) (int, error) + err bool + inputs [][]int + exp []int + }{ + { + desc: "OneBatch", + ctx: context.Background(), + batchSize: 2, + mapper: successMapper, + err: false, + inputs: [][]int{{0, 1}}, + exp: []int{0, 1}, + }, + { + desc: "ZeroBatchSizeStillWorks", + ctx: context.Background(), + batchSize: 0, + mapper: successMapper, + err: false, + inputs: [][]int{{0, 1}}, + exp: []int{0, 1}, + }, + { + desc: "OneBatchLessThanFull", + ctx: context.Background(), + batchSize: 2, + mapper: successMapper, + err: false, + inputs: [][]int{{0}}, + exp: []int{0}, + }, + { + desc: "TwoBatches", + ctx: context.Background(), + batchSize: 2, + mapper: successMapper, + err: false, + inputs: [][]int{{0, 1, 2, 3}}, + exp: []int{0, 1, 2, 3}, + }, + { + desc: "MultipleBatchesMultipleLoaders", + ctx: context.Background(), + batchSize: 2, + mapper: successMapper, + err: false, + inputs: [][]int{{0, 1}, {2}, {3, 4, 5}}, + exp: []int{0, 1, 2, 3, 4, 5}, + }, + { + desc: "HandlesEmptyInputs", + ctx: context.Background(), + batchSize: 2, + mapper: successMapper, + err: false, + inputs: [][]int{{0, 1, 2, 3}, nil, {4}}, + exp: []int{0, 1, 2, 3, 4}, + }, + { + desc: "Timeout", + ctx: expired, + batchSize: 2, + mapper: successMapper, + err: true, + inputs: [][]int{{0}}, + }, + { + desc: "MappingFailure", + ctx: context.Background(), + batchSize: 2, + mapper: errMapper, + err: true, + inputs: [][]int{{0}}, + }, + } { + t.Run(tc.desc, func(t *testing.T) { + fetchers := make([]Fetcher[int, int], 0, len(tc.inputs)) + for range tc.inputs { + fetchers = append( + fetchers, + FetchFunc[int, int](func(ctx context.Context, xs []int) ([]int, error) { + if ctx.Err() != nil { + return nil, ctx.Err() + } + return xs, nil + }), + ) + } + + loader := newBatchedLoader[int, int, int]( + tc.ctx, + fetchers, + tc.inputs, + tc.mapper, + tc.batchSize, + ) + + got, err := v1.Collect[int](loader) + if tc.err { + require.Error(t, err) + return + } + require.NoError(t, err) + require.Equal(t, tc.exp, got) + + }) + } + +} From eb8464a64117a070d9e1baf784b9ee59fca1911f Mon Sep 17 00:00:00 2001 From: Christian Haudum Date: Tue, 13 Feb 2024 09:10:56 +0100 Subject: [PATCH 5/7] Bloom compactor: Load blocks lazily in batches (#11919) To avoid loading possibly lots of blocks upfront, this PR introduces lazy loading of blocks in batches using an iterator that loads blocks on demand. Signed-off-by: Christian Haudum --- pkg/bloomcompactor/batch.go | 95 ++++++++++++++++++++++++++++++++ pkg/bloomcompactor/batch_test.go | 37 +++++++++++++ pkg/bloomcompactor/controller.go | 68 +++++++++++++++-------- pkg/bloomcompactor/spec.go | 30 ++++++---- pkg/bloomcompactor/spec_test.go | 5 +- pkg/storage/bloom/v1/util.go | 12 ++++ 6 files changed, 212 insertions(+), 35 deletions(-) create mode 100644 pkg/bloomcompactor/batch.go create mode 100644 pkg/bloomcompactor/batch_test.go diff --git a/pkg/bloomcompactor/batch.go b/pkg/bloomcompactor/batch.go new file mode 100644 index 0000000000000..2d43f83219df9 --- /dev/null +++ b/pkg/bloomcompactor/batch.go @@ -0,0 +1,95 @@ +package bloomcompactor + +import ( + "context" + + "github.com/grafana/dskit/multierror" + + "github.com/grafana/loki/pkg/storage/stores/shipper/bloomshipper" +) + +// interface modeled from `pkg/storage/stores/shipper/bloomshipper.Fetcher` +type blocksFetcher interface { + FetchBlocks(context.Context, []bloomshipper.BlockRef) ([]*bloomshipper.CloseableBlockQuerier, error) +} + +func newBatchedBlockLoader(ctx context.Context, fetcher blocksFetcher, blocks []bloomshipper.BlockRef) (*batchedBlockLoader, error) { + return &batchedBlockLoader{ + ctx: ctx, + batchSize: 10, // make configurable? + source: blocks, + fetcher: fetcher, + }, nil +} + +type batchedBlockLoader struct { + ctx context.Context + batchSize int + + source []bloomshipper.BlockRef + fetcher blocksFetcher + + batch []*bloomshipper.CloseableBlockQuerier + cur *bloomshipper.CloseableBlockQuerier + err error +} + +// At implements v1.CloseableIterator. +func (b *batchedBlockLoader) At() *bloomshipper.CloseableBlockQuerier { + return b.cur +} + +// Close implements v1.CloseableIterator. +func (b *batchedBlockLoader) Close() error { + if b.cur != nil { + return b.cur.Close() + } + return nil +} + +// CloseBatch closes the remaining items from the current batch +func (b *batchedBlockLoader) CloseBatch() error { + var err multierror.MultiError + for _, cur := range b.batch { + err.Add(cur.Close()) + } + if len(b.batch) > 0 { + b.batch = b.batch[:0] + } + return err.Err() +} + +// Err implements v1.CloseableIterator. +func (b *batchedBlockLoader) Err() error { + return b.err +} + +// Next implements v1.CloseableIterator. +func (b *batchedBlockLoader) Next() bool { + if len(b.batch) > 0 { + return b.setNext() + } + + if len(b.source) == 0 { + return false + } + + // setup next batch + batchSize := min(b.batchSize, len(b.source)) + toFetch := b.source[:batchSize] + + // update source + b.source = b.source[batchSize:] + + b.batch, b.err = b.fetcher.FetchBlocks(b.ctx, toFetch) + if b.err != nil { + return false + } + return b.setNext() +} + +func (b *batchedBlockLoader) setNext() bool { + b.cur, b.err = b.batch[0], nil + b.batch = b.batch[1:] + return true +} diff --git a/pkg/bloomcompactor/batch_test.go b/pkg/bloomcompactor/batch_test.go new file mode 100644 index 0000000000000..a1922bf931b86 --- /dev/null +++ b/pkg/bloomcompactor/batch_test.go @@ -0,0 +1,37 @@ +package bloomcompactor + +import ( + "context" + "testing" + + "github.com/stretchr/testify/require" + "go.uber.org/atomic" + + "github.com/grafana/loki/pkg/storage/stores/shipper/bloomshipper" +) + +type dummyBlocksFetcher struct { + count *atomic.Int32 +} + +func (f *dummyBlocksFetcher) FetchBlocks(_ context.Context, blocks []bloomshipper.BlockRef) ([]*bloomshipper.CloseableBlockQuerier, error) { + f.count.Inc() + return make([]*bloomshipper.CloseableBlockQuerier, len(blocks)), nil +} + +func TestBatchedBlockLoader(t *testing.T) { + ctx := context.Background() + f := &dummyBlocksFetcher{count: atomic.NewInt32(0)} + + blocks := make([]bloomshipper.BlockRef, 25) + blocksIter, err := newBatchedBlockLoader(ctx, f, blocks) + require.NoError(t, err) + + var count int + for blocksIter.Next() && blocksIter.Err() == nil { + count++ + } + + require.Equal(t, len(blocks), count) + require.Equal(t, int32(len(blocks)/blocksIter.batchSize+1), f.count.Load()) +} diff --git a/pkg/bloomcompactor/controller.go b/pkg/bloomcompactor/controller.go index 38831ef932e69..47d9627d92e1a 100644 --- a/pkg/bloomcompactor/controller.go +++ b/pkg/bloomcompactor/controller.go @@ -4,6 +4,7 @@ import ( "bytes" "context" "fmt" + "io" "sort" "github.com/go-kit/log" @@ -138,45 +139,36 @@ func (s *SimpleBloomController) buildBlocks( for _, gap := range plan.gaps { // Fetch blocks that aren't up to date but are in the desired fingerprint range // to try and accelerate bloom creation - seriesItr, preExistingBlocks, err := s.loadWorkForGap(ctx, table, tenant, plan.tsdb, gap) + seriesItr, blocksIter, err := s.loadWorkForGap(ctx, table, tenant, plan.tsdb, gap) if err != nil { level.Error(logger).Log("msg", "failed to get series and blocks", "err", err) return errors.Wrap(err, "failed to get series and blocks") } - // Close all remaining blocks on exit - closePreExistingBlocks := func() { - var closeErrors multierror.MultiError - for _, block := range preExistingBlocks { - closeErrors.Add(block.Close()) - } - if err := closeErrors.Err(); err != nil { - level.Error(s.logger).Log("msg", "failed to close blocks", "err", err) - } - } gen := NewSimpleBloomGenerator( tenant, blockOpts, seriesItr, s.chunkLoader, - preExistingBlocks, + blocksIter, s.rwFn, s.metrics, - log.With(logger, "tsdb", plan.tsdb.Name(), "ownership", gap, "blocks", len(preExistingBlocks)), + log.With(logger, "tsdb", plan.tsdb.Name(), "ownership", gap), ) - _, newBlocks, err := gen.Generate(ctx) + _, loaded, newBlocks, err := gen.Generate(ctx) + if err != nil { // TODO(owen-d): metrics level.Error(logger).Log("msg", "failed to generate bloom", "err", err) - closePreExistingBlocks() + s.closeLoadedBlocks(loaded, blocksIter) return errors.Wrap(err, "failed to generate bloom") } client, err := s.bloomStore.Client(table.ModelTime()) if err != nil { level.Error(logger).Log("msg", "failed to get client", "err", err) - closePreExistingBlocks() + s.closeLoadedBlocks(loaded, blocksIter) return errors.Wrap(err, "failed to get client") } @@ -195,7 +187,7 @@ func (s *SimpleBloomController) buildBlocks( built, ); err != nil { level.Error(logger).Log("msg", "failed to write block", "err", err) - closePreExistingBlocks() + s.closeLoadedBlocks(loaded, blocksIter) return errors.Wrap(err, "failed to write block") } } @@ -203,12 +195,12 @@ func (s *SimpleBloomController) buildBlocks( if err := newBlocks.Err(); err != nil { // TODO(owen-d): metrics level.Error(logger).Log("msg", "failed to generate bloom", "err", err) - closePreExistingBlocks() + s.closeLoadedBlocks(loaded, blocksIter) return errors.Wrap(err, "failed to generate bloom") } // Close pre-existing blocks - closePreExistingBlocks() + s.closeLoadedBlocks(loaded, blocksIter) } } @@ -226,19 +218,49 @@ func (s *SimpleBloomController) loadWorkForGap( tenant string, id tsdb.Identifier, gap gapWithBlocks, -) (v1.CloseableIterator[*v1.Series], []*bloomshipper.CloseableBlockQuerier, error) { +) (v1.CloseableIterator[*v1.Series], v1.CloseableIterator[*bloomshipper.CloseableBlockQuerier], error) { // load a series iterator for the gap seriesItr, err := s.tsdbStore.LoadTSDB(ctx, table, tenant, id, gap.bounds) if err != nil { return nil, nil, errors.Wrap(err, "failed to load tsdb") } - blocks, err := s.bloomStore.FetchBlocks(ctx, gap.blocks) + // load a blocks iterator for the gap + fetcher, err := s.bloomStore.Fetcher(table.ModelTime()) if err != nil { - return nil, nil, errors.Wrap(err, "failed to get blocks") + return nil, nil, errors.Wrap(err, "failed to get fetcher") } - return seriesItr, blocks, nil + blocksIter, err := newBatchedBlockLoader(ctx, fetcher, gap.blocks) + if err != nil { + return nil, nil, errors.Wrap(err, "failed to load blocks") + } + + return seriesItr, blocksIter, nil +} + +func (s *SimpleBloomController) closeLoadedBlocks(toClose []io.Closer, it v1.CloseableIterator[*bloomshipper.CloseableBlockQuerier]) { + // close loaded blocks + var err multierror.MultiError + for _, closer := range toClose { + err.Add(closer.Close()) + } + + switch itr := it.(type) { + case *batchedBlockLoader: + // close remaining loaded blocks from batch + err.Add(itr.CloseBatch()) + default: + // close remaining loaded blocks + for itr.Next() && itr.Err() == nil { + err.Add(itr.At().Close()) + } + } + + // log error + if err.Err() != nil { + level.Error(s.logger).Log("msg", "failed to close blocks", "err", err) + } } type gapWithBlocks struct { diff --git a/pkg/bloomcompactor/spec.go b/pkg/bloomcompactor/spec.go index 58dd2674895ed..4a1125082ca54 100644 --- a/pkg/bloomcompactor/spec.go +++ b/pkg/bloomcompactor/spec.go @@ -3,6 +3,7 @@ package bloomcompactor import ( "context" "fmt" + "io" "math" "time" @@ -39,7 +40,7 @@ func (k Keyspace) Cmp(other Keyspace) v1.BoundsCheck { // Store is likely bound within. This allows specifying impls like ShardedStore // to only request the shard-range needed from the existing store. type BloomGenerator interface { - Generate(ctx context.Context) (skippedBlocks []*v1.Block, results v1.Iterator[*v1.Block], err error) + Generate(ctx context.Context) (skippedBlocks []v1.BlockMetadata, toClose []io.Closer, results v1.Iterator[*v1.Block], err error) } // Simple implementation of a BloomGenerator. @@ -47,9 +48,7 @@ type SimpleBloomGenerator struct { userID string store v1.Iterator[*v1.Series] chunkLoader ChunkLoader - // TODO(owen-d): blocks need not be all downloaded prior. Consider implementing - // as an iterator of iterators, where each iterator is a batch of overlapping blocks. - blocks []*bloomshipper.CloseableBlockQuerier + blocksIter v1.CloseableIterator[*bloomshipper.CloseableBlockQuerier] // options to build blocks with opts v1.BlockOptions @@ -71,7 +70,7 @@ func NewSimpleBloomGenerator( opts v1.BlockOptions, store v1.Iterator[*v1.Series], chunkLoader ChunkLoader, - blocks []*bloomshipper.CloseableBlockQuerier, + blocksIter v1.CloseableIterator[*bloomshipper.CloseableBlockQuerier], readWriterFn func() (v1.BlockWriter, v1.BlockReader), metrics *Metrics, logger log.Logger, @@ -81,7 +80,7 @@ func NewSimpleBloomGenerator( opts: opts, store: store, chunkLoader: chunkLoader, - blocks: blocks, + blocksIter: blocksIter, logger: log.With(logger, "component", "bloom_generator"), readWriterFn: readWriterFn, metrics: metrics, @@ -108,9 +107,15 @@ func (s *SimpleBloomGenerator) populator(ctx context.Context) func(series *v1.Se } -func (s *SimpleBloomGenerator) Generate(ctx context.Context) (skippedBlocks []v1.BlockMetadata, results v1.Iterator[*v1.Block], err error) { - blocksMatchingSchema := make([]*bloomshipper.CloseableBlockQuerier, 0, len(s.blocks)) - for _, block := range s.blocks { +func (s *SimpleBloomGenerator) Generate(ctx context.Context) ([]v1.BlockMetadata, []io.Closer, v1.Iterator[*v1.Block], error) { + skippedBlocks := make([]v1.BlockMetadata, 0) + toClose := make([]io.Closer, 0) + blocksMatchingSchema := make([]*bloomshipper.CloseableBlockQuerier, 0) + + for s.blocksIter.Next() && s.blocksIter.Err() == nil { + block := s.blocksIter.At() + toClose = append(toClose, block) + logger := log.With(s.logger, "block", block.BlockRef) md, err := block.Metadata() schema := md.Options.Schema @@ -130,11 +135,16 @@ func (s *SimpleBloomGenerator) Generate(ctx context.Context) (skippedBlocks []v1 blocksMatchingSchema = append(blocksMatchingSchema, block) } + if s.blocksIter.Err() != nil { + // should we ignore the error and continue with the blocks we got? + return skippedBlocks, toClose, v1.NewSliceIter([]*v1.Block{}), s.blocksIter.Err() + } + level.Debug(s.logger).Log("msg", "generating bloom filters for blocks", "num_blocks", len(blocksMatchingSchema), "skipped_blocks", len(skippedBlocks), "schema", fmt.Sprintf("%+v", s.opts.Schema)) series := v1.NewPeekingIter(s.store) blockIter := NewLazyBlockBuilderIterator(ctx, s.opts, s.populator(ctx), s.readWriterFn, series, blocksMatchingSchema) - return skippedBlocks, blockIter, nil + return skippedBlocks, toClose, blockIter, nil } // LazyBlockBuilderIterator is a lazy iterator over blocks that builds diff --git a/pkg/bloomcompactor/spec_test.go b/pkg/bloomcompactor/spec_test.go index 44b1fa26a4d1f..bb4fde6cc2359 100644 --- a/pkg/bloomcompactor/spec_test.go +++ b/pkg/bloomcompactor/spec_test.go @@ -71,13 +71,14 @@ func dummyBloomGen(opts v1.BlockOptions, store v1.Iterator[*v1.Series], blocks [ BlockQuerier: v1.NewBlockQuerier(b), }) } + blocksIter := v1.NewCloseableIterator(v1.NewSliceIter(bqs)) return NewSimpleBloomGenerator( "fake", opts, store, dummyChunkLoader{}, - bqs, + blocksIter, func() (v1.BlockWriter, v1.BlockReader) { indexBuf := bytes.NewBuffer(nil) bloomsBuf := bytes.NewBuffer(nil) @@ -130,7 +131,7 @@ func TestSimpleBloomGenerator(t *testing.T) { ) gen := dummyBloomGen(tc.toSchema, storeItr, sourceBlocks) - skipped, results, err := gen.Generate(context.Background()) + skipped, _, results, err := gen.Generate(context.Background()) require.Nil(t, err) require.Equal(t, tc.numSkipped, len(skipped)) diff --git a/pkg/storage/bloom/v1/util.go b/pkg/storage/bloom/v1/util.go index d980a9ecc4df1..3b9e0631b715d 100644 --- a/pkg/storage/bloom/v1/util.go +++ b/pkg/storage/bloom/v1/util.go @@ -247,6 +247,18 @@ type CloseableIterator[T any] interface { Close() error } +func NewCloseableIterator[T io.Closer](itr Iterator[T]) *CloseIter[T] { + return &CloseIter[T]{itr} +} + +type CloseIter[T io.Closer] struct { + Iterator[T] +} + +func (i *CloseIter[T]) Close() error { + return i.At().Close() +} + type PeekingCloseableIterator[T any] interface { PeekingIterator[T] CloseableIterator[T] From 6c5c347270e7f41ed4240945fcd567c7e4630921 Mon Sep 17 00:00:00 2001 From: Robert Jacob Date: Tue, 13 Feb 2024 09:23:20 +0100 Subject: [PATCH 6/7] operator: Refactor handling of credentials in managed-auth mode (#11920) --- operator/CHANGELOG.md | 1 + .../apis/config/v1/projectconfig_types.go | 9 +- operator/apis/loki/v1/lokistack_types.go | 27 +++ .../loki-operator.clusterserviceversion.yaml | 3 +- .../loki.grafana.com_lokistacks.yaml | 8 + .../loki-operator.clusterserviceversion.yaml | 3 +- .../loki.grafana.com_lokistacks.yaml | 8 + .../loki-operator.clusterserviceversion.yaml | 3 +- .../loki.grafana.com_lokistacks.yaml | 8 + .../bases/loki.grafana.com_lokistacks.yaml | 8 + operator/config/rbac/role.yaml | 1 + .../loki/credentialsrequests_controller.go | 82 --------- .../credentialsrequests_controller_test.go | 164 ------------------ .../lokistack/credentialsrequest_discovery.go | 30 ---- .../credentialsrequest_discovery_test.go | 98 ----------- .../controllers/loki/lokistack_controller.go | 60 ++----- .../loki/lokistack_controller_test.go | 30 ++-- operator/docs/operator/api.md | 48 +++++ operator/docs/operator/feature-gates.md | 3 +- operator/internal/config/managed_auth.go | 48 +++++ operator/internal/config/options.go | 13 +- .../handlers/credentialsrequest_create.go | 80 ++++++--- .../credentialsrequest_create_test.go | 157 +++++++++++++---- .../handlers/credentialsrequest_delete.go | 43 ----- .../credentialsrequest_delete_test.go | 47 ----- .../handlers/internal/storage/secrets.go | 32 ++-- .../handlers/internal/storage/secrets_test.go | 72 +++++--- .../handlers/internal/storage/storage_test.go | 75 -------- .../handlers/lokistack_create_or_update.go | 32 ++-- .../lokistack_create_or_update_test.go | 18 +- operator/internal/manifests/mutate.go | 10 ++ .../manifests/openshift/credentialsrequest.go | 62 +------ .../openshift/credentialsrequest_test.go | 42 ++--- .../internal/manifests/openshift/options.go | 19 +- operator/internal/manifests/openshift/var.go | 2 - .../internal/manifests/storage/configure.go | 55 +++--- .../manifests/storage/configure_test.go | 76 +++----- .../internal/manifests/storage/options.go | 35 ++++ operator/internal/manifests/storage/var.go | 36 ++-- operator/internal/status/status.go | 3 +- operator/internal/status/status_test.go | 8 +- operator/main.go | 24 +-- 42 files changed, 621 insertions(+), 962 deletions(-) delete mode 100644 operator/controllers/loki/credentialsrequests_controller.go delete mode 100644 operator/controllers/loki/credentialsrequests_controller_test.go delete mode 100644 operator/controllers/loki/internal/lokistack/credentialsrequest_discovery.go delete mode 100644 operator/controllers/loki/internal/lokistack/credentialsrequest_discovery_test.go create mode 100644 operator/internal/config/managed_auth.go delete mode 100644 operator/internal/handlers/credentialsrequest_delete.go delete mode 100644 operator/internal/handlers/credentialsrequest_delete_test.go diff --git a/operator/CHANGELOG.md b/operator/CHANGELOG.md index 2a1ebc2f5d362..59afb29708782 100644 --- a/operator/CHANGELOG.md +++ b/operator/CHANGELOG.md @@ -1,5 +1,6 @@ ## Main +- [11920](https://github.com/grafana/loki/pull/11920) **xperimental**: Refactor handling of credentials in managed-auth mode - [11869](https://github.com/grafana/loki/pull/11869) **periklis**: Add support for running with Google Workload Identity - [11868](https://github.com/grafana/loki/pull/11868) **xperimental**: Integrate support for OpenShift-managed credentials in Azure - [11854](https://github.com/grafana/loki/pull/11854) **periklis**: Allow custom audience for managed-auth on STS diff --git a/operator/apis/config/v1/projectconfig_types.go b/operator/apis/config/v1/projectconfig_types.go index 06ff8cb090598..8e510b5d3ab79 100644 --- a/operator/apis/config/v1/projectconfig_types.go +++ b/operator/apis/config/v1/projectconfig_types.go @@ -52,16 +52,11 @@ type OpenShiftFeatureGates struct { // Dashboards enables the loki-mixin dashboards into the OpenShift Console Dashboards bool `json:"dashboards,omitempty"` - // ManagedAuthEnv enabled when the operator installation is on OpenShift STS clusters. + // ManagedAuthEnv is true when OpenShift-functions are enabled and the operator has detected + // that it is running with some kind of "workload identity" (AWS STS, Azure WIF) enabled. ManagedAuthEnv bool } -// ManagedAuthEnabled returns true when OpenShift-functions are enabled and the operator has detected that it is -// running with some kind of "workload identity" (AWS STS, Azure WIF) enabled. -func (o *OpenShiftFeatureGates) ManagedAuthEnabled() bool { - return o.Enabled && o.ManagedAuthEnv -} - // FeatureGates is the supported set of all operator feature gates. type FeatureGates struct { // ServiceMonitors enables creating a Prometheus-Operator managed ServiceMonitor diff --git a/operator/apis/loki/v1/lokistack_types.go b/operator/apis/loki/v1/lokistack_types.go index a50fb48b187ea..b652ba0c7a4d9 100644 --- a/operator/apis/loki/v1/lokistack_types.go +++ b/operator/apis/loki/v1/lokistack_types.go @@ -1174,6 +1174,27 @@ type LokiStackComponentStatus struct { Ruler PodStatusMap `json:"ruler,omitempty"` } +// CredentialMode represents the type of authentication used for accessing the object storage. +// +// +kubebuilder:validation:Enum=static;token;managed +type CredentialMode string + +const ( + // CredentialModeStatic represents the usage of static, long-lived credentials stored in a Secret. + // This is the default authentication mode and available for all supported object storage types. + CredentialModeStatic CredentialMode = "static" + // CredentialModeToken represents the usage of short-lived tokens retrieved from a credential source. + // In this mode the static configuration does not contain credentials needed for the object storage. + // Instead, they are generated during runtime using a service, which allows for shorter-lived credentials and + // much more granular control. This authentication mode is not supported for all object storage types. + CredentialModeToken CredentialMode = "token" + // CredentialModeManaged represents the usage of short-lived tokens retrieved from a credential source. + // This mode is similar to CredentialModeToken,but instead of having a user-configured credential source, + // it is configured by the environment, for example the Cloud Credential Operator in OpenShift. + // This mode is only supported for certain object storage types in certain runtime environments. + CredentialModeManaged CredentialMode = "managed" +) + // LokiStackStorageStatus defines the observed state of // the Loki storage configuration. type LokiStackStorageStatus struct { @@ -1183,6 +1204,12 @@ type LokiStackStorageStatus struct { // +optional // +kubebuilder:validation:Optional Schemas []ObjectStorageSchema `json:"schemas,omitempty"` + + // CredentialMode contains the authentication mode used for accessing the object storage. + // + // +optional + // +kubebuilder:validation:Optional + CredentialMode CredentialMode `json:"credentialMode,omitempty"` } // LokiStackStatus defines the observed state of LokiStack diff --git a/operator/bundle/community-openshift/manifests/loki-operator.clusterserviceversion.yaml b/operator/bundle/community-openshift/manifests/loki-operator.clusterserviceversion.yaml index 6854bf38ff661..ad2b2e1bc93b4 100644 --- a/operator/bundle/community-openshift/manifests/loki-operator.clusterserviceversion.yaml +++ b/operator/bundle/community-openshift/manifests/loki-operator.clusterserviceversion.yaml @@ -150,7 +150,7 @@ metadata: categories: OpenShift Optional, Logging & Tracing certified: "false" containerImage: docker.io/grafana/loki-operator:0.5.0 - createdAt: "2024-01-31T16:48:07Z" + createdAt: "2024-02-12T14:48:52Z" description: The Community Loki Operator provides Kubernetes native deployment and management of Loki and related logging components. features.operators.openshift.io/disconnected: "true" @@ -1472,6 +1472,7 @@ spec: - delete - get - list + - update - watch - apiGroups: - config.openshift.io diff --git a/operator/bundle/community-openshift/manifests/loki.grafana.com_lokistacks.yaml b/operator/bundle/community-openshift/manifests/loki.grafana.com_lokistacks.yaml index a8033e692214e..e1a7e5578965a 100644 --- a/operator/bundle/community-openshift/manifests/loki.grafana.com_lokistacks.yaml +++ b/operator/bundle/community-openshift/manifests/loki.grafana.com_lokistacks.yaml @@ -4064,6 +4064,14 @@ spec: description: Storage provides summary of all changes that have occurred to the storage configuration. properties: + credentialMode: + description: CredentialMode contains the authentication mode used + for accessing the object storage. + enum: + - static + - token + - managed + type: string schemas: description: Schemas is a list of schemas which have been applied to the LokiStack. diff --git a/operator/bundle/community/manifests/loki-operator.clusterserviceversion.yaml b/operator/bundle/community/manifests/loki-operator.clusterserviceversion.yaml index f8c37162b5a44..b372a29504e3a 100644 --- a/operator/bundle/community/manifests/loki-operator.clusterserviceversion.yaml +++ b/operator/bundle/community/manifests/loki-operator.clusterserviceversion.yaml @@ -150,7 +150,7 @@ metadata: categories: OpenShift Optional, Logging & Tracing certified: "false" containerImage: docker.io/grafana/loki-operator:0.5.0 - createdAt: "2024-01-31T16:48:04Z" + createdAt: "2024-02-12T14:48:49Z" description: The Community Loki Operator provides Kubernetes native deployment and management of Loki and related logging components. operators.operatorframework.io/builder: operator-sdk-unknown @@ -1452,6 +1452,7 @@ spec: - delete - get - list + - update - watch - apiGroups: - config.openshift.io diff --git a/operator/bundle/community/manifests/loki.grafana.com_lokistacks.yaml b/operator/bundle/community/manifests/loki.grafana.com_lokistacks.yaml index 8b86ddfff8bbf..f92665f5095d2 100644 --- a/operator/bundle/community/manifests/loki.grafana.com_lokistacks.yaml +++ b/operator/bundle/community/manifests/loki.grafana.com_lokistacks.yaml @@ -4064,6 +4064,14 @@ spec: description: Storage provides summary of all changes that have occurred to the storage configuration. properties: + credentialMode: + description: CredentialMode contains the authentication mode used + for accessing the object storage. + enum: + - static + - token + - managed + type: string schemas: description: Schemas is a list of schemas which have been applied to the LokiStack. diff --git a/operator/bundle/openshift/manifests/loki-operator.clusterserviceversion.yaml b/operator/bundle/openshift/manifests/loki-operator.clusterserviceversion.yaml index 234ddb423a3aa..8026bbcd0fc4c 100644 --- a/operator/bundle/openshift/manifests/loki-operator.clusterserviceversion.yaml +++ b/operator/bundle/openshift/manifests/loki-operator.clusterserviceversion.yaml @@ -150,7 +150,7 @@ metadata: categories: OpenShift Optional, Logging & Tracing certified: "false" containerImage: quay.io/openshift-logging/loki-operator:0.1.0 - createdAt: "2024-01-31T16:48:10Z" + createdAt: "2024-02-12T14:48:55Z" description: | The Loki Operator for OCP provides a means for configuring and managing a Loki stack for cluster logging. ## Prerequisites and Requirements @@ -1457,6 +1457,7 @@ spec: - delete - get - list + - update - watch - apiGroups: - config.openshift.io diff --git a/operator/bundle/openshift/manifests/loki.grafana.com_lokistacks.yaml b/operator/bundle/openshift/manifests/loki.grafana.com_lokistacks.yaml index f121699ec6fb8..3163752ad36f0 100644 --- a/operator/bundle/openshift/manifests/loki.grafana.com_lokistacks.yaml +++ b/operator/bundle/openshift/manifests/loki.grafana.com_lokistacks.yaml @@ -4064,6 +4064,14 @@ spec: description: Storage provides summary of all changes that have occurred to the storage configuration. properties: + credentialMode: + description: CredentialMode contains the authentication mode used + for accessing the object storage. + enum: + - static + - token + - managed + type: string schemas: description: Schemas is a list of schemas which have been applied to the LokiStack. diff --git a/operator/config/crd/bases/loki.grafana.com_lokistacks.yaml b/operator/config/crd/bases/loki.grafana.com_lokistacks.yaml index 4661097811b75..d603ef2a9b644 100644 --- a/operator/config/crd/bases/loki.grafana.com_lokistacks.yaml +++ b/operator/config/crd/bases/loki.grafana.com_lokistacks.yaml @@ -4046,6 +4046,14 @@ spec: description: Storage provides summary of all changes that have occurred to the storage configuration. properties: + credentialMode: + description: CredentialMode contains the authentication mode used + for accessing the object storage. + enum: + - static + - token + - managed + type: string schemas: description: Schemas is a list of schemas which have been applied to the LokiStack. diff --git a/operator/config/rbac/role.yaml b/operator/config/rbac/role.yaml index 766a6d7d191e6..072efd5b99128 100644 --- a/operator/config/rbac/role.yaml +++ b/operator/config/rbac/role.yaml @@ -56,6 +56,7 @@ rules: - delete - get - list + - update - watch - apiGroups: - config.openshift.io diff --git a/operator/controllers/loki/credentialsrequests_controller.go b/operator/controllers/loki/credentialsrequests_controller.go deleted file mode 100644 index efd0226c6a340..0000000000000 --- a/operator/controllers/loki/credentialsrequests_controller.go +++ /dev/null @@ -1,82 +0,0 @@ -package controllers - -import ( - "context" - - "github.com/go-logr/logr" - corev1 "k8s.io/api/core/v1" - apierrors "k8s.io/apimachinery/pkg/api/errors" - "k8s.io/apimachinery/pkg/runtime" - ctrl "sigs.k8s.io/controller-runtime" - "sigs.k8s.io/controller-runtime/pkg/client" - - lokiv1 "github.com/grafana/loki/operator/apis/loki/v1" - "github.com/grafana/loki/operator/controllers/loki/internal/lokistack" - "github.com/grafana/loki/operator/controllers/loki/internal/management/state" - "github.com/grafana/loki/operator/internal/external/k8s" - "github.com/grafana/loki/operator/internal/handlers" -) - -// CredentialsRequestsReconciler reconciles a single CredentialsRequest resource for each LokiStack request. -type CredentialsRequestsReconciler struct { - client.Client - Scheme *runtime.Scheme - Log logr.Logger -} - -// Reconcile creates a single CredentialsRequest per LokiStack for the OpenShift cloud-credentials-operator (CCO) to -// provide a managed cloud credentials Secret. On successful creation, the LokiStack resource is annotated -// with `loki.grafana.com/credentials-request-secret-ref` that refers to the secret provided by CCO. If the LokiStack -// resource is not found its accompanying CredentialsRequest resource is deleted. -func (r *CredentialsRequestsReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) { - var stack lokiv1.LokiStack - if err := r.Client.Get(ctx, req.NamespacedName, &stack); err != nil { - if apierrors.IsNotFound(err) { - return ctrl.Result{}, handlers.DeleteCredentialsRequest(ctx, r.Client, req.NamespacedName) - } - return ctrl.Result{}, err - } - - managed, err := state.IsManaged(ctx, req, r.Client) - if err != nil { - return ctrl.Result{}, err - } - if !managed { - r.Log.Info("Skipping reconciliation for unmanaged LokiStack resource", "name", req.String()) - // Stop requeueing for unmanaged LokiStack custom resources - return ctrl.Result{}, nil - } - - storageSecretName := client.ObjectKey{ - Namespace: req.Namespace, - Name: stack.Spec.Storage.Secret.Name, - } - storageSecret := &corev1.Secret{} - err = r.Client.Get(ctx, storageSecretName, storageSecret) - if err != nil { - return ctrl.Result{}, err - } - - secretRef, err := handlers.CreateCredentialsRequest(ctx, r.Client, req.NamespacedName, storageSecret) - if err != nil { - return ctrl.Result{}, err - } - - if err := lokistack.AnnotateForCredentialsRequest(ctx, r.Client, req.NamespacedName, secretRef); err != nil { - return ctrl.Result{}, err - } - - return ctrl.Result{}, nil -} - -// SetupWithManager sets up the controller with the Manager. -func (r *CredentialsRequestsReconciler) SetupWithManager(mgr ctrl.Manager) error { - b := ctrl.NewControllerManagedBy(mgr) - return r.buildController(k8s.NewCtrlBuilder(b)) -} - -func (r *CredentialsRequestsReconciler) buildController(bld k8s.Builder) error { - return bld. - For(&lokiv1.LokiStack{}). - Complete(r) -} diff --git a/operator/controllers/loki/credentialsrequests_controller_test.go b/operator/controllers/loki/credentialsrequests_controller_test.go deleted file mode 100644 index 3c91ee2275e97..0000000000000 --- a/operator/controllers/loki/credentialsrequests_controller_test.go +++ /dev/null @@ -1,164 +0,0 @@ -package controllers - -import ( - "context" - "testing" - - cloudcredentialsv1 "github.com/openshift/cloud-credential-operator/pkg/apis/cloudcredential/v1" - "github.com/stretchr/testify/require" - corev1 "k8s.io/api/core/v1" - apierrors "k8s.io/apimachinery/pkg/api/errors" - metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" - "k8s.io/apimachinery/pkg/runtime/schema" - "k8s.io/apimachinery/pkg/types" - ctrl "sigs.k8s.io/controller-runtime" - "sigs.k8s.io/controller-runtime/pkg/client" - - lokiv1 "github.com/grafana/loki/operator/apis/loki/v1" - "github.com/grafana/loki/operator/internal/external/k8s/k8sfakes" - "github.com/grafana/loki/operator/internal/manifests/storage" -) - -func TestCredentialsRequestController_RegistersCustomResource_WithDefaultPredicates(t *testing.T) { - b := &k8sfakes.FakeBuilder{} - k := &k8sfakes.FakeClient{} - c := &CredentialsRequestsReconciler{Client: k, Scheme: scheme} - - b.ForReturns(b) - b.OwnsReturns(b) - - err := c.buildController(b) - require.NoError(t, err) - - // Require only one For-Call for the custom resource - require.Equal(t, 1, b.ForCallCount()) - - // Require For-call with LokiStack resource - obj, _ := b.ForArgsForCall(0) - require.Equal(t, &lokiv1.LokiStack{}, obj) -} - -func TestCredentialsRequestController_DeleteCredentialsRequest_WhenLokiStackNotFound(t *testing.T) { - k := &k8sfakes.FakeClient{} - c := &CredentialsRequestsReconciler{Client: k, Scheme: scheme} - r := ctrl.Request{ - NamespacedName: types.NamespacedName{ - Name: "my-stack", - Namespace: "ns", - }, - } - - // Set managed auth environment - t.Setenv("ROLEARN", "a-role-arn") - - k.GetStub = func(_ context.Context, key types.NamespacedName, _ client.Object, _ ...client.GetOption) error { - if key.Name == r.Name && key.Namespace == r.Namespace { - return apierrors.NewNotFound(schema.GroupResource{}, "lokistack not found") - } - return nil - } - - res, err := c.Reconcile(context.Background(), r) - require.NoError(t, err) - require.Equal(t, ctrl.Result{}, res) - require.Equal(t, 1, k.DeleteCallCount()) -} - -func TestCredentialsRequestController_CreateCredentialsRequest_WhenLokiStackNotAnnotated(t *testing.T) { - k := &k8sfakes.FakeClient{} - c := &CredentialsRequestsReconciler{Client: k, Scheme: scheme} - r := ctrl.Request{ - NamespacedName: types.NamespacedName{ - Name: "my-stack", - Namespace: "ns", - }, - } - s := lokiv1.LokiStack{ - ObjectMeta: metav1.ObjectMeta{ - Name: "my-stack", - Namespace: "ns", - }, - Spec: lokiv1.LokiStackSpec{ - ManagementState: lokiv1.ManagementStateManaged, - }, - } - secret := &corev1.Secret{} - - // Set managed auth environment - t.Setenv("ROLEARN", "a-role-arn") - - k.GetStub = func(_ context.Context, key types.NamespacedName, out client.Object, _ ...client.GetOption) error { - switch out.(type) { - case *lokiv1.LokiStack: - if key.Name == r.Name && key.Namespace == r.Namespace { - k.SetClientObject(out, &s) - return nil - } - return apierrors.NewNotFound(schema.GroupResource{}, "lokistack not found") - case *corev1.Secret: - k.SetClientObject(out, secret) - return nil - } - return nil - } - - k.CreateStub = func(_ context.Context, o client.Object, _ ...client.CreateOption) error { - _, isCredReq := o.(*cloudcredentialsv1.CredentialsRequest) - if !isCredReq { - return apierrors.NewBadRequest("something went wrong creating a credentials request") - } - return nil - } - - k.UpdateStub = func(_ context.Context, o client.Object, _ ...client.UpdateOption) error { - stack, ok := o.(*lokiv1.LokiStack) - if !ok { - return apierrors.NewBadRequest("something went wrong creating a credentials request") - } - - _, hasSecretRef := stack.Annotations[storage.AnnotationCredentialsRequestsSecretRef] - if !hasSecretRef { - return apierrors.NewBadRequest("something went updating the lokistack annotations") - } - return nil - } - - res, err := c.Reconcile(context.Background(), r) - require.NoError(t, err) - require.Equal(t, ctrl.Result{}, res) - require.Equal(t, 1, k.CreateCallCount()) - require.Equal(t, 1, k.UpdateCallCount()) -} - -func TestCredentialsRequestController_SkipsUnmanaged(t *testing.T) { - k := &k8sfakes.FakeClient{} - c := &CredentialsRequestsReconciler{Client: k, Scheme: scheme} - r := ctrl.Request{ - NamespacedName: types.NamespacedName{ - Name: "my-stack", - Namespace: "ns", - }, - } - - s := lokiv1.LokiStack{ - ObjectMeta: metav1.ObjectMeta{ - Name: "my-stack", - Namespace: "ns", - }, - Spec: lokiv1.LokiStackSpec{ - ManagementState: lokiv1.ManagementStateUnmanaged, - }, - } - - k.GetStub = func(_ context.Context, key types.NamespacedName, out client.Object, _ ...client.GetOption) error { - if key.Name == s.Name && key.Namespace == s.Namespace { - k.SetClientObject(out, &s) - return nil - } - return apierrors.NewNotFound(schema.GroupResource{}, "something not found") - } - - res, err := c.Reconcile(context.Background(), r) - require.NoError(t, err) - require.Equal(t, ctrl.Result{}, res) -} diff --git a/operator/controllers/loki/internal/lokistack/credentialsrequest_discovery.go b/operator/controllers/loki/internal/lokistack/credentialsrequest_discovery.go deleted file mode 100644 index c911c1196eed4..0000000000000 --- a/operator/controllers/loki/internal/lokistack/credentialsrequest_discovery.go +++ /dev/null @@ -1,30 +0,0 @@ -package lokistack - -import ( - "context" - - "github.com/ViaQ/logerr/v2/kverrors" - "sigs.k8s.io/controller-runtime/pkg/client" - - "github.com/grafana/loki/operator/internal/external/k8s" - "github.com/grafana/loki/operator/internal/manifests/storage" -) - -// AnnotateForCredentialsRequest adds the `loki.grafana.com/credentials-request-secret-ref` annotation -// to the named Lokistack. If no LokiStack is found, then skip reconciliation. Or else return an error. -func AnnotateForCredentialsRequest(ctx context.Context, k k8s.Client, key client.ObjectKey, secretRef string) error { - stack, err := getLokiStack(ctx, k, key) - if stack == nil || err != nil { - return err - } - - if val, ok := stack.Annotations[storage.AnnotationCredentialsRequestsSecretRef]; ok && val == secretRef { - return nil - } - - if err := updateAnnotation(ctx, k, stack, storage.AnnotationCredentialsRequestsSecretRef, secretRef); err != nil { - return kverrors.Wrap(err, "failed to update lokistack `credentialsRequestSecretRef` annotation", "key", key) - } - - return nil -} diff --git a/operator/controllers/loki/internal/lokistack/credentialsrequest_discovery_test.go b/operator/controllers/loki/internal/lokistack/credentialsrequest_discovery_test.go deleted file mode 100644 index ef073ca853ba5..0000000000000 --- a/operator/controllers/loki/internal/lokistack/credentialsrequest_discovery_test.go +++ /dev/null @@ -1,98 +0,0 @@ -package lokistack - -import ( - "context" - "testing" - - "github.com/stretchr/testify/require" - apierrors "k8s.io/apimachinery/pkg/api/errors" - metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" - "k8s.io/apimachinery/pkg/types" - "sigs.k8s.io/controller-runtime/pkg/client" - - lokiv1 "github.com/grafana/loki/operator/apis/loki/v1" - "github.com/grafana/loki/operator/internal/external/k8s/k8sfakes" - "github.com/grafana/loki/operator/internal/manifests/storage" -) - -func TestAnnotateForCredentialsRequest_ReturnError_WhenLokiStackMissing(t *testing.T) { - k := &k8sfakes.FakeClient{} - annotationVal := "ns-my-stack-aws-creds" - stackKey := client.ObjectKey{Name: "my-stack", Namespace: "ns"} - - k.GetStub = func(_ context.Context, _ types.NamespacedName, out client.Object, _ ...client.GetOption) error { - return apierrors.NewBadRequest("failed to get lokistack") - } - - err := AnnotateForCredentialsRequest(context.Background(), k, stackKey, annotationVal) - require.Error(t, err) -} - -func TestAnnotateForCredentialsRequest_DoNothing_WhenAnnotationExists(t *testing.T) { - k := &k8sfakes.FakeClient{} - - annotationVal := "ns-my-stack-aws-creds" - s := &lokiv1.LokiStack{ - ObjectMeta: metav1.ObjectMeta{ - Name: "my-stack", - Namespace: "ns", - Annotations: map[string]string{ - storage.AnnotationCredentialsRequestsSecretRef: annotationVal, - }, - }, - } - stackKey := client.ObjectKeyFromObject(s) - - k.GetStub = func(_ context.Context, key types.NamespacedName, out client.Object, _ ...client.GetOption) error { - if key.Name == stackKey.Name && key.Namespace == stackKey.Namespace { - k.SetClientObject(out, s) - return nil - } - return nil - } - - err := AnnotateForCredentialsRequest(context.Background(), k, stackKey, annotationVal) - require.NoError(t, err) - require.Equal(t, 0, k.UpdateCallCount()) -} - -func TestAnnotateForCredentialsRequest_UpdateLokistack_WhenAnnotationMissing(t *testing.T) { - k := &k8sfakes.FakeClient{} - - annotationVal := "ns-my-stack-aws-creds" - s := &lokiv1.LokiStack{ - ObjectMeta: metav1.ObjectMeta{ - Name: "my-stack", - Namespace: "ns", - Annotations: map[string]string{}, - }, - } - stackKey := client.ObjectKeyFromObject(s) - - k.GetStub = func(_ context.Context, key types.NamespacedName, out client.Object, _ ...client.GetOption) error { - if key.Name == stackKey.Name && key.Namespace == stackKey.Namespace { - k.SetClientObject(out, s) - return nil - } - return nil - } - - k.UpdateStub = func(_ context.Context, o client.Object, _ ...client.UpdateOption) error { - stack, ok := o.(*lokiv1.LokiStack) - if !ok { - return apierrors.NewBadRequest("failed conversion to *lokiv1.LokiStack") - } - val, ok := stack.Annotations[storage.AnnotationCredentialsRequestsSecretRef] - if !ok { - return apierrors.NewBadRequest("missing annotation") - } - if val != annotationVal { - return apierrors.NewBadRequest("annotations does not match input") - } - return nil - } - - err := AnnotateForCredentialsRequest(context.Background(), k, stackKey, annotationVal) - require.NoError(t, err) - require.Equal(t, 1, k.UpdateCallCount()) -} diff --git a/operator/controllers/loki/lokistack_controller.go b/operator/controllers/loki/lokistack_controller.go index 40e7691bd1a2b..eb30a1a9bf555 100644 --- a/operator/controllers/loki/lokistack_controller.go +++ b/operator/controllers/loki/lokistack_controller.go @@ -3,7 +3,6 @@ package controllers import ( "context" "errors" - "strings" "time" "github.com/go-logr/logr" @@ -16,7 +15,6 @@ import ( corev1 "k8s.io/api/core/v1" networkingv1 "k8s.io/api/networking/v1" rbacv1 "k8s.io/api/rbac/v1" - apierrors "k8s.io/apimachinery/pkg/api/errors" "k8s.io/apimachinery/pkg/runtime" "k8s.io/apimachinery/pkg/types" ctrl "sigs.k8s.io/controller-runtime" @@ -31,6 +29,7 @@ import ( configv1 "github.com/grafana/loki/operator/apis/config/v1" lokiv1 "github.com/grafana/loki/operator/apis/loki/v1" "github.com/grafana/loki/operator/controllers/loki/internal/management/state" + "github.com/grafana/loki/operator/internal/config" "github.com/grafana/loki/operator/internal/external/k8s" "github.com/grafana/loki/operator/internal/handlers" manifestsocp "github.com/grafana/loki/operator/internal/manifests/openshift" @@ -111,6 +110,7 @@ type LokiStackReconciler struct { Log logr.Logger Scheme *runtime.Scheme FeatureGates configv1.FeatureGates + AuthConfig *config.ManagedAuthConfig } // +kubebuilder:rbac:groups=loki.grafana.com,resources=lokistacks,verbs=get;list;watch;create;update;patch;delete @@ -128,7 +128,7 @@ type LokiStackReconciler struct { // +kubebuilder:rbac:groups=policy,resources=poddisruptionbudgets,verbs=get;list;watch;create;update // +kubebuilder:rbac:groups=config.openshift.io,resources=dnses;apiservers;proxies,verbs=get;list;watch // +kubebuilder:rbac:groups=route.openshift.io,resources=routes,verbs=get;list;watch;create;update;delete -// +kubebuilder:rbac:groups=cloudcredential.openshift.io,resources=credentialsrequests,verbs=get;list;watch;create;delete +// +kubebuilder:rbac:groups=cloudcredential.openshift.io,resources=credentialsrequests,verbs=get;list;watch;create;update;delete // Reconcile is part of the main kubernetes reconciliation loop which aims to // move the current state of the cluster closer to the desired state. @@ -150,7 +150,7 @@ func (r *LokiStackReconciler) Reconcile(ctx context.Context, req ctrl.Request) ( } var degraded *status.DegradedError - err = r.updateResources(ctx, req) + credentialMode, err := r.updateResources(ctx, req) switch { case errors.As(err, °raded): // degraded errors are handled by status.Refresh below @@ -158,7 +158,7 @@ func (r *LokiStackReconciler) Reconcile(ctx context.Context, req ctrl.Request) ( return ctrl.Result{}, err } - err = status.Refresh(ctx, r.Client, req, time.Now(), degraded) + err = status.Refresh(ctx, r.Client, req, time.Now(), credentialMode, degraded) if err != nil { return ctrl.Result{}, err } @@ -172,18 +172,25 @@ func (r *LokiStackReconciler) Reconcile(ctx context.Context, req ctrl.Request) ( return ctrl.Result{}, nil } -func (r *LokiStackReconciler) updateResources(ctx context.Context, req ctrl.Request) error { +func (r *LokiStackReconciler) updateResources(ctx context.Context, req ctrl.Request) (lokiv1.CredentialMode, error) { if r.FeatureGates.BuiltInCertManagement.Enabled { if err := handlers.CreateOrRotateCertificates(ctx, r.Log, req, r.Client, r.Scheme, r.FeatureGates); err != nil { - return err + return "", err } } - if err := handlers.CreateOrUpdateLokiStack(ctx, r.Log, req, r.Client, r.Scheme, r.FeatureGates); err != nil { - return err + if r.FeatureGates.OpenShift.ManagedAuthEnv { + if err := handlers.CreateCredentialsRequest(ctx, r.Log, r.Scheme, r.AuthConfig, r.Client, req); err != nil { + return "", err + } + } + + credentialMode, err := handlers.CreateOrUpdateLokiStack(ctx, r.Log, req, r.Client, r.Scheme, r.FeatureGates) + if err != nil { + return "", err } - return nil + return credentialMode, nil } // SetupWithManager sets up the controller with the Manager. @@ -216,7 +223,7 @@ func (r *LokiStackReconciler) buildController(bld k8s.Builder) error { if r.FeatureGates.OpenShift.Enabled { bld = bld. Owns(&routev1.Route{}, updateOrDeleteOnlyPred). - Watches(&cloudcredentialv1.CredentialsRequest{}, r.enqueueForCredentialsRequest(), updateOrDeleteOnlyPred) + Owns(&cloudcredentialv1.CredentialsRequest{}, updateOrDeleteOnlyPred) if r.FeatureGates.OpenShift.ClusterTLSPolicy { bld = bld.Watches(&openshiftconfigv1.APIServer{}, r.enqueueAllLokiStacksHandler(), updateOrDeleteOnlyPred) @@ -358,34 +365,3 @@ func (r *LokiStackReconciler) enqueueForStorageCA() handler.EventHandler { return requests }) } - -func (r *LokiStackReconciler) enqueueForCredentialsRequest() handler.EventHandler { - return handler.EnqueueRequestsFromMapFunc(func(ctx context.Context, obj client.Object) []reconcile.Request { - a := obj.GetAnnotations() - owner, ok := a[manifestsocp.AnnotationCredentialsRequestOwner] - if !ok { - return nil - } - - var ( - ownerParts = strings.Split(owner, "/") - namespace = ownerParts[0] - name = ownerParts[1] - key = client.ObjectKey{Namespace: namespace, Name: name} - ) - - var stack lokiv1.LokiStack - if err := r.Client.Get(ctx, key, &stack); err != nil { - if !apierrors.IsNotFound(err) { - r.Log.Error(err, "failed retrieving CredentialsRequest owning Lokistack", "key", key) - } - return nil - } - - return []reconcile.Request{ - { - NamespacedName: key, - }, - } - }) -} diff --git a/operator/controllers/loki/lokistack_controller_test.go b/operator/controllers/loki/lokistack_controller_test.go index 515d829766aa1..6be22022c19db 100644 --- a/operator/controllers/loki/lokistack_controller_test.go +++ b/operator/controllers/loki/lokistack_controller_test.go @@ -161,7 +161,18 @@ func TestLokiStackController_RegisterOwnedResourcesForUpdateOrDeleteOnly(t *test { obj: &routev1.Route{}, index: 10, - ownCallsCount: 11, + ownCallsCount: 12, + featureGates: configv1.FeatureGates{ + OpenShift: configv1.OpenShiftFeatureGates{ + Enabled: true, + }, + }, + pred: updateOrDeleteOnlyPred, + }, + { + obj: &cloudcredentialv1.CredentialsRequest{}, + index: 11, + ownCallsCount: 12, featureGates: configv1.FeatureGates{ OpenShift: configv1.OpenShiftFeatureGates{ Enabled: true, @@ -203,20 +214,9 @@ func TestLokiStackController_RegisterWatchedResources(t *testing.T) { } table := []test{ { - src: &cloudcredentialv1.CredentialsRequest{}, + src: &openshiftconfigv1.APIServer{}, index: 3, watchesCallsCount: 4, - featureGates: configv1.FeatureGates{ - OpenShift: configv1.OpenShiftFeatureGates{ - Enabled: true, - }, - }, - pred: updateOrDeleteOnlyPred, - }, - { - src: &openshiftconfigv1.APIServer{}, - index: 4, - watchesCallsCount: 5, featureGates: configv1.FeatureGates{ OpenShift: configv1.OpenShiftFeatureGates{ Enabled: true, @@ -227,8 +227,8 @@ func TestLokiStackController_RegisterWatchedResources(t *testing.T) { }, { src: &openshiftconfigv1.Proxy{}, - index: 4, - watchesCallsCount: 5, + index: 3, + watchesCallsCount: 4, featureGates: configv1.FeatureGates{ OpenShift: configv1.OpenShiftFeatureGates{ Enabled: true, diff --git a/operator/docs/operator/api.md b/operator/docs/operator/api.md index 92f93dd970224..48fbe0c8a7e48 100644 --- a/operator/docs/operator/api.md +++ b/operator/docs/operator/api.md @@ -1100,6 +1100,40 @@ string +## CredentialMode { #loki-grafana-com-v1-CredentialMode } +(string alias) +

+(Appears on:LokiStackStorageStatus) +

+
+

CredentialMode represents the type of authentication used for accessing the object storage.

+
+ + + + + + + + + + + + + + +
ValueDescription

"managed"

CredentialModeManaged represents the usage of short-lived tokens retrieved from a credential source. +This mode is similar to CredentialModeToken,but instead of having a user-configured credential source, +it is configured by the environment, for example the Cloud Credential Operator in OpenShift. +This mode is only supported for certain object storage types in certain runtime environments.

+

"static"

CredentialModeStatic represents the usage of static, long-lived credentials stored in a Secret. +This is the default authentication mode and available for all supported object storage types.

+

"token"

CredentialModeToken represents the usage of short-lived tokens retrieved from a credential source. +In this mode the static configuration does not contain credentials needed for the object storage. +Instead, they are generated during runtime using a service, which allows for shorter-lived credentials and +much more granular control. This authentication mode is not supported for all object storage types.

+
+ ## HashRingSpec { #loki-grafana-com-v1-HashRingSpec }

(Appears on:LokiStackSpec) @@ -2152,6 +2186,20 @@ the Loki storage configuration.

to the LokiStack.

+ + +credentialMode
+ + +CredentialMode + + + + +(Optional) +

CredentialMode contains the authentication mode used for accessing the object storage.

+ + diff --git a/operator/docs/operator/feature-gates.md b/operator/docs/operator/feature-gates.md index 34fbdf4b69a4d..189b72e4ddb12 100644 --- a/operator/docs/operator/feature-gates.md +++ b/operator/docs/operator/feature-gates.md @@ -417,7 +417,8 @@ bool -

ManagedAuthEnv enabled when the operator installation is on OpenShift STS clusters.

+

ManagedAuthEnv is true when OpenShift-functions are enabled and the operator has detected +that it is running with some kind of “workload identity” (AWS STS, Azure WIF) enabled.

diff --git a/operator/internal/config/managed_auth.go b/operator/internal/config/managed_auth.go new file mode 100644 index 0000000000000..73598e7032f8f --- /dev/null +++ b/operator/internal/config/managed_auth.go @@ -0,0 +1,48 @@ +package config + +import "os" + +type AWSEnvironment struct { + RoleARN string +} + +type AzureEnvironment struct { + ClientID string + SubscriptionID string + TenantID string + Region string +} + +type ManagedAuthConfig struct { + AWS *AWSEnvironment + Azure *AzureEnvironment +} + +func discoverManagedAuthConfig() *ManagedAuthConfig { + // AWS + roleARN := os.Getenv("ROLEARN") + + // Azure + clientID := os.Getenv("CLIENTID") + tenantID := os.Getenv("TENANTID") + subscriptionID := os.Getenv("SUBSCRIPTIONID") + + switch { + case roleARN != "": + return &ManagedAuthConfig{ + AWS: &AWSEnvironment{ + RoleARN: roleARN, + }, + } + case clientID != "" && tenantID != "" && subscriptionID != "": + return &ManagedAuthConfig{ + Azure: &AzureEnvironment{ + ClientID: clientID, + SubscriptionID: subscriptionID, + TenantID: tenantID, + }, + } + } + + return nil +} diff --git a/operator/internal/config/options.go b/operator/internal/config/options.go index 7ed9abb526a7b..dc54404f22450 100644 --- a/operator/internal/config/options.go +++ b/operator/internal/config/options.go @@ -17,19 +17,24 @@ import ( // LoadConfig initializes the controller configuration, optionally overriding the defaults // from a provided configuration file. -func LoadConfig(scheme *runtime.Scheme, configFile string) (*configv1.ProjectConfig, ctrl.Options, error) { +func LoadConfig(scheme *runtime.Scheme, configFile string) (*configv1.ProjectConfig, *ManagedAuthConfig, ctrl.Options, error) { options := ctrl.Options{Scheme: scheme} if configFile == "" { - return &configv1.ProjectConfig{}, options, nil + return &configv1.ProjectConfig{}, nil, options, nil } ctrlCfg, err := loadConfigFile(scheme, configFile) if err != nil { - return nil, options, fmt.Errorf("failed to parse controller manager config file: %w", err) + return nil, nil, options, fmt.Errorf("failed to parse controller manager config file: %w", err) + } + + managedAuth := discoverManagedAuthConfig() + if ctrlCfg.Gates.OpenShift.Enabled && managedAuth != nil { + ctrlCfg.Gates.OpenShift.ManagedAuthEnv = true } options = mergeOptionsFromFile(options, ctrlCfg) - return ctrlCfg, options, nil + return ctrlCfg, managedAuth, options, nil } func mergeOptionsFromFile(o manager.Options, cfg *configv1.ProjectConfig) manager.Options { diff --git a/operator/internal/handlers/credentialsrequest_create.go b/operator/internal/handlers/credentialsrequest_create.go index 6074e10b2d5af..50e06375ffd8b 100644 --- a/operator/internal/handlers/credentialsrequest_create.go +++ b/operator/internal/handlers/credentialsrequest_create.go @@ -3,64 +3,102 @@ package handlers import ( "context" "errors" + "fmt" "github.com/ViaQ/logerr/v2/kverrors" + "github.com/go-logr/logr" corev1 "k8s.io/api/core/v1" apierrors "k8s.io/apimachinery/pkg/api/errors" + "k8s.io/apimachinery/pkg/runtime" + ctrl "sigs.k8s.io/controller-runtime" "sigs.k8s.io/controller-runtime/pkg/client" + ctrlutil "sigs.k8s.io/controller-runtime/pkg/controller/controllerutil" + lokiv1 "github.com/grafana/loki/operator/apis/loki/v1" + "github.com/grafana/loki/operator/internal/config" "github.com/grafana/loki/operator/internal/external/k8s" + "github.com/grafana/loki/operator/internal/manifests" "github.com/grafana/loki/operator/internal/manifests/openshift" "github.com/grafana/loki/operator/internal/manifests/storage" ) -var ( - errAzureNoSecretFound = errors.New("can not create CredentialsRequest: no azure secret found") - errAzureNoRegion = errors.New("can not create CredentialsRequest: missing secret field: region") -) +var errAzureNoRegion = errors.New("can not create CredentialsRequest: missing secret field: region") // CreateCredentialsRequest creates a new CredentialsRequest resource for a Lokistack // to request a cloud credentials Secret resource from the OpenShift cloud-credentials-operator. -func CreateCredentialsRequest(ctx context.Context, k k8s.Client, stack client.ObjectKey, secret *corev1.Secret) (string, error) { - managedAuthEnv := openshift.DiscoverManagedAuthEnv() - if managedAuthEnv == nil { - return "", nil +func CreateCredentialsRequest(ctx context.Context, log logr.Logger, scheme *runtime.Scheme, managedAuth *config.ManagedAuthConfig, k k8s.Client, req ctrl.Request) error { + ll := log.WithValues("lokistack", req.NamespacedName, "event", "createCredentialsRequest") + + var stack lokiv1.LokiStack + if err := k.Get(ctx, req.NamespacedName, &stack); err != nil { + if apierrors.IsNotFound(err) { + // maybe the user deleted it before we could react? Either way this isn't an issue + ll.Error(err, "could not find the requested LokiStack", "name", req.String()) + return nil + } + return kverrors.Wrap(err, "failed to lookup LokiStack", "name", req.String()) } - if managedAuthEnv.Azure != nil && managedAuthEnv.Azure.Region == "" { + if managedAuth.Azure != nil && managedAuth.Azure.Region == "" { // Managed environment for Azure does not provide Region, but we need this for the CredentialsRequest. // This looks like an oversight when creating the UI in OpenShift, but for now we need to pull this data // from somewhere else -> the Azure Storage Secret - if secret == nil { - return "", errAzureNoSecretFound + storageSecretName := client.ObjectKey{ + Namespace: stack.Namespace, + Name: stack.Spec.Storage.Secret.Name, + } + storageSecret := &corev1.Secret{} + if err := k.Get(ctx, storageSecretName, storageSecret); err != nil { + if apierrors.IsNotFound(err) { + // Skip this error here as it will be picked up by the LokiStack handler instead + ll.Error(err, "could not find secret for LokiStack", "name", req.String()) + return nil + } + return err } - region := secret.Data[storage.KeyAzureRegion] + region := storageSecret.Data[storage.KeyAzureRegion] if len(region) == 0 { - return "", errAzureNoRegion + return errAzureNoRegion } - managedAuthEnv.Azure.Region = string(region) + managedAuth.Azure.Region = string(region) } opts := openshift.Options{ BuildOpts: openshift.BuildOptions{ LokiStackName: stack.Name, LokiStackNamespace: stack.Namespace, + RulerName: manifests.RulerName(stack.Name), }, - ManagedAuthEnv: managedAuthEnv, + ManagedAuth: managedAuth, } credReq, err := openshift.BuildCredentialsRequest(opts) if err != nil { - return "", err + return err } - if err := k.Create(ctx, credReq); err != nil { - if !apierrors.IsAlreadyExists(err) { - return "", kverrors.Wrap(err, "failed to create credentialsrequest", "key", client.ObjectKeyFromObject(credReq)) - } + err = ctrl.SetControllerReference(&stack, credReq, scheme) + if err != nil { + return kverrors.Wrap(err, "failed to set controller owner reference to resource") + } + + desired := credReq.DeepCopyObject().(client.Object) + mutateFn := manifests.MutateFuncFor(credReq, desired, map[string]string{}) + + op, err := ctrl.CreateOrUpdate(ctx, k, credReq, mutateFn) + if err != nil { + return kverrors.Wrap(err, "failed to configure CredentialRequest") + } + + msg := fmt.Sprintf("Resource has been %s", op) + switch op { + case ctrlutil.OperationResultNone: + ll.V(1).Info(msg) + default: + ll.Info(msg) } - return credReq.Spec.SecretRef.Name, nil + return nil } diff --git a/operator/internal/handlers/credentialsrequest_create_test.go b/operator/internal/handlers/credentialsrequest_create_test.go index df903eaec662f..626302a113274 100644 --- a/operator/internal/handlers/credentialsrequest_create_test.go +++ b/operator/internal/handlers/credentialsrequest_create_test.go @@ -8,51 +8,108 @@ import ( "github.com/stretchr/testify/require" corev1 "k8s.io/api/core/v1" "k8s.io/apimachinery/pkg/api/errors" + metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" "k8s.io/apimachinery/pkg/runtime/schema" + "k8s.io/apimachinery/pkg/types" + ctrl "sigs.k8s.io/controller-runtime" "sigs.k8s.io/controller-runtime/pkg/client" + lokiv1 "github.com/grafana/loki/operator/apis/loki/v1" + "github.com/grafana/loki/operator/internal/config" "github.com/grafana/loki/operator/internal/external/k8s/k8sfakes" ) -func TestCreateCredentialsRequest_DoNothing_WhenManagedAuthEnvMissing(t *testing.T) { +func credentialsRequestFakeClient(cr *cloudcredentialv1.CredentialsRequest, lokistack *lokiv1.LokiStack, secret *corev1.Secret) *k8sfakes.FakeClient { k := &k8sfakes.FakeClient{} - key := client.ObjectKey{Name: "my-stack", Namespace: "ns"} + k.GetStub = func(_ context.Context, name types.NamespacedName, object client.Object, _ ...client.GetOption) error { + switch object.(type) { + case *cloudcredentialv1.CredentialsRequest: + if cr == nil { + return errors.NewNotFound(schema.GroupResource{}, name.Name) + } + k.SetClientObject(object, cr) + case *lokiv1.LokiStack: + if lokistack == nil { + return errors.NewNotFound(schema.GroupResource{}, name.Name) + } + k.SetClientObject(object, lokistack) + case *corev1.Secret: + if secret == nil { + return errors.NewNotFound(schema.GroupResource{}, name.Name) + } + k.SetClientObject(object, secret) + } + return nil + } - secretRef, err := CreateCredentialsRequest(context.Background(), k, key, nil) - require.NoError(t, err) - require.Empty(t, secretRef) + return k } func TestCreateCredentialsRequest_CreateNewResource(t *testing.T) { - k := &k8sfakes.FakeClient{} - key := client.ObjectKey{Name: "my-stack", Namespace: "ns"} + wantServiceAccountNames := []string{ + "my-stack", + "my-stack-ruler", + } + + lokistack := &lokiv1.LokiStack{ + ObjectMeta: metav1.ObjectMeta{ + Name: "my-stack", + Namespace: "ns", + }, + } - t.Setenv("ROLEARN", "a-role-arn") + k := credentialsRequestFakeClient(nil, lokistack, nil) + req := ctrl.Request{ + NamespacedName: client.ObjectKey{Name: "my-stack", Namespace: "ns"}, + } + + managedAuth := &config.ManagedAuthConfig{ + AWS: &config.AWSEnvironment{ + RoleARN: "a-role-arn", + }, + } - secretRef, err := CreateCredentialsRequest(context.Background(), k, key, nil) + err := CreateCredentialsRequest(context.Background(), logger, scheme, managedAuth, k, req) require.NoError(t, err) - require.NotEmpty(t, secretRef) require.Equal(t, 1, k.CreateCallCount()) + + _, obj, _ := k.CreateArgsForCall(0) + credReq, ok := obj.(*cloudcredentialv1.CredentialsRequest) + require.True(t, ok) + + require.Equal(t, wantServiceAccountNames, credReq.Spec.ServiceAccountNames) } func TestCreateCredentialsRequest_CreateNewResourceAzure(t *testing.T) { wantRegion := "test-region" - k := &k8sfakes.FakeClient{} - key := client.ObjectKey{Name: "my-stack", Namespace: "ns"} + lokistack := &lokiv1.LokiStack{ + ObjectMeta: metav1.ObjectMeta{ + Name: "my-stack", + Namespace: "ns", + }, + } secret := &corev1.Secret{ Data: map[string][]byte{ "region": []byte(wantRegion), }, } - t.Setenv("CLIENTID", "test-client-id") - t.Setenv("TENANTID", "test-tenant-id") - t.Setenv("SUBSCRIPTIONID", "test-subscription-id") + k := credentialsRequestFakeClient(nil, lokistack, secret) + req := ctrl.Request{ + NamespacedName: client.ObjectKey{Name: "my-stack", Namespace: "ns"}, + } - secretRef, err := CreateCredentialsRequest(context.Background(), k, key, secret) + managedAuth := &config.ManagedAuthConfig{ + Azure: &config.AzureEnvironment{ + ClientID: "test-client-id", + SubscriptionID: "test-tenant-id", + TenantID: "test-subscription-id", + }, + } + + err := CreateCredentialsRequest(context.Background(), logger, scheme, managedAuth, k, req) require.NoError(t, err) - require.NotEmpty(t, secretRef) require.Equal(t, 1, k.CreateCallCount()) _, obj, _ := k.CreateArgsForCall(0) @@ -66,17 +123,20 @@ func TestCreateCredentialsRequest_CreateNewResourceAzure(t *testing.T) { } func TestCreateCredentialsRequest_CreateNewResourceAzure_Errors(t *testing.T) { - k := &k8sfakes.FakeClient{} - key := client.ObjectKey{Name: "my-stack", Namespace: "ns"} + lokistack := &lokiv1.LokiStack{ + ObjectMeta: metav1.ObjectMeta{ + Name: "my-stack", + Namespace: "ns", + }, + } + req := ctrl.Request{ + NamespacedName: client.ObjectKey{Name: "my-stack", Namespace: "ns"}, + } tt := []struct { secret *corev1.Secret wantError string }{ - { - secret: nil, - wantError: errAzureNoSecretFound.Error(), - }, { secret: &corev1.Secret{}, wantError: errAzureNoRegion.Error(), @@ -86,29 +146,52 @@ func TestCreateCredentialsRequest_CreateNewResourceAzure_Errors(t *testing.T) { for _, tc := range tt { tc := tc t.Run(tc.wantError, func(t *testing.T) { - // Not parallel (environment variables) - t.Setenv("CLIENTID", "test-client-id") - t.Setenv("TENANTID", "test-tenant-id") - t.Setenv("SUBSCRIPTIONID", "test-subscription-id") - - _, err := CreateCredentialsRequest(context.Background(), k, key, tc.secret) + t.Parallel() + + managedAuth := &config.ManagedAuthConfig{ + Azure: &config.AzureEnvironment{ + ClientID: "test-client-id", + SubscriptionID: "test-tenant-id", + TenantID: "test-subscription-id", + }, + } + k := credentialsRequestFakeClient(nil, lokistack, tc.secret) + + err := CreateCredentialsRequest(context.Background(), logger, scheme, managedAuth, k, req) require.EqualError(t, err, tc.wantError) }) } } func TestCreateCredentialsRequest_DoNothing_WhenCredentialsRequestExist(t *testing.T) { - k := &k8sfakes.FakeClient{} - key := client.ObjectKey{Name: "my-stack", Namespace: "ns"} + req := ctrl.Request{ + NamespacedName: client.ObjectKey{Name: "my-stack", Namespace: "ns"}, + } - t.Setenv("ROLEARN", "a-role-arn") + managedAuth := &config.ManagedAuthConfig{ + AWS: &config.AWSEnvironment{ + RoleARN: "a-role-arn", + }, + } - k.CreateStub = func(_ context.Context, _ client.Object, _ ...client.CreateOption) error { - return errors.NewAlreadyExists(schema.GroupResource{}, "credentialsrequest exists") + cr := &cloudcredentialv1.CredentialsRequest{ + ObjectMeta: metav1.ObjectMeta{ + Name: "my-stack", + Namespace: "ns", + }, + } + lokistack := &lokiv1.LokiStack{ + ObjectMeta: metav1.ObjectMeta{ + Name: "my-stack", + Namespace: "ns", + }, } - secretRef, err := CreateCredentialsRequest(context.Background(), k, key, nil) + k := credentialsRequestFakeClient(cr, lokistack, nil) + + err := CreateCredentialsRequest(context.Background(), logger, scheme, managedAuth, k, req) require.NoError(t, err) - require.NotEmpty(t, secretRef) - require.Equal(t, 1, k.CreateCallCount()) + require.Equal(t, 2, k.GetCallCount()) + require.Equal(t, 0, k.CreateCallCount()) + require.Equal(t, 1, k.UpdateCallCount()) } diff --git a/operator/internal/handlers/credentialsrequest_delete.go b/operator/internal/handlers/credentialsrequest_delete.go deleted file mode 100644 index edf05fcb205d0..0000000000000 --- a/operator/internal/handlers/credentialsrequest_delete.go +++ /dev/null @@ -1,43 +0,0 @@ -package handlers - -import ( - "context" - - "github.com/ViaQ/logerr/v2/kverrors" - "k8s.io/apimachinery/pkg/api/errors" - "sigs.k8s.io/controller-runtime/pkg/client" - - "github.com/grafana/loki/operator/internal/external/k8s" - "github.com/grafana/loki/operator/internal/manifests/openshift" -) - -// DeleteCredentialsRequest deletes a LokiStack's accompanying CredentialsRequest resource -// to trigger the OpenShift cloud-credentials-operator to wipe out any credentials related -// Secret resource on the LokiStack namespace. -func DeleteCredentialsRequest(ctx context.Context, k k8s.Client, stack client.ObjectKey) error { - managedAuthEnv := openshift.DiscoverManagedAuthEnv() - if managedAuthEnv == nil { - return nil - } - - opts := openshift.Options{ - BuildOpts: openshift.BuildOptions{ - LokiStackName: stack.Name, - LokiStackNamespace: stack.Namespace, - }, - ManagedAuthEnv: managedAuthEnv, - } - - credReq, err := openshift.BuildCredentialsRequest(opts) - if err != nil { - return kverrors.Wrap(err, "failed to build credentialsrequest", "key", stack) - } - - if err := k.Delete(ctx, credReq); err != nil { - if !errors.IsNotFound(err) { - return kverrors.Wrap(err, "failed to delete credentialsrequest", "key", client.ObjectKeyFromObject(credReq)) - } - } - - return nil -} diff --git a/operator/internal/handlers/credentialsrequest_delete_test.go b/operator/internal/handlers/credentialsrequest_delete_test.go deleted file mode 100644 index 57f1c005ee706..0000000000000 --- a/operator/internal/handlers/credentialsrequest_delete_test.go +++ /dev/null @@ -1,47 +0,0 @@ -package handlers - -import ( - "context" - "testing" - - "github.com/stretchr/testify/require" - "k8s.io/apimachinery/pkg/api/errors" - "k8s.io/apimachinery/pkg/runtime/schema" - "sigs.k8s.io/controller-runtime/pkg/client" - - "github.com/grafana/loki/operator/internal/external/k8s/k8sfakes" -) - -func TestDeleteCredentialsRequest_DoNothing_WhenManagedAuthEnvMissing(t *testing.T) { - k := &k8sfakes.FakeClient{} - key := client.ObjectKey{Name: "my-stack", Namespace: "ns"} - - err := DeleteCredentialsRequest(context.Background(), k, key) - require.NoError(t, err) -} - -func TestDeleteCredentialsRequest_DeleteExistingResource(t *testing.T) { - k := &k8sfakes.FakeClient{} - key := client.ObjectKey{Name: "my-stack", Namespace: "ns"} - - t.Setenv("ROLEARN", "a-role-arn") - - err := DeleteCredentialsRequest(context.Background(), k, key) - require.NoError(t, err) - require.Equal(t, 1, k.DeleteCallCount()) -} - -func TestDeleteCredentialsRequest_DoNothing_WhenCredentialsRequestNotExists(t *testing.T) { - k := &k8sfakes.FakeClient{} - key := client.ObjectKey{Name: "my-stack", Namespace: "ns"} - - t.Setenv("ROLEARN", "a-role-arn") - - k.DeleteStub = func(_ context.Context, _ client.Object, _ ...client.DeleteOption) error { - return errors.NewNotFound(schema.GroupResource{}, "credentials request not found") - } - - err := DeleteCredentialsRequest(context.Background(), k, key) - require.NoError(t, err) - require.Equal(t, 1, k.DeleteCallCount()) -} diff --git a/operator/internal/handlers/internal/storage/secrets.go b/operator/internal/handlers/internal/storage/secrets.go index 21cd58b7c3c25..99bafb911ec26 100644 --- a/operator/internal/handlers/internal/storage/secrets.go +++ b/operator/internal/handlers/internal/storage/secrets.go @@ -59,15 +59,7 @@ func getSecrets(ctx context.Context, k k8s.Client, stack *lokiv1.LokiStack, fg c } if fg.OpenShift.ManagedAuthEnv { - secretName, ok := stack.Annotations[storage.AnnotationCredentialsRequestsSecretRef] - if !ok { - return nil, nil, &status.DegradedError{ - Message: "Missing OpenShift cloud credentials request", - Reason: lokiv1.ReasonMissingCredentialsRequest, - Requeue: true, - } - } - + secretName := storage.ManagedCredentialsSecretName(stack.Name) managedAuthCredsKey := client.ObjectKey{Name: secretName, Namespace: stack.Namespace} if err := k.Get(ctx, managedAuthCredsKey, &managedAuthSecret); err != nil { if apierrors.IsNotFound(err) { @@ -100,7 +92,7 @@ func extractSecrets(secretType lokiv1.ObjectStorageSecretType, objStore, managed SharedStore: secretType, } - if fg.OpenShift.ManagedAuthEnabled() { + if fg.OpenShift.ManagedAuthEnv { var managedAuthHash string managedAuthHash, err = hashSecretData(managedAuth) if err != nil { @@ -190,11 +182,18 @@ func extractAzureConfigSecret(s *corev1.Secret, fg configv1.FeatureGates) (*stor // Extract and validate optional fields endpointSuffix := s.Data[storage.KeyAzureStorageEndpointSuffix] audience := s.Data[storage.KeyAzureAudience] + region := s.Data[storage.KeyAzureRegion] if !workloadIdentity && len(audience) > 0 { return nil, fmt.Errorf("%w: %s", errSecretFieldNotAllowed, storage.KeyAzureAudience) } + if fg.OpenShift.ManagedAuthEnv { + if len(region) == 0 { + return nil, fmt.Errorf("%w: %s", errSecretMissingField, storage.KeyAzureRegion) + } + } + return &storage.AzureStorageConfig{ Env: string(env), Container: string(container), @@ -210,12 +209,7 @@ func validateAzureCredentials(s *corev1.Secret, fg configv1.FeatureGates) (workl tenantID := s.Data[storage.KeyAzureStorageTenantID] subscriptionID := s.Data[storage.KeyAzureStorageSubscriptionID] - if fg.OpenShift.ManagedAuthEnabled() { - region := s.Data[storage.KeyAzureRegion] - if len(region) == 0 { - return false, fmt.Errorf("%w: %s", errSecretMissingField, storage.KeyAzureRegion) - } - + if fg.OpenShift.ManagedAuthEnv { if len(accountKey) > 0 || len(clientID) > 0 || len(tenantID) > 0 || len(subscriptionID) > 0 { return false, errAzureManagedIdentityNoOverride } @@ -282,8 +276,8 @@ func extractGCSConfigSecret(s *corev1.Secret) (*storage.GCSStorageConfig, error) return nil, fmt.Errorf("%w: %s", errSecretMissingField, storage.KeyGCPWorkloadIdentityProviderAudience) } - if credentialsFile.CredentialsSource.File != storage.GCPDefautCredentialsFile { - return nil, fmt.Errorf("%w: %s", errGCPWrongCredentialSourceFile, storage.GCPDefautCredentialsFile) + if credentialsFile.CredentialsSource.File != storage.ServiceAccountTokenFilePath { + return nil, fmt.Errorf("%w: %s", errGCPWrongCredentialSourceFile, storage.ServiceAccountTokenFilePath) } } @@ -330,7 +324,7 @@ func extractS3ConfigSecret(s *corev1.Secret, fg configv1.FeatureGates) (*storage ) switch { - case fg.OpenShift.ManagedAuthEnabled(): + case fg.OpenShift.ManagedAuthEnv: cfg.STS = true cfg.Audience = string(audience) // Do not allow users overriding the role arn provided on Loki Operator installation diff --git a/operator/internal/handlers/internal/storage/secrets_test.go b/operator/internal/handlers/internal/storage/secrets_test.go index cc18360232315..1363cd4a660a6 100644 --- a/operator/internal/handlers/internal/storage/secrets_test.go +++ b/operator/internal/handlers/internal/storage/secrets_test.go @@ -71,11 +71,12 @@ func TestUnknownType(t *testing.T) { func TestAzureExtract(t *testing.T) { type test struct { - name string - secret *corev1.Secret - managedSecret *corev1.Secret - featureGates configv1.FeatureGates - wantError string + name string + secret *corev1.Secret + managedSecret *corev1.Secret + featureGates configv1.FeatureGates + wantError string + wantCredentialMode lokiv1.CredentialMode } table := []test{ { @@ -224,6 +225,7 @@ func TestAzureExtract(t *testing.T) { "account_key": []byte("secret"), }, }, + wantCredentialMode: lokiv1.CredentialModeStatic, }, { name: "mandatory for workload-identity set", @@ -239,6 +241,7 @@ func TestAzureExtract(t *testing.T) { "region": []byte("test-region"), }, }, + wantCredentialMode: lokiv1.CredentialModeToken, }, { name: "mandatory for managed workload-identity set", @@ -252,7 +255,14 @@ func TestAzureExtract(t *testing.T) { }, }, managedSecret: &corev1.Secret{ - Data: map[string][]byte{}, + ObjectMeta: metav1.ObjectMeta{ + Name: "managed-secret", + }, + Data: map[string][]byte{ + "azure_client_id": []byte("test-client-id"), + "azure_tenant_id": []byte("test-tenant-id"), + "azure_subscription_id": []byte("test-subscription-id"), + }, }, featureGates: configv1.FeatureGates{ OpenShift: configv1.OpenShiftFeatureGates{ @@ -260,6 +270,7 @@ func TestAzureExtract(t *testing.T) { ManagedAuthEnv: true, }, }, + wantCredentialMode: lokiv1.CredentialModeManaged, }, { name: "all set including optional", @@ -273,6 +284,7 @@ func TestAzureExtract(t *testing.T) { "endpoint_suffix": []byte("suffix"), }, }, + wantCredentialMode: lokiv1.CredentialModeStatic, }, } for _, tst := range table { @@ -285,7 +297,8 @@ func TestAzureExtract(t *testing.T) { require.NoError(t, err) require.NotEmpty(t, opts.SecretName) require.NotEmpty(t, opts.SecretSHA1) - require.Equal(t, opts.SharedStore, lokiv1.ObjectStorageSecretAzure) + require.Equal(t, lokiv1.ObjectStorageSecretAzure, opts.SharedStore) + require.Equal(t, tst.wantCredentialMode, opts.CredentialMode()) } else { require.EqualError(t, err, tst.wantError) } @@ -295,9 +308,10 @@ func TestAzureExtract(t *testing.T) { func TestGCSExtract(t *testing.T) { type test struct { - name string - secret *corev1.Secret - wantError string + name string + secret *corev1.Secret + wantError string + wantCredentialMode lokiv1.CredentialMode } table := []test{ { @@ -332,10 +346,10 @@ func TestGCSExtract(t *testing.T) { Data: map[string][]byte{ "bucketname": []byte("here"), "audience": []byte("test"), - "key.json": []byte("{\"type\": \"external_account\", \"credential_source\": {\"file\": \"/custom/path/to/secret/gcp/serviceaccount/token\"}}"), + "key.json": []byte("{\"type\": \"external_account\", \"credential_source\": {\"file\": \"/custom/path/to/secret/storage/serviceaccount/token\"}}"), }, }, - wantError: "credential source in secret needs to point to token file: /var/run/secrets/gcp/serviceaccount/token", + wantError: "credential source in secret needs to point to token file: /var/run/secrets/storage/serviceaccount/token", }, { name: "all set", @@ -346,6 +360,7 @@ func TestGCSExtract(t *testing.T) { "key.json": []byte("{\"type\": \"service_account\"}"), }, }, + wantCredentialMode: lokiv1.CredentialModeStatic, }, { name: "mandatory for workload-identity set", @@ -354,9 +369,10 @@ func TestGCSExtract(t *testing.T) { Data: map[string][]byte{ "bucketname": []byte("here"), "audience": []byte("test"), - "key.json": []byte("{\"type\": \"external_account\", \"credential_source\": {\"file\": \"/var/run/secrets/gcp/serviceaccount/token\"}}"), + "key.json": []byte("{\"type\": \"external_account\", \"credential_source\": {\"file\": \"/var/run/secrets/storage/serviceaccount/token\"}}"), }, }, + wantCredentialMode: lokiv1.CredentialModeToken, }, } for _, tst := range table { @@ -364,9 +380,10 @@ func TestGCSExtract(t *testing.T) { t.Run(tst.name, func(t *testing.T) { t.Parallel() - _, err := extractSecrets(lokiv1.ObjectStorageSecretGCS, tst.secret, nil, configv1.FeatureGates{}) + opts, err := extractSecrets(lokiv1.ObjectStorageSecretGCS, tst.secret, nil, configv1.FeatureGates{}) if tst.wantError == "" { require.NoError(t, err) + require.Equal(t, tst.wantCredentialMode, opts.CredentialMode()) } else { require.EqualError(t, err, tst.wantError) } @@ -376,9 +393,10 @@ func TestGCSExtract(t *testing.T) { func TestS3Extract(t *testing.T) { type test struct { - name string - secret *corev1.Secret - wantError string + name string + secret *corev1.Secret + wantError string + wantCredentialMode lokiv1.CredentialMode } table := []test{ { @@ -456,6 +474,7 @@ func TestS3Extract(t *testing.T) { "sse_kms_key_id": []byte("kms-key-id"), }, }, + wantCredentialMode: lokiv1.CredentialModeStatic, }, { name: "all set with SSE-KMS with encryption context", @@ -471,6 +490,7 @@ func TestS3Extract(t *testing.T) { "sse_kms_encryption_context": []byte("kms-encryption-ctx"), }, }, + wantCredentialMode: lokiv1.CredentialModeStatic, }, { name: "all set with SSE-S3", @@ -484,6 +504,7 @@ func TestS3Extract(t *testing.T) { "sse_type": []byte("SSE-S3"), }, }, + wantCredentialMode: lokiv1.CredentialModeStatic, }, { name: "all set without SSE", @@ -496,6 +517,7 @@ func TestS3Extract(t *testing.T) { "access_key_secret": []byte("secret"), }, }, + wantCredentialMode: lokiv1.CredentialModeStatic, }, { name: "STS missing region", @@ -518,6 +540,7 @@ func TestS3Extract(t *testing.T) { "region": []byte("here"), }, }, + wantCredentialMode: lokiv1.CredentialModeToken, }, { name: "STS all set", @@ -530,6 +553,7 @@ func TestS3Extract(t *testing.T) { "audience": []byte("audience"), }, }, + wantCredentialMode: lokiv1.CredentialModeToken, }, } for _, tst := range table { @@ -542,7 +566,8 @@ func TestS3Extract(t *testing.T) { require.NoError(t, err) require.NotEmpty(t, opts.SecretName) require.NotEmpty(t, opts.SecretSHA1) - require.Equal(t, opts.SharedStore, lokiv1.ObjectStorageSecretS3) + require.Equal(t, lokiv1.ObjectStorageSecretS3, opts.SharedStore) + require.Equal(t, tst.wantCredentialMode, opts.CredentialMode()) } else { require.EqualError(t, err, tst.wantError) } @@ -616,10 +641,11 @@ func TestS3Extract_WithOpenShiftManagedAuth(t *testing.T) { require.NoError(t, err) require.NotEmpty(t, opts.SecretName) require.NotEmpty(t, opts.SecretSHA1) - require.Equal(t, opts.SharedStore, lokiv1.ObjectStorageSecretS3) + require.Equal(t, lokiv1.ObjectStorageSecretS3, opts.SharedStore) require.True(t, opts.S3.STS) - require.Equal(t, opts.OpenShift.CloudCredentials.SecretName, tst.managedAuthSecret.Name) + require.Equal(t, tst.managedAuthSecret.Name, opts.OpenShift.CloudCredentials.SecretName) require.NotEmpty(t, opts.OpenShift.CloudCredentials.SHA1) + require.Equal(t, lokiv1.CredentialModeManaged, opts.CredentialMode()) } else { require.EqualError(t, err, tst.wantError) } @@ -767,7 +793,8 @@ func TestSwiftExtract(t *testing.T) { require.NoError(t, err) require.NotEmpty(t, opts.SecretName) require.NotEmpty(t, opts.SecretSHA1) - require.Equal(t, opts.SharedStore, lokiv1.ObjectStorageSecretSwift) + require.Equal(t, lokiv1.ObjectStorageSecretSwift, opts.SharedStore) + require.Equal(t, lokiv1.CredentialModeStatic, opts.CredentialMode()) } else { require.EqualError(t, err, tst.wantError) } @@ -840,7 +867,8 @@ func TestAlibabaCloudExtract(t *testing.T) { require.NoError(t, err) require.NotEmpty(t, opts.SecretName) require.NotEmpty(t, opts.SecretSHA1) - require.Equal(t, opts.SharedStore, lokiv1.ObjectStorageSecretAlibabaCloud) + require.Equal(t, lokiv1.ObjectStorageSecretAlibabaCloud, opts.SharedStore) + require.Equal(t, lokiv1.CredentialModeStatic, opts.CredentialMode()) } else { require.EqualError(t, err, tst.wantError) } diff --git a/operator/internal/handlers/internal/storage/storage_test.go b/operator/internal/handlers/internal/storage/storage_test.go index 9e041bf99a23a..45f5b0f2865ba 100644 --- a/operator/internal/handlers/internal/storage/storage_test.go +++ b/operator/internal/handlers/internal/storage/storage_test.go @@ -17,7 +17,6 @@ import ( configv1 "github.com/grafana/loki/operator/apis/config/v1" lokiv1 "github.com/grafana/loki/operator/apis/loki/v1" "github.com/grafana/loki/operator/internal/external/k8s/k8sfakes" - "github.com/grafana/loki/operator/internal/manifests/storage" "github.com/grafana/loki/operator/internal/status" ) @@ -135,77 +134,6 @@ func TestBuildOptions_WhenMissingSecret_SetDegraded(t *testing.T) { require.Equal(t, degradedErr, err) } -func TestBuildOptions_WhenMissingCloudCredentialsRequest_SetDegraded(t *testing.T) { - sw := &k8sfakes.FakeStatusWriter{} - k := &k8sfakes.FakeClient{} - r := ctrl.Request{ - NamespacedName: types.NamespacedName{ - Name: "my-stack", - Namespace: "some-ns", - }, - } - - fg := configv1.FeatureGates{ - OpenShift: configv1.OpenShiftFeatureGates{ - ManagedAuthEnv: true, - }, - } - - degradedErr := &status.DegradedError{ - Message: "Missing OpenShift cloud credentials request", - Reason: lokiv1.ReasonMissingCredentialsRequest, - Requeue: true, - } - - stack := &lokiv1.LokiStack{ - TypeMeta: metav1.TypeMeta{ - Kind: "LokiStack", - }, - ObjectMeta: metav1.ObjectMeta{ - Name: "my-stack", - Namespace: "some-ns", - UID: "b23f9a38-9672-499f-8c29-15ede74d3ece", - Annotations: map[string]string{}, - }, - Spec: lokiv1.LokiStackSpec{ - Size: lokiv1.SizeOneXExtraSmall, - Storage: lokiv1.ObjectStorageSpec{ - Schemas: []lokiv1.ObjectStorageSchema{ - { - Version: lokiv1.ObjectStorageSchemaV11, - EffectiveDate: "2020-10-11", - }, - }, - Secret: lokiv1.ObjectStorageSecretSpec{ - Name: defaultManagedAuthSecret.Name, - Type: lokiv1.ObjectStorageSecretS3, - }, - }, - }, - } - - k.GetStub = func(_ context.Context, name types.NamespacedName, object client.Object, _ ...client.GetOption) error { - _, isLokiStack := object.(*lokiv1.LokiStack) - if r.Name == name.Name && r.Namespace == name.Namespace && isLokiStack { - k.SetClientObject(object, stack) - return nil - } - if name.Name == defaultManagedAuthSecret.Name { - k.SetClientObject(object, &defaultManagedAuthSecret) - return nil - } - return apierrors.NewNotFound(schema.GroupResource{}, "something is not found") - } - - k.StatusStub = func() client.StatusWriter { return sw } - - _, err := BuildOptions(context.TODO(), k, stack, fg) - - // make sure error is returned - require.Error(t, err) - require.Equal(t, degradedErr, err) -} - func TestBuildOptions_WhenMissingCloudCredentialsSecret_SetDegraded(t *testing.T) { sw := &k8sfakes.FakeStatusWriter{} k := &k8sfakes.FakeClient{} @@ -236,9 +164,6 @@ func TestBuildOptions_WhenMissingCloudCredentialsSecret_SetDegraded(t *testing.T Name: "my-stack", Namespace: "some-ns", UID: "b23f9a38-9672-499f-8c29-15ede74d3ece", - Annotations: map[string]string{ - storage.AnnotationCredentialsRequestsSecretRef: "my-stack-aws-creds", - }, }, Spec: lokiv1.LokiStackSpec{ Size: lokiv1.SizeOneXExtraSmall, diff --git a/operator/internal/handlers/lokistack_create_or_update.go b/operator/internal/handlers/lokistack_create_or_update.go index 2f78f75d02c5b..47e7a309bf8b9 100644 --- a/operator/internal/handlers/lokistack_create_or_update.go +++ b/operator/internal/handlers/lokistack_create_or_update.go @@ -36,7 +36,7 @@ func CreateOrUpdateLokiStack( k k8s.Client, s *runtime.Scheme, fg configv1.FeatureGates, -) error { +) (lokiv1.CredentialMode, error) { ll := log.WithValues("lokistack", req.NamespacedName, "event", "createOrUpdate") var stack lokiv1.LokiStack @@ -44,9 +44,9 @@ func CreateOrUpdateLokiStack( if apierrors.IsNotFound(err) { // maybe the user deleted it before we could react? Either way this isn't an issue ll.Error(err, "could not find the requested loki stack", "name", req.NamespacedName) - return nil + return "", nil } - return kverrors.Wrap(err, "failed to lookup lokistack", "name", req.NamespacedName) + return "", kverrors.Wrap(err, "failed to lookup lokistack", "name", req.NamespacedName) } img := os.Getenv(manifests.EnvRelatedImageLoki) @@ -61,21 +61,21 @@ func CreateOrUpdateLokiStack( objStore, err := storage.BuildOptions(ctx, k, &stack, fg) if err != nil { - return err + return "", err } baseDomain, tenants, err := gateway.BuildOptions(ctx, ll, k, &stack, fg) if err != nil { - return err + return "", err } if err = rules.Cleanup(ctx, ll, k, &stack); err != nil { - return err + return "", err } alertingRules, recordingRules, ruler, ocpOptions, err := rules.BuildOptions(ctx, ll, k, &stack) if err != nil { - return err + return "", err } certRotationRequiredAt := "" @@ -86,7 +86,7 @@ func CreateOrUpdateLokiStack( timeoutConfig, err := manifests.NewTimeoutConfig(stack.Spec.Limits) if err != nil { ll.Error(err, "failed to parse query timeout") - return &status.DegradedError{ + return "", &status.DegradedError{ Message: fmt.Sprintf("Error parsing query timeout: %s", err), Reason: lokiv1.ReasonQueryTimeoutInvalid, Requeue: false, @@ -116,13 +116,13 @@ func CreateOrUpdateLokiStack( if optErr := manifests.ApplyDefaultSettings(&opts); optErr != nil { ll.Error(optErr, "failed to conform options to build settings") - return optErr + return "", optErr } if fg.LokiStackGateway { if optErr := manifests.ApplyGatewayDefaultOptions(&opts); optErr != nil { ll.Error(optErr, "failed to apply defaults options to gateway settings") - return optErr + return "", optErr } } @@ -140,13 +140,13 @@ func CreateOrUpdateLokiStack( if optErr := manifests.ApplyTLSSettings(&opts, tlsProfile); optErr != nil { ll.Error(optErr, "failed to conform options to tls profile settings") - return optErr + return "", optErr } objects, err := manifests.BuildAll(opts) if err != nil { ll.Error(err, "failed to build manifests") - return err + return "", err } ll.Info("manifests built", "count", len(objects)) @@ -158,7 +158,7 @@ func CreateOrUpdateLokiStack( // a user possibly being unable to read logs. if err := status.SetStorageSchemaStatus(ctx, k, req, objStore.Schemas); err != nil { ll.Error(err, "failed to set storage schema status") - return err + return "", err } var errCount int32 @@ -182,7 +182,7 @@ func CreateOrUpdateLokiStack( depAnnotations, err := dependentAnnotations(ctx, k, obj) if err != nil { l.Error(err, "failed to set dependent annotations") - return err + return "", err } desired := obj.DeepCopyObject().(client.Object) @@ -205,7 +205,7 @@ func CreateOrUpdateLokiStack( } if errCount > 0 { - return kverrors.New("failed to configure lokistack resources", "name", req.NamespacedName) + return "", kverrors.New("failed to configure lokistack resources", "name", req.NamespacedName) } // 1x.demo is used only for development, so the metrics will not @@ -214,7 +214,7 @@ func CreateOrUpdateLokiStack( metrics.Collect(&opts.Stack, opts.Name) } - return nil + return objStore.CredentialMode(), nil } func dependentAnnotations(ctx context.Context, k k8s.Client, obj client.Object) (map[string]string, error) { diff --git a/operator/internal/handlers/lokistack_create_or_update_test.go b/operator/internal/handlers/lokistack_create_or_update_test.go index 4ba9a9affc369..bef5ffc9efb70 100644 --- a/operator/internal/handlers/lokistack_create_or_update_test.go +++ b/operator/internal/handlers/lokistack_create_or_update_test.go @@ -108,7 +108,7 @@ func TestCreateOrUpdateLokiStack_WhenGetReturnsNotFound_DoesNotError(t *testing. k.StatusStub = func() client.StatusWriter { return sw } - err := CreateOrUpdateLokiStack(context.TODO(), logger, r, k, scheme, featureGates) + _, err := CreateOrUpdateLokiStack(context.TODO(), logger, r, k, scheme, featureGates) require.NoError(t, err) // make sure create was NOT called because the Get failed @@ -132,7 +132,7 @@ func TestCreateOrUpdateLokiStack_WhenGetReturnsAnErrorOtherThanNotFound_ReturnsT k.StatusStub = func() client.StatusWriter { return sw } - err := CreateOrUpdateLokiStack(context.TODO(), logger, r, k, scheme, featureGates) + _, err := CreateOrUpdateLokiStack(context.TODO(), logger, r, k, scheme, featureGates) require.Equal(t, badRequestErr, errors.Unwrap(err)) @@ -219,7 +219,7 @@ func TestCreateOrUpdateLokiStack_SetsNamespaceOnAllObjects(t *testing.T) { k.StatusStub = func() client.StatusWriter { return sw } - err := CreateOrUpdateLokiStack(context.TODO(), logger, r, k, scheme, featureGates) + _, err := CreateOrUpdateLokiStack(context.TODO(), logger, r, k, scheme, featureGates) require.NoError(t, err) // make sure create was called @@ -327,7 +327,7 @@ func TestCreateOrUpdateLokiStack_SetsOwnerRefOnAllObjects(t *testing.T) { k.StatusStub = func() client.StatusWriter { return sw } - err := CreateOrUpdateLokiStack(context.TODO(), logger, r, k, scheme, featureGates) + _, err := CreateOrUpdateLokiStack(context.TODO(), logger, r, k, scheme, featureGates) require.NoError(t, err) // make sure create was called @@ -387,7 +387,7 @@ func TestCreateOrUpdateLokiStack_WhenSetControllerRefInvalid_ContinueWithOtherOb k.StatusStub = func() client.StatusWriter { return sw } - err := CreateOrUpdateLokiStack(context.TODO(), logger, r, k, scheme, featureGates) + _, err := CreateOrUpdateLokiStack(context.TODO(), logger, r, k, scheme, featureGates) // make sure error is returned to re-trigger reconciliation require.Error(t, err) @@ -490,7 +490,7 @@ func TestCreateOrUpdateLokiStack_WhenGetReturnsNoError_UpdateObjects(t *testing. k.StatusStub = func() client.StatusWriter { return sw } - err := CreateOrUpdateLokiStack(context.TODO(), logger, r, k, scheme, featureGates) + _, err := CreateOrUpdateLokiStack(context.TODO(), logger, r, k, scheme, featureGates) require.NoError(t, err) // make sure create not called @@ -556,7 +556,7 @@ func TestCreateOrUpdateLokiStack_WhenCreateReturnsError_ContinueWithOtherObjects k.StatusStub = func() client.StatusWriter { return sw } - err := CreateOrUpdateLokiStack(context.TODO(), logger, r, k, scheme, featureGates) + _, err := CreateOrUpdateLokiStack(context.TODO(), logger, r, k, scheme, featureGates) // make sure error is returned to re-trigger reconciliation require.Error(t, err) @@ -663,7 +663,7 @@ func TestCreateOrUpdateLokiStack_WhenUpdateReturnsError_ContinueWithOtherObjects k.StatusStub = func() client.StatusWriter { return sw } - err := CreateOrUpdateLokiStack(context.TODO(), logger, r, k, scheme, featureGates) + _, err := CreateOrUpdateLokiStack(context.TODO(), logger, r, k, scheme, featureGates) // make sure error is returned to re-trigger reconciliation require.Error(t, err) @@ -734,7 +734,7 @@ func TestCreateOrUpdateLokiStack_WhenInvalidQueryTimeout_SetDegraded(t *testing. k.StatusStub = func() client.StatusWriter { return sw } - err := CreateOrUpdateLokiStack(context.TODO(), logger, r, k, scheme, featureGates) + _, err := CreateOrUpdateLokiStack(context.TODO(), logger, r, k, scheme, featureGates) // make sure error is returned require.Error(t, err) diff --git a/operator/internal/manifests/mutate.go b/operator/internal/manifests/mutate.go index 27421750bf2cc..63308bb9ceb62 100644 --- a/operator/internal/manifests/mutate.go +++ b/operator/internal/manifests/mutate.go @@ -6,6 +6,7 @@ import ( "github.com/ViaQ/logerr/v2/kverrors" "github.com/imdario/mergo" routev1 "github.com/openshift/api/route/v1" + cloudcredentialv1 "github.com/openshift/cloud-credential-operator/pkg/apis/cloudcredential/v1" monitoringv1 "github.com/prometheus-operator/prometheus-operator/pkg/apis/monitoring/v1" appsv1 "k8s.io/api/apps/v1" corev1 "k8s.io/api/core/v1" @@ -123,6 +124,11 @@ func MutateFuncFor(existing, desired client.Object, depAnnotations map[string]st wantRt := desired.(*routev1.Route) mutateRoute(rt, wantRt) + case *cloudcredentialv1.CredentialsRequest: + cr := existing.(*cloudcredentialv1.CredentialsRequest) + wantCr := desired.(*cloudcredentialv1.CredentialsRequest) + mutateCredentialRequest(cr, wantCr) + case *monitoringv1.PrometheusRule: pr := existing.(*monitoringv1.PrometheusRule) wantPr := desired.(*monitoringv1.PrometheusRule) @@ -213,6 +219,10 @@ func mutateRoute(existing, desired *routev1.Route) { existing.Spec = desired.Spec } +func mutateCredentialRequest(existing, desired *cloudcredentialv1.CredentialsRequest) { + existing.Spec = desired.Spec +} + func mutatePrometheusRule(existing, desired *monitoringv1.PrometheusRule) { existing.Annotations = desired.Annotations existing.Labels = desired.Labels diff --git a/operator/internal/manifests/openshift/credentialsrequest.go b/operator/internal/manifests/openshift/credentialsrequest.go index 2962b61d0d1ef..0e97dd97c2b19 100644 --- a/operator/internal/manifests/openshift/credentialsrequest.go +++ b/operator/internal/manifests/openshift/credentialsrequest.go @@ -1,10 +1,6 @@ package openshift import ( - "fmt" - "os" - "path" - "github.com/ViaQ/logerr/v2/kverrors" cloudcredentialv1 "github.com/openshift/cloud-credential-operator/pkg/apis/cloudcredential/v1" corev1 "k8s.io/api/core/v1" @@ -12,32 +8,26 @@ import ( "k8s.io/apimachinery/pkg/runtime" "sigs.k8s.io/controller-runtime/pkg/client" + "github.com/grafana/loki/operator/internal/config" "github.com/grafana/loki/operator/internal/manifests/storage" ) -const ( - ccoNamespace = "openshift-cloud-credential-operator" -) - func BuildCredentialsRequest(opts Options) (*cloudcredentialv1.CredentialsRequest, error) { stack := client.ObjectKey{Name: opts.BuildOpts.LokiStackName, Namespace: opts.BuildOpts.LokiStackNamespace} - providerSpec, secretName, err := encodeProviderSpec(opts.BuildOpts.LokiStackName, opts.ManagedAuthEnv) + providerSpec, err := encodeProviderSpec(opts.ManagedAuth) if err != nil { return nil, kverrors.Wrap(err, "failed encoding credentialsrequest provider spec") } return &cloudcredentialv1.CredentialsRequest{ ObjectMeta: metav1.ObjectMeta{ - Name: fmt.Sprintf("%s-%s", stack.Namespace, secretName), - Namespace: ccoNamespace, - Annotations: map[string]string{ - AnnotationCredentialsRequestOwner: stack.String(), - }, + Name: stack.Name, + Namespace: stack.Namespace, }, Spec: cloudcredentialv1.CredentialsRequestSpec{ SecretRef: corev1.ObjectReference{ - Name: secretName, + Name: storage.ManagedCredentialsSecretName(stack.Name), Namespace: stack.Namespace, }, ProviderSpec: providerSpec, @@ -45,16 +35,13 @@ func BuildCredentialsRequest(opts Options) (*cloudcredentialv1.CredentialsReques stack.Name, rulerServiceAccountName(opts), }, - CloudTokenPath: path.Join(storage.AWSTokenVolumeDirectory, "token"), + CloudTokenPath: storage.ServiceAccountTokenFilePath, }, }, nil } -func encodeProviderSpec(stackName string, env *ManagedAuthEnv) (*runtime.RawExtension, string, error) { - var ( - spec runtime.Object - secretName string - ) +func encodeProviderSpec(env *config.ManagedAuthConfig) (*runtime.RawExtension, error) { + var spec runtime.Object switch { case env.AWS != nil: @@ -73,7 +60,6 @@ func encodeProviderSpec(stackName string, env *ManagedAuthEnv) (*runtime.RawExte }, STSIAMRoleARN: env.AWS.RoleARN, } - secretName = fmt.Sprintf("%s-aws-creds", stackName) case env.Azure != nil: azure := env.Azure @@ -101,38 +87,8 @@ func encodeProviderSpec(stackName string, env *ManagedAuthEnv) (*runtime.RawExte AzureSubscriptionID: azure.SubscriptionID, AzureTenantID: azure.TenantID, } - secretName = fmt.Sprintf("%s-azure-creds", stackName) } encodedSpec, err := cloudcredentialv1.Codec.EncodeProviderSpec(spec.DeepCopyObject()) - return encodedSpec, secretName, err -} - -func DiscoverManagedAuthEnv() *ManagedAuthEnv { - // AWS - roleARN := os.Getenv("ROLEARN") - - // Azure - clientID := os.Getenv("CLIENTID") - tenantID := os.Getenv("TENANTID") - subscriptionID := os.Getenv("SUBSCRIPTIONID") - - switch { - case roleARN != "": - return &ManagedAuthEnv{ - AWS: &AWSSTSEnv{ - RoleARN: roleARN, - }, - } - case clientID != "" && tenantID != "" && subscriptionID != "": - return &ManagedAuthEnv{ - Azure: &AzureWIFEnvironment{ - ClientID: clientID, - SubscriptionID: subscriptionID, - TenantID: tenantID, - }, - } - } - - return nil + return encodedSpec, err } diff --git a/operator/internal/manifests/openshift/credentialsrequest_test.go b/operator/internal/manifests/openshift/credentialsrequest_test.go index 21b193c8c7d7e..36c6e2331f7e5 100644 --- a/operator/internal/manifests/openshift/credentialsrequest_test.go +++ b/operator/internal/manifests/openshift/credentialsrequest_test.go @@ -1,40 +1,22 @@ package openshift import ( - "strings" "testing" "github.com/stretchr/testify/require" + "github.com/grafana/loki/operator/internal/config" "github.com/grafana/loki/operator/internal/manifests/storage" ) -func TestBuildCredentialsRequest_HasOwnerAnnotation(t *testing.T) { - opts := Options{ - BuildOpts: BuildOptions{ - LokiStackName: "a-stack", - LokiStackNamespace: "ns", - }, - ManagedAuthEnv: &ManagedAuthEnv{ - AWS: &AWSSTSEnv{ - RoleARN: "role-arn", - }, - }, - } - - credReq, err := BuildCredentialsRequest(opts) - require.NoError(t, err) - require.Contains(t, credReq.Annotations, AnnotationCredentialsRequestOwner) -} - func TestBuildCredentialsRequest_HasSecretRef_MatchingLokiStackNamespace(t *testing.T) { opts := Options{ BuildOpts: BuildOptions{ LokiStackName: "a-stack", LokiStackNamespace: "ns", }, - ManagedAuthEnv: &ManagedAuthEnv{ - AWS: &AWSSTSEnv{ + ManagedAuth: &config.ManagedAuthConfig{ + AWS: &config.AWSEnvironment{ RoleARN: "role-arn", }, }, @@ -51,8 +33,8 @@ func TestBuildCredentialsRequest_HasServiceAccountNames_ContainsAllLokiStackServ LokiStackName: "a-stack", LokiStackNamespace: "ns", }, - ManagedAuthEnv: &ManagedAuthEnv{ - AWS: &AWSSTSEnv{ + ManagedAuth: &config.ManagedAuthConfig{ + AWS: &config.AWSEnvironment{ RoleARN: "role-arn", }, }, @@ -70,8 +52,8 @@ func TestBuildCredentialsRequest_CloudTokenPath_MatchinOpenShiftSADirectory(t *t LokiStackName: "a-stack", LokiStackNamespace: "ns", }, - ManagedAuthEnv: &ManagedAuthEnv{ - AWS: &AWSSTSEnv{ + ManagedAuth: &config.ManagedAuthConfig{ + AWS: &config.AWSEnvironment{ RoleARN: "role-arn", }, }, @@ -79,7 +61,7 @@ func TestBuildCredentialsRequest_CloudTokenPath_MatchinOpenShiftSADirectory(t *t credReq, err := BuildCredentialsRequest(opts) require.NoError(t, err) - require.True(t, strings.HasPrefix(credReq.Spec.CloudTokenPath, storage.AWSTokenVolumeDirectory)) + require.Equal(t, storage.ServiceAccountTokenFilePath, credReq.Spec.CloudTokenPath) } func TestBuildCredentialsRequest_FollowsNamingConventions(t *testing.T) { @@ -96,14 +78,14 @@ func TestBuildCredentialsRequest_FollowsNamingConventions(t *testing.T) { LokiStackName: "a-stack", LokiStackNamespace: "ns", }, - ManagedAuthEnv: &ManagedAuthEnv{ - AWS: &AWSSTSEnv{ + ManagedAuth: &config.ManagedAuthConfig{ + AWS: &config.AWSEnvironment{ RoleARN: "role-arn", }, }, }, - wantName: "ns-a-stack-aws-creds", - wantSecretName: "a-stack-aws-creds", + wantName: "a-stack", + wantSecretName: "a-stack-managed-credentials", }, } for _, test := range tests { diff --git a/operator/internal/manifests/openshift/options.go b/operator/internal/manifests/openshift/options.go index 9bc2e4faae36e..572db7fe64453 100644 --- a/operator/internal/manifests/openshift/options.go +++ b/operator/internal/manifests/openshift/options.go @@ -6,6 +6,7 @@ import ( "time" lokiv1 "github.com/grafana/loki/operator/apis/loki/v1" + "github.com/grafana/loki/operator/internal/config" ) // Options is the set of internal template options for rendering @@ -14,7 +15,7 @@ type Options struct { BuildOpts BuildOptions Authentication []AuthenticationSpec Authorization AuthorizationSpec - ManagedAuthEnv *ManagedAuthEnv + ManagedAuth *config.ManagedAuthConfig } // AuthenticationSpec describes the authentication specification @@ -55,22 +56,6 @@ type TenantData struct { CookieSecret string } -type AWSSTSEnv struct { - RoleARN string -} - -type AzureWIFEnvironment struct { - ClientID string - SubscriptionID string - TenantID string - Region string -} - -type ManagedAuthEnv struct { - AWS *AWSSTSEnv - Azure *AzureWIFEnvironment -} - // NewOptions returns an openshift options struct. func NewOptions( stackName, stackNamespace string, diff --git a/operator/internal/manifests/openshift/var.go b/operator/internal/manifests/openshift/var.go index 84928c48d7e28..5e3ac6300e3eb 100644 --- a/operator/internal/manifests/openshift/var.go +++ b/operator/internal/manifests/openshift/var.go @@ -48,8 +48,6 @@ var ( MonitoringSVCUserWorkload = "alertmanager-user-workload" MonitoringUserWorkloadNS = "openshift-user-workload-monitoring" - - AnnotationCredentialsRequestOwner = "loki.grafana.com/credentialsrequest-owner" ) func authorizerRbacName(componentName string) string { diff --git a/operator/internal/manifests/storage/configure.go b/operator/internal/manifests/storage/configure.go index 49958ebec7b9c..ede098425323d 100644 --- a/operator/internal/manifests/storage/configure.go +++ b/operator/internal/manifests/storage/configure.go @@ -13,6 +13,18 @@ import ( lokiv1 "github.com/grafana/loki/operator/apis/loki/v1" ) +var ( + managedAuthConfigVolumeMount = corev1.VolumeMount{ + Name: managedAuthConfigVolumeName, + MountPath: managedAuthConfigDirectory, + } + + saTokenVolumeMount = corev1.VolumeMount{ + Name: saTokenVolumeName, + MountPath: saTokenVolumeMountPath, + } +) + // ConfigureDeployment appends additional pod volumes and container env vars, args, volume mounts // based on the object storage type. Currently supported amendments: // - All: Ensure object storage secret mounted and auth projected as env vars. @@ -127,11 +139,11 @@ func ensureObjectStoreCredentials(p *corev1.PodSpec, opts Options) corev1.PodSpe if managedAuthEnabled(opts) { container.Env = append(container.Env, managedAuthCredentials(opts)...) volumes = append(volumes, saTokenVolume(opts)) - container.VolumeMounts = append(container.VolumeMounts, saTokenVolumeMount(opts)) + container.VolumeMounts = append(container.VolumeMounts, saTokenVolumeMount) - if opts.OpenShift.ManagedAuthEnabled() { - volumes = append(volumes, managedAuthVolume(opts)) - container.VolumeMounts = append(container.VolumeMounts, managedAuthVolumeMount(opts)) + if opts.OpenShift.ManagedAuthEnabled() && opts.S3 != nil && opts.S3.STS { + volumes = append(volumes, managedAuthConfigVolume(opts)) + container.VolumeMounts = append(container.VolumeMounts, managedAuthConfigVolumeMount) } } else { container.Env = append(container.Env, staticAuthCredentials(opts)...) @@ -183,13 +195,13 @@ func managedAuthCredentials(opts Options) []corev1.EnvVar { case lokiv1.ObjectStorageSecretS3: if opts.OpenShift.ManagedAuthEnabled() { return []corev1.EnvVar{ - envVarFromValue(EnvAWSCredentialsFile, path.Join(managedAuthSecretDirectory, KeyAWSCredentialsFilename)), + envVarFromValue(EnvAWSCredentialsFile, path.Join(managedAuthConfigDirectory, KeyAWSCredentialsFilename)), envVarFromValue(EnvAWSSdkLoadConfig, "true"), } } else { return []corev1.EnvVar{ envVarFromSecret(EnvAWSRoleArn, opts.SecretName, KeyAWSRoleArn), - envVarFromValue(EnvAWSWebIdentityTokenFile, path.Join(AWSTokenVolumeDirectory, "token")), + envVarFromValue(EnvAWSWebIdentityTokenFile, ServiceAccountTokenFilePath), } } case lokiv1.ObjectStorageSecretAzure: @@ -199,7 +211,7 @@ func managedAuthCredentials(opts Options) []corev1.EnvVar { envVarFromSecret(EnvAzureClientID, opts.OpenShift.CloudCredentials.SecretName, azureManagedCredentialKeyClientID), envVarFromSecret(EnvAzureTenantID, opts.OpenShift.CloudCredentials.SecretName, azureManagedCredentialKeyTenantID), envVarFromSecret(EnvAzureSubscriptionID, opts.OpenShift.CloudCredentials.SecretName, azureManagedCredentialKeySubscriptionID), - envVarFromValue(EnvAzureFederatedTokenFile, path.Join(azureTokenVolumeDirectory, "token")), + envVarFromValue(EnvAzureFederatedTokenFile, ServiceAccountTokenFilePath), } } @@ -208,7 +220,7 @@ func managedAuthCredentials(opts Options) []corev1.EnvVar { envVarFromSecret(EnvAzureClientID, opts.SecretName, KeyAzureStorageClientID), envVarFromSecret(EnvAzureTenantID, opts.SecretName, KeyAzureStorageTenantID), envVarFromSecret(EnvAzureSubscriptionID, opts.SecretName, KeyAzureStorageSubscriptionID), - envVarFromValue(EnvAzureFederatedTokenFile, path.Join(azureTokenVolumeDirectory, "token")), + envVarFromValue(EnvAzureFederatedTokenFile, ServiceAccountTokenFilePath), } case lokiv1.ObjectStorageSecretGCS: return []corev1.EnvVar{ @@ -301,22 +313,6 @@ func managedAuthEnabled(opts Options) bool { } } -func saTokenVolumeMount(opts Options) corev1.VolumeMount { - var tokenPath string - switch opts.SharedStore { - case lokiv1.ObjectStorageSecretS3: - tokenPath = AWSTokenVolumeDirectory - case lokiv1.ObjectStorageSecretAzure: - tokenPath = azureTokenVolumeDirectory - case lokiv1.ObjectStorageSecretGCS: - tokenPath = gcpTokenVolumeDirectory - } - return corev1.VolumeMount{ - Name: saTokenVolumeName, - MountPath: tokenPath, - } -} - func saTokenVolume(opts Options) corev1.Volume { var audience string storeType := opts.SharedStore @@ -352,16 +348,9 @@ func saTokenVolume(opts Options) corev1.Volume { } } -func managedAuthVolumeMount(opts Options) corev1.VolumeMount { - return corev1.VolumeMount{ - Name: opts.OpenShift.CloudCredentials.SecretName, - MountPath: managedAuthSecretDirectory, - } -} - -func managedAuthVolume(opts Options) corev1.Volume { +func managedAuthConfigVolume(opts Options) corev1.Volume { return corev1.Volume{ - Name: opts.OpenShift.CloudCredentials.SecretName, + Name: managedAuthConfigVolumeName, VolumeSource: corev1.VolumeSource{ Secret: &corev1.SecretVolumeSource{ SecretName: opts.OpenShift.CloudCredentials.SecretName, diff --git a/operator/internal/manifests/storage/configure_test.go b/operator/internal/manifests/storage/configure_test.go index f17a9af6c3524..2cd7b079a4b4a 100644 --- a/operator/internal/manifests/storage/configure_test.go +++ b/operator/internal/manifests/storage/configure_test.go @@ -206,7 +206,7 @@ func TestConfigureDeploymentForStorageType(t *testing.T) { { Name: saTokenVolumeName, ReadOnly: false, - MountPath: "/var/run/secrets/azure/serviceaccount", + MountPath: saTokenVolumeMountPath, }, }, Env: []corev1.EnvVar{ @@ -256,7 +256,7 @@ func TestConfigureDeploymentForStorageType(t *testing.T) { }, { Name: EnvAzureFederatedTokenFile, - Value: "/var/run/secrets/azure/serviceaccount/token", + Value: "/var/run/secrets/storage/serviceaccount/token", }, }, }, @@ -331,7 +331,7 @@ func TestConfigureDeploymentForStorageType(t *testing.T) { { Name: saTokenVolumeName, ReadOnly: false, - MountPath: "/var/run/secrets/azure/serviceaccount", + MountPath: saTokenVolumeMountPath, }, }, Env: []corev1.EnvVar{ @@ -381,7 +381,7 @@ func TestConfigureDeploymentForStorageType(t *testing.T) { }, { Name: EnvAzureFederatedTokenFile, - Value: "/var/run/secrets/azure/serviceaccount/token", + Value: "/var/run/secrets/storage/serviceaccount/token", }, }, }, @@ -462,11 +462,7 @@ func TestConfigureDeploymentForStorageType(t *testing.T) { { Name: saTokenVolumeName, ReadOnly: false, - MountPath: "/var/run/secrets/azure/serviceaccount", - }, - { - Name: "cloud-credentials", - MountPath: managedAuthSecretDirectory, + MountPath: saTokenVolumeMountPath, }, }, Env: []corev1.EnvVar{ @@ -516,7 +512,7 @@ func TestConfigureDeploymentForStorageType(t *testing.T) { }, { Name: EnvAzureFederatedTokenFile, - Value: "/var/run/secrets/azure/serviceaccount/token", + Value: "/var/run/secrets/storage/serviceaccount/token", }, }, }, @@ -546,14 +542,6 @@ func TestConfigureDeploymentForStorageType(t *testing.T) { }, }, }, - { - Name: "cloud-credentials", - VolumeSource: corev1.VolumeSource{ - Secret: &corev1.SecretVolumeSource{ - SecretName: "cloud-credentials", - }, - }, - }, }, }, }, @@ -655,7 +643,7 @@ func TestConfigureDeploymentForStorageType(t *testing.T) { { Name: saTokenVolumeName, ReadOnly: false, - MountPath: "/var/run/secrets/gcp/serviceaccount", + MountPath: saTokenVolumeMountPath, }, }, Env: []corev1.EnvVar{ @@ -810,7 +798,7 @@ func TestConfigureDeploymentForStorageType(t *testing.T) { { Name: saTokenVolumeName, ReadOnly: false, - MountPath: "/var/run/secrets/aws/serviceaccount", + MountPath: saTokenVolumeMountPath, }, }, Env: []corev1.EnvVar{ @@ -827,7 +815,7 @@ func TestConfigureDeploymentForStorageType(t *testing.T) { }, { Name: "AWS_WEB_IDENTITY_TOKEN_FILE", - Value: "/var/run/secrets/aws/serviceaccount/token", + Value: "/var/run/secrets/storage/serviceaccount/token", }, }, }, @@ -908,13 +896,9 @@ func TestConfigureDeploymentForStorageType(t *testing.T) { { Name: saTokenVolumeName, ReadOnly: false, - MountPath: "/var/run/secrets/aws/serviceaccount", - }, - { - Name: "cloud-credentials", - ReadOnly: false, - MountPath: "/etc/storage/managed-auth", + MountPath: saTokenVolumeMountPath, }, + managedAuthConfigVolumeMount, }, Env: []corev1.EnvVar{ { @@ -954,7 +938,7 @@ func TestConfigureDeploymentForStorageType(t *testing.T) { }, }, { - Name: "cloud-credentials", + Name: managedAuthConfigVolumeName, VolumeSource: corev1.VolumeSource{ Secret: &corev1.SecretVolumeSource{ SecretName: "cloud-credentials", @@ -1340,7 +1324,7 @@ func TestConfigureStatefulSetForStorageType(t *testing.T) { { Name: saTokenVolumeName, ReadOnly: false, - MountPath: "/var/run/secrets/azure/serviceaccount", + MountPath: saTokenVolumeMountPath, }, }, Env: []corev1.EnvVar{ @@ -1390,7 +1374,7 @@ func TestConfigureStatefulSetForStorageType(t *testing.T) { }, { Name: EnvAzureFederatedTokenFile, - Value: "/var/run/secrets/azure/serviceaccount/token", + Value: "/var/run/secrets/storage/serviceaccount/token", }, }, }, @@ -1465,7 +1449,7 @@ func TestConfigureStatefulSetForStorageType(t *testing.T) { { Name: saTokenVolumeName, ReadOnly: false, - MountPath: "/var/run/secrets/azure/serviceaccount", + MountPath: saTokenVolumeMountPath, }, }, Env: []corev1.EnvVar{ @@ -1515,7 +1499,7 @@ func TestConfigureStatefulSetForStorageType(t *testing.T) { }, { Name: EnvAzureFederatedTokenFile, - Value: "/var/run/secrets/azure/serviceaccount/token", + Value: "/var/run/secrets/storage/serviceaccount/token", }, }, }, @@ -1596,11 +1580,7 @@ func TestConfigureStatefulSetForStorageType(t *testing.T) { { Name: saTokenVolumeName, ReadOnly: false, - MountPath: "/var/run/secrets/azure/serviceaccount", - }, - { - Name: "cloud-credentials", - MountPath: managedAuthSecretDirectory, + MountPath: saTokenVolumeMountPath, }, }, Env: []corev1.EnvVar{ @@ -1650,7 +1630,7 @@ func TestConfigureStatefulSetForStorageType(t *testing.T) { }, { Name: EnvAzureFederatedTokenFile, - Value: "/var/run/secrets/azure/serviceaccount/token", + Value: "/var/run/secrets/storage/serviceaccount/token", }, }, }, @@ -1680,14 +1660,6 @@ func TestConfigureStatefulSetForStorageType(t *testing.T) { }, }, }, - { - Name: "cloud-credentials", - VolumeSource: corev1.VolumeSource{ - Secret: &corev1.SecretVolumeSource{ - SecretName: "cloud-credentials", - }, - }, - }, }, }, }, @@ -1789,7 +1761,7 @@ func TestConfigureStatefulSetForStorageType(t *testing.T) { { Name: saTokenVolumeName, ReadOnly: false, - MountPath: "/var/run/secrets/gcp/serviceaccount", + MountPath: saTokenVolumeMountPath, }, }, Env: []corev1.EnvVar{ @@ -1950,13 +1922,9 @@ func TestConfigureStatefulSetForStorageType(t *testing.T) { { Name: saTokenVolumeName, ReadOnly: false, - MountPath: "/var/run/secrets/aws/serviceaccount", - }, - { - Name: "cloud-credentials", - ReadOnly: false, - MountPath: "/etc/storage/managed-auth", + MountPath: saTokenVolumeMountPath, }, + managedAuthConfigVolumeMount, }, Env: []corev1.EnvVar{ { @@ -1996,7 +1964,7 @@ func TestConfigureStatefulSetForStorageType(t *testing.T) { }, }, { - Name: "cloud-credentials", + Name: managedAuthConfigVolumeName, VolumeSource: corev1.VolumeSource{ Secret: &corev1.SecretVolumeSource{ SecretName: "cloud-credentials", diff --git a/operator/internal/manifests/storage/options.go b/operator/internal/manifests/storage/options.go index e525640da6c0c..6693d2261e978 100644 --- a/operator/internal/manifests/storage/options.go +++ b/operator/internal/manifests/storage/options.go @@ -23,12 +23,47 @@ type Options struct { OpenShift OpenShiftOptions } +// CredentialMode returns which mode is used by the current storage configuration. +// This defaults to CredentialModeStatic, but can be CredentialModeToken +// or CredentialModeManaged depending on the object storage provide, the provided +// secret and whether the operator is running in a managed-auth cluster. +func (o Options) CredentialMode() lokiv1.CredentialMode { + if o.Azure != nil { + if o.OpenShift.ManagedAuthEnabled() { + return lokiv1.CredentialModeManaged + } + + if o.Azure.WorkloadIdentity { + return lokiv1.CredentialModeToken + } + } + + if o.GCS != nil { + if o.GCS.WorkloadIdentity { + return lokiv1.CredentialModeToken + } + } + + if o.S3 != nil { + if o.OpenShift.ManagedAuthEnabled() { + return lokiv1.CredentialModeManaged + } + + if o.S3.STS { + return lokiv1.CredentialModeToken + } + } + + return lokiv1.CredentialModeStatic +} + // AzureStorageConfig for Azure storage config type AzureStorageConfig struct { Env string Container string EndpointSuffix string Audience string + Region string WorkloadIdentity bool } diff --git a/operator/internal/manifests/storage/var.go b/operator/internal/manifests/storage/var.go index 49ec0b0a16ae1..cbd944a821c34 100644 --- a/operator/internal/manifests/storage/var.go +++ b/operator/internal/manifests/storage/var.go @@ -1,5 +1,7 @@ package storage +import "fmt" + const ( // EnvAlibabaCloudAccessKeyID is the environment variable to specify the AlibabaCloud client id to access S3. EnvAlibabaCloudAccessKeyID = "ALIBABA_CLOUD_ACCESS_KEY_ID" @@ -127,27 +129,29 @@ const ( // KeySwiftUsername is the secret data key for the OpenStack Swift password. KeySwiftUsername = "username" - saTokenVolumeK8sDirectory = "/var/run/secrets/kubernetes.io/serviceaccount" - saTokenVolumeName = "bound-sa-token" - saTokenExpiration int64 = 3600 + saTokenVolumeName = "bound-sa-token" + saTokenExpiration int64 = 3600 + saTokenVolumeMountPath = "/var/run/secrets/storage/serviceaccount" + + ServiceAccountTokenFilePath = saTokenVolumeMountPath + "/token" + + secretDirectory = "/etc/storage/secrets" + storageTLSVolume = "storage-tls" + caDirectory = "/etc/storage/ca" - secretDirectory = "/etc/storage/secrets" - managedAuthSecretDirectory = "/etc/storage/managed-auth" - storageTLSVolume = "storage-tls" - caDirectory = "/etc/storage/ca" + managedAuthConfigVolumeName = "managed-auth-config" + managedAuthConfigDirectory = "/etc/storage/managed-auth" - awsDefaultAudience = "sts.amazonaws.com" - AWSTokenVolumeDirectory = "/var/run/secrets/aws/serviceaccount" + awsDefaultAudience = "sts.amazonaws.com" - azureDefaultAudience = "api://AzureADTokenExchange" - azureTokenVolumeDirectory = "/var/run/secrets/azure/serviceaccount" + azureDefaultAudience = "api://AzureADTokenExchange" azureManagedCredentialKeyClientID = "azure_client_id" azureManagedCredentialKeyTenantID = "azure_tenant_id" azureManagedCredentialKeySubscriptionID = "azure_subscription_id" - - gcpTokenVolumeDirectory = "/var/run/secrets/gcp/serviceaccount" - GCPDefautCredentialsFile = gcpTokenVolumeDirectory + "/token" - - AnnotationCredentialsRequestsSecretRef = "loki.grafana.com/credentials-request-secret-ref" ) + +// ManagedCredentialsSecretName returns the name of the secret holding the managed credentials. +func ManagedCredentialsSecretName(stackName string) string { + return fmt.Sprintf("%s-managed-credentials", stackName) +} diff --git a/operator/internal/status/status.go b/operator/internal/status/status.go index 281a167355c37..c544695d3d2ea 100644 --- a/operator/internal/status/status.go +++ b/operator/internal/status/status.go @@ -17,7 +17,7 @@ import ( // Refresh executes an aggregate update of the LokiStack Status struct, i.e. // - It recreates the Status.Components pod status map per component. // - It sets the appropriate Status.Condition to true that matches the pod status maps. -func Refresh(ctx context.Context, k k8s.Client, req ctrl.Request, now time.Time, degradedErr *DegradedError) error { +func Refresh(ctx context.Context, k k8s.Client, req ctrl.Request, now time.Time, credentialMode lokiv1.CredentialMode, degradedErr *DegradedError) error { var stack lokiv1.LokiStack if err := k.Get(ctx, req.NamespacedName, &stack); err != nil { if apierrors.IsNotFound(err) { @@ -45,6 +45,7 @@ func Refresh(ctx context.Context, k k8s.Client, req ctrl.Request, now time.Time, statusUpdater := func(stack *lokiv1.LokiStack) { stack.Status.Components = *cs stack.Status.Conditions = mergeConditions(stack.Status.Conditions, activeConditions, metaTime) + stack.Status.Storage.CredentialMode = credentialMode } statusUpdater(&stack) diff --git a/operator/internal/status/status_test.go b/operator/internal/status/status_test.go index c7895cbe8020e..32ef892ed1bde 100644 --- a/operator/internal/status/status_test.go +++ b/operator/internal/status/status_test.go @@ -54,7 +54,9 @@ func TestRefreshSuccess(t *testing.T) { Gateway: map[corev1.PodPhase][]string{corev1.PodRunning: {"lokistack-gateway-pod-0"}}, Ruler: map[corev1.PodPhase][]string{corev1.PodRunning: {"ruler-pod-0"}}, }, - Storage: lokiv1.LokiStackStorageStatus{}, + Storage: lokiv1.LokiStackStorageStatus{ + CredentialMode: lokiv1.CredentialModeStatic, + }, Conditions: []metav1.Condition{ { Type: string(lokiv1.ConditionReady), @@ -68,7 +70,7 @@ func TestRefreshSuccess(t *testing.T) { k, sw := setupListClient(t, stack, componentPods) - err := Refresh(context.Background(), k, req, now, nil) + err := Refresh(context.Background(), k, req, now, lokiv1.CredentialModeStatic, nil) require.NoError(t, err) require.Equal(t, 1, k.GetCallCount()) @@ -130,7 +132,7 @@ func TestRefreshSuccess_ZoneAwarePendingPod(t *testing.T) { return nil } - err := Refresh(context.Background(), k, req, now, nil) + err := Refresh(context.Background(), k, req, now, lokiv1.CredentialModeStatic, nil) require.NoError(t, err) require.Equal(t, 1, k.GetCallCount()) diff --git a/operator/main.go b/operator/main.go index a88a857bcee44..e212c268cbad8 100644 --- a/operator/main.go +++ b/operator/main.go @@ -21,7 +21,6 @@ import ( lokiv1beta1 "github.com/grafana/loki/operator/apis/loki/v1beta1" lokictrl "github.com/grafana/loki/operator/controllers/loki" "github.com/grafana/loki/operator/internal/config" - manifestsocp "github.com/grafana/loki/operator/internal/manifests/openshift" "github.com/grafana/loki/operator/internal/metrics" "github.com/grafana/loki/operator/internal/operator" "github.com/grafana/loki/operator/internal/validation" @@ -60,12 +59,16 @@ func main() { var err error - ctrlCfg, options, err := config.LoadConfig(scheme, configFile) + ctrlCfg, managedAuth, options, err := config.LoadConfig(scheme, configFile) if err != nil { logger.Error(err, "failed to load operator configuration") os.Exit(1) } + if managedAuth != nil { + logger.Info("Discovered OpenShift Cluster within a managed authentication environment") + } + if ctrlCfg.Gates.LokiStackAlerts && !ctrlCfg.Gates.ServiceMonitors { logger.Error(kverrors.New("LokiStackAlerts flag requires ServiceMonitors"), "") os.Exit(1) @@ -95,16 +98,12 @@ func main() { os.Exit(1) } - if ctrlCfg.Gates.OpenShift.Enabled && manifestsocp.DiscoverManagedAuthEnv() != nil { - logger.Info("discovered OpenShift Cluster within a managed authentication environment") - ctrlCfg.Gates.OpenShift.ManagedAuthEnv = true - } - if err = (&lokictrl.LokiStackReconciler{ Client: mgr.GetClient(), Log: logger.WithName("controllers").WithName("lokistack"), Scheme: mgr.GetScheme(), FeatureGates: ctrlCfg.Gates, + AuthConfig: managedAuth, }).SetupWithManager(mgr); err != nil { logger.Error(err, "unable to create controller", "controller", "lokistack") os.Exit(1) @@ -129,17 +128,6 @@ func main() { } } - if ctrlCfg.Gates.OpenShift.ManagedAuthEnabled() { - if err = (&lokictrl.CredentialsRequestsReconciler{ - Client: mgr.GetClient(), - Scheme: mgr.GetScheme(), - Log: logger.WithName("controllers").WithName("lokistack-credentialsrequest"), - }).SetupWithManager(mgr); err != nil { - logger.Error(err, "unable to create controller", "controller", "lokistack-credentialsrequest") - os.Exit(1) - } - } - if ctrlCfg.Gates.LokiStackWebhook { v := &validation.LokiStackValidator{} if err = v.SetupWebhookWithManager(mgr); err != nil { From 85908fafdfc9261c2f12f630726fa0dad514f886 Mon Sep 17 00:00:00 2001 From: Zirko <64951262+QuantumEnigmaa@users.noreply.github.com> Date: Tue, 13 Feb 2024 13:40:46 +0100 Subject: [PATCH 7/7] Helm: fix changelog entry and bump chart version (#11925) **What this PR does / why we need it**: This PR fixes the changelog by moving the `bugfix` entry for the cilium network policies to a separate chart version. The 5.42.2 had already been released when the said entry was added to it. **Special notes for your reviewer**: **Checklist** - [x] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (**required**) - [ ] Documentation added - [ ] Tests updated - [x] `CHANGELOG.md` updated - [ ] If the change is worth mentioning in the release notes, add `add-to-release-notes` label - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/setup/upgrade/_index.md` - [ ] For Helm chart changes bump the Helm chart version in `production/helm/loki/Chart.yaml` and update `production/helm/loki/CHANGELOG.md` and `production/helm/loki/README.md`. [Example PR](https://github.com/grafana/loki/commit/d10549e3ece02120974929894ee333d07755d213) - [ ] If the change is deprecating or removing a configuration option, update the `deprecated-config.yaml` and `deleted-config.yaml` files respectively in the `tools/deprecated-config-checker` directory. [Example PR](https://github.com/grafana/loki/pull/10840/commits/0d4416a4b03739583349934b96f272fb4f685d15) Signed-off-by: QuantumEnigmaa --- production/helm/loki/CHANGELOG.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/production/helm/loki/CHANGELOG.md b/production/helm/loki/CHANGELOG.md index e849918585ea2..068d37a495530 100644 --- a/production/helm/loki/CHANGELOG.md +++ b/production/helm/loki/CHANGELOG.md @@ -13,9 +13,12 @@ Entries should include a reference to the pull request that introduced the chang [//]: # ( : do not remove this line. This locator is used by the CI pipeline to automatically create a changelog entry for each new Loki release. Add other chart versions and respective changelog entries bellow this line.) -## 5.42.2 +## 5.42.3 - [BUGFIX] Added condition for `egress-discovery` networkPolicies and ciliumNetworkPolicies. + +## 5.42.2 + - [BUGFIX] Remove trailing tab character in statefulset templates ## 5.42.1