feat: ingester-RF1 Push endpoint #13315

benclive · 2024-06-25T13:50:14Z

What this PR does / why we need it:

Implements Push API on the RF1 ingester
This writes the pushed log lines to a WAL segment. The segment is recreated & the old segment written to the store every 500ms (configurable)
API requests wait on the successful flushing of the segment they wrote their entries to, or request timeout.
Vast majority of the code is copied from the existing ingester & there are still a few areas that could be cleaned up
None of the tests have been copied over yet (!) as I wanted to get a working baseline to collaborate on further ASAP.
Should support all object stores via PutObject but I've only tested using the local file store.

Which issue(s) this PR fixes:
Fixes https://github.com/grafana/loki-private/issues/1016

benclive · 2024-06-25T13:51:15Z

cmd/loki/loki-local-config.yaml

@@ -4,6 +4,7 @@ server:
  http_listen_port: 3100
  grpc_listen_port: 9096
  log_level: debug
+  grpc_server_max_concurrent_streams: 1000


This is required because the server is handling far more requests concurrently now that they wait for a segment flush before returning. It'll need to be even higher in real environments!

benclive · 2024-06-25T13:55:18Z

pkg/ingester-rf1/ingester.go

@@ -540,7 +578,25 @@ func (i *Ingester) loop() {
 	for {
 		select {
 		case <-flushTicker.C:
-			i.sweepUsers(false, true)
+			//i.logger.Log("msg", "starting periodic flush")
+			i.flushCtx.lock.Lock() // Stop new chunks being written while we swap destinations - we'll never unlock as this flushctx can no longer be used.


this section is the key part of the new approach:
It is responsible for rotating the WAL segment every 500ms.
General idea is to grab a Write lock (preventing an API callers from using the current segment) and to create a new one for clients one as fast as possible. Flush of old segment happens in the background.
Closing of channels is used for signalling the waiting API callers because we can Select on them or on the calling context to handle timeouts.

I think the Timer must be created as part of the context/request so that we can handle multiple flush timeout for different set of requests.

I think we can actually forget flush loop and enqueueing for now.

We probably also want to flush based on input size to limit block size ? I think we can start with just this for now.

See my comment below on the multiple flushes.
I do need to add the input size to this, yes! That was my next task but I wanted to get early feedback on this XXL PR first - ideally we can parallelise more work on top of this starting point.

benclive · 2024-06-25T13:57:47Z

pkg/ingester-rf1/ingester.go

+	// The only time the Write Lock is held is when this context is no longer usable and a new one is being created.
+	// In this case, we need to re-read i.flushCtx in order to fetch the new one as soon as it's available.
+	//The newCtxAvailable chan is closed as soon as the new one is available to avoid a busy loop.
+	currentFlushCtx := i.flushCtx


This is the other main piece of the new approach:
API callers will grab the current flush context (which includes a reference to the current WAL segment) by obtaining a Read lock.
The RWMutex is used backwards, which could be improved, but it provides the ability to have many API clients writing to the WAL segment at once (wal.Append is thread safe) and for the ingester to block new appends by grabbing the Write lock.

benclive · 2024-06-25T13:58:41Z

pkg/ingester-rf1/instance.go

+
+	sortedLabels := i.index.Add(logproto.FromLabelsToLabelAdapters(labels), fp)
+
+	chunkfmt, headfmt, err := i.chunkFormatAt(minTs(&pushReqStream))


I was unsure what most of this chunk related code did - it can almost certainly be removed but the key functionality such as tracking metrics should be retained.

benclive · 2024-06-25T14:00:00Z

pkg/storage/wal/segment.go

 				entries: make([]*logproto.Entry, 0, 4096),
 			}
 		},
 	}
+	tenantLabel = "__loki_tenant__"


Had to copy this from tsdb.TenantLabel because of a circular import between tsdb & wal via indexWriter.

cyriltovena · 2024-06-25T15:20:03Z

pkg/ingester-rf1/ingester.go

+		flushCtx: &flushCtx{
+			lock:            &sync.RWMutex{},
+			flushDone:       make(chan struct{}),
+			newCtxAvailable: make(chan struct{}),
+			segmentWriter:   segmentWriter,
+		},


Interesting choice here.

I thought we would need multiple flushes routine per ingester to be able to flush faster, right now if a single flush is getting slow down (happens) all other requests will be affected.

Not a problem though let's put it to the test in dev.

We do have multiple flush routines - the idea was to have one "active" flushCtx at a time, which all Pushs are Appending to.
Every time the timer fires, it will swap the flushCtx for a new one (new wal segment, etc.) as fast as possible - it's just a variable assignment - and then hand off the old flushCtx to be flushed async.
I fully expect flushing to storage to be the slowest part of this, so I'm reusing the existing flush loops. We can have multiple flushes happening in parallel, and only the Pushes that appended to that flush would be waiting. New Pushes would be unaffected as they'd be Appending to a completely different flushCtx.

Definitely open to suggestions if you have better ideas though!

cyriltovena · 2024-06-25T15:31:37Z

pkg/storage/chunk/client/object_client.go

+	if err != nil {
+		return err
+	}
+	return o.store.PutObject(ctx, "wal-segment-"+time.Now().UTC().Format(time.RFC3339Nano), bytes.NewReader(buffer.Bytes()))


Can the new ingester deal with this interface instead ? PutObject

This way we don't leak our storage into object storage clients.

I think thats a good idea but for now I was just copying the existing interfaces which use "PutChunks" everywhere. There is a slightly lower level interface which only has PutObject, but our storage layer seems to switch between them frequently.

I addressed it in b4ca98b

vendor/github.com/grafana/loki/pkg/push/push-rf1.proto

cyriltovena

LGTM

Let's merge and improve from there. Build is still failing.

grobinson-grafana and others added 5 commits June 24, 2024 15:22

wip

aff4a56

wip: Finish setup of new ingesters

5a01b07

wip: first pass at periodic push method

f424bd4

wip: push writes to segments concurrently

43c9466

pipe WAl segments through storage layer

0925125

benclive requested a review from a team as a code owner June 25, 2024 13:50

pull-request-size bot added the size/XXL label Jun 25, 2024

benclive changed the title ~~Benclive/ingester rf1 push~~ feat: ingester-RF1 Push endpoint Jun 25, 2024

self review

5e1a52f

benclive commented Jun 25, 2024

View reviewed changes

cyriltovena reviewed Jun 25, 2024

View reviewed changes

vendor/github.com/grafana/loki/pkg/push/push-rf1.proto Show resolved Hide resolved

cyriltovena approved these changes Jun 26, 2024

View reviewed changes

benclive added 3 commits June 26, 2024 14:52

Lint fixes

d961868

make docs

2bd7db9

lint: implement more store interfaces

63ff2fe

github-actions bot added the type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories label Jun 26, 2024

lint: use uber atomic pkg

160f556

benclive merged commit d3f2ff8 into grobinson/ingester-rf1 Jun 26, 2024
59 checks passed

benclive deleted the benclive/ingester-rf1-push branch June 26, 2024 14:47

vlad-diachenko mentioned this pull request Jun 27, 2024

chore: adjusted SegmentWriter to the reader conversion #13342

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: ingester-RF1 Push endpoint #13315

feat: ingester-RF1 Push endpoint #13315

benclive commented Jun 25, 2024

benclive Jun 25, 2024

benclive Jun 25, 2024

cyriltovena Jun 25, 2024

cyriltovena Jun 25, 2024

cyriltovena Jun 25, 2024

benclive Jun 26, 2024

benclive Jun 25, 2024

benclive Jun 25, 2024

benclive Jun 25, 2024

cyriltovena Jun 25, 2024

benclive Jun 26, 2024

cyriltovena Jun 25, 2024

benclive Jun 26, 2024

vlad-diachenko Jun 27, 2024

cyriltovena left a comment


		sortedLabels := i.index.Add(logproto.FromLabelsToLabelAdapters(labels), fp)

		chunkfmt, headfmt, err := i.chunkFormatAt(minTs(&pushReqStream))

feat: ingester-RF1 Push endpoint #13315

feat: ingester-RF1 Push endpoint #13315

Conversation

benclive commented Jun 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cyriltovena left a comment

Choose a reason for hiding this comment