-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use time-based event index for app session events #38495
Use time-based event index for app session events #38495
Conversation
The PR changelog entry failed validation: Changelog entry not found in the PR body. Please add a "no-changelog" label to the PR, or changelog lines starting with |
lib/srv/app/server.go
Outdated
@@ -1187,3 +1205,13 @@ func newGetConfigForClientFn(log logrus.FieldLogger, client auth.AccessCache, tl | |||
return tlsCopy, nil | |||
} | |||
} | |||
|
|||
// certFromConnState returns certificate from the connection. | |||
func certFromConn(tlsConn *tls.Conn) *x509.Certificate { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func certFromConn(tlsConn *tls.Conn) *x509.Certificate { | |
func leafCertFromConn(tlsConn *tls.Conn) *x509.Certificate { |
Overall looks good, added a few questions as I'm not super familiar with these parts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The approach looks good to me. The key index is used in app sessions recordings as well. After this change, ppl will see big numbers as key index instead of 1,2,3, so we need to document this behavior somewhere.
TableName: aws.String(l.Tablename), | ||
Item: av, | ||
TableName: aws.String(l.Tablename), | ||
ConditionExpression: aws.String("attribute_not_exists(SessionID) AND attribute_not_exists(EventIndex)"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this break existing environments when auth is upgraded but app server is not yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, as soon as they update their auth servers and events from older app servers, they will hit this, at least for the first app.session.chunk
(which conflicts with app.session.start
). As mentioned in the PR description, if the app server serving the requests changes, there will be more conflicts (generating more errors).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we release this check with the patch, customers's auth will be "spammed" with this before app servers are upgraded. As discussed offline, it is preferred to handle this separately and ideally release it in a new major version. By that time, app servers should have been upgraded with this patch already and we can likely catch other events that have the same problem during the release testing.
Though, since this check does not block using the app, and we are losing events before app servers are upgraded anyway, I don't mind if we really want to release this together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer having an error log to silently overwriting audit events (current behavior).
@codingllama, Any thoughts on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid I don't have enough context to have a strong opinion here.
// valid date. | ||
func (s *Server) sessionStartTime(ctx context.Context) time.Time { | ||
var startTime time.Time | ||
if userCert, err := authz.UserCertificateFromContext(ctx); err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed interesting approach. Chunks are every five minutes if I recall. So using nanoseconds ensures chunks are relatively in order. However in HA case, time on different app servers may not be perfectly in sync.
We could also use UnixNano for "smaller" numbers, but it'll still be quite large compared to "1,2,3". |
Co-authored-by: Alan Parra <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested today and worked as expected 👍 . Do we expect this change affect other event backend?
TableName: aws.String(l.Tablename), | ||
Item: av, | ||
TableName: aws.String(l.Tablename), | ||
ConditionExpression: aws.String("attribute_not_exists(SessionID) AND attribute_not_exists(EventIndex)"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we release this check with the patch, customers's auth will be "spammed" with this before app servers are upgraded. As discussed offline, it is preferred to handle this separately and ideally release it in a new major version. By that time, app servers should have been upgraded with this patch already and we can likely catch other events that have the same problem during the release testing.
Though, since this check does not block using the app, and we are losing events before app servers are upgraded anyway, I don't mind if we really want to release this together.
@gabrielcorado See the table below for backport results.
|
Closes #37049
As described on the issue, the events were being overwritten on DynamoDB events storage because they had the same key pair (SessionID and Index). The problem is that every time a new session chunk is initialized, it uses a new recorder, which would then have the index start at 0. This would cause any previous event from the session to be overwritten.
The solution proposed on this PR is, for app sessions, to generate an event index based on the session start (here, the time the certificate was issued), removing the necessity of a distributed counter.
In addition, the solution also needed to take into account if different servers handle an app session. For app access, the sessions are not ended if the server handling it changes (HA). In case that happens, the recorder would start over at index 0 and overwrite all the events emitted by the proxy and other app server.
NOTE: This change only affects app session recordings; other resources will still use the incremental method.
This PR also adds a DynamoDB condition expression to guarantee events are not being overwritten on the storage layer.
changelog: Fix application access events being overwritten when using DynamoDB as event storage.