-
Notifications
You must be signed in to change notification settings - Fork 374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add fqdnCacheMinTTL configuration parameter #6808
Conversation
Signed-off-by: Hemant <[email protected]>
Signed-off-by: Hemant <[email protected]>
cmd/antrea-agent/options.go
Outdated
// If minTTL is greater than 1, it indicates that the value has been set externally by a cluster admin, and should be respected. | ||
if o.config.MinTTL > 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why 1 and not 0?
# The minTTL setting helps address the problem of applications caching DNS response IPs indefinitely. | ||
# The Cluster administrators should configure this value, ideally setting it to be equal to or greater than the maximum TTL | ||
# value of the application's DNS cache. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove "indefinitely". If the application does indeed cache the DNS entry forever, there is not much we can do.
We should also mention that this is for FQDN policy enforcement.
So maybe something like this:
The minTTL setting helps address the problem of applications caching DNS response IPs beyond the TTL value for the DNS record.
It is used to enforce FQDN policy rules, ensuring that resolved IPs are included in datapath rules for as long as the application is caching them.
This value should ideally be set to the maximum caching duration across all applications.
cmd/antrea-agent/options.go
Outdated
@@ -90,6 +90,7 @@ type Options struct { | |||
nplEndPort int | |||
dnsServerOverride string | |||
nodeType config.NodeType | |||
minTTL uint32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this field doesn't seem necessary?
getMaxTTL := func(ttl1, ttl2 uint32) uint32 { | ||
if ttl1 > ttl2 { | ||
return ttl1 | ||
} else { | ||
return ttl2 | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a max
built in in Golang: https://pkg.go.dev/builtin#max
It was introduced in Go 1.21
currentTime := f.clock.Now() | ||
for _, ans := range msg.Answer { | ||
switch r := ans.(type) { | ||
case *dns.A: | ||
if f.ipv4Enabled { | ||
responseIPs[r.A.String()] = ipWithExpiration{ | ||
ip: r.A, | ||
expirationTime: currentTime.Add(time.Duration(r.Header().Ttl) * time.Second), | ||
expirationTime: currentTime.Add(time.Duration(getMaxTTL(f.minTTL, r.Header().Ttl)) * time.Second), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like technically, minTTL
should only apply to DNS responses which were not initiated by Antrea, but intercepted by Antrea (responses to DNS queries generated by the application). However, this code applies to responses to DNS queries sent by Antrea (when an override DNS server is configured). Maybe we need to introduce a flag to distinguish between the 2 cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually spent the length of last syncup meeting to discuss this, because I had the same suggestion but @tnqn’s opinion was that we don’t have to differentiate between the two cases. For security purpose we advise users to use FQDN rules only in allowlists for Antrea-native policies, and he thinks it’s okay that the clients’ TTL for a FQDN goes “out of sync” with the antrea agent since that’s not the gaurentee we want: we only want to enforce that client cannot access unintended addresses. So having an address for a domain which has longer TTL in antrea cache compared to the client is ok. I’ll let Quan chime in to see if I’m summarizing this correctly, but the end result of the discussion was we told Hemant to not worry about differentiating Antrea and client initiated dns queries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For security purpose we advise users to use FQDN rules only in allowlists for Antrea-native policies
Got it, makes sense to me
@hkiiita please ignore this comment
- Remove the redundant minTTL field from Options struct. - Use in built max function for comparison of max TTL values. - Improve descriptive comment about minTTL in config file. Signed-off-by: Hemant <[email protected]>
Signed-off-by: Hemant <[email protected]>
# The minTTL setting helps address the problem of applications caching DNS response IPs beyond the TTL value for the DNS record. | ||
# It is used to enforce FQDN policy rules, ensuring that resolved IPs are included in datapath rules for as long as the application is caching them. | ||
# This value should ideally be set to the maximum caching duration across all applications. | ||
minTTL: {{ .Values.minTTL }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the name is a bit generic, maybe we should go with fqdnCacheMinTTL
?
cmd/antrea-agent/options.go
Outdated
// Ensure that the minTTL is not negative. | ||
o.config.MinTTL = max(o.config.MinTTL, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be better to return an error if o.config.MinTTL < 0
IMO
while this code is in validateK8sNodeOptions
, I am not sure this configuration parameter is specific to the "K8sNode" case. Maybe the check should be in the parent function? cc @tnqn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it can be used in ExternalNode too but guess no one validates it. I'm fine with keeping it in the parent function for now. We might create a validate function for networkpolicy related configurations later.
@@ -160,7 +161,7 @@ type fqdnController struct { | |||
clock clock.Clock | |||
} | |||
|
|||
func newFQDNController(client openflow.Client, allocator *idAllocator, dnsServerOverride string, dirtyRuleHandler func(string), v4Enabled, v6Enabled bool, gwPort uint32, clock clock.WithTicker) (*fqdnController, error) { | |||
func newFQDNController(client openflow.Client, allocator *idAllocator, dnsServerOverride string, dirtyRuleHandler func(string), v4Enabled, v6Enabled bool, gwPort uint32, clock clock.WithTicker, minTTL int) (*fqdnController, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if minTTL
should be uint32
here. This way we know it is a positive number. The conversion from int
to uint32
can happen in cmd/antrea-agent
.
Signed-off-by: Hemant <[email protected]>
# It is used to enforce FQDN policy rules, ensuring that resolved IPs are included in datapath rules for as long as the application is caching them. | ||
# This value should ideally be set to the maximum caching duration across all applications. | ||
minTTL: {{ .Values.minTTL }} | ||
fqdnCacheMinTTL: {{ .Values.minTTL }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fqdnCacheMinTTL: {{ .Values.minTTL }} | |
fqdnCacheMinTTL: {{ .Values.fqdnCacheMinTTL }} |
This is why the manifests are not generated correctly (fqdnCacheMinTTL:
instead of fqdnCacheMinTTL: 0
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry , my bad. Will correct that.
cmd/antrea-agent/options.go
Outdated
switch o.config.NodeType { | ||
case config.ExternalNode.String(): | ||
o.nodeType = config.ExternalNode | ||
return o.validateExternalNodeOptions() | ||
} else if o.config.NodeType == config.K8sNode.String() { | ||
case config.K8sNode.String(): | ||
o.nodeType = config.K8sNode | ||
return o.validateK8sNodeOptions() | ||
} else { | ||
default: | ||
return fmt.Errorf("unsupported nodeType %s", o.config.NodeType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not a bad change, but I would avoid doing it in this PR as it is unrelated
cmd/antrea-agent/options.go
Outdated
if o.config.NodeType == config.ExternalNode.String() { | ||
// validate FqdnCacheMinTTL | ||
if o.config.FqdnCacheMinTTL < 0 { | ||
return fmt.Errorf("fqdnCacheMinTTL set to an invalid value, its must be a positive integer") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return fmt.Errorf("fqdnCacheMinTTL set to an invalid value, its must be a positive integer") | |
return fmt.Errorf("fqdnCacheMinTTL must be greater than or equal to 0") |
pkg/config/agent/config.go
Outdated
@@ -158,7 +158,7 @@ type AgentConfig struct { | |||
// The minTTL setting helps address the problem of applications caching DNS response IPs indefinitely. | |||
// The Cluster administrators should configure this value, ideally setting it to be equal to or greater than the maximum TTL | |||
// value of the application's DNS cache. | |||
MinTTL int `yaml:"minTTL,omitempty"` | |||
FqdnCacheMinTTL int `yaml:"fqdnCacheMinTTL,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the field name here should be FQDNCacheMinTTL
per our conventions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment not addressed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should also add unit test cases to verify that with fqdnController with minTTL set will set correct dnsEntry expiration times in cache
… per review. Signed-off-by: Hemant <[email protected]>
Signed-off-by: Hemant <[email protected]>
@@ -306,6 +306,11 @@ kubeAPIServerOverride: {{ .Values.kubeAPIServerOverride | quote }} | |||
# 10.96.0.10:53, [fd00:10:96::a]:53). | |||
dnsServerOverride: {{ .Values.dnsServerOverride | quote }} | |||
|
|||
# The fqdnCacheMinTTL setting helps address the problem of applications caching DNS response IPs beyond the TTL value for the DNS record. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/The fqdnCacheMinTTL setting helps address/fqdnCacheMinTTL helps address
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hkiiita not addressed correctly, the current sentence is not grammatically correct
@@ -744,3 +747,140 @@ func TestOnDNSResponse(t *testing.T) { | |||
}) | |||
} | |||
} | |||
func TestFQDNCacheMinTTL(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like in theory we only need to test parseDNSResponse
, because this is the only place where minTTL
is actually used. We just need to make sure that minTTL
can override the TTL included in the DNS response. cc @Dyanngg
However, I don't feel super strongly about it, so if others think it is better to test onDNSResponseMsg
"end-to-end", it is fine by me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am awaiting a comment on this one to implement the nit change asked above in previous comment , moreover, i had also pushed the commit below refactor to test just the parseDNSResponse considering the feedback.
Signed-off-by: Hemant <[email protected]>
name: "Response TTL less than FQDNCacheTTL", | ||
expectedTTL: currentTime.Add(10 * time.Second), | ||
fqdnCacheMinTTL: 10, | ||
dnsMsg: &dns.Msg{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a lot of repetition: we use the same DNS message every time as far as I can tell (with the only change being the ttl value).
One option is to use a closure such as this one:
getDNSMsg := func(ttl in) *dns.Msg {
return &dns.Msg{ ... }
}
fakeClock := newFakeClock(currentTime) | ||
controller := gomock.NewController(t) | ||
f, _ := newMockFQDNController(t, controller, nil, fakeClock, tc.fqdnCacheMinTTL) | ||
require.Zero(t, fakeClock.TimersAdded()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this useful in the context of this test? Please add a short comment if you think it is necessary.
f, _ := newMockFQDNController(t, controller, nil, fakeClock, tc.fqdnCacheMinTTL) | ||
require.Zero(t, fakeClock.TimersAdded()) | ||
_, responseIPs, err := f.parseDNSResponse(tc.dnsMsg) | ||
assert.NoError(t, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one should be require.NoError
Signed-off-by: Hemant <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a couple of nits, otherwise LGTM
Edit: the e2e test is not correct, it doesn't actually set "minTTL"
please resolve conflicts as well
build/charts/antrea/values.yaml
Outdated
@@ -186,6 +186,10 @@ kubeAPIServerOverride: "" | |||
# -- Address of DNS server, to override the kube-dns Service. It's used to | |||
# resolve hostnames in a FQDN policy. | |||
dnsServerOverride: "" | |||
# -- The minTTL setting helps address the problem of applications caching DNS response IPs indefinitely. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment out of date
ttl uint32 | ||
}{ | ||
{ | ||
name: "Response TTL less than FQDNCacheTTL", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename all occurrences of FQDNCacheTTL
to fqdnCacheMinTTL
}{ | ||
{ | ||
name: "Response TTL less than FQDNCacheTTL", | ||
expectedTTL: currentTime.Add(10 * time.Second), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to improve readability of the test case definitions, it would be nice if expectedTTL
followed the same "format" as fqdnCacheMinTTL
and ttl
I would suggest using a plain integer for expectedTTL
(like for the other 2 fields), and making the necessary conversion in the test implementation (t.Run
)
name string | ||
expectedTTL time.Time | ||
fqdnCacheMinTTL uint32 | ||
ttl uint32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: rename ttl
to responseTTL
- Added mocking of fqdnCacheMinTTl set and unset in the e2e test. - Refactored fqdn test by enclosing the DNS message construction in a closure and - setting the TTL value in DNS message in t.Run() - updated outdated comment in values.yaml for the fqdnCacheMinTTL. Signed-off-by: Hemant <[email protected]>
build/yamls/antrea.yml
Outdated
@@ -5383,8 +5388,8 @@ spec: | |||
# Using "kubectl logs/exec/attach/cp" doesn't have to specify "-c antrea-agent" when troubleshooting. | |||
kubectl.kubernetes.io/default-container: antrea-agent | |||
# Automatically restart Pods with a RollingUpdate if the ConfigMap changes | |||
# See https://helm.sh/docs/howto/charts_tips_and_tricks/#automatically-roll-deployments | |||
checksum/config: ee43fc8c8c5ac8097a757da0545e43b25f98d8f2d831842f7ee76e1ed7581267 | |||
# See https://helm.sh/docs/howto/charts_tips_and_tricks/#automatically-roll-deploymentstl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is an intentional change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not addressed
dnsMsg := getDNSMsg(tc.responseTTL) | ||
_, responseIPs, err := f.parseDNSResponse(dnsMsg) | ||
require.NoError(t, err) | ||
expectedTTL := currentTime.Add(tc.expectedTTL * time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/expectedTTL/expectedExpirationTime for this variable name
test/e2e/antreapolicy_test.go
Outdated
testWithFQDNCacheMinTTL(t, 0) | ||
testWithFQDNCacheMinTTL(t, 10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make them independent subtests with their own name. For example:
t.Run("FQDNCacheMinTTL unset", func(t *testing.T) { testWithFQDNCacheMinTTL(t, 0) })
t.Run("FQDNCacheMinTTL set to 10s", func(t *testing.T) { testWithFQDNCacheMinTTL(t, 10) })
test/e2e/antreapolicy_test.go
Outdated
if fqdnCacheMinTTL == 0 { | ||
t.Logf("Running the test with FQDNCacheMinTTL unset") | ||
} else { | ||
t.Logf("Running the test with FQDNCacheMinTTL set to %d ", fqdnCacheMinTTL) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of this, each case should be a subtest, with its own name (see comment above)
After making that change, you can remove these logs
test/e2e/antreapolicy_test.go
Outdated
@@ -5342,6 +5359,7 @@ func TestFQDNCacheMinTTL(t *testing.T) { | |||
} | |||
return stdout, nil | |||
} | |||
setFQDNCacheMinTTLInAntrea(t, data, fqdnCacheMinTTL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's better if you only mutate the Antrea config once. So you need to find a way to "merge" this with setDNSServerAddressInAntrea
additionally, the config should be restored at the end of the test
test/e2e/antreapolicy_test.go
Outdated
assert.EventuallyWithT(t, func(t *assert.CollectT) { | ||
_, err := curlFQDN(fqdnIP) | ||
assert.Error(t, err) | ||
}, 10*time.Second, 1*time.Second) | ||
}, 20*time.Second, 1*time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the assertion should be different based on whether fqdnCacheMinTTL
is set or not? otherwise there is something wrong with the test.
test/e2e/antreapolicy_test.go
Outdated
// The wait time here should be slightly longer than the reload value specified in the custom DNS configuration. | ||
// TODO: This assertion currently verifies the issue described in https://github.com/antrea-io/antrea/issues/6229. | ||
// It will need to be updated once minTTL support is implemented. | ||
// TODO: This assertion verifies the fix to the issue described in https://github.com/antrea-io/antrea/issues/6229. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is no longer a TODO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not addressed
cmd/antrea-agent/options.go
Outdated
@@ -155,6 +155,11 @@ func (o *Options) validate(args []string) error { | |||
return fmt.Errorf("nodeType %s requires feature gate ExternalNode to be enabled", o.config.NodeType) | |||
} | |||
|
|||
// validate FqdnCacheMinTTL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to have the comment as the code is obvious and the comment doesn't provide more information
@@ -196,7 +196,7 @@ func NewNetworkPolicyController(antreaClientGetter client.AntreaClientProvider, | |||
gwPort, tunPort uint32, | |||
nodeConfig *config.NodeConfig, | |||
podNetworkWait *utilwait.Group, | |||
l7Reconciler *l7engine.Reconciler) (*Controller, error) { | |||
l7Reconciler *l7engine.Reconciler, fqdnCacheMinTTL uint32) (*Controller, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a separate line for the new argument like other args.
pkg/config/agent/config.go
Outdated
@@ -158,7 +158,7 @@ type AgentConfig struct { | |||
// The minTTL setting helps address the problem of applications caching DNS response IPs indefinitely. | |||
// The Cluster administrators should configure this value, ideally setting it to be equal to or greater than the maximum TTL | |||
// value of the application's DNS cache. | |||
MinTTL int `yaml:"minTTL,omitempty"` | |||
FqdnCacheMinTTL int `yaml:"fqdnCacheMinTTL,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment not addressed
test/e2e/antreapolicy_test.go
Outdated
// applications might cache them, thereby preventing intermittent network connectivity issues to the FQDN concerned. | ||
// Actual test logic runs in testWithFQDNCacheMinTTL, which gets called by TestFQDNCacheMinTTL with 2 fqdnCacheMinTTL values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two lines are obviously longer than other lines.
And remove the extra space between "by TestFQDNCacheMinTTL"
Signed-off-by: Hemant <[email protected]>
build/yamls/antrea.yml
Outdated
@@ -5383,8 +5388,8 @@ spec: | |||
# Using "kubectl logs/exec/attach/cp" doesn't have to specify "-c antrea-agent" when troubleshooting. | |||
kubectl.kubernetes.io/default-container: antrea-agent | |||
# Automatically restart Pods with a RollingUpdate if the ConfigMap changes | |||
# See https://helm.sh/docs/howto/charts_tips_and_tricks/#automatically-roll-deployments | |||
checksum/config: ee43fc8c8c5ac8097a757da0545e43b25f98d8f2d831842f7ee76e1ed7581267 | |||
# See https://helm.sh/docs/howto/charts_tips_and_tricks/#automatically-roll-deploymentstl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not addressed
# fqdnCacheMinTTL helps address the issue of applications caching DNS response IPs beyond the TTL value for the DNS record. | ||
# It is used to enforce FQDN policy rules, ensuring that resolved IPs are included in datapath rules for as long as the application caches them. | ||
# Ideally, this value should be set to the maximum caching duration across all applications. | ||
fqdnCacheMinTTL: {{ .Values.fqdnCacheMinTTL }} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be good to wrap the lines here as well
build/charts/antrea/values.yaml
Outdated
# fqdnCacheMinTTL helps address the issue of applications caching DNS response IPs beyond the TTL value for the DNS record. | ||
# It is used to enforce FQDN policy rules, ensuring that resolved IPs are included in datapath rules for as long as the application caches them. | ||
# Ideally, this value should be set to the maximum caching duration across all applications. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
test/e2e/antreapolicy_test.go
Outdated
t.Run("FQDNCacheMinTTL-unset", func(t *testing.T) { testWithFQDNCacheMinTTL(t, 0) }) | ||
t.Run("FQDNCacheMinTTL-set-to-10s", func(t *testing.T) { testWithFQDNCacheMinTTL(t, 10) }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you adding dashes to the sub-test names?
Note that the go testing framework takes care of escaping whitespaces automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@antoninbas Actually , i was getting the following error , hence i had done the said change .
fixtures.go:288: Creating 'testfqdncacheminttl-fqdncacheminttl_unset-c2njzgdw' K8s Namespace
fixtures.go:355: Exporting test logs to '/tmp/antrea-test-2503355158/TestFQDNCacheMinTTL_FQDNCacheMinTTL_unset/afterSetupTest.Dec04-00-40-40'
antreapolicy_test.go:5301: Error when setting up test: error when creating 'testfqdncacheminttl-fqdncacheminttl_unset-c2njzgdw'
Namespace: Namespace "testfqdncacheminttl-fqdncacheminttl_unset-c2njzgdw" is invalid: metadata.name: Invalid value: "testfqdncacheminttl-fqdncacheminttl_unset-c2njzgdw":
a lowercase RFC 1123 label must consist of lower case alphanumeric characters or '-', and must start and end with an alphanumeric character (e.g. 'my-name', or '123-abc', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?')
After you pointed this out, Initially i thought maybe i should look at the test framework code to check if we are doing this replacement of whitespace with underscores but i found its not being done there as the setupTest function gets the name from *testing.T type, hence its being modified by go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved as per guidance in slack.
test/e2e/antreapolicy_test.go
Outdated
// The wait time here should be slightly longer than the reload value specified in the custom DNS configuration. | ||
// TODO: This assertion currently verifies the issue described in https://github.com/antrea-io/antrea/issues/6229. | ||
// It will need to be updated once minTTL support is implemented. | ||
// TODO: This assertion verifies the fix to the issue described in https://github.com/antrea-io/antrea/issues/6229. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not addressed
test/e2e/antreapolicy_test.go
Outdated
@@ -5322,8 +5334,8 @@ func TestFQDNCacheMinTTL(t *testing.T) { | |||
createCustomDNSPod(t, data, configMap.Name) | |||
|
|||
// set the custom DNS server IP address in Antrea ConfigMap. | |||
setDNSServerAddressInAntrea(t, data, dnsServiceIP) | |||
defer setDNSServerAddressInAntrea(t, data, "") //reset after the test. | |||
setDNSServerAddressInAntrea(t, data, dnsServiceIP, fqdnCacheMinTTL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename setDNSServerAddressInAntrea
to configureFQDNPolicyEnforcement
, which is more generic and more suitable since you are not configuring 2 different things
test/e2e/antreapolicy_test.go
Outdated
assert.EventuallyWithT(t, func(t *assert.CollectT) { | ||
_, err := curlFQDN(fqdnIP) | ||
assert.NoError(t, err) | ||
}, time.Duration(fqdnCacheMinTTL)*time.Second, 1*time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it feels like this should be assert.Never
. I don't understand how assert.Eventually
is valid.
- Fixed final e2e assertion by using assert.Never - Fixed various formatting issues as pointe di review - Renamed the function which modifies DNS ConfigMap Signed-off-by: Hemant <[email protected]>
test/e2e/antreapolicy_test.go
Outdated
assert.Never(t, func() bool { | ||
_, err := curlFQDN(fqdnIP) | ||
return err != nil | ||
}, time.Duration(fqdnCacheMinTTL)*time.Second, 1*time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like the test is likely to be "flaky" in practice. The assertion will verify that the condition is not met for fqdnCacheMinTTL
seconds. However, by the time we start running the assertion, Antrea may have cached the IP for a few seconds already? So assuming that fqdnCacheMinTTL
is 10s (as is the case currently), could the assertion actually fail after 7 or 8s?
Unless I am missing something, we should probably have some logic like this:
// get timestamp before the Pod resolves the FQDN for the first time
startCacheTime := time.Now()
// this is the first time we make the Pod curl the FQDN
assert.EventuallyWithT(t, func(t *assert.CollectT) {
_, err := curlFQDN(testFQDN)
assert.NoError(t, err)
}, 2*time.Second, 100*time.Millisecond, "failed to curl test FQDN: ", testFQDN)
// rest of the test code
// when we get to the "Never" assertion, we need to deduct some time from fqdnCacheMinTTL
// the extra 1s is for "safety"
waitFor := time.Duration(fqdnCacheMinTTL)*time.Second - time.Since(startCacheTime) - 1*time.Second
assert.Never(t, func() bool {
_, err := curlFQDN(fqdnIP)
return err != nil
}, waitFor, 1*time.Second)
It may make sense to increase the minTTL value from 10s to 20s for the test IMO. I am not exactly sure how much time elapses between startCacheTime
and this assertion being executed.
@@ -5368,26 +5380,34 @@ func TestFQDNCacheMinTTL(t *testing.T) { | |||
require.NoError(t, data.setPodAnnotation(data.testNamespace, "custom-dns-server", "test.antrea.io/random-value", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not introduced in this PR, but I just noticed the following assertion above:
assert.EventuallyWithT(t, func(t *assert.CollectT) {
_, err := curlFQDN(testFQDN)
assert.NoError(t, err)
}, 2*time.Second, 1*time.Millisecond, "failed to curl test FQDN: ", testFQDN)
1ms is much too frequent to be executing the condition function. Please update to 100ms.
test/e2e/antreapolicy_test.go
Outdated
@@ -5368,26 +5380,34 @@ func TestFQDNCacheMinTTL(t *testing.T) { | |||
require.NoError(t, data.setPodAnnotation(data.testNamespace, "custom-dns-server", "test.antrea.io/random-value", | |||
randSeq(8)), "failed to update custom DNS Pod annotation.") | |||
|
|||
// finally verify that Curling the previously cached IP fails after DNS update. | |||
// finally verify that Curling the previously cached IP does not fail after DNS update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// finally verify that Curling the previously cached IP does not fail after DNS update. | |
// finally verify that Curling the previously cached IP does not fail after DNS update, as long as fqdnCacheMinTTL is set. |
- added a waitFor time equivalent to the difference of time elapsed since last DNS request. - updated manifests using `make manifests` for build fail. - updated helm chart docs. Signed-off-by: Hemant <[email protected]>
test/e2e/antreapolicy_test.go
Outdated
// Calculate `waitFor` to determine the duration to wait for the 'Never' assertion. | ||
// This accounts for the elapsed time since the initial DNS request was made from the Pod | ||
// and the start of the FQDN cache's minimum TTL (fqdnCacheMinTTL). The duration is reduced | ||
// by 1 second as a buffer acting as a safety margin. | ||
safetyMargin := 1 * time.Second | ||
waitFor := (time.Duration(fqdnCacheMinTTL)*time.Second - time.Since(startCacheTime)) - safetyMargin | ||
|
||
if waitFor <= 0 { | ||
t.Logf("Invalid waitFor computed: %v. Clamping to 1 second.", waitFor) | ||
waitFor = 1 * time.Second | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be moved to the else
block of the code below. We only need that for the case where fqdnCacheMinTTL
is greater than 0.
I would also remove the sanity checking code for waitFor
and replace it with require.GreaterOrEqual(t, waitFor, 5*time.Second)
. I think the test is only meaningful if waitFor
is large enough. If this requirement fails, it either means that there is an issue with the test or that fqdnCacheMinTTL
should be set to a larger value.
Signed-off-by: Hemant <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test/e2e/antreapolicy_test.go
Outdated
func TestFQDNCacheMinTTL(t *testing.T) { | ||
t.Run("minTTLUnset", func(t *testing.T) { testWithFQDNCacheMinTTL(t, 0) }) | ||
t.Run("minTTL10s", func(t *testing.T) { testWithFQDNCacheMinTTL(t, 10) }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
t.Run("minTTL10s", func(t *testing.T) { testWithFQDNCacheMinTTL(t, 10) }) | |
t.Run("minTTL20s", func(t *testing.T) { testWithFQDNCacheMinTTL(t, 20) }) |
In #6808 (comment), this is what I was actually recommending
test/e2e/antreapolicy_test.go
Outdated
assert.EventuallyWithT(t, func(t *assert.CollectT) { | ||
_, err := curlFQDN(fqdnIP) | ||
assert.Error(t, err) | ||
}, 20*time.Second, 1*time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can keep the timeout as 10s here. I think there was some confusion with my request in #6808 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, got it.
test/e2e/antreapolicy_test.go
Outdated
waitFor := (time.Duration(fqdnCacheMinTTL)*time.Second - time.Since(startCacheTime)) - safetyMargin | ||
require.GreaterOrEqual(t, waitFor, 5*time.Second) | ||
|
||
// fqdnCacheMinTTL is set hence we expect no error at least till the period equivalent to fqdnCacheMinTTL's value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// fqdnCacheMinTTL is set hence we expect no error at least till the period equivalent to fqdnCacheMinTTL's value. | |
// fqdnCacheMinTTL is set hence we expect no error at least until fqdnCacheMinTTL expires. |
@@ -350,7 +353,7 @@ func TestDeleteFQDNRule(t *testing.T) { | |||
func TestLookupIPFallback(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hkiiita this test is failing for your PR. The test itself can probably be improved, but right now it is failing because you are enabling IPv6 unconditionally in newMockFQDNController
, and this test cannot handle IPv6.
My recommendation for this PR would be to revert the change in newMockFQDNController
so that IPv6 stays disabled by default. Then in your new test function (TestParseDNSResponseOnFQDNCacheMinTTL
), which requires IPv6, you manually enabled it like this:
f, _ := newMockFQDNController(t, controller, nil, fakeClock, tc.fqdnCacheMinTTL)
f.ipv6Enabled = true // this test needs IPv6 enabled
I think this is good enough for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, did the changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The integration test seems to be failing , am not sure but it seems to be failing at
gobgp_test.go:108: Getting peers of BGP server1 and verifying them
while verifying peers for BGP Server 1 , i checked the test and though i cant find it but i am wondering if its related to above change in any way ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can ignore the integration test failure here. It is unrelated to your change. The test is known to be flaky and we already have an issue to track it: #6501
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall once Antonin's comments are addressed
@@ -155,6 +155,10 @@ func (o *Options) validate(args []string) error { | |||
return fmt.Errorf("nodeType %s requires feature gate ExternalNode to be enabled", o.config.NodeType) | |||
} | |||
|
|||
if o.config.FQDNCacheMinTTL < 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: not necessarily need to be part of this PR, but I just want to note @salv-orlando's comment in the community meeting: we should consider setting a sensible max value for this field, so that Antrea will not just cache each FQDN ip forever
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking on it (i am inexperienced in a real world scenario for this though) and thought that IPs mapped to FQDN's wont probably update/change at a rate that it might make us ending up with too many datapath rules for each IP ? and maybe can this be left to the cluster admin to determine and set it ? What if the admin wishes to place a large value , for his own use case, which surpasses the 'max' that we set ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know about a max value for FQDNCacheMinTTL
, but we should have a upper bound on the size of the cache (which would also be a upper bound on the number of datapath rules). This is independent of this feature IMO, because I could also have a DNS server returning records with a very high TTL, and it seems that the issue would be the same.
Signed-off-by: Hemant <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for all the work
/test-all |
This PR introduces a
fqdnCacheMinTTL
setting which would help address the problem of applications caching DNS response IPs indefinitely. Cluster administrators should be able to configure this value, ideally setting it to be equal to or greater than the maximum TTL value of the application's DNS cache.This feature is a work towards resolving the issue of indefinite caching of DNS response IPs by certain applications.