-
Notifications
You must be signed in to change notification settings - Fork 376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trigger reconciling NetworkPolicy rules upon CNI Add #222
Conversation
Thanks for your PR. The following commands are available:
|
/test-e2e |
24f9f44
to
f2a6c5e
Compare
/test-e2e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a couple questions. I think it would be good to add to this PR description which assumptions (if any) we make regarding ordering of events. We can add this information later in developer documentation.
for group, podSet := range c.podSetByGroup { | ||
_, exists := podSet[pod] | ||
if exists { | ||
c.onAppliedToGroupUpdate(group) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a stupid question. Are we assuming that by the time the CNI server notifies us the channel and we execute this code, we have already received the created / updated relevant AppliedToGroups from the Antrea NP controller? If yes, it may be worth documenting in the comment for this function (and explaining why that is the case). If no, then I think I would benefit from an explanation, because in my mind the sequence of events is as follows:
- Antrea NP controller is notified by apiserver that a new Pod is being created. The IP address is not known yet, but we update / create AppliedToGroups as needed based on existing Network Policies.
- Agent NP controller is notified by Antrea NP controller, flows cannot be installed yet because IP address is not known.
- CNIAdd is invoked, Pod networking is configured, IP address is determined. CNI server can notify Agent NP controller using the
podUpdates
channel. processPodUpdates
processes the notification from the CNI server. The IP address is now retrievable from the interface store, and we can install NP flows.
So I guess my question is: is this the correct sequence of events, are we assuming that 2) happens before 3) (and if yes why can we make this assumption)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The NetworkPolicy implementation is all asynchronous from Antrea Controller to Antrea Agent with best-effort. Your understanding of the sequence of events is correct, but we don't assume it must happen, just it should be that case with a great possibility.
It's because the way Kubernetes works: kube-scheduler schedules a Pod on a Node by setting its its NodeName
field via API, then kubelet and antrea-controller can receive the Pod update event at almost the same time, and kubelet starts to create containers for the Pod and antrea-controller starts to update the Group info and notify antrea-agent. It's observed that the former is slower than the latter so the PR works in that case. But it's hard to say it's consistent, due to different network condition, antrea-controller workload, kubelet workload, and cluster scale.
This PR is to enhance the best-effort by triggering the realization as early as possible. Even if a NetworkPolicy is created before a Pod, we can't say antrea-controller must receive the Policy from APIServer first as they are two API connections. But this slight difference should make no difference to users as they should reach eventual consistency in a very short time.
I will add comments to explain which situation this processing will help.
@@ -403,6 +406,10 @@ func (s *CNIServer) CmdAdd(ctx context.Context, request *cnipb.CniCmdRequest) ( | |||
klog.Errorf("Failed to configure container %s interface: %v", cniConfig.ContainerId, err) | |||
return s.configInterfaceFailureResponse(err), nil | |||
} | |||
|
|||
// Notify the Pod update event to required components. | |||
s.podUpdates <- v1beta1.PodReference{Name: podName, Namespace: podNamespace} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so as a follow-up to your comment there: #197 (comment)
we are assuming that it takes less time to 1) notify on the channel + reconcile rules + install flows than it takes for 2) kubelet to receive the response from the CNI and create the workload container
If 1) is very fast, wouldn't it be acceptable to just block until we know it has completed before returning from CNI Add? This way we would have a guarantee that known network policies are enforced before the workload container is created. I know that we don't have a notification mechanism to determine when flows have been installed, but that's something we may have in the future. The only issue I see is we need to know the OF port for NP flows, which means we cannot start installing flows until we have added the new port to the vSwitch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even we block there there is no guarantee Controller already computes and pushes all policies applied to the Pod. As most event processing is already async, I prefer to keep the async behaviors here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree @jianjuns. Also, as I explained in #222 (comment), it's not even guaranteed antrea-controller will receive the network policy and handle it first unless we sort events of different resources which makes it more complicated. I think the most important thing is whether it's worth to do it if it makes no difference to users.
@@ -403,6 +406,10 @@ func (s *CNIServer) CmdAdd(ctx context.Context, request *cnipb.CniCmdRequest) ( | |||
klog.Errorf("Failed to configure container %s interface: %v", cniConfig.ContainerId, err) | |||
return s.configInterfaceFailureResponse(err), nil | |||
} | |||
|
|||
// Notify the Pod update event to required components. | |||
s.podUpdates <- v1beta1.PodReference{Name: podName, Namespace: podNamespace} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even we block there there is no guarantee Controller already computes and pushes all policies applied to the Pod. As most event processing is already async, I prefer to keep the async behaviors here too.
@@ -1080,8 +1080,8 @@ func (n *NetworkPolicyController) syncAppliedToGroup(key string) error { | |||
// Retrieve all Pods matching the podSelector. | |||
pods, err = n.podLister.Pods(appliedToGroup.Selector.Namespace).List(selector) | |||
for _, pod := range pods { | |||
if pod.Status.PodIP == "" { | |||
// No need to process Pod when IPAddress is unset. | |||
if pod.Spec.NodeName == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If typically Controller always receives a Pod event without NodeName (and IP), maybe we should optimize to ignore the event in updatePod()?
I do not mean we need to make the change in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense, I would create an issue to track the optimization as there could be more cases: if nodeName is not set, no need to trigger appliedToGroup; if PodIP is not set, no need to trigger addressGroup.
/test-e2e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Added one minor comment and one question.
/test-e2e |
Currently only Pods that have IPs are included in AppliedToGroups. It makes a Pod's NetworkPolicies not enforced until it has IP assigned in kube-apiserver, which always introduces a time slot that the Pod is not isolated by NetworkPolicies created before it. To enforce NetworkPolicies to Pods, antrea-agent doesn't need PodIPs from kube-apiserver but the OFPort and IP from internal InterfaceCache. This patch uses a channel to receive Pod update events from CNIServer and notify NetworkPolicyController to reconcile rules related to the updated Pods.
/test-e2e |
Currently only Pods that have IPs are included in AppliedToGroups. It
makes a Pod's NetworkPolicies not enforced until it has IP assigned in
kube-apiserver, which always introduces a time slot that the Pod is not
isolated by NetworkPolicies created before it.
To enforce NetworkPolicies to Pods, antrea-agent doesn't need PodIPs
from kube-apiserver but the OFPort and IP from internal InterfaceCache.
This patch uses a channel to receive Pod update events from CNIServer
and notify NetworkPolicyController to reconcile rules related to the
updated Pods.
Fixes #197