Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] 安全组和子网ACL优先级问题 #4607

Open
zcq98 opened this issue Oct 15, 2024 · 5 comments
Open

[BUG] 安全组和子网ACL优先级问题 #4607

zcq98 opened this issue Oct 15, 2024 · 5 comments
Labels
bug Something isn't working subnet

Comments

@zcq98
Copy link
Member

zcq98 commented Oct 15, 2024

Kube-OVN Version

v1.13.0

Kubernetes Version

v1.21.5

Operation-system/Kernel Version

"BigCloud Enterprise Linux For Euler 21.10 LTS" 4.19.90-2107.6.0.0192.8.oe1.bclinux.x86_64

Description

1、设置VLAN子网ACL为禁止同子网的所有流量,优先级设置1590
2、在该子网创建2个不同节点的Pod
3、分别给Pod1绑定安全组1,给Pod2绑定安全组2,安全组规则设置入向和出向放通所有流量

安全组下发的对应ACL规则的优先级应该都高于子网ACL的1590,理论上来说Pod1和Pod2在绑定安全组后应该可以实现互通,但实际上在VLAN子网中,他们无法Ping通彼此。

Steps To Reproduce

1、创建provider-network,defaultInterface根据实际情况调整

apiVersion: kubeovn.io/v1
kind: ProviderNetwork
metadata:
  name: bussinessnet
spec:
  defaultInterface: bond1

2、创建VLAN,vlan id根据实际情况调整

apiVersion: kubeovn.io/v1
kind: Vlan
metadata:
  name: bussinessnet-vlan2398
spec:
  id: 2398
  provider: bussinessnet

3、创建子网,子网网段、网关及ACL配置根据实际情况调整

apiVersion: kubeovn.io/v1
kind: Subnet
metadata:
  name: subnet2398
spec:
  acls:
  - action: drop
    direction: from-lport
    match: ip4.src==0.0.0.0/0 && ip4.dst==2.2.99.0/24
    priority: 1599
  cidrBlock: 2.2.99.0/24
  default: false
  disableGatewayCheck: true
  enableDHCP: true
  enableLb: false
  excludeIps:
  - 2.2.99.254
  gateway: 2.2.99.254
  gatewayNode: ""
  gatewayType: distributed
  natOutgoing: false
  private: false
  protocol: IPv4
  provider: ovn
  vlan: bussinessnet-vlan2398
  vpc: ovn-cluster

4、创建安全组

apiVersion: kubeovn.io/v1
kind: SecurityGroup
metadata:
  name: sg1
spec:
  allowSameGroupTraffic: true
  egressRules:
  - ipVersion: ipv4
    policy: allow
    priority: 10
    protocol: all
    remoteAddress: 0.0.0.0/0
    remoteType: address
  ingressRules:
  - ipVersion: ipv4
    policy: allow
    priority: 10
    protocol: all
    remoteAddress: 0.0.0.0/0
    remoteType: address
---

apiVersion: kubeovn.io/v1
kind: SecurityGroup
metadata:
  name: sg2
spec:
  allowSameGroupTraffic: true
  egressRules:
  - ipVersion: ipv4
    policy: allow
    priority: 10
    protocol: all
    remoteAddress: 0.0.0.0/0
    remoteType: address
  ingressRules:
  - ipVersion: ipv4
    policy: allow
    priority: 10
    protocol: all
    remoteAddress: 0.0.0.0/0
    remoteType: address

5、创建Pod

apiVersion: apps/v1
kind: DaemonSet
metadata:
  labels:
    app: netshoot
  name: netshoot
  namespace: default
spec:
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: netshoot
  template:
    metadata:
      annotations:
        ovn.kubernetes.io/logical_switch: subnet2398
      creationTimestamp: null
      labels:
        app: netshoot
    spec:
      containers:
      - command:
        - sleep
        - infinity
        image: nicolaka/netshoot
        imagePullPolicy: IfNotPresent
        name: netshoot
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
  updateStrategy:
    rollingUpdate:
      maxSurge: 0
      maxUnavailable: 1
    type: RollingUpdate

分别给创建的Pod一个绑定sg1(ovn.kubernetes.io/security_groups: sg1) 一个绑定sg2(ovn.kubernetes.io/security_groups: sg2)

Current Behavior

image
1、这两个Pod分别绑定了安全组sg1和sg2
image
2、从ping的结果来看无法ping通
image
3、但是从trace来看,根据ovn的流表会将数据发送到目标pod,但是我在目标pod内并没有抓到icmp的request报文
image
4、在目标pod所在的节点上可以抓到request包,说明ping发起方pod所在节点将包从物理机发出去了,我进一步抓包后发现数据包到了bond1,但是没有从bond1所在的ovs网桥(br-bussinessnet)将流量通过patch-localnet.subnet2398-to-br-int发送给br-int

似乎出现了ovn的逻辑流量认为数据是通的,但是实际ovs转发的时候并没有通

另外这种情况只出现在underlay网络的跨节点通信,我测试了1.12及master分支,对于overlay网络是没有这个问题的,如果是underlay网络,pod在同一个节点上也不会有这个问题(因为只在br-int上转发?)。唯一有异常的就是underlay子网且pod跨节点,流量需要经过br-bussinessnet(provider-network创建的ovs网桥)

Expected Behavior

跨节点VLAN子网Pod间通信,安全组的优先级大于子网ACL(priority=1590)的时候,Pod间应该能互通

@zcq98 zcq98 added the bug Something isn't working label Oct 15, 2024
@dosubot dosubot bot added the subnet label Oct 15, 2024
Copy link

dosubot bot commented Oct 15, 2024

I found related issues that might be helpful:

To continue talking to Dosu, mention @dosu.

@bobz965
Copy link
Collaborator

bobz965 commented Oct 15, 2024

image

在添加 from-lport dest 到同一子网cidr drop acl 之后,在源pod 的物理节点是可以看到包发出来的,在目标pod的物理节点是可以看到 包到了物理节点的物理网卡,但是包被上图中的table16的drop规则丢掉了。

添加了 acl 会多两条 table16的规则, 但是仅匹配到了 drop 的规则,而没有匹配那条 ct 的规则。

同样的场景 vpc 确实是通的。

@bobz965
Copy link
Collaborator

bobz965 commented Oct 15, 2024

@oilbeater 大佬,这个看着是要提到 ovn 社区解决么?

@oilbeater
Copy link
Collaborator

allow 改成 allow-related 试试

@zcq98
Copy link
Member Author

zcq98 commented Oct 15, 2024

目前将安全组相应的ACL的action改成allow 改成 allow-related,也是同样的问题;kubectl ko trace显示是通的但实际不通

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working subnet
Projects
None yet
Development

No branches or pull requests

3 participants