Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev port disaggregation #320

Merged
merged 29 commits into from
Jun 23, 2023
Merged

Dev port disaggregation #320

merged 29 commits into from
Jun 23, 2023

Conversation

anjan-keysight
Copy link
Contributor

@anjan-keysight anjan-keysight commented May 25, 2023

Support to address issue #319 (Metrics response disaggregation based on Rx ports).

This is Rx port level disaggregation of metrics response required for multi Rx port device (and raw) traffic. Also addressed is support for specifying Rx/Tx frames ratio.

Example code snippet

config.Ports().Add().SetName("p1").SetLocation("localhost:5555")
config.Ports().Add().SetName("p2").SetLocation("localhost:5555")
config.Ports().Add().SetName("p3").SetLocation("localhost:5555")

For control plane BGP sessions configured on P1, P2 and P3 with device names
p1d1peer1rrv4, p2d1peer1rrv4 and p3d1peer1rrv4 respectively.
Following is the data plane configuration

flow := config.Flows().Add()
flow.Metrics().SetEnable(true).PredefinedMetricTags().SetRxName(true)

We are expecting Tx and Rx packets to match in this example and hence the Tx Rx ratio is set to "one"

flow.Metrics().RxTxRatio().SetChoice("value")
flow.Duration().FixedPackets().SetPackets(1000)

flow.SetName("f1").TxRx().Device().
	SetTxNames([]string{"p1d1peer1rrv4"}).
	SetRxNames([]string{"p2d1peer1rrv4", "p3d1peer1rrv4"})

flowEth := flow.Packet().Add().Ethernet()
flowEth.Src().SetValue("00:00:01:01:01:01")
flowEth.Dst().SetChoice("auto")

flowIp := flow.Packet().Add().Ipv4()
flowIp.Src().SetValue("10.10.10.1")
flowIp.Dst().SetValues([]string{"20.20.20.1", "30.30.30.1"})

client.SetConfig(config)

Query for metrics results

client.GetFlowMetrics(nil, nil)

Flow      Frames Tx      Bytes Tx      Frames Rx     Bytes Rx      Tags
f1        1000           64000         1000          64000
f1                                     500           32000         rx_name: p2
f1                                     500           32000         rx_name: p3

Note: The “rx_name” would be in-built tag name for Rx ports and cannot be used for any egress tracking name.

@parthpower
Copy link

@anjan-keysight

The response format has been captured here.

linked document is not accessible to public

@anjan-keysight
Copy link
Contributor Author

@parthpower I have shared the document.

flow/metrics.yaml Outdated Show resolved Hide resolved
flow/metrics.yaml Outdated Show resolved Hide resolved
@@ -30,6 +37,12 @@ components:
Latency metrics.
$ref: '#/components/schemas/Flow.Latency.Metrics'
x-field-uid: 4
rx_per_port:
description: |-
Enables Rx port level disaggregation with metrics tag name set as "rx_port_name".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... metrics tag name automatically set by implementation as "rx_port_name" in the flow metric_response ( use exact attribute names from metric response )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also , do we need updates in otg gnmi model to specify that response can be filtered for specific port by specifing the filter for rx_port_name: in this exact format . Otherwise, how will gnmi user know what filter to use or expect which has to be parsed to set expectation in tests when checking stats using gnmi ( or otg metrics)

Copy link
Contributor

@PrasenjitAdhikary PrasenjitAdhikary Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For gNMI query path: “/flows/flow[name=f1]/”
expected gNMI response (for the metric response example in PR description):

"updates": [
  {
	"Path": "flows/flow[name=f1]",
	"values": {
	  "flows/flow": {
		"open-traffic-generator-flow:name": "f1",
		"open-traffic-generator-flow:state": {                     // Contains the aggregated per-flow stats
		  "counters": {
			"in-pkts": "1000",
			"in-octets": "8000",
			"out-pkts": "1000",
			"out-octets": "8000"
		  },
		  "in-frame-rate": "QKAAAA==",
		  "name": "f1",
		  "out-frame-rate": "QCAAAA==",
		  "transmit": true
		},
		"open-traffic-generator-flow:tag-metrics": {              // Contains the disaggregated per-flow stats
		  "tag-metric": [
			{
			  "name-value": "rx_name=p2",
			  "state": {
				"counters": {
				  "in-octets": "4000",
				  "in-pkts": "500"
				},
				"name-value": "rx_name=p2",
				"tags": [
				  {
					"tag-name": "rx_name",
					"tag-value": {
					  "value-as-string": "p2",
					  "value-type": "STRING"
					}
				  }
				]
			  }
			},
			{
			  "name-value": "rx_name=p3",
			  "state": {
				"counters": {
				  "in-octets": "4000",
				  "in-pkts": "500"
				},
				"name-value": "rx_name=p3",
				"tags": [
				  {
					"tag-name": "rx_name",
					"tag-value": {
					  "value-as-string": "p3",
					  "value-type": "STRING"
					}
				  }
				]
			  }
			}
		  ]
		}
	  }
	}
  }
]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Description modified to reflect Rx port name location in tagged response.

@anjan-keysight anjan-keysight requested a review from parthpower June 7, 2023 12:14
flow/metrics.yaml Show resolved Hide resolved
flow/metrics.yaml Outdated Show resolved Hide resolved
flow/metrics.yaml Outdated Show resolved Hide resolved
Copy link

@bortok bortok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a more general question. A device's ethernet can have a connection to a port, lag or vxlan tunnel. As a result, a flow that is received by multiple devices can come through all of these types of connections. Shall we disaggregate Rx metrics accordingly, per lag or vxlan when configured as such, instead of port-level disaggregation?

@@ -30,6 +35,13 @@ components:
Latency metrics.
$ref: '#/components/schemas/Flow.Latency.Metrics'
x-field-uid: 4
rx_per_port:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also have rx_per_lag and rx_per_vxlan? Or maybe if we want to support those in future, shall we do rx_per_connection?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we do not have support for drill down stats for lag or vxlan in TE. So we will support ports level disaggregation for now. But the model is very extensible and when we have the underlying support for those, we can extend the model without any breaking change.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you give an example of how the model would be extended? it would be good to see how that would be aligned with rx_per_port parameter. i'm thinking maybe we should use choice here and set it as port first and then extend to lag and vxlan later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have restructured this part, now all additional tracking will come under "predefined_metric_tags" and we can add new properties under this. Current property "rx_name" will apply for both raw port and lag.

@@ -69,3 +81,44 @@ components:
x-field-uid: 1
cut_through:
x-field-uid: 2
Flow.TxRxReplication:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't Replication imply additional copies? Meaning, if we have Replication = 1 it means we expect to receive twice more packets that we sent? Maybe we could use TxRxRatio instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, I have updated it to TxRxRatio.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we change the reference as well from $ref: '#/components/schemas/Flow.TxRxReplication' to $ref: '#/components/schemas/Flow.TxRxRatio'?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

flow/metrics.yaml Outdated Show resolved Hide resolved
flow/metrics.yaml Outdated Show resolved Hide resolved
@anjan-keysight anjan-keysight requested a review from ajbalogh June 8, 2023 13:29
Copy link
Contributor

@apratimmukherjee apratimmukherjee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor typo.

x-field-uid: 3
Flow.RxTxRatio.RxCount:
description: |-
This is for cases where one copy each Tx packet is received on all Rx ports and so the sum total of Rx packets
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one copy of

type: array
items:
description: |-
The unique name of a port that is the intended receive port.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we correct 'port' to 'port or lag'?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not seeing the updated change?

x-field-uid: 2
value:
description: |-
Should be a positive, non-zero value. A custom integer value (>1) can be specified for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need to explain the scenario with the default value (i.e. == 1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added description for default value.

Copy link
Contributor

@PrasenjitAdhikary PrasenjitAdhikary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@jasdeep-hundal jasdeep-hundal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still thinking about GNMI outputs a bit here, but I'm wondering if we should just use the port name directly as the tag ("" vs. "rx_port_name=".) (It should be a unique name in the config, but I'm also not sure if there are any other future 'built-in' ones that could not be specified this way.)

Unrelated clarification question: These tags will be included in disaggregated GNMI stats in the same way/alongside the user-defined ones correct?

properties:
rx_name:
description: |-
Enables Rx port or lag level disaggregation with predefined metrics tag name set as "rx_name".
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need an update from rx_name->rx_port_name?

Copy link
Contributor Author

@anjan-keysight anjan-keysight Jun 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both otg and gnmi tag name is going to be rx_name (rx_port_name was the earlier proposal). The rx_name is generic tag name, referring to both port and lag as rx endpoints.
So if we have a flow configured from lag L1 to lag L2 and a separate port P5, the metrics response will look like:
F1: L1 => L2, P5

Flow    Frames Tx    Bytes Tx    Frames Rx    Bytes Rx    Tags
f1      1000         64000       1000         64000
f1		                 400          25600       rx_name: p5
f1		                 600          38400       rx_name: l2

There can be other built-in tag names in future, like "mem_name" (as an example), if we want to disaggregate based on lag member ports also.
And yes, these tag names will be alongside user defined ones.

format: float
default: 1.0
x-field-uid: 3
Flow.RxTxRatio.RxCount:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my current understanding, I'm wondering if we need this option. I'm not sure it's more convenient/explicit for test writer to use vs. just setting the port count explicitly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User can use RxCount, so even if the rx port count changes in config, user does not have to update it.
But I leave it to you, if you feel otherwise.

type: array
items:
description: |-
The unique name of a port that is the intended receive port.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not seeing the updated change?

@anjan-keysight anjan-keysight merged commit 6c6d5f6 into master Jun 23, 2023
@anjan-keysight anjan-keysight deleted the dev-port-disaggregation branch June 23, 2023 08:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants