Skip to content
This repository has been archived by the owner on Sep 14, 2022. It is now read-only.

Add negative_packet_loss_mode pass option for Mellanox nic as traffic… #172

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

wanghekai
Copy link

… sender

@wanghekai
Copy link
Author

Add negative_packet_loss_mode pass option for Mellanox nic as traffic sender

Copy link
Collaborator

@k-rister k-rister left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The functionality that this patch is attempting to add is something that we have not previously seen much value in which is why we do not have this capability already. In almost every scenario where we use trafficgen these days we use stream statistics to be able to only calculate packet loss on packets that our code generates -- meaning we ignore broadcast traffic and other "noise" that exists on the network fabric when doing our packet loss calculations. The only way that negative packet loss should be observed is the following scenarios:

  1. You use --use-device-stats because you have to and you receive packets that you did not transmit -- so RX-TX equals a negative number. The question is why do you have to use --use-device-stats since this is often a crutch solving a different problem entirely than what is addressed in this patch.
  2. Something is wrong with your network topology/packet forwarding setup such that you receive more packets than you transmit. If this occurs with latency packets (which are uniquely identifiable then you will get a different error (see

    trafficgen/binary-search.py

    Lines 1712 to 1719 in 6cbf236

    if 'latency_duplicate_error' in trial_stats[dev_pair['rx']]:
    if trial_params['duplicate_packet_failure_mode'] == 'quit':
    trial_result = 'abort'
    bs_logger("\t(critical requirement failure, duplicate latency packets detected, device pair: %d -> %d, pg_ids: [%s], trial result: %s)" %
    (dev_pair['tx'],
    dev_pair['rx'],
    trial_stats[dev_pair['rx']]['latency_duplicate_error'],
    trial_result))
    ) which can be avoided using a configuration parameter (see

    trafficgen/binary-search.py

    Lines 212 to 217 in 6cbf236

    parser.add_argument('--duplicate-packet-failure',
    dest='duplicate_packet_failure_mode',
    help='What to do when a duplicate packet failure is encountered',
    default = 'quit',
    choices = [ 'fail', 'quit', 'retry-to-fail', 'retry-to-quit' ]
    )
    ). If you disable latency packets and are still seeing negative packet loss while not using --use-device-stats then you cannot trust any of the packet loss calculations since the bulk packets are not uniquely identifiable -- ie. you cannot tell when you transmit X packets and receive X packets if you received X unique packets or you receive 1 packet X times and dropped X-1 packets. For this reason all of the results that are generated are questionable and you have to ask yourself what you are really determining by running the test.

Ultimately, before relying on the functionality that this patch is trying to add I would take a long, hard look at why you need this because this may just mask an issue that should be resolved in some other way.

Also, if we were going to merge this (assuming the requested changes are addressed), I would prefer that rather than adding pass as an option we do something like adding ignore as the option. Then, if --negative-packet-loss mode is set to ignore you go ahead and log the detection of this result but do not modify the value of trial_result.

@@ -230,8 +230,8 @@ def process_options ():
parser.add_argument('--negative-packet-loss',
dest='negative_packet_loss_mode',
help='What to do when negative packet loss is encountered',
default = 'quit',
choices = [ 'fail', 'quit', 'retry-to-fail', 'retry-to-quit' ]
default = 'pass',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot change the default behavior for this result.

Comment on lines 1889 to +1893
trial_result = trial_params['negative_packet_loss_mode']
bs_logger("\t(trial failed requirement, negative direction packet loss, direction: %s, trial result status: modified, trial result: %s)" %
(direction,
trial_result))
if trial_result != 'pass':
bs_logger("\t(trial failed requirement, negative direction packet loss, direction: %s, trial result status: modified, trial result: %s)" %
(direction,
trial_result))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way this is written it is possible that if other trial criteria would have already resulted in a failing trial result that encountering a negative packet loss "error" would negate that existing failure when --negative-packet-loss=pass was used because it does not factor in the previous state of trial_result.

Comment on lines +1909 to +1910
if trial_result == 'fail':
return trial_result
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding this prevents other code in the evaluate_trial function from running which is not something we want to do. We want all criteria to be evaluated so that the user has as clear a picture as possible to understand the behavior observed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants