-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dcnm_vrf fix issues #351, #356, #357 #354
base: develop
Are you sure you want to change the base?
Conversation
The fix entails a modification to wait_for_vrf_del_ready() In both the legitimate case (user trying to delete a VRF after having removed all network attachments) `lanAttachState` very briefly transitions to DEPLOY before transitioning to its final state of NA. However, in this case, `isLanAttached` (in the same data structure) is False. Whereas in the illegitimate case (user hasn't removed network attachments) `isLanAttached` is True. Hence, we can leverage `isLanAttached` to differentiate between legitimate and illegitimate cases. Adding another conditional that checks if `lanAttachState` == DEPLOY AND `isLanAttached` == True. If this is the case, then the user is trying to delete a VRF that still contains network attachments and we now fail immediately with an appropriate error message. Other changes: 1. Add standard python logging 2. Use `ControllerVersion()` to retrieve the NDFC version and remove import for `dcnm_version_supported` 3. Use `FabricDetails()` to retrieve fabric type. 4. Modify `update_attach_params()` to improve readability by first populating the neighbor dictionary before appending it. This way, we avoid a lot of unsightly accesses to element 0 of the list. For example: ```python if a_l["peer_vrf"]: vrflite_con["VRF_LITE_CONN"][0]["PEER_VRF_NAME"] = a_l["peer_vrf"] else: vrflite_con["VRF_LITE_CONN"][0]["PEER_VRF_NAME"] = "" ``` Becomes: ```python if a_l["peer_vrf"]: nbr_dict["PEER_VRF_NAME"] = a_l["peer_vrf"] else: nbr_dict["PEER_VRF_NAME"] = "" ``` 5. diff_for_attach_deploy() - Reduce indentation by reversing logic of conditional. The following: ```python if wlite["IF_NAME"] == hlite["IF_NAME"]: # Lots of indented code ... ``` Becomes: ```python if wlite["IF_NAME"] != hlite["IF_NAME"]: continue # unindent the above code ``` 6. get_have() - Reduce indentation levels by reversing logic (similar to #5 above) 7. Add method want_and_have_vrf_template_configs_differ(), see next item. 8. diff_for_create() - Leverage want_and_have_vrf_template_configs_differ() to simplify. 9. Add method to_bool(), see next item 10. diff_for_attach_deploy() - Simplify/shorten by leveraging to_bool() 11. In multiple places, ensure that a key exists before accessing it or deleting it. 12. Run though black 13. Several minor formatting changes for improved readability.
The initial implementation would return True for e.g. "false" since bool(non-null-string) is always True. 1. Modify to explicitly compare against known boolean-like strings i.e. "false", "False", "true", and "True". 2. Add the caller to the error message for better debugging ability in the future.
* Fix for issue 347 Manually tested this to verify. Still need to update integration and unit tests. * dcnm_image_policy: Update integration test Update integration test for overridden state. 1. playbooks/roles/dcnm_image_policy/dcnm_tests.yaml - Add vars - install_package_1 - uninstall_package_1 2. test/integration/targets/dcnm_image_policy/tests/dcnm_image_policy_overridden.yaml - Add packages.install and packages.uninstall configuration - Verify that merged state adds these packages to the controller config - Verify that overridden state removes packages.install and packages.uninstall - Verify that overridden state metadata.action is "replace" instead of "update"
Two bits of vulnerable code found when porting to ndfc-python. 1. plugins/modules/dcnm_fabric.py Accessing dictionary key directly can lead to a KeyError exception. 2. plugins/module_utils/fabric/replaced.py If user omits the DEPLOY parameter from their playbook (ndfc-python) the DEPLOY key would be None, and not get popped from the payload. This would cause NDFC to complain about an invalid key in the payload. We need to unconditionally pop DEPLOY here, if it's present. Hence, we've removed the value check (if DEPLOY is not None).
1. Removed all instances where values were cast to bool. These potentially could result in bad results e.g. bool("false") returns True. 2. Renamed and fixed want_and_have_vrf_template_configs_differ(). Renamed to dict_values_differ() Fix was to add a skip_keys parameter so that we can skip vrfVlanId in one of the elif()s 3. Added some debugging statements.
1. find_dict_in_list_by_key_value() new method to generalize and consolidate duplicate code. 2. Remove a few cases of single-use vars. 3. Run though black
I opened an issue to track what this comment describes, so can remove the comment from the module. #352
1. Replace several bits that can be replaced with a call to get_vrf_lite_objects(). 2. Fix a few pylint f-string complaints. There are many more of these, which we'll address in the next commit. One of these required a change to an associated unit test.
1. Appease pylint f-string complaints 2. optimize a couple conditionals 3. Change an "== True" to the preferred "is True" 4. Add a few TODO comments
Unit tests pass locally if Ithe tests in the following file are disabled: ~/test/unit/module_utils/common/test_log_v2.py. Temporarily disabling these to see if the same is seen when running the unit tests on Github. If the same is seen, will debug why this is happening.
Fix bare-except and dangerous-default-value errors.
test_dcnm_vrf.py: Removed two (out of four) contiguous blank lines.
python 3.9 doesn't like: def find_dict_in_list_by_key_value( ... ) -> dict | None: Removed the type hint: def find_dict_in_list_by_key_value( ... ):
If we fail_json(), or even if we sys.exit() in main() logging setup, the unit tests fail. The failure is a KeyError in logging.config.dictConfig when disabling logging in log_v2.py: def disable_logging(self): logger = logging.getLogger() for handler in logger.handlers.copy(): try: logger.removeHandler(handler) except ValueError: # if handler already removed pass logger.addHandler(logging.NullHandler()) logger.propagate = False Above, the KeyError happens here logger.removeHandler(handler) The value of handler when this happens is "standard" I'm not sure why this happens ONLY when the log_v2.py unit tests are run prior to the dcnm_vrf.py unit tests (running these tests separately works). For now, a "fix" is to pass in the except portion of the try/except block in dcnm_vrf.py main(). def main(): try: log = Log() log.commit() except (TypeError, ValueError) as error: pass Will investigate further, but the above works, and logging is enabled with no issue in normal use. Am renaming __DISABLE_test_log_v2.py back to test_log_v2.py
Remove unused import (sys, added to test fixes for the unit test failures). Remove extra lines.
Modify another OR-conditional to use the preferred: if X "in" (X, Y, Z):
Use generic names for the two dicts.
NOTE: All three of these tests are working with the current set of changes up to this point. 1. Add REQUIRED VARS section 2. Update task titles for consistency 3. Add a wait_for task to the SETUP section in case any VRFs exist with vrf-lite extensions prior to running the tests. This wait_for will be skipped if the preceeding "Delete all VRFs" task result.changed value is false. 4. Standardize var names - switch_1 - border switch - switch_2 - border switch - switch_3 - non-border switch - interface_1
1. Add example dynamic inventory playbooks/roles/dynamic_inventory.py 2. Update dcnm_vrf/dcnm_tests.yaml with notes for dynamic_inventory.py 3. Add dcnm_vrf/dcnm_hosts.py
Try to avoid sanity error (unexpected shebang) by moving dynamic_inventory.py out of playbooks/roles.
According to the following link, '#!/usr/bin/env python' should be OK. https://docs.ansible.com/ansible/latest/dev_guide/testing/sanity/shebang.html Let's try...
Fix pep8 E265: block comment should start with '# '
1. Standardize integration test var names fabric_1 switch_1 switch_2 switch_3 interface_1 interface_2 interface_3 2. All tests - SETUP. Add task to print all vars - Standardize task titles to include ansible state 3 overridden.yaml - Add a workaround for issue seen with ND 3.2.1e In step TEST.6, NDFC issues an NX-OS CLI that immediately switches from from configure profile mode, to configure terminal; vrf context <vrf>. This command results in FAILURE (switch accounting log). Adding a wait_for will not help since this all happens within step TEST.6. A config-deploy resolves the OUT-OF-SYNC VRF status. - Add REQUIRED VARS section 4. query.yaml - Update REQUIRED VARS section 5. merged.yaml - Add missing wait_for after VRF deletion - Update REQUIRED VARS section - Renumber tests to group sub-tests 6. deleted.yaml - Update REQUIRED VARS section 7. dynamic_inventory.py - Add conditional blocks to set vars based on role
Found in IT (replaced.yaml)
1. All tests - Added wait_for to the CLEANUP section for all tests, in case we run them immediately after each other. - rename "result" to include the test number i.e. there is no longer a global result that is overwritten in each test. Rather, each test has its own result. This is a bit more maintenance perhaps, but it reduces the probability that one test asserts on an earlier test's result. 2. replaced.yaml - Added REQUIRED VARS section - Use standardized var names - SETUP: print vars - Add standardized task names to all tasks - Renumbered tests to group them. - Print all results just prior to their associated asserts 3. query.yaml - Update some task titles and fix some test numbers 4. merged.yaml - Print all results just prior to their associated asserts - Fix a few test numbering issues 5. deleted.yaml - Use standardize task titles
playbooks/files/dynamic_inventory.py
Outdated
""" | ||
# Summary | ||
|
||
Dynamic inventory for DCNM Collection integration tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the idea to use this dynamic_inventory file (eventually) for all of the integration tests? I suspect we will need to use a common naming convention across all of our tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind the question about how to run it... I see the instructions in the diffs below.
playbooks/files/dynamic_inventory.py
Outdated
|
||
# Usage | ||
|
||
See README.md in the top-level of this repository and define the environment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you adding more information to our README.md? I don't see any updates to it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. This is a holdover from when this was a separate repository. I'll modify to remove this reference.
EDIT: Added detailed usage to the Usage section of dynamic_inventory.py
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
from __future__ import absolute_import, division, print_function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this!
gather_facts: no | ||
connection: ansible.netcommon.httpapi | ||
# Uncomment and modify if not using dynamic_inventory.py | ||
# vars: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the full set of vars that are required by the vrf integration tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I've updated it to include all the vars.
1. Include all vars used in the dcnm_vrf integration tests. 2. Update the path to dynamic_inventory.py
Address mwiebe comments by including more detailed usage and examples. Add fabric_2, fabric_3 in case any tests require more than one fabric.
1. The current interface var names did not incorporate a way to encode switch ownership. Modified the var naming to allow for specifying multiple interfaces per switch in such a way that the switch ownership of an interface is evident. This is documented in: playbooks/files/dynamic_inventory.py 2. Modified all dcnm_vrf test cases to align with this convention. - Updated test case header comments with the new usage - Updated all test case interface vars - Ran the following tests - deleted.yaml - overridden.yaml - replaced.yaml - query.yaml - sanity.yaml 3. dynamic_interface.py In addition to the changes above: - Fixed the documentation for environment variable ND_ROLE (previously it was misnamed NDFC_ROLE in the documentation, but was correct -- ND_ROLE -- in the actual usage). - Fix Markdown heading levels
1. Use standardized task titles 2. Print results prior to each assert
1. dcnm_vrf: use switch_1, switch_2, switch_3 directly 2. Add scale role to the 'if nd_role' conditional
1. Fix case where previous commit in this PR broke undeploy. 2. Fix for issue #356 2. Update unit tests to align with changes in this commit 3. Some simplifications, including - Add a method send_to_controller() to aggregate POST, PUT, DELETE verb handling. This method calls dcnm_send() and then calls the response handler, etc. This removes duplicated code throughout the module. - Refactor vrf_lite handlng out of update_attach_params() and into new method update_attach_params_extension_values() - Never noticed this, but it appears we don't have to use inspect() with the new logging system, except in cases where fail_json() is called. Removed inspect() from all methods that do not call fail_json() - New method is_border_switch() to remove this code from push_diff_attach() and for future consolidation into a shared library. - Move dcnm_vrf_paths dictionary out of the class. These endpoints will later be moved to common/api/ep/. - in __init__(), add self.sn_ip, built from self.ip_sn. There were several case where the module wanted a serial_number given an ip_address. Added two methods that leverage self.sn_ip and self.ip_sn: - self.serial_number_to_ip() - self.ip_to_serial_number() Replaced all instances where duplicated code was performing these functions.
1. Potential fix for issue #357 If any interface in the playbook task's vrf_lite configuration does not match an interface on the switch that had extensionValues, call fail_json(). - Refactor vrf_lite processing out of push_diff_attach() and into: - update_vrf_attach_vrf_lite_extensions() - In update_vrf_attach_vrf_lite_extensions() verify that all interfaces in the playbook's vrf_lite section match an interface on the switch that has extensionValues. If this check fails, call fail_json() 2. Rename get_extension_values_from_lite_object() to get_extension_values_from_lite_objects() and be explicit that the method takes a list of objects and returns a list of objects, or an empty list. 3. Add some debug statements 4. Rename vrf to vrf_name in push_to_remote()
1. Update task titles to group tests. 2. Print results before each assert stanza. 3. Increase pause after VRF deletion from 40 to 60 seconds.
Update the comment for test 3b to indicate that the workaround is needed only when Overlay Mode is set to "config-profile" (which is the default for new fabrics). The issue does not happen when Overlay Mode is set to "cli".
Summary
This PR includes:
Status
Fix for issue #351
This involves a change to
wait_for_vrf_del_ready()
where we call fail_json() if bothlanAttachedState
== "DEPLOYED" andisLanAttached
is True.fix for issue #356
Added two methods:
release_orphaned_resources()
release_resources_by_id()
release_orphaned_resources()
creates a list of resource IDs which are "orphaned". In this context, "orphaned" means that the resource'sallocatedFlag
is False and theresourcePool.vrfName
field is null. It then checks that the resource'sentityName
matches the current VRF name, and that theressourcePool.fabricName
matches the current fabric name. If all of this is True, then it adds the resource'sid
to a list of IDs to be deleted and passes that list torelease_resources_by_id()
which releases the resource IDs in the list.In
push_to_remote()
, a call torelease_orphaned_resources()
is then made for every VRF that was previously deleted inpush_diff_delete()
.fix for issue #357
Previously, fail_json() was called only if the switch did not contain any interfaces with
extensionValues
. If an interface in the playbook contains a vrf_lite config, and the interface in this vrf_lite config does not match an interface on the switch withextensionValues
, but at least one interface on the switch DOES haveextensionValues
, the generated payload for the interface in the vrf_lite config will be incomplete, which generates the 500 error.The fix is to call fail_json() if any interface in the playbook's vrf_lite config does not match an interface with
extensionValues
on the switch. This new check causes several unit-tests to fail, since the unit-tests meet all the conditions for fail_json() to be called with this new check in place. These unit-tests have been modiified so that the interface in the vrf_lite config will match one of the interfaces in the test case fixture that containsextensionValues
.Improve code readability
Refactored a few bits of code that were duplicated in multiple places into the following methods
compare_properties()
find_dict_in_list_by_key_value()
to_bool()
dict_values_differ()
get_vrf_objects()
get_vrf_lite_objects()
get_items_to_detach()
Reversed the logic of several conditionals to reduce indentation.
Examples can be found in:
get_have()
push_to_remote()
Populate dicts before appending them to list
Previously, dicts were appended to a list (within another dict) and then populated. This resulted in code that was harder to read due to the multi-level accesses made when setting each key/value in the embedded dict. This was changed in a couple places so that we first populate the dict, and then append the dict to the list afterwards.
Examples in:
get_have()
update_attach_params()
Use
in
to simplifyor
conditionalsFor example, changed:
To:
Rename vars whose names were hard to understand
In some cases, the name was hard to understand, and didn't add any value. In these cases, used a generic name like
item
instead:a_l
renamed toitem
In other cases, the name was ambiguous:
dep_vrf
seemed to me to mean "dependent vrf", but it actually means "deploy vrf", so renamed it todeploy_vrf
. Several other cases along these lines.Harden against KeyErrors and misinterpreted booleans
Removed all bool() casts
In multiple places
bool(thing)
was used to try to castthing
into a boolean. This can (potentially) result in misinterpretation ofthing
. For example:bool("false") returns True.
Removed all cases of
bool()
casts since debugging these showed that, in all of them, the thing being casted was already a boolean. Hence, in these cases, there was no misinterpretation, but casting to bool should be considered bad practice in most circumstances.Harden against
KeyError
Several places were accessing dict keys directly without using
dict.get()
. Changed those that I noticed.Add logging
Standard python logging was added (similarly to e.g. dcnm_fabric, etc).
inspect
is used to add the method name to all logged messages and to messages passed to fail_json().One issue that adding logging raised is that unit tests failed until the following was changed in main():
From:
To:
The issue seems to be that the dcnm_vrf unit tests are run like this:
And somehow (which I don't understand) this is causing the following error in logging.config.DictConfigurator (part of the standard Python logging system), but ONLY when unit tests for log_v2.py are run in the same session as unit tests for dcnm_vrf.py, i.e.:
The following results in the error:
pytest -k "test_dcnm_vrf or test_log"
Whereas the following does not:
pytest -k test_log
pytest -k test_dcnm_vrf
Which is called from common/log_v2.py, in
enable_logging()
here:So, maybe something having to do with the use of the temporary path
/private/var/...
, or permissions on this path when both tests are run, or the path's length?Will investigate further, but the "fix" (ugly as it is) above works for both the unit tests and for normal usage.