Replace telnetlib with scrapli #279

kaelemc · 2024-11-07T18:54:46Z

This PR replaces the usage of telnetlib in vrnetlab with scrapli.

Closes vendor telnetlib #228 and swap wait_write with scrapli #164

Changes

Telnetlib has been removed and replaced and now the Scrapli Base Driver class is used for base telnet functionality.
- Previously telnetlib was used to connect to the qemu monitor, I didn't see any use for this anywhere so I removed this functionality as well
- All nodes have had old telnetlib functions swapped with new scrapli-based ones.
- All Dockerfiles except OpenWRT (and maybe one or two more i'm forgetting) should be on debian:bookworm-slim from the Amazon ECR.
- Dockerfiles have had git and pip added, as well as installation command of Scrapli via the git repo.
Cisco and Juniper nodes are in their own vendor-specific subdirectory.
- Defined a quick and dirty VENDOR_SUBDIR variable in the Makefiles of each node as well as logic in makefile.include so that files get copied one level further (due to vendor subdirectory). If VENDOR_SUBDIR is 1 then files are copied one subdirectory level further to account for the vendor subdirectory.
Logger for VR class (inherited by all nodes) now uses a logger under the vrnetlab instance. This is because Scrapli uses the root logging instance, and outputs all channel output as 'debug' log level.
- The root logger is now set to only write 'info' level logs.
- self.logger will still write 'debug' logs to stdout (docker logs).
Coloured the logging levels so it's easier to see, looks nicer visually too.

Functions

read_until()

telnetlib had a read_until() function which was used by wait_write. This has been re-implemented under the VR class.

Args:

match_str: a string of the text to wait for
timeout: a float of how long to wait before exiting the function even if no match is found. Defaults to None.

Returns: All console output that was read up until the match, or function timeout.

wait_write()

Adapted to scrapli, functionality should be analogous to the telnetlib version.

Added argument timeout (float): This has been added, and is passed to read_until, Defaults to None.

expect()

telnetlib had the expect() function which was used in the launch.py entrypoint. This new adapted version in the VR class should function the same as the telnetlib one.

Args:

regex_list: a list of byte-strings which are used to match via regex on the console output.
timeout: a float of how long before the function should just timeout and stop waiting. Defaults to None.

Returns: a tuple of:

List index of the matched byte-string. Defaults to -1 if there was no match. (int)
The regex match object. (re.match)
The console output up until that point (byte-string)

print()

The VR class now has a print function which is used to simply write to stdout and instantly flush. It's used so that the telnet console can output nicely on docker logs.

If the console output was printed via the logger, the formatting would make it difficult to interpret the output.

Args:

bytes: byte-string of what to write to stdout.

Usage

Relevant changes to make a node work:

Ensure git, pip and scrapli are installed in the Dockerfile
Replace self.tn.expect() with self.expect()
Use self.print() to print any console output, do not use logging for this. There is no need to decode()/convert the returned byte-string of expect() as self.print() doesn't accept strings, this would result in malformed outputs.
Logging levels for any log output should be changed to be more appropriate.
INFO is Green
WARNING is Yellow
ERROR/DEBUG is Red

Other

Added an XRv_qcow folder to build XRv images out of qcow2 images, existing XRv builds out of vmdks
XRv9k, CSR, cat8kv and cat9kv nodes use their relevant Scrapli driver.
XRv didn't play nicely with the XR driver, I assume due to poor performance and old code version (XRv is deprecated and only runs 6.x code, XRv9k is 7.x)
ASAv was not working correctly, had to make fixes for that.

I don't expect other nodes to 'plug and play' and work perfectly from the get go, I'm sure I have made errors or overlooked something so some work is required on the other nodes to ensure functionality.

For other nodes I think the Scrapli drivers can probably increase reliability and make it easier to send configs.

Collaboration from others is required so we can confirm all nodes work reliably (as well as to fix some other possible issues in the way i've implemented the changes). I'm open to feedback 😊.

Confirmed working platforms

subdirectory names are in brackets. Follows alphabetical/subfolder order of the directory tree.

- Add python3-pip, git and scrapli install via pip to all node - Dockerfiles. - Replace all nodes telnetlib functions with new adapated scrapli functions. (wait_write, expect, read_until) - Move Cisco and Juniper nodes to their own vendor subdirectories, fix makefiles to ensure functionality works. - Other things, which should be outlined in the PR.

kaelemc · 2024-11-08T09:23:52Z

It might be worth making another scrapli branch and then I can change this PR to merge into that branch (instead of master).

This way other contributors can submit PRs for any changes to other nodes and once all/enough nodes work, that branch can be merged into master?

hellt · 2024-11-08T09:34:25Z

sound idea @kaelemc
I have created the scrapli-dev branch

…nfigurations - Startup and bootstrap configurations use scrapli with IOSXEDriver - Changed variable name from 'csr' to 'cat8kv' in cat8kv install function. - Reverted change in bootstrap_spin so that console output is evaluated against empty byte-string instead of regular string.

kaelemc · 2024-11-09T08:26:03Z

IOS-XE nodes (CSR1000V, Cat8kv and Cat9kv) the Scrapli XE driver is now being used. Had some issues with it working with IOS-XE 16.x but it was an error on my end.. got the configs (both bootstrap and startup) to work nicely.

- existing XRv node uses vmdk image only. This one will use the qcow2 image.

kaelemc · 2024-11-09T10:28:09Z

Maybe this belongs as it's own PR but I've added an xrv_qemu directory. It is the same as XRv but with some modifications to make XRv work when the user provides the qcow2 image.

The existing xrv directory requires vmdk images, most users who import images from CML or other places on the web most likely will get a qcow2, this makes it easier for users so they don't have to mess around with qemu-img. (even the vmdk I have has been converted from a qcow via qemu-img)

I tried one more time to use the scrapli IOSXRDriver with XRv but it still seems unreliable, and I get some weird behavior from the XRv (don't think this is scrapli's fault).

- Alter the log levels for some logs from debug->error - Move the VM startup log message and add information about qemu smp and startup RAM.

jbemmel · 2024-11-09T13:02:05Z

Not a trivial change, but I would suggest to create 1 base Dockerfile for all of vrnetlab, and then derive all platform images from that

It doesn't make sense to do 100x "install scrapli" in all those separate files - that becomes unmanageable

hellt · 2024-11-09T13:10:38Z

yeah we can definitely create the base image hosted on ghcr.io with base pacakages that every vr system relies on

kaelemc · 2024-11-09T13:12:34Z

@jbemmel Excellent idea, certainly is/was unmanageable for me

- This means for all nodes, environment variables are displayed in the logs.

…-XE 16.x compatibility)

- Useful for lower-spec systems where some operations may take time.

kaelemc · 2024-11-10T05:17:15Z

So this is a difficult one to balance, in labs of different sizes the nodes take varying times to boot (depending on system specs etc.).. I'm wondering if we should bump all the scrapli timeouts to something very large to make the connection timing out a non-issue?

Currently I have XE devices on 10 minutes, I have a feeling for people with lower-spec systems might hit time-outs meaning they could never boot the device. Should we increase to something very large just to be safe?

The base Scrapli Driver (underlying telnet connection for wait_write, expect and read_until functions) uses a timeout of 1hr.

hellt · 2024-11-10T09:02:10Z

I agree, let's have a long enough timeout so that you don't have to guess the time down to minutes.

We should leave the timeout configurable via the env var setting that can be then put in the topology file if needed to tune it up

kaelemc · 2024-11-10T09:43:25Z

Ok 👍, having timeouts as an env var is a good idea

- New env var SCRAPLI_TIMEOUT added, defaults to 3600 seconds (1 hour). - It's used to enable the user to modify the operation, transport and socket timeout for the Scrapli driver used to apply the config to the CSR.

kaelemc · 2024-11-13T17:04:35Z

I had to remove NETCONF configuration on CSR as different versions of IOS-XE 16.x and 17.x have different behaviors when this command is entered in the config.

In 16.x a prompt will appear asking for some confirmation about the NETCONF configuration. I figured it's best to just remove this instead of adding extra complexity to handle the prompt for that single command.

Thoughts?

hellt · 2024-11-20T09:04:58Z

@kaelemc added genisoimage and reuploaded 0.0.1 image

kaelemc · 2024-11-21T01:08:47Z

@hellt Thanks, I guess one more thing is if you could just change Scrapli to install from the latest changes via:

pip install git+https://github.com/carlmontanari/scrapli --break-system-packages

Otherwise everything is broken and won't work.. (seems to be a random EOF sent at some point from either qemu or the node itself and Scrapli will think the connection hasn't been opened correctly)

If/when a new release comes out with the fixes then we can pin the version to that release :)

carlmontanari · 2024-11-21T18:46:19Z

@kaelemc 2024.7.30.post1 just pushed to pypi 🫡

hellt · 2024-11-21T20:13:19Z

thanks Carl
@kaelemc I have reuploaded the base image with the new version

kaelemc · 2024-11-21T23:54:00Z

@carlmontanari @hellt Thanks!

- Use format string in log colour formatting - Print newlines before log messages in wait_write() to improve visual clarity of log messages.

- Hopefully this is more intuitive to users, they see they have a .qcow2 file and can use the xrv_qcow dir?

kaelemc · 2024-11-22T10:08:42Z

@hellt New image (& scrapli release) are all working great, i've moved Cisco nodes over so far and tested all of them. Install time is much better without having to re-install all the packages (even better when everything is cached locally of course 😄).

I have the Dockerfile changes ready for all the other nodes but I want to test as much as I can first to see if any key packages are missing, any incompatibilities etc.

In terms of Dockerfiles, since the MAINTAINER tag is now deprecated, should I migrate those over to labels or remove them entirely, it seems some of these may have been copied over from the original vrnetlab so possibly those users aren't active, or contributing to hellt/vrnetlab? I'm also seeing inconsistencies in how the maintainer labels are handled (LABEL maintainer vs LABEL org.opencontainers.image.authors) and only a few Dockerfiles have these labels anyway.

and in the recent commits i've just pushed you'll see on XRv9k I've switched it over to pull vcpu and ram from the env vars directly instead of via the flags/argparser. I figured if we want vrnetlab to be the 'source of truth' for those things (discussed in #2285.) there's no need to have the clab end to contain logic to pass vcpu/ram via the flags... do you agree with this approach, am I clouding this PR with too much other stuff, what are your thoughts?

I've also implemented it this way with the cat8kv (for some reason there was no vcpu/ram knobs implemented).

hellt · 2024-11-22T10:12:54Z

you can remove the maintainer label/field altogether. It is not up to date anyways.

Yes, correct, the defaults for the cpu/mem should stay within the vrnetlab node definition, and containerlab can override this via QEMU

kaelemc · 2024-11-22T10:21:49Z

you can remove the maintainer label/field altogether. It is not up to date anyways.

👍

Yes, correct, the defaults for the cpu/mem should stay within the vrnetlab node definition, and containerlab can override this via QEMU

Alright, currently I have it set to use env vars called VCPU and RAM as that's what I've generally always used (it's what containerlab is using. It's also in the xrv9k docs in the 'resource requirements' admonition.

Didn't know about the QEMU_SMP/QEMU_MEMORY but I can easily changeover if you think this is the preferred naming?

hellt · 2024-11-22T10:26:12Z

it is not only the preferred naming, but also is generically loaded in the vrnetlab common

The reason there are VCPU/MEM dangling in the code base is pure legacy. We should converged on the global QEMU_* env vars as they are read once by every node, without duplication

kaelemc · 2024-11-22T10:28:41Z

Oh perfect, got it. Will make the changes 👍

- Added default SMP/RAM for all nodes, this can be overriden by the env var(s) `QEMU_SMP` & `QEMU_MEMORY`

aoscx/docker/Dockerfile

kaelemc · 2024-11-23T05:05:02Z

Will convert as many of the nodes to use the Scrapli community drivers as possible.

I've added a checklist in the original PR comment of nodes I think are 'working' for sure.

Unlike XRv9k, regular XRv isn't using the IOS-XR driver because it's very much legacy and doesn't want to play nice.
All XE nodes are using IOSXEDriver and work perfectly
In terms of Cisco, FTDv, ASAv, Nexus devices and vIOS I will try to move to their respective drivers (vIOS runs classic IOS but should work with the XE Driver)

Every node is using the new base vrnetlab image, will have to iron out any missing packages or incompatibilities there.

All Cisco nodes have default SMP/memory and I have tested that they are infact overriden correctly using the respective env vars as pointed out. I had to make a minor change for XRv9k as without the explicit single socket config it did nothing but kernel panic.

This means like other nodes, for XRv9k users can just set the QEMU_SMP to the number of vCPUs they want instead of QEMU_SMP: cores=x,threads=1,sockets=1. But if they want to supply it like this they still can.

common/vrnetlab.py

juniper/vmx/Makefile

mzagozen · 2024-11-15T02:20:17Z

juniper/vmx/docker/launch.py

                        self.tn.write("yes\r".encode())
-            self.logger.trace("Read: %s" % res.decode())
+            self.print(res)
        self.logger.debug("writing to serial console: %s" % cmd)
        self.tn.write("{}\r".format(cmd).encode())



2024-11-15 02:18:02,877: launch DEBUG writing to serial console: root Traceback (most recent call last): File "/launch.py", line 483, in <module> vr.start() File "/vrnetlab.py", line 734, in start vm.work() File "/vrnetlab.py", line 524, in work self.bootstrap_spin() File "/launch.py", line 123, in bootstrap_spin self.wait_write("root", wait=None) File "/launch.py", line 250, in wait_write self.tn.write("{}\r".format(cmd).encode()) ^^^^^^^^^^^^^ AttributeError: 'Driver' object has no attribute 'write'

Oops sorry about that, Still have to migrate to scrapli JunOS driver :)

mzagozen · 2024-11-15T09:54:34Z

cisco/csr/docker/launch.py

-        self.wait_write("restconf")
-        self.wait_write("netconf-yang")


Which version of CSR did you find had a problem with these? I only have csr1000v-universalk9.17.03.04a.qcow2 where it seems to work. You could use the same version condition that was in place for the ip domain name command if the NETCONF command is problematic on older.

Was testing with both XE17 and 16.09.x versions (which were the problematic ones).

mzagozen · 2024-11-15T09:56:38Z

cisco/csr/docker/launch.py

-        if int(self.version.split('.')[0]) >= 16:
-           self.wait_write("ip domain name example.com")
-        else:
-           self.wait_write("ip domain-name example.com")


This should be part of the new config string too, if older CSR are still supported.

👍 trying to get an older version to test this out with.

mzagozen · 2024-11-15T13:35:59Z

cisco/xrv/docker/launch.py

-        # check if we are prompted to overwrite current keys
-        (ridx, match, res) = self.tn.expect(
-            [
-                b"How many bits in the modulus",
-                b"Do you really want to replace them",
-                b"^[^ ]+#",
-            ],
-            10,
-        )
-        if match:  # got a match!
-            if ridx == 0:
-                self.wait_write("2048", None)
-            elif ridx == 1:  # press return to get started, so we press return!
-                self.wait_write("no", None)


If you restart a container it comes up with keys already configured:

RP/0/0/CPU0:vr-xrv#2024-11-15 13:31:22,158: vrnetlab INFO writing to console: 'crypto key generate rsa' 2024-11-15 13:31:22,258: vrnetlab INFO writing to console: '2048' 2024-11-15 13:31:22,258: vrnetlab INFO waiting for '#' on console. crypto key generate rsa Fri Nov 15 13:31:21.519 UTC The name for the keys will be: the_default % You already have keys defined for the_default Do you really want to replace them? [yes/no]: 2048 % Please answer 'yes' or 'no'.

Thanks, could you explain a little further, because I don't think I can see this behaviour?

Thanks for your review, appreciate it a lot.

I'll have to go over each platform and migrate them all to their Scrapli (and Scrapli Community) drivers. If it's ok I'll request another review when that's all good to go 👍

kaelemc · 2024-12-20T21:51:25Z

Replaced by #297

kaelemc changed the base branch from master to scrapli-dev November 8, 2024 09:49

kaelemc added 6 commits November 9, 2024 02:33

Change QEMU connection error log message to 'error' instead of 'debug'

7519894

Delete cidfiles

6d23587

Ignore cidfile

1726fd4

Rename XRv9k class names and use context manager for scrapli

2a21182

Format environment variables nicely

decb036

Add XRv 'qemu' node

ad8fdfc

- existing XRv node uses vmdk image only. This one will use the qcow2 image.

Alter vrnetlab logging.

f5183cc

- Alter the log levels for some logs from debug->error - Move the VM startup log message and add information about qemu smp and startup RAM.

kaelemc marked this pull request as ready for review November 9, 2024 10:39

hellt mentioned this pull request Nov 9, 2024

Dell OS10 improvements #278

Merged

kaelemc added 6 commits November 9, 2024 13:23

Move environment variable printing to vrnetlab.py

a192853

- This means for all nodes, environment variables are displayed in the logs.

Fix missing space after 'ifeq' in VENDOR_SUBDIR check.

f8bff21

Update gitignore to ignore .qcow and .tgz

4826cfd

Add 'yes' statement after netconf-yang command in CSR config (for IOS…

54cddd6

…-XE 16.x compatibility)

Bump scrapli operation timeouts for Cisco devices.

a2219a5

- Useful for lower-spec systems where some operations may take time.

Bump timeout of XE devices to 10 minutes (600 seconds)

5765434

Use env var for IOSXEDriver Scrapli timeout on csr1000v

e1c71a4

- New env var SCRAPLI_TIMEOUT added, defaults to 3600 seconds (1 hour). - It's used to enable the user to modify the operation, transport and socket timeout for the Scrapli driver used to apply the config to the CSR.

update scrapli

15e8224

kaelemc added 5 commits November 22, 2024 09:42

Update Cisco Dockerfiles to use vrnetlab base img

9ef133f

Alter log formatting in vrnetlab.py

5dc58c5

- Use format string in log colour formatting - Print newlines before log messages in wait_write() to improve visual clarity of log messages.

Imrpove XRv9k install process + use env vars for vcpu, ram.

7b9bada

Rename xrv_qemu to xrv_qcow

628fd33

- Hopefully this is more intuitive to users, they see they have a .qcow2 file and can use the xrv_qcow dir?

Add VCPU, RAM and SCRAPLI_TIMEOUT env vars to cat8k

6249321

kaelemc added 3 commits November 22, 2024 11:06

Update SROS Dockerfile to use vrnetlab-base image

d09e24d

Update Cisco nodes to use env vars for smp/ram

4e483ed

- Added default SMP/RAM for all nodes, this can be overriden by the env var(s) `QEMU_SMP` & `QEMU_MEMORY`

Override smp property in XRv9k to set single socket config

89277f7

hellt reviewed Nov 22, 2024

View reviewed changes

aoscx/docker/Dockerfile Outdated Show resolved Hide resolved

hellt and others added 2 commits November 22, 2024 14:09

added scrapli community

9a9bf7e

Update other Dockefiles to use new base image.

ce49fd9

kaelemc marked this pull request as draft November 23, 2024 04:20

Allow custom SMP for XRv9k

e46c174

mzagozen reviewed Nov 27, 2024

View reviewed changes

kaelemc added 2 commits November 29, 2024 11:50

Set "scrapli" logger to info (instead of root logger)

0710b1d

Fix makefile-install.include image naming schema (hellt#283)

b181cd4

kaelemc closed this Dec 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace telnetlib with scrapli #279

Replace telnetlib with scrapli #279

kaelemc commented Nov 7, 2024 •

edited

Loading

kaelemc commented Nov 8, 2024

hellt commented Nov 8, 2024

kaelemc commented Nov 9, 2024 •

edited

Loading

kaelemc commented Nov 9, 2024

jbemmel commented Nov 9, 2024

hellt commented Nov 9, 2024

kaelemc commented Nov 9, 2024

kaelemc commented Nov 10, 2024

hellt commented Nov 10, 2024

kaelemc commented Nov 10, 2024

kaelemc commented Nov 13, 2024

hellt commented Nov 20, 2024

kaelemc commented Nov 21, 2024

carlmontanari commented Nov 21, 2024

hellt commented Nov 21, 2024

kaelemc commented Nov 21, 2024

kaelemc commented Nov 22, 2024 •

edited

Loading

hellt commented Nov 22, 2024

kaelemc commented Nov 22, 2024

hellt commented Nov 22, 2024

kaelemc commented Nov 22, 2024

kaelemc commented Nov 23, 2024 •

edited

Loading

mzagozen Nov 15, 2024

kaelemc Nov 27, 2024

mzagozen Nov 15, 2024

kaelemc Nov 27, 2024

mzagozen Nov 15, 2024

kaelemc Nov 27, 2024

mzagozen Nov 15, 2024

kaelemc Nov 27, 2024

kaelemc commented Dec 20, 2024

Replace telnetlib with scrapli #279

Replace telnetlib with scrapli #279

Conversation

kaelemc commented Nov 7, 2024 • edited Loading

Changes

Functions

Usage

Other

Confirmed working platforms

kaelemc commented Nov 8, 2024

hellt commented Nov 8, 2024

kaelemc commented Nov 9, 2024 • edited Loading

kaelemc commented Nov 9, 2024

jbemmel commented Nov 9, 2024

hellt commented Nov 9, 2024

kaelemc commented Nov 9, 2024

kaelemc commented Nov 10, 2024

hellt commented Nov 10, 2024

kaelemc commented Nov 10, 2024

kaelemc commented Nov 13, 2024

hellt commented Nov 20, 2024

kaelemc commented Nov 21, 2024

carlmontanari commented Nov 21, 2024

hellt commented Nov 21, 2024

kaelemc commented Nov 21, 2024

kaelemc commented Nov 22, 2024 • edited Loading

hellt commented Nov 22, 2024

kaelemc commented Nov 22, 2024

hellt commented Nov 22, 2024

kaelemc commented Nov 22, 2024

kaelemc commented Nov 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kaelemc commented Dec 20, 2024

kaelemc commented Nov 7, 2024 •

edited

Loading

kaelemc commented Nov 9, 2024 •

edited

Loading

kaelemc commented Nov 22, 2024 •

edited

Loading

kaelemc commented Nov 23, 2024 •

edited

Loading