-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[change] NetJSON DeviceMonitoring compliance #2 #62
base: master
Are you sure you want to change the base?
Conversation
@pandafy @nemesisdesign In the last commit (d46b568), I've made the
Todo: We need to define new methods to collect as much monitoring info as possible. |
You can go ahead with this, since it will make it clear for us what methods are to be removed. We can always revert the commit if something went wrong.
instead of being exhaustive, let's start with minimal data. Check what data is being collected in openwrt-openwisp-monitoring and add support for it here. Otherwise we may spend time collecting every possible metric many of which might not be used in OpenWISP. We can always iterate to make it better. |
Thanks @pandafy , I have updated the OpenWRT backend for now (will do AirOS later). Below you can see the comparison between netengine and openwrt-openwisp-monitoring. I've added a few more metrics but it may not be possible to add all. Now I need to fix some tests. output from openwrt-openwisp-monitoring
{
"type": "DeviceMonitoring",
"general": {
"local_time": 1623319682,
"uptime": 178,
"hostname": "OpenWrt"
},
"dhcp_leases": [
{
"mac": "04:0e:3c:ca:55:5f",
"client_id": "01:04:0e:3c:ca:55:5f",
"client_name": "purhan",
"ip": "192.168.2.140",
"expiry": 1623362752
}
],
"interfaces": [
{
"type": "vlan",
"up": false,
"name": "eth0.2"
},
{
"mac": "1c:3b:f3:10:0a:42",
"type": "bridge",
"stp": false,
"up": true,
"txqueuelen": 1000,
"name": "br-lan",
"multicast": true,
"addresses": [
{
"proto": "static",
"address": "192.168.2.1",
"family": "ipv4",
"mask": 24,
"gateway": "192.168.0.1"
},
{
"family": "ipv6",
"mask": 64,
"proto": "static",
"address": "fe80::1e3b:f3ff:fe10:a42"
}
],
"bridge_members": [
"eth0.1"
],
"mtu": 1500
},
{
"mac": "1c:3b:f3:10:0a:42",
"type": "wireless",
"up": true,
"txqueuelen": 1000,
"name": "wlan0",
"wireless": {
"country": "00",
"noise": 0,
"ssid": "TP-Link",
"channel": 2,
"tx_power": 20,
"mode": "station",
"signal": -46,
"frequency": 2417
},
"multicast": true,
"addresses": [
{
"proto": "dhcp",
"address": "192.168.0.100",
"family": "ipv4",
"mask": 24,
"gateway": "192.168.0.1"
},
{
"family": "ipv6",
"mask": 64,
"proto": "static",
"address": "fe80::1e3b:f3ff:fe10:a42"
}
],
"mtu": 1500
},
{
"mac": "1c:3b:f3:10:0a:42",
"type": "other",
"up": true,
"txqueuelen": 1000,
"name": "eth0",
"multicast": true,
"addresses": [
{
"family": "ipv6",
"mask": 64,
"proto": "static",
"address": "fe80::1e3b:f3ff:fe10:a42"
}
],
"mtu": 1500
},
{
"mac": "1c:3b:f3:10:0a:42",
"type": "vlan",
"up": true,
"txqueuelen": 1000,
"name": "eth0.1",
"multicast": true,
"mtu": 1500
}
],
"resources": {
"memory": {
"total": 61452288,
"shared": 278528,
"free": 33243136,
"cached": 7983104,
"available": 26767360,
"buffered": 2465792
},
"cpus": 1,
"disk": [
{
"filesystem": "/dev/root",
"available_bytes": 0,
"mount_point": "/rom",
"used_percent": 100,
"size_bytes": 2621440,
"used_bytes": 2621440
},
{
"filesystem": "/dev/mtdblock5",
"available_bytes": 192512,
"mount_point": "/overlay",
"used_percent": 95,
"size_bytes": 3735552,
"used_bytes": 3543040
}
],
"load": [
0.33,
0.27,
0.11
],
"swap": {
"free": 0,
"total": 0
}
},
"dns_servers": [
"8.8.8.8",
"8.8.4.4"
],
"neighbors": [
{
"mac": "04:0e:3c:ca:55:5f",
"state": "REACHABLE",
"interface": "br-lan",
"ip": "192.168.2.140"
},
{
"mac": "84:d8:1b:62:a3:55",
"state": "REACHABLE",
"interface": "wlan0",
"ip": "192.168.0.1"
},
{
"mac": "04:0e:3c:ca:55:5f",
"state": "STALE",
"interface": "br-lan",
"ip": "fe80::2ed4:5d74:941f:ce1e"
}
]
} output from netengine
{
"type": "DeviceMonitoring",
"general": {
"uptime": 30,
"local_time": 1623319570
},
"resources": {
"cpus": 0,
"memory": {
"total": 60012,
"shared": 88,
"free": 27340,
"cached": 7160
},
"swap": {
"total": 0,
"free": 0
}
},
"interfaces": [
{
"name": "lo",
"statistics": {
"mac": "",
"type": "softwareLoopback",
"up": true,
"rx_bytes": 2437,
"tx_bytes": 2437,
"mtu": 65536
}
},
{
"name": "eth0",
"statistics": {
"mac": "1c:3b:c3:b3:10:0a",
"type": "ethernetCsmacd",
"up": true,
"rx_bytes": 160673,
"tx_bytes": 615795,
"mtu": 1500
}
},
{
"name": "Device 14c3:7662",
"statistics": {
"mac": "",
"type": "ethernetCsmacd",
"up": false,
"rx_bytes": 0,
"tx_bytes": 0,
"mtu": 1500
}
},
{
"name": "br-lan",
"statistics": {
"mac": "1c:3b:c3:b3:10:0a",
"type": "ethernetCsmacd",
"up": true,
"rx_bytes": 147924,
"tx_bytes": 601205,
"mtu": 1500
}
},
{
"name": "eth0.1",
"statistics": {
"mac": "1c:3b:c3:b3:10:0a",
"type": "ethernetCsmacd",
"up": true,
"rx_bytes": 147924,
"tx_bytes": 614999,
"mtu": 1500
}
},
{
"name": "wlan0",
"statistics": {
"mac": "1c:3b:c3:b3:10:0a",
"type": "ethernetCsmacd",
"up": true,
"rx_bytes": 604709,
"tx_bytes": 158937,
"mtu": 1500
}
}
],
"neighbors": [
{
"mac": "04:0e:3c:c3:8a:55",
"state": "REACHABLE",
"interface": "br-lan"
},
{
"mac": "04:0e:3c:c3:8a:55",
"state": "REACHABLE",
"interface": "br-lan"
},
{
"mac": "c2:84:c3:98:1b:62",
"state": "STALE",
"interface": "wlan0"
}
]
} |
701f1df
to
b8c427f
Compare
b8c427f
to
cfa6650
Compare
- Fix failing tests - Pass more realistic values in mocks - Remove redundant code
Pull Request Test Coverage Report for Build 938832440Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
01de297
to
8cd7089
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got following errors while running tests again OpenWrt device. Is there anything that needs to be configured on the device? Is there any extra snmpd cofnigguration needed?
(venv-netengine) pandafy@FRIDAY:~/openwisp/netengine$ DISABLE_MOCKS=1 TEST_SETTINGS_FILE=./test-settings.json nose2 tests.test_snmp.test_openwrt
........EE....E.......E..
======================================================================
ERROR: test_interfaces_to_dict (tests.test_snmp.test_openwrt.TestSNMPOpenWRT)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/pandafy/openwisp/netengine/tests/test_snmp/test_openwrt.py", line 80, in test_interfaces_to_dict
self.assertIsInstance(self.device.interfaces_to_dict, list)
File "/home/pandafy/openwisp/netengine/netengine/backends/snmp/openwrt.py", line 335, in interfaces_to_dict
if_type = self.interfaces_type[i]['type']
File "/home/pandafy/openwisp/netengine/netengine/backends/snmp/openwrt.py", line 281, in interfaces_type
'type': types[self.get_value(types_oid + str(i))],
KeyError: '1'
======================================================================
ERROR: test_interfaces_type (tests.test_snmp.test_openwrt.TestSNMPOpenWRT)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/pandafy/openwisp/netengine/tests/test_snmp/test_openwrt.py", line 71, in test_interfaces_type
self.assertIsInstance(self.device.interfaces_type, list)
File "/home/pandafy/openwisp/netengine/netengine/backends/snmp/openwrt.py", line 281, in interfaces_type
'type': types[self.get_value(types_oid + str(i))],
KeyError: '1'
======================================================================
ERROR: test_netjson_compliance (tests.test_snmp.test_openwrt.TestSNMPOpenWRT)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/pandafy/openwisp/netengine/tests/test_snmp/test_openwrt.py", line 120, in test_netjson_compliance
device_dict = self.device.to_dict()
File "/home/pandafy/openwisp/netengine/netengine/backends/snmp/openwrt.py", line 532, in to_dict
'interfaces': self.interfaces_to_dict,
File "/home/pandafy/openwisp/netengine/netengine/backends/snmp/openwrt.py", line 335, in interfaces_to_dict
if_type = self.interfaces_type[i]['type']
File "/home/pandafy/openwisp/netengine/netengine/backends/snmp/openwrt.py", line 281, in interfaces_type
'type': types[self.get_value(types_oid + str(i))],
KeyError: '1'
======================================================================
ERROR: test_to_dict (tests.test_snmp.test_openwrt.TestSNMPOpenWRT)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/pandafy/openwisp/netengine/tests/test_snmp/test_openwrt.py", line 113, in test_to_dict
device_dict = self.device.to_dict()
File "/home/pandafy/openwisp/netengine/netengine/backends/snmp/openwrt.py", line 532, in to_dict
'interfaces': self.interfaces_to_dict,
File "/home/pandafy/openwisp/netengine/netengine/backends/snmp/openwrt.py", line 335, in interfaces_to_dict
if_type = self.interfaces_type[i]['type']
File "/home/pandafy/openwisp/netengine/netengine/backends/snmp/openwrt.py", line 281, in interfaces_type
'type': types[self.get_value(types_oid + str(i))],
KeyError: '1'
----------------------------------------------------------------------
Ran 25 tests in 23.801s
FAILED (errors=4)
- Swap time with datetime in local_time - Remove unused warning - Use a single call for collecting load information
0b12396
to
da39965
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work Purhan! Please see my comments below.
b5940b2
to
cda0295
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While testing it was found that to_dict
methods are taking too long to fetch the data since it is making multiple queries. To make this method quicker, we can do a single query to device for a complete SNMP Walk and then process the data as needed locally by netengine.
I have been thinking about how we can bring this change and here is one of my idea.
Instead of changing complete implementation we can update the signature of current methods to accept another optional parameter snmpdump
which will be the output from snmpwalk. The methods will locally fetch the required OIDs from snmpdump and then the existing code for further processing can be used.
Here's a pseudocode for example:
def local_time(self, snmpdump=None):
if snmpdump:
epoch = get_local_time_from_snmp_dump
else:
epoch = str(self.get('1.3.6.1.4.1.41112.1.4.8.1.0')[3][0][1])
timestamp = int(datetime.strptime(epoch, '%Y-%m-%d %H:%M:%S').timestamp())
return timestamp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @purhan, with that fix I can now extract the info without errors.
For sure we'll have find a way to get all the info at once to resolve the performance issue when generating the JSON output, in the current form it would not be usable.
In the output of type I see ethernet Csmacd
, but to be compliant with the NetJSON spec it should be just ethernet
.
One thing I noticed is that wifi interfaces are also detected as ethernet and I don't see any wireless info is that the max we can do?
Looking at the snmpd package for OpenWRT, I do see a patch that seems to add support for WiFi information: https://github.com/openwrt/packages/blob/38f01ad2c9738c6c5a5dbda85019bb9cf5bc10f8/net/net-snmp/patches/750-ieee802dot11.patch.
I just asked on the OpenWRT forum if the know anything about it.
c5d932d
to
e6f350d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from snmpwalk:
iso.3.6.1.2.1.1.1.0 = STRING: "Linux Capoano-Test-01 4.14.229 #0 Fri Apr 9 13:43:38 2021 mips"
From this informatio we can find out the device name, which would be nice to have in the output. The device name is what comes after Linux, but before the kernel version.
Then onto the IP addresses of each interface, I see they're available in the snmpwalk output but I can't get this info from the JSON dict:
iso.3.6.1.2.1.4.20.1.1.10.10.0.48 = IpAddress: 10.10.0.48
iso.3.6.1.2.1.4.20.1.1.127.0.0.1 = IpAddress: 127.0.0.1
iso.3.6.1.2.1.4.20.1.1.192.168.1.59 = IpAddress: 192.168.1.59
iso.3.6.1.2.1.4.20.1.1.192.168.254.1 = IpAddress: 192.168.254.1
Can you make sure this info is returned? It's very useful to display this info in the status tab.
e6f350d
to
995b703
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good progress @purhan 👍
Now we need to start digging deeper into OpenWRT/SNMPd in order to add more information which is there but we haven't been looking at and clean up things a bit.
Here's a few issues I see:
- ip address format is not valid NetJSON, please double check the spec and the output returned by the netjson-monitoring scripts
- total memory got via netjson-monitoring scripts:
127467520
, via netengine I get124480
, looks like the value we get from SNMPd is not in bytes but KB? Make sure the size used is the same, multiply if necessary. - given the issue above, please double check the interface traffic counters, size must be consistent, we can't use bytes somewhere and kilobytes elsewhere
- load average is available with
1.3.6.1.4.1.2021.10
, please add this info to the output because it's useful - memory: with oid
1.3.6.1.4.1.2021.4
I can get values that are more similar to those found in /proc/meminfo:
"memory": {
"total": 127467520,
"free": 89944064,
"shared": 135168
"buffered": 4501504,
"cached": 13062144,
"available": 74657792,
},
iso.3.6.1.4.1.2021.4.5.0 = INTEGER: 124480 (memTotalReal, 124480 * 1024 = 127467520, same value retrieved with netjson_monitoring in "total")
iso.3.6.1.4.1.2021.4.6.0 = INTEGER: 87956 (memAvailReal, 87956 * 1024 = 90066944, this is closer to "free" rather than "available")
iso.3.6.1.4.1.2021.4.11.0 = INTEGER: 87956 (memTotalFree, same as before)
iso.3.6.1.4.1.2021.4.12.0 = INTEGER: 16000 (memMinimumSwap)
iso.3.6.1.4.1.2021.4.13.0 = INTEGER: 132 (memShared, 132 * 1024 = 135168, same value in "shared" from netjson_monitoring)
iso.3.6.1.4.1.2021.4.14.0 = INTEGER: 4396 (memBuffer, 439 * 1024 = 4501504, same as "buffered" above)
iso.3.6.1.4.1.2021.4.15.0 = INTEGER: 12756 (memCached, 12756 * 1024 = 13062144, same value as "cached above")
Therefore for memory I think we should use this MIB.
- please use the MIB with oid
1.2.840.10036
to determine which interfaces are of type WiFi - please double check the same also on AirOS
I left many more pointers in our chat, please read those.
CC: @pandafy.
07f3e59
to
5dc3cc6
Compare
15c779d
to
d746f2d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok @purhan it's time to improve the docs a bit more, let's document the main methods like validate
, to_dict
, to_json
, then we should have a page for OpenWrt and a page for AirOS, in each of the two we should have a list which explains the data returned with the to_json method (eg: interfaces, system, neighbours, etc), we should mention the format is inspired by NeJSON DeviceMonitoring.
I think we can point out that the internal methods that are not documented may be changed in future versions, we could also flag all the other methods as private if you think it's a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @purhan, I see good progress here but there's room for improvement, we should put effort in improving this further.
I highly advise to make it explicit that we recommend using only the to_netjson
method to retrieve information and that all the other methods right now are not stable, I ask you to please flag all the other methods as private by prepending an underscore, eg: method_name()
becomes _method_name
. This way we are signaling to others who may want to use this code that we reserve the freedom to change the internal API freely if we need.
We should do this because the library is still in early stages and it's generally recommended to be conservative in which methods are declared public and can be used by other developers, so that only the minimum needed has to be maintained with backward compatibility in mind (which requires more work) and the rest can be freely changed.
I am pretty sure that over the next iterations we'll be changing a lot of internal code so it's better to prepare this right now.
Please dedicate one page to OpenWrt and one page to AirOS.
Please list OpenWrt before AirOS.
Please refer to the netjsonconfig documentation http://netjsonconfig.openwisp.org/en/latest/ for insipiration on how to write documentation which is informative and consistent with the rest of OpenWISP, this documentation is built with ReStructuredText and Python Sphinx just like this one, you can find the source here: https://github.com/openwisp/netjsonconfig/tree/master/docs/source.
I'll be referring to more specific examples below.
the tree gains more depth. | ||
Obviously, by getting the smallest MIB which is "1" or simply " . " one can get all the tree. | ||
|
||
The base SNMP backend contains the following methods (some internal methods are not documented and are subject to change in the future): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can avoid explaining what SNMP is, we assume peopel using this library will know.
We can however include information about how all the data is retrieved and give a few hints on how to use the base backend to create new SNMP backends for other systems (eg: indicate the python path for the base backend and explain anything important regarding it).
|
||
The SNMP backend provides support for 2 firmwares: | ||
* AirOS | ||
* OpenWRT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please list OpenWrt first. Please also link their respective pages
| **agent** | Agent string for the SNMP connection | | ||
+---------------+---------------------------------------------------------------------+ | ||
| **port** | Port for the SNMP connection. Default value is `161` | | ||
+---------------+---------------------------------------------------------------------+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these parameters all mandatory?
Is there any default value for any parameter?
Let's make this explicit.
There's no need to write this table, sphinx has automations we can use to autodocument these things.
Please use it!
Refer to the netjsonconfig documentation for examples:
http://netjsonconfig.openwisp.org/en/latest/backends/openwrt.html#initialization
I mean this:
See the source for more info: https://raw.githubusercontent.com/openwisp/netjsonconfig/master/docs/source/backends/openwrt.rst
The doc text is embedded in a docstring in the method itself.
This is a good practice, the docstring is also shown in some rich shells (bpython should show it, not sure about ipython).
| **to_json** | Calls the `to_dict` method and returns a JSON string of the dict | | ||
+--------------+------------------------------------------------------------------------------------------------------------------------------------------+ | ||
| **validate** | Checks if connection with the device is working and raises `NetengineError` in case something is wrong | | ||
+--------------+------------------------------------------------------------------------------------------------------------------------------------------+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to do the same here and drop the table syntax which is painful to maintain
***** | ||
AirOS | ||
***** | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as in netjsonconfig, I suggest to make this a dedicated page, which should point to the base snmp page for the general concepts.
Some information regarding methods can be duplicated (no problem with this if we use autodoc).
With AirOS, Netengine is able to collect the following information which is returned in the | ||
`NetJSON Devicemonitoring <https://netjson.org/rfc.html#name-devicemonitoring>`_ format: | ||
|
||
+------------+------------------------------------------------------------------------------------------------------------+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please let's switch from table sytnax which is painful to maintain to simple headings and list after headings.
With OpenWRT, Netengine is able to collect the following information which is returned in the | ||
`NetJSON Devicemonitoring <https://netjson.org/rfc.html#name-devicemonitoring>`_ format: | ||
|
||
+------------+----------------------------------------------------------+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please let's switch from table sytnax which is painful to maintain to simple headings and list after headings.
device.next("1.3.6") | ||
|
||
Otherwise, if you want simply a value of the tree just type:: | ||
device.get_value("oid_you_want_to_ask_for") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these methods are not documented above so why are we showing them in the example?
I guess we can document (and keep public) next
and get_value
, but I would avoid documenting uptime_tuple
because it would not be consistent with advising to use to_netjson
and make all other internal methods private.
OpenWRT example | ||
=============== | ||
|
||
The same instructions typed above can be applied to OpenWRT itself, just remember to import the correct firmware by typing:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not a firmware, it's a backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing things we found out in AirOS during our last call:
- device name
- device model (Ubiquiti LiteBeam ...)
- operating system (Linux version (withou date) + firmware version)
- please check what the mac addresses are and if we can display them (if these mac addresses are coming from the interfaces, we should include them)
@nemesisdesign Should this information be returned in the |
Yes. You can add this with 'system_info' key in the returned value. Moreover, it won't take much time if we want to move it around in the returned data structure if needed. |
Closes #2, closes #64, closes #65
Resolves part of #60