Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

got exception with query [SELECT * FROM "memory.buffered.memory" WHERE fqdn = 'mds' ORDER BY time DESC LIMIT 1;] #25

Open
GodGirlwsy opened this issue Oct 14, 2020 · 4 comments

Comments

@GodGirlwsy
Copy link

[2020/10/14-14:58:38] [INFO] [filelock.py:247] Lock 140164392906576 acquired on /etc/esmon_install.conf.lock
[2020/10/14-14:58:39] [WARNING] [ssh_host.py:232] lsb_release is needed on host [mds] for accurate distro identification
[2020/10/14-14:58:40] [INFO] [esmon_install_nodeps.py:1514] can't deterimine Lustre version according to RPM names on host [mds], possible versions are [es2 es3 es4 2.7 2.12], using default [es3]
[2020/10/14-14:58:40] [INFO] [esmon_install_nodeps.py:2470] ESMON server won't be reinstalled according to the config
[2020/10/14-14:58:40] [INFO] [esmon_install_nodeps.py:2484] support for metrics of [memory, CPU, df(/), load, sensors, uptime, users, Lustre MDS] will be enabled on ESMON client [mds] according to the config
[2020/10/14-14:58:51] [INFO] [connectionpool.py:203] Starting new HTTP connection (1): cloudos112
[2020/10/14-14:58:51] [ERROR] [esmon_influxdb.py:60] got exception with query [SELECT * FROM "memory.buffered.memory" WHERE fqdn = 'mds' ORDER BY time DESC LIMIT 1;]: Traceback (most recent call last):
File "pyesmon/esmon_influxdb.py", line 57, in ic_query
File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 464, in request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 576, in send
r = adapter.send(request, **kwargs)
File "/usr/lib/python2.7/site-packages/requests/adapters.py", line 415, in send
raise ConnectionError(err, request=request)
ConnectionError: ('Connection aborted.', error(111, 'Connection refused'))

@LiXi-storage
Copy link
Collaborator

It seems http://${INFLUX_SERVER}:8086 can not be connected. I'd suggest to check the connection manually. And firewalld might be blocking the requests.

@LiXi-storage LiXi-storage mentioned this issue Oct 14, 2020
@GodGirlwsy
Copy link
Author

It seems http://${INFLUX_SERVER}:8086 can not be connected. I'd suggest to check the connection manually. And firewalld might be blocking the requests.
感谢您的回复,上述问题已经解决,原因是esmon_config中的server:reinstall为false,改为true即可安装成功。

安装成功后登陆grafana界面,查看了lustre的statistis发现有些指标的数据采集失败。查看mds的节点
[root@mds1 ~]# systemctl status collectd.service
● collectd.service - Collectd statistics daemon
Loaded: loaded (/usr/lib/systemd/system/collectd.service; enabled; vendor preset: disabled)
Active: active (running) since Wed 2020-10-14 23:50:57 EDT; 19s ago
Docs: man:collectd(1)
man:collectd.conf(5)
Main PID: 6809 (collectd)
CGroup: /system.slice/collectd.service
└─6809 /usr/sbin/collectd

Oct 14 23:50:57 mds1 systemd[1]: Starting Collectd statistics daemon...
Oct 14 23:50:57 mds1 collectd[6809]: plugin_load: plugin "aggregation" successfully loaded.
Oct 14 23:50:57 mds1 collectd[6809]: plugin_load: plugin "match_regex" successfully loaded.
Oct 14 23:50:57 mds1 collectd[6809]: plugin_load: plugin "filedata" successfully loaded.
Oct 14 23:50:57 mds1 collectd[6809]: plugin_load: plugin "syslog" successfully loaded.
Oct 14 23:50:57 mds1 systemd[1]: Started Collectd statistics daemon.
Oct 14 23:50:57 mds1 collectd[6809]: failed to stat /proc/fs/lustre/lod/h3lustre-MDT0000-mdtlov/filestotal
Oct 14 23:50:57 mds1 collectd[6809]: unable to read file /proc/fs/lustre/lod/h3lustre-MDT0000-mdtlov/filestotal
Oct 14 23:50:57 mds1 collectd[6809]: failed to stat /proc/fs/lustre/lod/h3lustre-MDT0000-mdtlov/filesfree
Oct 14 23:50:57 mds1 collectd[6809]: unable to read file /proc/fs/lustre/lod/h3lustre-MDT0000-mdtlov/filesfree

不知道是哪里的配置有误,请指教,期待您的回复,谢谢

@LiXi-storage
Copy link
Collaborator

I noticed the following errors:

[2020/10/14-14:58:40] [INFO] [esmon_install_nodeps.py:1514] can't deterimine Lustre version according to RPM names on host [mds], possible versions are [es2 es3 es4 2.7 2.12], using default [es3]

Please check your Lustre version is compatible with es3 (2.10). Otherwise, the data wo't be collected properly. If 2.10 is not the closest version with your Lustre version, please change the Lustre version in /etc/esmon_install.conf.

@GodGirlwsy
Copy link
Author

I noticed the following errors:

[2020/10/14-14:58:40] [INFO] [esmon_install_nodeps.py:1514] can't deterimine Lustre version according to RPM names on host [mds], possible versions are [es2 es3 es4 2.7 2.12], using default [es3]

Please check your Lustre version is compatible with es3 (2.10). Otherwise, the data wo't be collected properly. If 2.10 is not the closest version with your Lustre version, please change the Lustre version in /etc/esmon_install.conf.

我安装的lustre是2.12,esmon为esmon-1.3.ge627284.x86_64.iso。

尝试一:将/etc/esmon_install.conf配置文件中的lustre version修改为es4仍然存在数据收集失败的消息:
[root@mds1 ~]# systemctl status collectd.service
● collectd.service - Collectd statistics daemon
Loaded: loaded (/usr/lib/systemd/system/collectd.service; enabled; vendor preset: disabled)
Active: active (running) since Wed 2020-10-14 23:50:57 EDT; 19s ago
Docs: man:collectd(1)
man:collectd.conf(5)
Main PID: 6809 (collectd)
CGroup: /system.slice/collectd.service
└─6809 /usr/sbin/collectd

Oct 14 23:50:57 mds1 systemd[1]: Starting Collectd statistics daemon...
Oct 14 23:50:57 mds1 collectd[6809]: plugin_load: plugin "aggregation" successfully loaded.
Oct 14 23:50:57 mds1 collectd[6809]: plugin_load: plugin "match_regex" successfully loaded.
Oct 14 23:50:57 mds1 collectd[6809]: plugin_load: plugin "filedata" successfully loaded.
Oct 14 23:50:57 mds1 collectd[6809]: plugin_load: plugin "syslog" successfully loaded.
Oct 14 23:50:57 mds1 systemd[1]: Started Collectd statistics daemon.
Oct 14 23:50:57 mds1 collectd[6809]: failed to stat /proc/fs/lustre/lod/h3lustre-MDT0000-mdtlov/filestotal
Oct 14 23:50:57 mds1 collectd[6809]: unable to read file /proc/fs/lustre/lod/h3lustre-MDT0000-mdtlov/filestotal
Oct 14 23:50:57 mds1 collectd[6809]: failed to stat /proc/fs/lustre/lod/h3lustre-MDT0000-mdtlov/filesfree
Oct 14 23:50:57 mds1 collectd[6809]: unable to read file /proc/fs/lustre/lod/h3lustre-MDT0000-mdtlov/filesfree

尝试一:将/etc/esmon_install.conf配置文件中的lustre version修改为2.12:
[root@cloudos111 etc]# esmon_install
Started installing Exascaler monitoring system using config [/etc/esmon_install.conf], please check [/var/log/esmon_install/2020-10-15-13_59_26] for more log
[2020/10/15-13:59:26] [INFO] [filelock.py:247] Lock 139624011587344 acquired on /etc/esmon_install.conf.lock
[2020/10/15-13:59:26] [ERROR] [esmon_install_nodeps.py:2252] unsupported Lustre version [2.12], please correct file [/etc/esmon_install.conf]
[2020/10/15-13:59:26] [ERROR] [esmon_install_nodeps.py:2428] failed to parse config [/etc/esmon_install.conf]
[2020/10/15-13:59:26] [INFO] [filelock.py:310] Lock 139624011587344 released on /etc/esmon_install.conf.lock
[2020/10/15-13:59:26] [ERROR] [esmon_install_nodeps.py:2674] installation failed, please check [/var/log/esmon_install/2020-10-15-13_59_26] for more log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants