Hi Andrei,
Could you also post a relevant piece of engine.log? I don't have high
expectations to find the answer there but I just want to be sure of it.
VDSM.log does not show any trace of error from the vdsm point of view. For
example it looks like it started correctly and subscribed to receiving
commands from the engine (yet that does not mean I connected to it - only
in listening mode).
Can you confirm that 'SSH restart' from UI works - by 'works' I mean the
host is actually restarted after a few minutes and there are no ssh related
(public key etc) errors in engine.log?
Artur
On Mon, Aug 9, 2021 at 9:55 AM Andrei Verovski <andreil1(a)starlett.lv> wrote:
Hi,
I have oVirt 4.4.7.6-1.el8 and one problematic node (HP ProLiant with
CentOS 8 stream).
After replacing server rack router switch and restart got this error I
can’t recover from:
VDSM node14 command Get Host Capabilities failed: Message timeout which
can be caused by communication issues
vdsm-network running fine, but vdsmd can’t start on node14 for whatever
reason. All other nodes running fine.
Aug 09 10:24:12 node14.mydomain.lv vdsmd_init_common.sh[4825]: vdsm:
Running dummybr
Aug 09 10:24:13 node14.mydomain.lv vdsmd_init_common.sh[4825]: vdsm:
Running tune_system
Aug 09 10:24:13 node14.mydomain.lv vdsmd_init_common.sh[4825]: vdsm:
Running test_space
Aug 09 10:24:13 node14.mydomain.lv vdsmd_init_common.sh[4825]: vdsm:
Running test_lo
Aug 09 10:24:13 node14.mydomain.lv systemd[1]: Started Virtual Desktop
Server Manager.
Aug 09 10:24:16 node14.mydomain.lv sudo[7721]: pam_systemd(sudo:session):
Failed to create session: Start job for unit user-0.slice failed with
'canceled'
Aug 09 10:24:16 node14.mydomain.lv sudo[7721]: pam_unix(sudo:session):
session opened for user root by (uid=0)
Aug 09 10:24:16 node14.mydomain.lv sudo[7721]: pam_unix(sudo:session):
session closed for user root
Aug 09 10:24:17 node14.mydomain.lv vdsm[6754]: WARN MOM not available.
Error: [Errno 2] No such file or directory
Aug 09 10:24:17 node14.mydomain.lv vdsm[6754]: WARN MOM not available,
KSM stats will be missing. Error:
In web gui -> Management I can’t do anything with the host except restart.
Stop aborts with error, all other commands are gray-ed out.
Status is “Unassigned”. Host is answering to pings as usual.
vdsm.log (from node14) attached.
Thanks in advance for any help.
_______________________________________________
Users mailing list -- users(a)ovirt.org
To unsubscribe send an email to users-leave(a)ovirt.org
Privacy Statement:
https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/55M65W57Z43...
--
Artur Socha
Senior Software Engineer, RHV
Red Hat