Hi,
Due to networking and DNS issues. our engine was offlined (it is
physical machine currently, will be converting it to a VM in the
future when time allows). When service was restored, I noticed that
all the VMs were listed as being in an unknown state on one host. The
VMs were fine, but the engine could not ascertain their status as the
host itself was in an unknown state. vdsm was reporting errors and was
not running on the engine (or at least was in status 'failed' in
systemd). I tried starting vdsmd on the engine but it would not start.
I decided to try to restart vdsmd on the host and that did allow the
state of the VMs to be discovered, and the engine listed the host as
up again. However, there are still errors with vdsmd on both the host
and the engine, and the engine cannot start vdsmd. I guess it is able
to monitor the hosts in a limited way as it says they are both up.
There are communication errors between one of the hosts and the
engine: the host is refusing connections by the look of it
from the engine log:
2017-02-20 18:41:51,226Z ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand]
(DefaultQuartzScheduler2) [f8aa18b3-97b9-48e2-a681-cf3aaed330a5]
Command 'GetCapabilitiesVDSCommand(HostName = k
vm-ldn-01, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true',
hostId='e050c27f-8709-404c-b03e-59c0167a824b',
vds='Host[kvm-ldn-01,e050c27f-8709-404c-b03e-59c0167a824b]'})'
execution failed: java.net.ConnectExce
ption: Connection refused
2017-02-20 18:41:51,226Z ERROR
[org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring]
(DefaultQuartzScheduler2) [f8aa18b3-97b9-48e2-a681-cf3aaed330a5]
Failure to refresh host 'kvm-ldn-01' runtime info: java.n
et.ConnectException: Connection refused
2017-02-20 18:41:52,772Z ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand]
(DefaultQuartzScheduler6) [f8aa18b3-97b9-48e2-a681-cf3aaed330a5]
Command 'GetAllVmStatsVDSCommand(HostName = kvm-ldn-01,
VdsIdVDSCommandParametersBase:{runAsync='true',
hostId='e050c27f-8709-404c-b03e-59c0167a824b'})' execution failed:
VDSGenericException: VDSNetworkException: Connection reset by peer
2017-02-20 18:41:54,256Z ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand]
(DefaultQuartzScheduler7) [f8aa18b3-97b9-48e2-a681-cf3aaed330a5]
Command 'GetCapabilitiesVDSCommand(HostName = kvm-ldn-01,
VdsIdAndVdsVDSCommandParametersBase:{runAsync='true',
hostId='e050c27f-8709-404c-b03e-59c0167a824b',
vds='Host[kvm-ldn-01,e050c27f-8709-404c-b03e-59c0167a824b]'})'
execution failed: java.net.ConnectException: Connection refused
from the vdsm.log on the host:
Feb 20 18:44:20 kvm-ldn-01 vdsm[42308]: vdsm vds.dispatcher ERROR SSL
error receiving from <yajsonrpc.betterAsyncore.Dispatcher connected
('::ffff:172.16.75.16', 38350, 0, 0) at 0x33b9bd8>: unexpected eof
Feb 20 18:44:24 kvm-ldn-01 vdsm[42308]: vdsm jsonrpc.JsonRpcServer
ERROR Internal server error
Traceback (most recent call last):
File
"/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 547, in
_handle_request...
Any ideas what might be going on here?
Thanks,
Cam