[ovirt-users] vdsm issues between engine and host

cmc iucounu at gmail.com
Tue Feb 21 10:18:20 UTC 2017


Hi Piotr,

Thanks for the reply. It all looks healthy now. Regarding DNS, we had
some issues with it at the time. However, I think the main issue was
NetworkManager shutting the interface down seemingly at random. I had
thought it had been disabled when I set the machine up about 5 months
ago (and it has worked fine up until then). That, together with VDSM
being enabled on the engine I can't explain. The only change I had
made was an attempt to set up a hosted engine, which I did incorrectly
by not setting the host into maintenance and doing it there (I instead
tried to set it up as a VM on the running cluster). I can't see why
this may have made the changes above, but I would not know why.
Anyway, I've read the documentation more closely rather than hurrying
through it.

Feb 20 19:47:53 ovirt-engine NetworkManager[1061]: <info>
[1487620073.1858] device (enp5s0f0): state change: disconnected ->
prepare (reason 'none') [30 40 0]
Feb 20 19:47:53 ovirt-engine NetworkManager[1061]: <info>
[1487620073.1860] manager: NetworkManager state is now CONNECTING
Feb 20 19:47:53 ovirt-engine NetworkManager[1061]: <info>
[1487620073.1867] device (enp5s0f0): state change: prepare -> config
(reason 'none') [40 50 0]
Feb 20 19:47:53 ovirt-engine NetworkManager[1061]: <info>
[1487620073.2071] device (enp5s0f0): state change: config -> ip-config
(reason 'none') [50 70 0]
Feb 20 19:47:53 ovirt-engine NetworkManager[1061]: <info>
[1487620073.2120] device (enp5s0f0): state change: ip-config ->
ip-check (reason 'none') [70 80 0]
Feb 20 19:47:53 ovirt-engine NetworkManager[1061]: <info>
[1487620073.2160] device (enp5s0f0): state change: ip-check ->
secondaries (reason 'none') [80 90 0]
Feb 20 19:47:53 ovirt-engine NetworkManager[1061]: <info>
[1487620073.2164] device (enp5s0f0): state change: secondaries ->
activated (reason 'none') [90 100 0]
Feb 20 19:47:53 ovirt-engine NetworkManager[1061]: <info>
[1487620073.2166] manager: NetworkManager state is now CONNECTED_LOCAL
Feb 20 19:47:53 ovirt-engine NetworkManager[1061]: <info>
[1487620073.2889] manager: NetworkManager state is now
CONNECTED_GLOBAL

Thanks again and sorry to have wasted your time with this,

Cam

On Tue, Feb 21, 2017 at 8:59 AM, Piotr Kliczewski
<piotr.kliczewski at gmail.com> wrote:
> On Mon, Feb 20, 2017 at 9:47 PM, cmc <iucounu at gmail.com> wrote:
>> Hi,
>>
>> Due to networking and DNS issues. our engine was offlined (it is
>> physical machine currently, will be converting it to a VM in the
>> future when time allows). When service was restored, I noticed that
>> all the VMs were listed as being in an unknown state on one host. The
>> VMs were fine, but the engine could not ascertain their status as the
>> host itself was in an unknown state. vdsm was reporting errors and was
>> not running on the engine (or at least was in status 'failed' in
>> systemd). I tried starting vdsmd on the engine but it would not start.
>> I decided to try to restart vdsmd on the host and that did allow the
>> state of the VMs to be discovered, and the engine listed the host as
>> up again. However, there are still errors with vdsmd on both the host
>> and the engine, and the engine cannot start vdsmd. I guess it is able
>> to monitor the hosts in a limited way as it says they are both up.
>> There are communication errors between one of the hosts and the
>> engine: the host is refusing connections by the look of it
>>
>> from the engine log:
>>
>> 2017-02-20 18:41:51,226Z ERROR
>> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand]
>> (DefaultQuartzScheduler2) [f8aa18b3-97b9-48e2-a681-cf3aaed330a5]
>> Command 'GetCapabilitiesVDSCommand(HostName = k
>> vm-ldn-01, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true',
>> hostId='e050c27f-8709-404c-b03e-59c0167a824b',
>> vds='Host[kvm-ldn-01,e050c27f-8709-404c-b03e-59c0167a824b]'})'
>> execution failed: java.net.ConnectExce
>> ption: Connection refused
>> 2017-02-20 18:41:51,226Z ERROR
>> [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring]
>> (DefaultQuartzScheduler2) [f8aa18b3-97b9-48e2-a681-cf3aaed330a5]
>> Failure to refresh host 'kvm-ldn-01' runtime info: java.n
>> et.ConnectException: Connection refused
>> 2017-02-20 18:41:52,772Z ERROR
>> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand]
>> (DefaultQuartzScheduler6) [f8aa18b3-97b9-48e2-a681-cf3aaed330a5]
>> Command 'GetAllVmStatsVDSCommand(HostName = kvm-ldn-01,
>> VdsIdVDSCommandParametersBase:{runAsync='true',
>> hostId='e050c27f-8709-404c-b03e-59c0167a824b'})' execution failed:
>> VDSGenericException: VDSNetworkException: Connection reset by peer
>> 2017-02-20 18:41:54,256Z ERROR
>> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand]
>> (DefaultQuartzScheduler7) [f8aa18b3-97b9-48e2-a681-cf3aaed330a5]
>> Command 'GetCapabilitiesVDSCommand(HostName = kvm-ldn-01,
>> VdsIdAndVdsVDSCommandParametersBase:{runAsync='true',
>> hostId='e050c27f-8709-404c-b03e-59c0167a824b',
>> vds='Host[kvm-ldn-01,e050c27f-8709-404c-b03e-59c0167a824b]'})'
>> execution failed: java.net.ConnectException: Connection refused
>>
>
> I checked your engine logs and I saw dns issues much later then the error above:
>
> 2017-02-20 19:47:56,516Z ERROR
> [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring]
> (DefaultQuartzScheduler6) [f8aa18b3-97b9-48e2-a681-cf3aaed330a5]
> Failure to refresh host 'kvm-ldn-01' runtime info:
> java.net.UnknownHostException: kvm-ldn-01
>
>> from the vdsm.log on the host:
>>
>>
>> Feb 20 18:44:20 kvm-ldn-01 vdsm[42308]: vdsm vds.dispatcher ERROR SSL
>> error receiving from <yajsonrpc.betterAsyncore.Dispatcher connected
>> ('::ffff:172.16.75.16', 38350, 0, 0) at 0x33b9bd8>: unexpected eof
>> Feb 20 18:44:24 kvm-ldn-01 vdsm[42308]: vdsm jsonrpc.JsonRpcServer
>> ERROR Internal server error
>>                                         Traceback (most recent call last):
>>                                           File
>> "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 547, in
>> _handle_request...
>>
>> Any ideas what might be going on here?
>
> I see that ~13 vm was move to up state.
>
> Can you please say which host is causing issues and provide the logs.
>
>>
>> Thanks,
>>
>> Cam
>>
>> _______________________________________________
>> Users mailing list
>> Users at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>


More information about the Users mailing list