Hi Piotr,
Thanks for the reply. It all looks healthy now. Regarding DNS, we had
some issues with it at the time. However, I think the main issue was
NetworkManager shutting the interface down seemingly at random. I had
thought it had been disabled when I set the machine up about 5 months
ago (and it has worked fine up until then). That, together with VDSM
being enabled on the engine I can't explain. The only change I had
made was an attempt to set up a hosted engine, which I did incorrectly
by not setting the host into maintenance and doing it there (I instead
tried to set it up as a VM on the running cluster). I can't see why
this may have made the changes above, but I would not know why.
Anyway, I've read the documentation more closely rather than hurrying
through it.
Feb 20 19:47:53 ovirt-engine NetworkManager[1061]: <info>
[1487620073.1858] device (enp5s0f0): state change: disconnected ->
prepare (reason 'none') [30 40 0]
Feb 20 19:47:53 ovirt-engine NetworkManager[1061]: <info>
[1487620073.1860] manager: NetworkManager state is now CONNECTING
Feb 20 19:47:53 ovirt-engine NetworkManager[1061]: <info>
[1487620073.1867] device (enp5s0f0): state change: prepare -> config
(reason 'none') [40 50 0]
Feb 20 19:47:53 ovirt-engine NetworkManager[1061]: <info>
[1487620073.2071] device (enp5s0f0): state change: config -> ip-config
(reason 'none') [50 70 0]
Feb 20 19:47:53 ovirt-engine NetworkManager[1061]: <info>
[1487620073.2120] device (enp5s0f0): state change: ip-config ->
ip-check (reason 'none') [70 80 0]
Feb 20 19:47:53 ovirt-engine NetworkManager[1061]: <info>
[1487620073.2160] device (enp5s0f0): state change: ip-check ->
secondaries (reason 'none') [80 90 0]
Feb 20 19:47:53 ovirt-engine NetworkManager[1061]: <info>
[1487620073.2164] device (enp5s0f0): state change: secondaries ->
activated (reason 'none') [90 100 0]
Feb 20 19:47:53 ovirt-engine NetworkManager[1061]: <info>
[1487620073.2166] manager: NetworkManager state is now CONNECTED_LOCAL
Feb 20 19:47:53 ovirt-engine NetworkManager[1061]: <info>
[1487620073.2889] manager: NetworkManager state is now
CONNECTED_GLOBAL
Thanks again and sorry to have wasted your time with this,
Cam
On Tue, Feb 21, 2017 at 8:59 AM, Piotr Kliczewski
<piotr.kliczewski(a)gmail.com> wrote:
On Mon, Feb 20, 2017 at 9:47 PM, cmc <iucounu(a)gmail.com>
wrote:
> Hi,
>
> Due to networking and DNS issues. our engine was offlined (it is
> physical machine currently, will be converting it to a VM in the
> future when time allows). When service was restored, I noticed that
> all the VMs were listed as being in an unknown state on one host. The
> VMs were fine, but the engine could not ascertain their status as the
> host itself was in an unknown state. vdsm was reporting errors and was
> not running on the engine (or at least was in status 'failed' in
> systemd). I tried starting vdsmd on the engine but it would not start.
> I decided to try to restart vdsmd on the host and that did allow the
> state of the VMs to be discovered, and the engine listed the host as
> up again. However, there are still errors with vdsmd on both the host
> and the engine, and the engine cannot start vdsmd. I guess it is able
> to monitor the hosts in a limited way as it says they are both up.
> There are communication errors between one of the hosts and the
> engine: the host is refusing connections by the look of it
>
> from the engine log:
>
> 2017-02-20 18:41:51,226Z ERROR
> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand]
> (DefaultQuartzScheduler2) [f8aa18b3-97b9-48e2-a681-cf3aaed330a5]
> Command 'GetCapabilitiesVDSCommand(HostName = k
> vm-ldn-01, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true',
> hostId='e050c27f-8709-404c-b03e-59c0167a824b',
> vds='Host[kvm-ldn-01,e050c27f-8709-404c-b03e-59c0167a824b]'})'
> execution failed: java.net.ConnectExce
> ption: Connection refused
> 2017-02-20 18:41:51,226Z ERROR
> [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring]
> (DefaultQuartzScheduler2) [f8aa18b3-97b9-48e2-a681-cf3aaed330a5]
> Failure to refresh host 'kvm-ldn-01' runtime info: java.n
> et.ConnectException: Connection refused
> 2017-02-20 18:41:52,772Z ERROR
> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand]
> (DefaultQuartzScheduler6) [f8aa18b3-97b9-48e2-a681-cf3aaed330a5]
> Command 'GetAllVmStatsVDSCommand(HostName = kvm-ldn-01,
> VdsIdVDSCommandParametersBase:{runAsync='true',
> hostId='e050c27f-8709-404c-b03e-59c0167a824b'})' execution failed:
> VDSGenericException: VDSNetworkException: Connection reset by peer
> 2017-02-20 18:41:54,256Z ERROR
> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand]
> (DefaultQuartzScheduler7) [f8aa18b3-97b9-48e2-a681-cf3aaed330a5]
> Command 'GetCapabilitiesVDSCommand(HostName = kvm-ldn-01,
> VdsIdAndVdsVDSCommandParametersBase:{runAsync='true',
> hostId='e050c27f-8709-404c-b03e-59c0167a824b',
> vds='Host[kvm-ldn-01,e050c27f-8709-404c-b03e-59c0167a824b]'})'
> execution failed: java.net.ConnectException: Connection refused
>
I checked your engine logs and I saw dns issues much later then the error above:
2017-02-20 19:47:56,516Z ERROR
[org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring]
(DefaultQuartzScheduler6) [f8aa18b3-97b9-48e2-a681-cf3aaed330a5]
Failure to refresh host 'kvm-ldn-01' runtime info:
java.net.UnknownHostException: kvm-ldn-01
> from the vdsm.log on the host:
>
>
> Feb 20 18:44:20 kvm-ldn-01 vdsm[42308]: vdsm vds.dispatcher ERROR SSL
> error receiving from <yajsonrpc.betterAsyncore.Dispatcher connected
> ('::ffff:172.16.75.16', 38350, 0, 0) at 0x33b9bd8>: unexpected eof
> Feb 20 18:44:24 kvm-ldn-01 vdsm[42308]: vdsm jsonrpc.JsonRpcServer
> ERROR Internal server error
> Traceback (most recent call last):
> File
> "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 547, in
> _handle_request...
>
> Any ideas what might be going on here?
I see that ~13 vm was move to up state.
Can you please say which host is causing issues and provide the logs.
>
> Thanks,
>
> Cam
>
> _______________________________________________
> Users mailing list
> Users(a)ovirt.org
>
http://lists.ovirt.org/mailman/listinfo/users
>