[ovirt-users] Hosts temporarily in "Non Operational" state after upgrade
Will Dennis
wdennis at nec-labs.com
Fri Apr 29 13:04:11 UTC 2016
(so noted) ...or anyone else who knows the answer ;)
-----Original Message-----
From: Michal Skrivanek [mailto:michal.skrivanek at redhat.com]
Sent: Friday, April 29, 2016 9:02 AM
To: Will Dennis
Cc: users at ovirt.org
Subject: Re: [ovirt-users] Hosts temporarily in "Non Operational" state after upgrade
> On 29 Apr 2016, at 14:46, Will Dennis <wdennis at nec-labs.com> wrote:
>
> Bump - can any RHAT folks comment on this?
note oVirt is a community project;-)
>
> -----Original Message-----
> From: Will Dennis
> Sent: Wednesday, April 27, 2016 11:00 PM
> To: users at ovirt.org
> Subject: Hosts temporarily in "Non Operational" state after upgrade
>
> Hi all,
>
> Had run updates tonight on my three oVirt hosts (3.6 hyperconverged) on on two of them, they went into “non Operational” state for a few minutes each before springing back to life… The synopsis was this:
>
> - Ran updates throughout the web Admin UI ...then I got the following series of messages via the “Events” tab in the UI:
what exactly did you do in the UI?
> - Updates successfully ran
> - VDSM “command failed: Heartbeat exceeded” message
> - host is not responding message
> - "Failed to connect to hosted_storage" message
> - “The error message for connection localhost:/engine returned by VDSM was: Problem while trying to mount target”
> - "Host <name> reports about one of the Active Storage Domains as Problematic”
> - “Host <name> cannot access the Storage Domain(s) hosted_storage attached to the data center Default. Setting host state to Non-Operational.”
> - "Detected change in status of brick {…} of volume {…} from DOWN to UP.” (once for every brick on the host for every Gluster volume.)
> - "Host <name> was autorecovered.”
> - "Status of host <name> was set to Up.”
so..it was not in Maintenance when you run the update?
You should avoid doing that as an update to any package may interfere with running guests. E.g. a qemu rpm update can (and likely will) simply kill all your VMs, I suppose similarly for Gluster before updating anything the volumes should be in some kind of maintenance mode as well
>
> (BTW, it would be awesome if the UI’s Events log could be copied and pasted… Doesn’t work for me at least…)
>
> Duration of outage was ~3 mins per each affected host. Didn’t happen on the first host I upgraded, but did on the last two.
>
> I know I’m a little over the bleeding edge running hyperconverged on 3.6 :) but, should this behavior be expected?
>
> Also, if I go onto the hosts directly and run a ‘yum update’ after this upgrade process (not that I went thru with it, just wanted to see what was available to be upgraded) I see a bunch of ovirt-* packages that can be upgraded, which didn’t get updated thru the web UI’s upgrade process —
> ovirt-engine-sdk-python noarch 3.6.5.0-1.el7.centos ovirt-3.6 480 k
> ovirt-hosted-engine-ha noarch 1.3.5.3-1.1.el7 centos-ovirt36 295 k
> ovirt-hosted-engine-setup noarch 1.3.5.0-1.1.el7 centos-ovirt36 270 k
> ovirt-release36 noarch 007-1 ovirt-3.6 9.5 k
>
> Are these packages not related to the “Upgrade” process available thru the web UI?
>
> FYI, here’s what did get updated thru the web UI “Upgrade” process — Apr 27 21:36:28 Updated: libvirt-client-1.2.17-13.el7_2.4.x86_64
> Apr 27 21:36:28 Updated: libvirt-daemon-1.2.17-13.el7_2.4.x86_64
> Apr 27 21:36:28 Updated: libvirt-daemon-driver-network-1.2.17-13.el7_2.4.x86_64
> Apr 27 21:36:28 Updated: libvirt-daemon-driver-qemu-1.2.17-13.el7_2.4.x86_64
> Apr 27 21:36:28 Updated: libvirt-daemon-driver-nwfilter-1.2.17-13.el7_2.4.x86_64
> Apr 27 21:36:28 Updated: vdsm-infra-4.17.26-1.el7.noarch Apr 27 21:36:28 Updated: vdsm-python-4.17.26-1.el7.noarch Apr 27 21:36:28 Updated: vdsm-xmlrpc-4.17.26-1.el7.noarch Apr 27 21:36:28 Updated: libvirt-daemon-config-nwfilter-1.2.17-13.el7_2.4.x86_64
> Apr 27 21:36:29 Updated: mom-0.5.3-1.1.el7.noarch Apr 27 21:36:29 Updated: libvirt-lock-sanlock-1.2.17-13.el7_2.4.x86_64
> Apr 27 21:36:29 Updated: libvirt-daemon-driver-secret-1.2.17-13.el7_2.4.x86_64
> Apr 27 21:36:29 Updated: libvirt-daemon-driver-nodedev-1.2.17-13.el7_2.4.x86_64
> Apr 27 21:36:29 Updated: libvirt-daemon-driver-interface-1.2.17-13.el7_2.4.x86_64
> Apr 27 21:36:29 Updated: libvirt-daemon-driver-storage-1.2.17-13.el7_2.4.x86_64
> Apr 27 21:36:29 Updated: libvirt-daemon-kvm-1.2.17-13.el7_2.4.x86_64
> Apr 27 21:36:29 Updated: 1:libguestfs-1.28.1-1.55.el7.centos.2.x86_64
> Apr 27 21:36:29 Updated: 1:libguestfs-tools-c-1.28.1-1.55.el7.centos.2.x86_64
> Apr 27 21:36:29 Installed: libguestfs-winsupport-7.2-1.el7.x86_64
> Apr 27 21:36:29 Updated: vdsm-yajsonrpc-4.17.26-1.el7.noarch
> Apr 27 21:36:29 Updated: vdsm-jsonrpc-4.17.26-1.el7.noarch Apr 27 21:36:29 Installed: unzip-6.0-15.el7.x86_64 Apr 27 21:36:30 Installed: gtk2-2.24.28-8.el7.x86_64 Apr 27 21:36:31 Installed: 1:virt-v2v-1.28.1-1.55.el7.centos.2.x86_64
> Apr 27 21:36:31 Updated: safelease-1.0-7.el7.x86_64 Apr 27 21:36:31 Updated: vdsm-hook-vmfex-dev-4.17.26-1.el7.noarch
> Apr 27 21:36:32 Updated: vdsm-4.17.26-1.el7.noarch Apr 27 21:36:32 Updated: vdsm-gluster-4.17.26-1.el7.noarch Apr 27 21:36:32 Updated: vdsm-cli-4.17.26-1.el7.noarch
Perhaps libvirtd restarted because of those updates, which causes a vdsm restart as well, dropping the host connection temporarily
Thanks,
michal
>
> Thanks,
> Will
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
More information about the Users
mailing list