Hosts temporarily in "Non Operational" state after upgrade

Hi all, Had run updates tonight on my three oVirt hosts (3.6 hyperconverged) on on two of them, they went into “non Operational” state for a few minutes each before springing back to life… The synopsis was this: - Ran updates throughout the web Admin UI ...then I got the following series of messages via the “Events” tab in the UI: - Updates successfully ran - VDSM “command failed: Heartbeat exceeded” message - host is not responding message - "Failed to connect to hosted_storage" message - “The error message for connection localhost:/engine returned by VDSM was: Problem while trying to mount target” - "Host <name> reports about one of the Active Storage Domains as Problematic” - “Host <name> cannot access the Storage Domain(s) hosted_storage attached to the data center Default. Setting host state to Non-Operational.” - "Detected change in status of brick {…} of volume {…} from DOWN to UP.” (once for every brick on the host for every Gluster volume.) - "Host <name> was autorecovered.” - "Status of host <name> was set to Up." (BTW, it would be awesome if the UI’s Events log could be copied and pasted… Doesn’t work for me at least…) Duration of outage was ~3 mins per each affected host. Didn’t happen on the first host I upgraded, but did on the last two. I know I’m a little over the bleeding edge running hyperconverged on 3.6 :) but, should this behavior be expected? Also, if I go onto the hosts directly and run a ‘yum update’ after this upgrade process (not that I went thru with it, just wanted to see what was available to be upgraded) I see a bunch of ovirt-* packages that can be upgraded, which didn’t get updated thru the web UI’s upgrade process — ovirt-engine-sdk-python noarch 3.6.5.0-1.el7.centos ovirt-3.6 480 k ovirt-hosted-engine-ha noarch 1.3.5.3-1.1.el7 centos-ovirt36 295 k ovirt-hosted-engine-setup noarch 1.3.5.0-1.1.el7 centos-ovirt36 270 k ovirt-release36 noarch 007-1 ovirt-3.6 9.5 k Are these packages not related to the “Upgrade” process available thru the web UI? FYI, here’s what did get updated thru the web UI “Upgrade” process — Apr 27 21:36:28 Updated: libvirt-client-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:28 Updated: libvirt-daemon-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:28 Updated: libvirt-daemon-driver-network-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:28 Updated: libvirt-daemon-driver-qemu-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:28 Updated: libvirt-daemon-driver-nwfilter-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:28 Updated: vdsm-infra-4.17.26-1.el7.noarch Apr 27 21:36:28 Updated: vdsm-python-4.17.26-1.el7.noarch Apr 27 21:36:28 Updated: vdsm-xmlrpc-4.17.26-1.el7.noarch Apr 27 21:36:28 Updated: libvirt-daemon-config-nwfilter-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:29 Updated: mom-0.5.3-1.1.el7.noarch Apr 27 21:36:29 Updated: libvirt-lock-sanlock-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:29 Updated: libvirt-daemon-driver-secret-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:29 Updated: libvirt-daemon-driver-nodedev-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:29 Updated: libvirt-daemon-driver-interface-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:29 Updated: libvirt-daemon-driver-storage-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:29 Updated: libvirt-daemon-kvm-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:29 Updated: 1:libguestfs-1.28.1-1.55.el7.centos.2.x86_64 Apr 27 21:36:29 Updated: 1:libguestfs-tools-c-1.28.1-1.55.el7.centos.2.x86_64 Apr 27 21:36:29 Installed: libguestfs-winsupport-7.2-1.el7.x86_64 Apr 27 21:36:29 Updated: vdsm-yajsonrpc-4.17.26-1.el7.noarch Apr 27 21:36:29 Updated: vdsm-jsonrpc-4.17.26-1.el7.noarch Apr 27 21:36:29 Installed: unzip-6.0-15.el7.x86_64 Apr 27 21:36:30 Installed: gtk2-2.24.28-8.el7.x86_64 Apr 27 21:36:31 Installed: 1:virt-v2v-1.28.1-1.55.el7.centos.2.x86_64 Apr 27 21:36:31 Updated: safelease-1.0-7.el7.x86_64 Apr 27 21:36:31 Updated: vdsm-hook-vmfex-dev-4.17.26-1.el7.noarch Apr 27 21:36:32 Updated: vdsm-4.17.26-1.el7.noarch Apr 27 21:36:32 Updated: vdsm-gluster-4.17.26-1.el7.noarch Apr 27 21:36:32 Updated: vdsm-cli-4.17.26-1.el7.noarch Thanks, Will

Bump - can any RHAT folks comment on this? -----Original Message----- From: Will Dennis Sent: Wednesday, April 27, 2016 11:00 PM To: users@ovirt.org Subject: Hosts temporarily in "Non Operational" state after upgrade Hi all, Had run updates tonight on my three oVirt hosts (3.6 hyperconverged) on on two of them, they went into “non Operational” state for a few minutes each before springing back to life… The synopsis was this: - Ran updates throughout the web Admin UI ...then I got the following series of messages via the “Events” tab in the UI: - Updates successfully ran - VDSM “command failed: Heartbeat exceeded” message - host is not responding message - "Failed to connect to hosted_storage" message - “The error message for connection localhost:/engine returned by VDSM was: Problem while trying to mount target” - "Host <name> reports about one of the Active Storage Domains as Problematic” - “Host <name> cannot access the Storage Domain(s) hosted_storage attached to the data center Default. Setting host state to Non-Operational.” - "Detected change in status of brick {…} of volume {…} from DOWN to UP.” (once for every brick on the host for every Gluster volume.) - "Host <name> was autorecovered.” - "Status of host <name> was set to Up." (BTW, it would be awesome if the UI’s Events log could be copied and pasted… Doesn’t work for me at least…) Duration of outage was ~3 mins per each affected host. Didn’t happen on the first host I upgraded, but did on the last two. I know I’m a little over the bleeding edge running hyperconverged on 3.6 :) but, should this behavior be expected? Also, if I go onto the hosts directly and run a ‘yum update’ after this upgrade process (not that I went thru with it, just wanted to see what was available to be upgraded) I see a bunch of ovirt-* packages that can be upgraded, which didn’t get updated thru the web UI’s upgrade process — ovirt-engine-sdk-python noarch 3.6.5.0-1.el7.centos ovirt-3.6 480 k ovirt-hosted-engine-ha noarch 1.3.5.3-1.1.el7 centos-ovirt36 295 k ovirt-hosted-engine-setup noarch 1.3.5.0-1.1.el7 centos-ovirt36 270 k ovirt-release36 noarch 007-1 ovirt-3.6 9.5 k Are these packages not related to the “Upgrade” process available thru the web UI? FYI, here’s what did get updated thru the web UI “Upgrade” process — Apr 27 21:36:28 Updated: libvirt-client-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:28 Updated: libvirt-daemon-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:28 Updated: libvirt-daemon-driver-network-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:28 Updated: libvirt-daemon-driver-qemu-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:28 Updated: libvirt-daemon-driver-nwfilter-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:28 Updated: vdsm-infra-4.17.26-1.el7.noarch Apr 27 21:36:28 Updated: vdsm-python-4.17.26-1.el7.noarch Apr 27 21:36:28 Updated: vdsm-xmlrpc-4.17.26-1.el7.noarch Apr 27 21:36:28 Updated: libvirt-daemon-config-nwfilter-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:29 Updated: mom-0.5.3-1.1.el7.noarch Apr 27 21:36:29 Updated: libvirt-lock-sanlock-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:29 Updated: libvirt-daemon-driver-secret-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:29 Updated: libvirt-daemon-driver-nodedev-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:29 Updated: libvirt-daemon-driver-interface-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:29 Updated: libvirt-daemon-driver-storage-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:29 Updated: libvirt-daemon-kvm-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:29 Updated: 1:libguestfs-1.28.1-1.55.el7.centos.2.x86_64 Apr 27 21:36:29 Updated: 1:libguestfs-tools-c-1.28.1-1.55.el7.centos.2.x86_64 Apr 27 21:36:29 Installed: libguestfs-winsupport-7.2-1.el7.x86_64 Apr 27 21:36:29 Updated: vdsm-yajsonrpc-4.17.26-1.el7.noarch Apr 27 21:36:29 Updated: vdsm-jsonrpc-4.17.26-1.el7.noarch Apr 27 21:36:29 Installed: unzip-6.0-15.el7.x86_64 Apr 27 21:36:30 Installed: gtk2-2.24.28-8.el7.x86_64 Apr 27 21:36:31 Installed: 1:virt-v2v-1.28.1-1.55.el7.centos.2.x86_64 Apr 27 21:36:31 Updated: safelease-1.0-7.el7.x86_64 Apr 27 21:36:31 Updated: vdsm-hook-vmfex-dev-4.17.26-1.el7.noarch Apr 27 21:36:32 Updated: vdsm-4.17.26-1.el7.noarch Apr 27 21:36:32 Updated: vdsm-gluster-4.17.26-1.el7.noarch Apr 27 21:36:32 Updated: vdsm-cli-4.17.26-1.el7.noarch Thanks, Will

On 29 Apr 2016, at 14:46, Will Dennis <wdennis@nec-labs.com> wrote:
Bump - can any RHAT folks comment on this?
note oVirt is a community project;-)
-----Original Message----- From: Will Dennis Sent: Wednesday, April 27, 2016 11:00 PM To: users@ovirt.org Subject: Hosts temporarily in "Non Operational" state after upgrade
Hi all,
Had run updates tonight on my three oVirt hosts (3.6 hyperconverged) on on two of them, they went into “non Operational” state for a few minutes each before springing back to life… The synopsis was this:
- Ran updates throughout the web Admin UI ...then I got the following series of messages via the “Events” tab in the UI:
what exactly did you do in the UI?
- Updates successfully ran - VDSM “command failed: Heartbeat exceeded” message - host is not responding message - "Failed to connect to hosted_storage" message - “The error message for connection localhost:/engine returned by VDSM was: Problem while trying to mount target” - "Host <name> reports about one of the Active Storage Domains as Problematic” - “Host <name> cannot access the Storage Domain(s) hosted_storage attached to the data center Default. Setting host state to Non-Operational.” - "Detected change in status of brick {…} of volume {…} from DOWN to UP.” (once for every brick on the host for every Gluster volume.) - "Host <name> was autorecovered.” - "Status of host <name> was set to Up.”
so..it was not in Maintenance when you run the update? You should avoid doing that as an update to any package may interfere with running guests. E.g. a qemu rpm update can (and likely will) simply kill all your VMs, I suppose similarly for Gluster before updating anything the volumes should be in some kind of maintenance mode as well
(BTW, it would be awesome if the UI’s Events log could be copied and pasted… Doesn’t work for me at least…)
Duration of outage was ~3 mins per each affected host. Didn’t happen on the first host I upgraded, but did on the last two.
I know I’m a little over the bleeding edge running hyperconverged on 3.6 :) but, should this behavior be expected?
Also, if I go onto the hosts directly and run a ‘yum update’ after this upgrade process (not that I went thru with it, just wanted to see what was available to be upgraded) I see a bunch of ovirt-* packages that can be upgraded, which didn’t get updated thru the web UI’s upgrade process — ovirt-engine-sdk-python noarch 3.6.5.0-1.el7.centos ovirt-3.6 480 k ovirt-hosted-engine-ha noarch 1.3.5.3-1.1.el7 centos-ovirt36 295 k ovirt-hosted-engine-setup noarch 1.3.5.0-1.1.el7 centos-ovirt36 270 k ovirt-release36 noarch 007-1 ovirt-3.6 9.5 k
Are these packages not related to the “Upgrade” process available thru the web UI?
FYI, here’s what did get updated thru the web UI “Upgrade” process — Apr 27 21:36:28 Updated: libvirt-client-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:28 Updated: libvirt-daemon-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:28 Updated: libvirt-daemon-driver-network-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:28 Updated: libvirt-daemon-driver-qemu-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:28 Updated: libvirt-daemon-driver-nwfilter-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:28 Updated: vdsm-infra-4.17.26-1.el7.noarch Apr 27 21:36:28 Updated: vdsm-python-4.17.26-1.el7.noarch Apr 27 21:36:28 Updated: vdsm-xmlrpc-4.17.26-1.el7.noarch Apr 27 21:36:28 Updated: libvirt-daemon-config-nwfilter-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:29 Updated: mom-0.5.3-1.1.el7.noarch Apr 27 21:36:29 Updated: libvirt-lock-sanlock-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:29 Updated: libvirt-daemon-driver-secret-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:29 Updated: libvirt-daemon-driver-nodedev-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:29 Updated: libvirt-daemon-driver-interface-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:29 Updated: libvirt-daemon-driver-storage-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:29 Updated: libvirt-daemon-kvm-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:29 Updated: 1:libguestfs-1.28.1-1.55.el7.centos.2.x86_64 Apr 27 21:36:29 Updated: 1:libguestfs-tools-c-1.28.1-1.55.el7.centos.2.x86_64 Apr 27 21:36:29 Installed: libguestfs-winsupport-7.2-1.el7.x86_64 Apr 27 21:36:29 Updated: vdsm-yajsonrpc-4.17.26-1.el7.noarch Apr 27 21:36:29 Updated: vdsm-jsonrpc-4.17.26-1.el7.noarch Apr 27 21:36:29 Installed: unzip-6.0-15.el7.x86_64 Apr 27 21:36:30 Installed: gtk2-2.24.28-8.el7.x86_64 Apr 27 21:36:31 Installed: 1:virt-v2v-1.28.1-1.55.el7.centos.2.x86_64 Apr 27 21:36:31 Updated: safelease-1.0-7.el7.x86_64 Apr 27 21:36:31 Updated: vdsm-hook-vmfex-dev-4.17.26-1.el7.noarch Apr 27 21:36:32 Updated: vdsm-4.17.26-1.el7.noarch Apr 27 21:36:32 Updated: vdsm-gluster-4.17.26-1.el7.noarch Apr 27 21:36:32 Updated: vdsm-cli-4.17.26-1.el7.noarch
Perhaps libvirtd restarted because of those updates, which causes a vdsm restart as well, dropping the host connection temporarily Thanks, michal
Thanks, Will _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

(so noted) ...or anyone else who knows the answer ;) -----Original Message----- From: Michal Skrivanek [mailto:michal.skrivanek@redhat.com] Sent: Friday, April 29, 2016 9:02 AM To: Will Dennis Cc: users@ovirt.org Subject: Re: [ovirt-users] Hosts temporarily in "Non Operational" state after upgrade
On 29 Apr 2016, at 14:46, Will Dennis <wdennis@nec-labs.com> wrote:
Bump - can any RHAT folks comment on this?
note oVirt is a community project;-)
-----Original Message----- From: Will Dennis Sent: Wednesday, April 27, 2016 11:00 PM To: users@ovirt.org Subject: Hosts temporarily in "Non Operational" state after upgrade
Hi all,
Had run updates tonight on my three oVirt hosts (3.6 hyperconverged) on on two of them, they went into “non Operational” state for a few minutes each before springing back to life… The synopsis was this:
- Ran updates throughout the web Admin UI ...then I got the following series of messages via the “Events” tab in the UI:
what exactly did you do in the UI?
- Updates successfully ran - VDSM “command failed: Heartbeat exceeded” message - host is not responding message - "Failed to connect to hosted_storage" message - “The error message for connection localhost:/engine returned by VDSM was: Problem while trying to mount target” - "Host <name> reports about one of the Active Storage Domains as Problematic” - “Host <name> cannot access the Storage Domain(s) hosted_storage attached to the data center Default. Setting host state to Non-Operational.” - "Detected change in status of brick {…} of volume {…} from DOWN to UP.” (once for every brick on the host for every Gluster volume.) - "Host <name> was autorecovered.” - "Status of host <name> was set to Up.”
so..it was not in Maintenance when you run the update? You should avoid doing that as an update to any package may interfere with running guests. E.g. a qemu rpm update can (and likely will) simply kill all your VMs, I suppose similarly for Gluster before updating anything the volumes should be in some kind of maintenance mode as well
(BTW, it would be awesome if the UI’s Events log could be copied and pasted… Doesn’t work for me at least…)
Duration of outage was ~3 mins per each affected host. Didn’t happen on the first host I upgraded, but did on the last two.
I know I’m a little over the bleeding edge running hyperconverged on 3.6 :) but, should this behavior be expected?
Also, if I go onto the hosts directly and run a ‘yum update’ after this upgrade process (not that I went thru with it, just wanted to see what was available to be upgraded) I see a bunch of ovirt-* packages that can be upgraded, which didn’t get updated thru the web UI’s upgrade process — ovirt-engine-sdk-python noarch 3.6.5.0-1.el7.centos ovirt-3.6 480 k ovirt-hosted-engine-ha noarch 1.3.5.3-1.1.el7 centos-ovirt36 295 k ovirt-hosted-engine-setup noarch 1.3.5.0-1.1.el7 centos-ovirt36 270 k ovirt-release36 noarch 007-1 ovirt-3.6 9.5 k
Are these packages not related to the “Upgrade” process available thru the web UI?
FYI, here’s what did get updated thru the web UI “Upgrade” process — Apr 27 21:36:28 Updated: libvirt-client-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:28 Updated: libvirt-daemon-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:28 Updated: libvirt-daemon-driver-network-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:28 Updated: libvirt-daemon-driver-qemu-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:28 Updated: libvirt-daemon-driver-nwfilter-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:28 Updated: vdsm-infra-4.17.26-1.el7.noarch Apr 27 21:36:28 Updated: vdsm-python-4.17.26-1.el7.noarch Apr 27 21:36:28 Updated: vdsm-xmlrpc-4.17.26-1.el7.noarch Apr 27 21:36:28 Updated: libvirt-daemon-config-nwfilter-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:29 Updated: mom-0.5.3-1.1.el7.noarch Apr 27 21:36:29 Updated: libvirt-lock-sanlock-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:29 Updated: libvirt-daemon-driver-secret-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:29 Updated: libvirt-daemon-driver-nodedev-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:29 Updated: libvirt-daemon-driver-interface-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:29 Updated: libvirt-daemon-driver-storage-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:29 Updated: libvirt-daemon-kvm-1.2.17-13.el7_2.4.x86_64 Apr 27 21:36:29 Updated: 1:libguestfs-1.28.1-1.55.el7.centos.2.x86_64 Apr 27 21:36:29 Updated: 1:libguestfs-tools-c-1.28.1-1.55.el7.centos.2.x86_64 Apr 27 21:36:29 Installed: libguestfs-winsupport-7.2-1.el7.x86_64 Apr 27 21:36:29 Updated: vdsm-yajsonrpc-4.17.26-1.el7.noarch Apr 27 21:36:29 Updated: vdsm-jsonrpc-4.17.26-1.el7.noarch Apr 27 21:36:29 Installed: unzip-6.0-15.el7.x86_64 Apr 27 21:36:30 Installed: gtk2-2.24.28-8.el7.x86_64 Apr 27 21:36:31 Installed: 1:virt-v2v-1.28.1-1.55.el7.centos.2.x86_64 Apr 27 21:36:31 Updated: safelease-1.0-7.el7.x86_64 Apr 27 21:36:31 Updated: vdsm-hook-vmfex-dev-4.17.26-1.el7.noarch Apr 27 21:36:32 Updated: vdsm-4.17.26-1.el7.noarch Apr 27 21:36:32 Updated: vdsm-gluster-4.17.26-1.el7.noarch Apr 27 21:36:32 Updated: vdsm-cli-4.17.26-1.el7.noarch
Perhaps libvirtd restarted because of those updates, which causes a vdsm restart as well, dropping the host connection temporarily Thanks, michal
Thanks, Will _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Answers inline below...
From: Michal Skrivanek [mailto:michal.skrivanek@redhat.com]
what exactly did you do in the UI? Clicked on the node, and in the bottom pane, clicked on the "Upgrade" link showing there (the nodes also had an icon indicating that updates were available)
so..it was not in Maintenance when you run the update? You should avoid doing that as an update to any package may interfere with running guests. E.g. a qemu rpm update can (and likely will) simply kill all your VMs, I suppose similarly for Gluster before updating anything the volumes should be in some kind of maintenance mode as well
No, the "Upgrade" link once clicked migrates any running VM off the target node onto another node, then sets the target node into Maintenance mode, and then performs the updates. Once the updates are completed successfully, it re-activates the node and makes it available again. On the second and third nodes this coming out of Maintenance process experienced a problem with mounting the Gluster storage so it seems, and had the problems I'd indicated.

On 29 Apr 2016, at 18:49, Will Dennis <wdennis@nec-labs.com> wrote:
Answers inline below...
From: Michal Skrivanek [mailto:michal.skrivanek@redhat.com]
what exactly did you do in the UI? Clicked on the node, and in the bottom pane, clicked on the "Upgrade" link showing there (the nodes also had an icon indicating that updates were available)
so..it was not in Maintenance when you run the update? You should avoid doing that as an update to any package may interfere with running guests. E.g. a qemu rpm update can (and likely will) simply kill all your VMs, I suppose similarly for Gluster before updating anything the volumes should be in some kind of maintenance mode as well
No, the "Upgrade" link once clicked migrates any running VM off the target node onto another node, then sets the target node into Maintenance mode, and then performs the updates.
ok, thanks for clarification, you got me scared:)
Once the updates are completed successfully, it re-activates the node and makes it available again. On the second and third nodes this coming out of Maintenance process experienced a problem with mounting the Gluster storage so it seems, and had the problems I'd indicated.
it might be a question for gluster guys, it might be that the maintenance process is a bit different there Sahina, can you check/comment that? Thanks, michal
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

On 05/04/2016 01:21 PM, Michal Skrivanek wrote:
On 29 Apr 2016, at 18:49, Will Dennis <wdennis@nec-labs.com> wrote:
Answers inline below...
From: Michal Skrivanek [mailto:michal.skrivanek@redhat.com] what exactly did you do in the UI? Clicked on the node, and in the bottom pane, clicked on the "Upgrade" link showing there (the nodes also had an icon indicating that updates were available)
so..it was not in Maintenance when you run the update? You should avoid doing that as an update to any package may interfere with running guests. E.g. a qemu rpm update can (and likely will) simply kill all your VMs, I suppose similarly for Gluster before updating anything the volumes should be in some kind of maintenance mode as well No, the "Upgrade" link once clicked migrates any running VM off the target node onto another node, then sets the target node into Maintenance mode, and then performs the updates. ok, thanks for clarification, you got me scared:)
Once the updates are completed successfully, it re-activates the node and makes it available again. On the second and third nodes this coming out of Maintenance process experienced a problem with mounting the Gluster storage so it seems, and had the problems I'd indicated. it might be a question for gluster guys, it might be that the maintenance process is a bit different there Sahina, can you check/comment that?
Activation of hosts - after maintenance, w.r.t gluster - checks that glusterd is connected and returns Peer status as connected. The error seems to be in activating the gluster storage domain - hosted_storage. Did you see anything suspicious w.r.t this in vdsm logs? Regarding the Host being unresponsive with "Heartbeat exceeded" - we have a bug logged on this which is being investigated - https://bugzilla.redhat.com/show_bug.cgi?id=1331006. In this bug too, the host regains connectivity after ~ 2 mins time.
Thanks, michal
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
participants (3)
-
Michal Skrivanek
-
Sahina Bose
-
Will Dennis