ovirt upgrade 4.1 -> 4.2: host bricks down

Hi all, I have a ovirt 2 node cluster for testing with self hosted engine on top gluster. The cluster was running on 4.1. After the upgrade to 4.2, which generally went smoothly, I am seeing that the bricks of one of the hosts (v1) are detected as down, while the gluster is ok when checked with command lines and all volumes mounted. Below is the error that the engine logs: 2018-06-17 00:21:26,309+03 ERROR [org.ovirt.engine.core.bll.gluster.GlusterSyncJob] (DefaultQuartzScheduler2) [98d7e79] Error while refreshing brick statuses for volume 'vms' of cluster 'test': null 2018-06-17 00:21:26,318+03 ERROR [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalLogicalVolumeListVDSCommand] (DefaultQuartzScheduler2) [98d7e79] Command 'GetGlusterLocalLogicalVolumeListVDSCommand(HostName = v0.test-group.com, VdsIdVDSCommandParametersBase:{hostId='d5a96118-ca49-411f-86cb-280c7f9c421f'})' execution failed: null 2018-06-17 00:21:26,323+03 ERROR [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalLogicalVolumeListVDSCommand] (DefaultQuartzScheduler2) [98d7e79] Command 'GetGlusterLocalLogicalVolumeListVDSCommand(HostName = v1.test-group.com, VdsIdVDSCommandParametersBase:{hostId='12dfea4a-8142-484e-b912-0cbd5f281aba'})' execution failed: null 2018-06-17 00:21:27,015+03 INFO [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (DefaultQuartzScheduler9) [426e7c3d] Failed to acquire lock and wait lock 'EngineLock:{exclusiveLocks='[00000002-0002-0002-0002-00000000017a=GLUSTER]', sharedLocks=''}' 2018-06-17 00:21:27,926+03 ERROR [org.ovirt.engine.core.bll.gluster.GlusterSyncJob] (DefaultQuartzScheduler2) [98d7e79] Error while refreshing brick statuses for volume 'engine' of cluster 'test': null Apart from this everything else is operating normally and VMs are running on both hosts. Any idea to isolate this issue? Thanx, Alex

On Sat, Jun 16, 2018 at 5:34 PM, Alex K <rightkicktech@gmail.com> wrote:
Hi all,
I have a ovirt 2 node cluster for testing with self hosted engine on top gluster.
The cluster was running on 4.1. After the upgrade to 4.2, which generally went smoothly, I am seeing that the bricks of one of the hosts (v1) are detected as down, while the gluster is ok when checked with command lines and all volumes mounted.
Below is the error that the engine logs:
2018-06-17 00:21:26,309+03 ERROR [org.ovirt.engine.core.bll.gluster.GlusterSyncJob] (DefaultQuartzScheduler2) [98d7e79] Error while refreshing brick statuses for volume 'vms' of cluster 'test': null 2018-06-17 00:21:26,318+03 ERROR [org.ovirt.engine.core.vdsbroker.gluster. GetGlusterLocalLogicalVolumeListVDSCommand] (DefaultQuartzScheduler2) [98d7e79] Command ' GetGlusterLocalLogicalVolumeListVDSCommand(HostName = v0.test-group.com, VdsIdVDSCommandParametersBase:{hostId='d5a96118-ca49-411f-86cb-280c7f9c421f'})' execution failed: null 2018-06-17 00:21:26,323+03 ERROR [org.ovirt.engine.core.vdsbroker.gluster. GetGlusterLocalLogicalVolumeListVDSCommand] (DefaultQuartzScheduler2) [98d7e79] Command ' GetGlusterLocalLogicalVolumeListVDSCommand(HostName = v1.test-group.com, VdsIdVDSCommandParametersBase:{hostId='12dfea4a-8142-484e-b912-0cbd5f281aba'})' execution failed: null 2018-06-17 00:21:27,015+03 INFO [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (DefaultQuartzScheduler9) [426e7c3d] Failed to acquire lock and wait lock 'EngineLock:{exclusiveLocks='[00000002-0002-0002-0002-00000000017a=GLUSTER]', sharedLocks=''}' 2018-06-17 00:21:27,926+03 ERROR [org.ovirt.engine.core.bll.gluster.GlusterSyncJob] (DefaultQuartzScheduler2) [98d7e79] Error while refreshing brick statuses for volume 'engine' of cluster 'test': null
Apart from this everything else is operating normally and VMs are running on both hosts.
Which version of 4.2? This issue is fixed with 4.2.4
Any idea to isolate this issue?
Thanx, Alex
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community- guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/ message/J3MQD5KRVIRHFCD3I54P5PHCQCCZ3ETG/

When I right click an empty host, and select upgrade and confirm that I want to upgrade. It simply comes back with install failed after a minute or two. I have no idea why its failing, there also does not appear to be anything in /var/log/yum.log so I am not sure where else to look to figure out why it cannot upgrade. Also to be clear, the wording in Ovirt uses the term upgrade, however I am under the impression it simply means update, not actually upgrade to 4.2. Thank you all.

I just did not know where to look for the errors, I now see that it is telling me it is failing on this package "collectd" So when I go to my host and I run *yum list collectd *I see that collectd is available to install via EPEL repos. _/Note: I did not setup this cluster not sure if epel is normal./ So looks like my problem here has to do with the epel package being available and being newer? Thank you all. On 06/19/2018 09:40 AM, Jacob Green wrote:
When I right click an empty host, and select upgrade and confirm that I want to upgrade. It simply comes back with install failed after a minute or two. I have no idea why its failing, there also does not appear to be anything in /var/log/yum.log so I am not sure where else to look to figure out why it cannot upgrade. Also to be clear, the wording in Ovirt uses the term upgrade, however I am under the impression it simply means update, not actually upgrade to 4.2.
Thank you all.
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZFVQXUEYBCHORC...
-- Jacob Green Systems Admin American Alloy Steel 713-300-5690

On 19-6-2018 17:26, Jacob Green wrote:
I just did not know where to look for the errors, I now see that it is telling me it is failing on this package "collectd"
So when I go to my host and I run *yum list collectd *I see that collectd is available to install via EPEL repos. _/Note: I did not setup this cluster not sure if epel is normal./
So looks like my problem here has to do with the epel package being available and being newer?
There is a warning on the ovirt site about enabling epel :-) Disable the epel repo and just use yum install whatever --enablerepo=epel just in case you need it. Regards, Joop

On 19/06/18 19:25, Joop wrote:
On 19-6-2018 17:26, Jacob Green wrote:
I just did not know where to look for the errors, I now see that it is telling me it is failing on this package "collectd"
So when I go to my host and I run *yum list collectd *I see that collectd is available to install via EPEL repos. _/Note: I did not setup this cluster not sure if epel is normal./
So looks like my problem here has to do with the epel package being available and being newer?
There is a warning on the ovirt site about enabling epel :-)
Disable the epel repo and just use yum install whatever --enablerepo=epel just in case you need it.
In my opinion this is bad advice, as keeping the repo disabled (but still obtaining packages from it occasionally) will mean you never automatically receive updates to packages you've installed from it. Instead, I recommend that you edit /etc/yum.repos.d/epel.repo and add the line "exclude=collectd*" under the "[epel]" heading. I've only ever seen issues with the collectd packages from EPEL and no others. If you want to be a bit stricter you can instead only "include=<packages>" the packages that you are specifically interested in. In my case that's too many packages to be practical. Cheers, Chris -- Chris Boot bootc@boo.tc

[re-send from my lists address] On 19/06/18 19:25, Joop wrote:
On 19-6-2018 17:26, Jacob Green wrote:
I just did not know where to look for the errors, I now see that it is telling me it is failing on this package "collectd"
So when I go to my host and I run *yum list collectd *I see that collectd is available to install via EPEL repos. _/Note: I did not setup this cluster not sure if epel is normal./
So looks like my problem here has to do with the epel package being available and being newer?
There is a warning on the ovirt site about enabling epel :-)
Disable the epel repo and just use yum install whatever --enablerepo=epel just in case you need it.
In my opinion this is bad advice, as keeping the repo disabled (but still obtaining packages from it occasionally) will mean you never automatically receive updates to packages you've installed from it. Instead, I recommend that you edit /etc/yum.repos.d/epel.repo and add the line "exclude=collectd*" under the "[epel]" heading. I've only ever seen issues with the collectd packages from EPEL and no others. If you want to be a bit stricter you can instead only "include=<packages>" the packages that you are specifically interested in. In my case that's too many packages to be practical. Cheers, Chris -- Chris Boot bootc@boo.tc

Disable EPEL repo and repeat update On 19 Jun 2018, at 22:26, Jacob Green <jgreen@aasteel.com<mailto:jgreen@aasteel.com>> wrote: I just did not know where to look for the errors, I now see that it is telling me it is failing on this package "collectd" <fcpilppooohgiinm.png> So when I go to my host and I run yum list collectd I see that collectd is available to install via EPEL repos. Note: I did not setup this cluster not sure if epel is normal. <ibhdklbbbplbjegm.png> So looks like my problem here has to do with the epel package being available and being newer? Thank you all. On 06/19/2018 09:40 AM, Jacob Green wrote: When I right click an empty host, and select upgrade and confirm that I want to upgrade. It simply comes back with install failed after a minute or two. I have no idea why its failing, there also does not appear to be anything in /var/log/yum.log so I am not sure where else to look to figure out why it cannot upgrade. Also to be clear, the wording in Ovirt uses the term upgrade, however I am under the impression it simply means update, not actually upgrade to 4.2. Thank you all. _______________________________________________ Users mailing list -- users@ovirt.org<mailto:users@ovirt.org> To unsubscribe send an email to users-leave@ovirt.org<mailto:users-leave@ovirt.org> Privacy Statement: https://www.ovirt.org/site/privacy-policy/<https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ovirt.org%2Fsite%2Fprivacy-policy%2F&data=02%7C01%7C%7C7e6324f27cec4c61e53008d5d5f96b0b%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636650189570671175&sdata=f21XohOzUZFSHeh1DXADEONFkZy9hVk8OVB0m9lCaKE%3D&reserved=0> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/<https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ovirt.org%2Fcommunity%2Fabout%2Fcommunity-guidelines%2F&data=02%7C01%7C%7C7e6324f27cec4c61e53008d5d5f96b0b%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636650189570671175&sdata=3iX5GXTxhszloLbTm1MWkOBx%2BQ6diX1oeZkDFoApG%2Bg%3D&reserved=0> List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZFVQXUEYBCHORCRKYRBU4JTLQZPFACK3/<https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.ovirt.org%2Farchives%2Flist%2Fusers%40ovirt.org%2Fmessage%2FZFVQXUEYBCHORCRKYRBU4JTLQZPFACK3%2F&data=02%7C01%7C%7C7e6324f27cec4c61e53008d5d5f96b0b%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636650189570671175&sdata=%2BFGZKqt9M0ClEUSeGGqsPjibg383XrTdVq0Jnzph3wA%3D&reserved=0> -- Jacob Green Systems Admin American Alloy Steel 713-300-5690 _______________________________________________ Users mailing list -- users@ovirt.org<mailto:users@ovirt.org> To unsubscribe send an email to users-leave@ovirt.org<mailto:users-leave@ovirt.org> Privacy Statement: https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ovirt.org%2Fsite%2Fprivacy-policy%2F&data=02%7C01%7C%7C7e6324f27cec4c61e53008d5d5f96b0b%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636650189570671175&sdata=f21XohOzUZFSHeh1DXADEONFkZy9hVk8OVB0m9lCaKE%3D&reserved=0 oVirt Code of Conduct: https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ovirt.org%2Fcommunity%2Fabout%2Fcommunity-guidelines%2F&data=02%7C01%7C%7C7e6324f27cec4c61e53008d5d5f96b0b%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636650189570671175&sdata=3iX5GXTxhszloLbTm1MWkOBx%2BQ6diX1oeZkDFoApG%2Bg%3D&reserved=0 List Archives: https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.ovirt.org%2Farchives%2Flist%2Fusers%40ovirt.org%2Fmessage%2F3D3TFBVEV4I7KKPP4JYCVMWZCOGFIY4K%2F&data=02%7C01%7C%7C7e6324f27cec4c61e53008d5d5f96b0b%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636650189570671175&sdata=%2BfR8yruTYRkY3Ji1asW1V8yiD8D0s9w%2FvL7j537lHP8%3D&reserved=0

I am running 4.2.3.8-1.el7. I will upgrade and check. Thanx, Alex On Tue, Jun 19, 2018 at 11:59 AM, Sahina Bose <sabose@redhat.com> wrote:
On Sat, Jun 16, 2018 at 5:34 PM, Alex K <rightkicktech@gmail.com> wrote:
Hi all,
I have a ovirt 2 node cluster for testing with self hosted engine on top gluster.
The cluster was running on 4.1. After the upgrade to 4.2, which generally went smoothly, I am seeing that the bricks of one of the hosts (v1) are detected as down, while the gluster is ok when checked with command lines and all volumes mounted.
Below is the error that the engine logs:
2018-06-17 00:21:26,309+03 ERROR [org.ovirt.engine.core.bll.gluster.GlusterSyncJob] (DefaultQuartzScheduler2) [98d7e79] Error while refreshing brick statuses for volume 'vms' of cluster 'test': null 2018-06-17 00:21:26,318+03 ERROR [org.ovirt.engine.core.vdsbrok er.gluster.GetGlusterLocalLogicalVolumeListVDSCommand] (DefaultQuartzScheduler2) [98d7e79] Command 'GetGlusterLocalLogicalVolumeListVDSCommand(HostName = v0.test-group.com, VdsIdVDSCommandParametersBase:{hostId='d5a96118-ca49-411f-86cb-280c7f9c421f'})' execution failed: null 2018-06-17 00:21:26,323+03 ERROR [org.ovirt.engine.core.vdsbrok er.gluster.GetGlusterLocalLogicalVolumeListVDSCommand] (DefaultQuartzScheduler2) [98d7e79] Command 'GetGlusterLocalLogicalVolumeListVDSCommand(HostName = v1.test-group.com, VdsIdVDSCommandParametersBase:{hostId='12dfea4a-8142-484e-b912-0cbd5f281aba'})' execution failed: null 2018-06-17 00:21:27,015+03 INFO [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (DefaultQuartzScheduler9) [426e7c3d] Failed to acquire lock and wait lock 'EngineLock:{exclusiveLocks='[00000002-0002-0002-0002-00000000017a=GLUSTER]', sharedLocks=''}' 2018-06-17 00:21:27,926+03 ERROR [org.ovirt.engine.core.bll.gluster.GlusterSyncJob] (DefaultQuartzScheduler2) [98d7e79] Error while refreshing brick statuses for volume 'engine' of cluster 'test': null
Apart from this everything else is operating normally and VMs are running on both hosts.
Which version of 4.2? This issue is fixed with 4.2.4
Any idea to isolate this issue?
Thanx, Alex
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/communit y/about/community-guidelines/ List Archives: https://lists.ovirt.org/archiv es/list/users@ovirt.org/message/J3MQD5KRVIRHFCD3I54P5PHCQCCZ3ETG/

Updated hosts and engine. Ran engine-setup but still version is showing 4.2.3.8-1.el7. The issue has been resolved though. Bricks are shown up. Thanx, Alex On Wed, Jun 20, 2018 at 10:48 AM, Alex K <rightkicktech@gmail.com> wrote:
I am running 4.2.3.8-1.el7. I will upgrade and check.
Thanx, Alex
On Tue, Jun 19, 2018 at 11:59 AM, Sahina Bose <sabose@redhat.com> wrote:
On Sat, Jun 16, 2018 at 5:34 PM, Alex K <rightkicktech@gmail.com> wrote:
Hi all,
I have a ovirt 2 node cluster for testing with self hosted engine on top gluster.
The cluster was running on 4.1. After the upgrade to 4.2, which generally went smoothly, I am seeing that the bricks of one of the hosts (v1) are detected as down, while the gluster is ok when checked with command lines and all volumes mounted.
Below is the error that the engine logs:
2018-06-17 00:21:26,309+03 ERROR [org.ovirt.engine.core.bll.gluster.GlusterSyncJob] (DefaultQuartzScheduler2) [98d7e79] Error while refreshing brick statuses for volume 'vms' of cluster 'test': null 2018-06-17 00:21:26,318+03 ERROR [org.ovirt.engine.core.vdsbrok er.gluster.GetGlusterLocalLogicalVolumeListVDSCommand] (DefaultQuartzScheduler2) [98d7e79] Command 'GetGlusterLocalLogicalVolumeListVDSCommand(HostName = v0.test-group.com , VdsIdVDSCommandParametersBase:{hostId='d5a96118-ca49-411f-86cb-280c7f9c421f'})' execution failed: null 2018-06-17 00:21:26,323+03 ERROR [org.ovirt.engine.core.vdsbrok er.gluster.GetGlusterLocalLogicalVolumeListVDSCommand] (DefaultQuartzScheduler2) [98d7e79] Command 'GetGlusterLocalLogicalVolumeListVDSCommand(HostName = v1.test-group.com , VdsIdVDSCommandParametersBase:{hostId='12dfea4a-8142-484e-b912-0cbd5f281aba'})' execution failed: null 2018-06-17 00:21:27,015+03 INFO [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (DefaultQuartzScheduler9) [426e7c3d] Failed to acquire lock and wait lock 'EngineLock:{exclusiveLocks='[00000002-0002-0002-0002-00000000017a=GLUSTER]', sharedLocks=''}' 2018-06-17 00:21:27,926+03 ERROR [org.ovirt.engine.core.bll.gluster.GlusterSyncJob] (DefaultQuartzScheduler2) [98d7e79] Error while refreshing brick statuses for volume 'engine' of cluster 'test': null
Apart from this everything else is operating normally and VMs are running on both hosts.
Which version of 4.2? This issue is fixed with 4.2.4
Any idea to isolate this issue?
Thanx, Alex
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/communit y/about/community-guidelines/ List Archives: https://lists.ovirt.org/archiv es/list/users@ovirt.org/message/J3MQD5KRVIRHFCD3I54P5PHCQCCZ3ETG/
participants (7)
-
Alex K
-
Chris Boot
-
Chris Boot
-
Jacob Green
-
Joop
-
Sahina Bose
-
Spickiy Nikita