All hosts non-operational after upgrading from 4.2 to 4.3

I am in a severe pinch here. A while back I upgraded from 4.2.8 to 4.3.3 and only had one step remaining and that was to set the cluster compat level to 4.3 (from 4.2). When I tried this it gave the usual warning that each VM would have to be rebooted to complete, but then I got my first unusual piece when it then told me next that this could not be completed until each host was in maintenance mode. Quirky I thought, but I stopped all VMs and put both hosts into maintenance mode. I then set the cluster to 4.3. Things didn't want to become active again and I eventually noticed that I was being told the DC needed to be 4.3 as well. Don't remember that from before, but oh well that was easy. However, the DC and SD remains down. The hosts are non-op. I've powered everything off and started fresh but still wind up in the same state. Hosts will look like their active for a bit (green triangle) but then go non-op after about a minute. It appears that my iSCSI sessions are active/logged in. The one glaring thing I see in the logs is this in vdsm.log: 2019-04-05 12:03:30,225-0400 ERROR (monitor/07bb1bf) [storage.Monitor] Setting up monitor for 07bb1bf8-3b3e-4dc0-bc43-375b09e06683 failed (monitor:329) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 326, in _setupLoop self._setupMonitor() File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 348, in _setupMonitor self._produceDomain() File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 158, in wrapper value = meth(self, *a, **kw) File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 366, in _produceDomain self.domain = sdCache.produce(self.sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 110, in produce domain.getRealDomain() File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 51, in getRealDomain return self._cache._realProduce(self._sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 134, in _realProduce domain = self._findDomain(sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 151, in _findDomain return findMethod(sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 176, in _findUnfetchedDomain raise se.StorageDomainDoesNotExist(sdUUID) StorageDomainDoesNotExist: Storage domain does not exist: (u'07bb1bf8-3b3e-4dc0-bc43-375b09e06683',) How do I proceed to get back operational?

Also, I see in the notification drawer a message that says: Storage domains with IDs [ed4d83f8-41a2-41bd-a0cd-6525d9649edb] could not be synchronized. To synchronize them, please move them to maintenance and then activate. However, when I navigate to Compute > Data Centers > Default, the Maintenance option is greyed out. Activate in the button bar is also greyed out, but it looks like an option in the r-click context menu although selecting that shows "Error while executing action: Cannot activate Storage. There is no active Host in the Data Center.". I'm just stuck in an endless circle here. On Fri, Apr 5, 2019 at 12:04 PM John Florian <jflorian@doubledog.org> wrote:
I am in a severe pinch here. A while back I upgraded from 4.2.8 to 4.3.3 and only had one step remaining and that was to set the cluster compat level to 4.3 (from 4.2). When I tried this it gave the usual warning that each VM would have to be rebooted to complete, but then I got my first unusual piece when it then told me next that this could not be completed until each host was in maintenance mode. Quirky I thought, but I stopped all VMs and put both hosts into maintenance mode. I then set the cluster to 4.3. Things didn't want to become active again and I eventually noticed that I was being told the DC needed to be 4.3 as well. Don't remember that from before, but oh well that was easy.
However, the DC and SD remains down. The hosts are non-op. I've powered everything off and started fresh but still wind up in the same state. Hosts will look like their active for a bit (green triangle) but then go non-op after about a minute. It appears that my iSCSI sessions are active/logged in. The one glaring thing I see in the logs is this in vdsm.log:
2019-04-05 12:03:30,225-0400 ERROR (monitor/07bb1bf) [storage.Monitor] Setting up monitor for 07bb1bf8-3b3e-4dc0-bc43-375b09e06683 failed (monitor:329) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 326, in _setupLoop self._setupMonitor() File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 348, in _setupMonitor self._produceDomain() File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 158, in wrapper value = meth(self, *a, **kw) File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 366, in _produceDomain self.domain = sdCache.produce(self.sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 110, in produce domain.getRealDomain() File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 51, in getRealDomain return self._cache._realProduce(self._sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 134, in _realProduce domain = self._findDomain(sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 151, in _findDomain return findMethod(sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 176, in _findUnfetchedDomain raise se.StorageDomainDoesNotExist(sdUUID) StorageDomainDoesNotExist: Storage domain does not exist: (u'07bb1bf8-3b3e-4dc0-bc43-375b09e06683',)
How do I proceed to get back operational?

What kind of storage are you using? local? On 2019-04-05 12:26, John Florian wrote:
Also, I see in the notification drawer a message that says:
Storage domains with IDs [ed4d83f8-41a2-41bd-a0cd-6525d9649edb] could not be synchronized. To synchronize them, please move them to maintenance and then activate.
However, when I navigate to Compute > Data Centers > Default, the Maintenance option is greyed out. Activate in the button bar is also greyed out, but it looks like an option in the r-click context menu although selecting that shows "Error while executing action: Cannot activate Storage. There is no active Host in the Data Center.".
I'm just stuck in an endless circle here.
On Fri, Apr 5, 2019 at 12:04 PM John Florian <jflorian@doubledog.org> wrote:
I am in a severe pinch here. A while back I upgraded from 4.2.8 to 4.3.3 and only had one step remaining and that was to set the cluster compat level to 4.3 (from 4.2). When I tried this it gave the usual warning that each VM would have to be rebooted to complete, but then I got my first unusual piece when it then told me next that this could not be completed until each host was in maintenance mode. Quirky I thought, but I stopped all VMs and put both hosts into maintenance mode. I then set the cluster to 4.3. Things didn't want to become active again and I eventually noticed that I was being told the DC needed to be 4.3 as well. Don't remember that from before, but oh well that was easy.
However, the DC and SD remains down. The hosts are non-op. I've powered everything off and started fresh but still wind up in the same state. Hosts will look like their active for a bit (green triangle) but then go non-op after about a minute. It appears that my iSCSI sessions are active/logged in. The one glaring thing I see in the logs is this in vdsm.log:
2019-04-05 12:03:30,225-0400 ERROR (monitor/07bb1bf) [storage.Monitor] Setting up monitor for 07bb1bf8-3b3e-4dc0-bc43-375b09e06683 failed (monitor:329) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 326, in _setupLoop self._setupMonitor() File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 348, in _setupMonitor self._produceDomain() File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 158, in wrapper value = meth(self, *a, **kw) File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 366, in _produceDomain self.domain = sdCache.produce(self.sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 110, in produce domain.getRealDomain() File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 51, in getRealDomain return self._cache._realProduce(self._sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 134, in _realProduce domain = self._findDomain(sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 151, in _findDomain return findMethod(sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 176, in _findUnfetchedDomain raise se.StorageDomainDoesNotExist(sdUUID) StorageDomainDoesNotExist: Storage domain does not exist: (u'07bb1bf8-3b3e-4dc0-bc43-375b09e06683',)
How do I proceed to get back operational?
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/A7XPDM3EFUJPXO...

Doh! I am such an idiot !!! First of all, I meant to say I upgraded to 4.3.2 not 4.3.3. I only installed ovirt-release43.rpm on the engine. I've gotten too lazy with using the upgrade host feature in the GUI that I completely failed to think of doing this on each of the hosts. Worse, I've got a vague sense of deja vu like I've been in this same spot before, maybe with 4.1 -> 4.2. These bigger upgrades are just infrequent enough I forget important steps. It seems like this could be handled more gracefully though. Shouldn't this be caught and reported as user-friendly alert in the GUI? Also, I think it would be better if the release notes ... Change """If you're upgrading from oVirt Engine 4.2.8 you just need to execute:""" to """If you're upgrading from oVirt Engine 4.2.8 you just need to execute on your Engine and each Host:""".
participants (3)
-
Alex McWhirter
-
John Florian
-
John Florian