I performed Ovirt 4.2 upgrade on a 3 host cluster with NFS shared storage. The shared storage is mounted from one of the hosts.
I upgraded the hosted engine first, downloading the 4.2 rpm, doing a yum update then engine setup which seemed to complete successfully, at the end it powered down the hosted VM but it never came back up. I was unable to start it.
I proceeded to upgrade the three hosts, ovirt 4.2 rpm and a full yum update. I also rebooted each of the three hosts.
After some time the hosts did come back and almost all of the VMs are running again and seem to be working ok with the exception of two:
1. The hosted VM still will not start, I've tried everything I can think of.
2. A VM that I know existed is not running and does not appear to exist, I have no idea where it is or how to start it.
1. Hosted engine
From one of the hosts I get a weird error trying to start it:
# hosted-engine --vm-start
Command VM.getStats with args {'vmID': '4013c829-c9d7-4b72-90d5-6fe58137504c'} failed:
(code=1, message=Virtual machine does not exist: {'vmId': u'4013c829-c9d7-4b72-90d5-6fe58137504c'})
From the two other hosts I do not get the same error as above, sometimes it appears to start but --vm-status shows errors such as: Engine status : {"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail": "Up"}
Seeing these errors in syslog:
Jan 11 01:06:30 host0 libvirtd: 2018-01-11 05:06:30.473+0000: 1910: error : qemuOpenFileAs:3183 : Failed to open file '/var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/c2dde892-f978-4dfc-a421-c8e04cf387f9/23aa0a66-fa6c-4967-a1e5-fbe47c0cd705': No such file or directory
Jan 11 01:06:30 host0 libvirtd: 2018-01-11 05:06:30.473+0000: 1910: error : qemuDomainStorageOpenStat:11492 : cannot stat file '/var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/c2dde892-f978-4dfc-a421-c8e04cf387f9/23aa0a66-fa6c-4967-a1e5-fbe47c0cd705': Bad file descriptor
2. Missing VM. virsh -r list on each host does not show the VM at all. I know it existed and is important. The log on one of the hosts even shows that it started it recently then stopped in 10 or so minutes later:
Jan 10 18:47:17 host3 systemd-machined: New machine qemu-9-Berna.
Jan 10 18:47:17 host3 systemd: Started Virtual Machine qemu-9-Berna.
Jan 10 18:47:17 host3 systemd: Starting Virtual Machine qemu-9-Berna.
Jan 10 18:54:45 host3 systemd-machined: Machine qemu-9-Berna terminated.
How can I find out the status of the "Berna" VM and get it running again?
Thanks so much!