Hi list,
I'm troubleshooting an issue with live migration on an upgrade context.
I have a 4.3 cluster that I try to upgrade in 4.4. The engine is OK in
4.4 and I already have upgraded a single host which had kindly
reintegrated the cluster.
The host is OK. I can start vms on it and migrate vms from it to other hosts.
But, if I try to migrate a vm from a 4.3 host to the 4.4 host, it failed with :
* on the host : qemu-kvm: terminating on signal 15 from pid XXXX
(<unknown process>)
* on the engine : VM 'VM_UUID' was unexpectedly detected as 'Down' on
VDS 'HOST_UUID'
Does anyone here can help me to find what's goning on ? I will be greate
regards
Joris
Show replies by date
Live migration across major releases sounds like the sort of feature everybody would just
love to have but oVirt would support as little as operating clusters with mixed release
nodes.
AFAIK HCI upgrades from 4.3 to 4.4 were never even described and definitely didn't
involve live VMs.
I exported all my VMs to an NFS based export domain, redid the HCI from scratch and then
imported the VMs from the export domain.
And I kept the 4.3 disks around so I could to back if things failed.
The described (non-HCI) upgrade procedures had you up the creek without a paddle if things
failed half-way...
oVirt was never really enterprise grade.
oVirt definitely involves a lot of paddling. I see now that you are right about HCI (Hyper
Converged Platform), it was never specifically documented how to upgrade it. I have
stepped away from HCI a long time ago, after testing and working with it in oVirt 3.x.
There were just too many dependencies and things that could go wrong. Too much
functionality depending on the same hardware and software. I have also stepped away from
Gluster since and in production we are not using self-hosted engines either. A standalone
engine on a different hypervisor cluster gives me much more peace of mind. However, I
cannot see from the original post if this is Hyperconverged or even if it is self-hosted.
Maybe I'm just missing it :-)
The procedures for self-hosted and standalone engine are described and I have been able to
upgrade clusters in the past using them. Make sure you follow all the steps one by one in
the correct order. But I agree, if something goes wrong on the way, you are a bit on your
own in dangerous territory paddling up the creek :-P It sounds like the upgrade went fine
though... and migrating VMs from a 4.3 host to a 4.4 host should be a standard procedure
when upgrading the nodes one by one in a live environment. The upgrade from 4.3 to 4.4 I
cannot remember if we did live, but from 4.4 to 4.5 worked for us without taking the VMs
down. This kind of upgrade is best done in a service window though, and you should have
backups of everything before you start. I'm 100% with Thomas on this.
Looking at "12.6. Migrating hosts and virtual machines from oVirt 4.3 to 4.4"
from the upgrade guide, I see lots of caveats. It is really depending whether or not the
oVirt node appliance is used or a Linux Enterprise server (as it needs to be upgraded
first by the looks of it). And whether or not there are VMs with CPU-passthrough.
I have also looked around for bugs in 4.4, as that is the version you are upgrading to...
I find this for example: Bug 1774064. It seem this error occurs in various versions of
4.4, even none related to upgrades.
I'm wondering, if you should not just push through with updating all the oVirt nodes,
which then allows you to change the cluster compatibility level to 4.4. This would mean
not migrating VMs live, but shutting them down and possibly starting them on an upgraded
node, until all nodes are upgraded. If you upgrade one more node, you would be able to
check if migration works between upgraded 4.4 nodes, which would then confirm that pushing
through and upgrade all nodes to 4.4 would be the way forward, even if it means that you
have to shutdown all the VMs at some point during the migration.
I hope this helps. Good luck with it.
Ps. I have resorted in few cases, to simply reinstalling a node from scratch on the new
version, and then simply add i to the (upgraded) engine, to make them work again. oVirt is
a lot of paddling, but in my experience it works fine when you just let it run and
don't do anything too fancy to it (like upgrading, which has to be done from time to
time).