On Thu, 2018-04-12 at 20:20 +0200, Michal Skrivanek wrote:


On 12 Apr 2018, at 18:26, Stefano Stagnaro <stefanos@prismatelecomtesting.com> wrote:
Hi,

I recently upgraded an oVirt deployment from 3.6 to 4.0 and then 4.1.9 (my actual release). Since then, when migrating many hosts simultaneously I always experience few migrations failure like 1 on 10 vms. The failure can occur on any host; moreover, after a couple of failure the destination host fall in Error status and I have to manually re-activate or wait 30 min.

Tipical error found on vdsm log is (from the source host):
2018-04-12 17:01:32,097+0200 ERROR (migsrc/3192dfe7) [virt.vm] (vmId='3192dfe7-eeac-4626-8c86-e49facc9006f') migration destination error: Fatal error during migration (migration:287)

Please find the logs of source host (v15.ovirt), destination host (v14.ovirt) and engine here: https://www.dropbox.com/sh/xhf8ry4ih40poxd/AABxiFCIxDe14HSx2DqLE61ya?dl=0

Some of the vm affected from the migration failure are:
svn 3192dfe7-eeac-4626-8c86-e49facc9006f
wood a8e83ff0-dfed-4074-b6b6-e947b8ebb952
qnx66 5697c4a4-9e40-4dd6-aba2-c8ab9904a584

can you also include qemu log from /var/log/libvirt/qemu/<vmname>?

Hi Michal, I've added libvirt logs for relevant VMs on the previous Dropbox share.


btw you seem to be using the legacy migration policy throttling the speed significantly. Please read into the migration enhancements in 4.0
https://www.ovirt.org/develop/release-management/features/virt/migration-enhancements/

I've already moved to Minimal Downtime and then to Post-copy with same results. VM migrations continue to fail randomly.


Thanks,
michal

Thanks,
Stefano.




Thank you very much for your help.

--
Stefano Stagnaro

Prisma Telecom Testing S.r.l.
Via Petrocchi, 4
20127 Milano – Italy

Tel. 02 26113507 int 339
e-mail: stefanos@prismatelecomtesting.com
skype: stefano.stagnaro
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users