
Hi, I recently upgraded an oVirt deployment from 3.6 to 4.0 and then 4.1.9 (my actual release). Since then, when migrating many hosts simultaneously I always experience few migrations failure like 1 on 10 vms. The failure can occur on any host; moreover, after a couple of failure the destination host fall in Error status and I have to manually re-activate or wait 30 min. Tipical error found on vdsm log is (from the source host): 2018-04-12 17:01:32,097+0200 ERROR (migsrc/3192dfe7) [virt.vm] (vmId='3192dfe7-eeac-4626-8c86-e49facc9006f') migration destination error: Fatal error during migration (migration:287) Please find the logs of source host (v15.ovirt), destination host (v14.ovirt) and engine here: https://www.dropbox.com/sh/xhf8ry4ih40poxd/AABxiFCIxDe14HSx2DqLE61ya?dl=0 Some of the vm affected from the migration failure are: svn 3192dfe7-eeac-4626-8c86-e49facc9006f wood a8e83ff0-dfed-4074-b6b6-e947b8ebb952 qnx66 5697c4a4-9e40-4dd6-aba2-c8ab9904a584 Thank you very much for your help. -- Stefano Stagnaro Prisma Telecom Testing S.r.l. Via Petrocchi, 4 20127 Milano – Italy Tel. 02 26113507 int 339 e-mail: stefanos@prismatelecomtesting.com skype: stefano.stagnaro

--Apple-Mail=_437561B1-E46A-4D8E-B586-1CF5DBD526CC Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8
On 12 Apr 2018, at 18:26, Stefano Stagnaro = <stefanos@prismatelecomtesting.com> wrote: =20 Hi, =20 I recently upgraded an oVirt deployment from 3.6 to 4.0 and then 4.1.9 = (my actual release). Since then, when migrating many hosts = simultaneously I always experience few migrations failure like 1 on 10 = vms. The failure can occur on any host; moreover, after a couple of = failure the destination host fall in Error status and I have to manually = re-activate or wait 30 min. =20 Tipical error found on vdsm log is (from the source host): 2018-04-12 17:01:32,097+0200 ERROR (migsrc/3192dfe7) [virt.vm] = (vmId=3D'3192dfe7-eeac-4626-8c86-e49facc9006f') migration destination = error: Fatal error during migration (migration:287) =20 Please find the logs of source host (v15.ovirt), destination host = (v14.ovirt) and engine here: = https://www.dropbox.com/sh/xhf8ry4ih40poxd/AABxiFCIxDe14HSx2DqLE61ya?dl=3D= 0 =20 Some of the vm affected from the migration failure are: svn 3192dfe7-eeac-4626-8c86-e49facc9006f wood a8e83ff0-dfed-4074-b6b6-e947b8ebb952 qnx66 5697c4a4-9e40-4dd6-aba2-c8ab9904a584
can you also include qemu log from /var/log/libvirt/qemu/<vmname>? btw you seem to be using the legacy migration policy throttling the = speed significantly. Please read into the migration enhancements in 4.0 = https://www.ovirt.org/develop/release-management/features/virt/migration-e= nhancements/ = <https://www.ovirt.org/develop/release-management/features/virt/migration-= enhancements/> Thanks, michal
=20 Thank you very much for your help. =20 --=20 Stefano Stagnaro =20 Prisma Telecom Testing S.r.l. Via Petrocchi, 4 20127 Milano =E2=80=93 Italy =20 Tel. 02 26113507 int 339 e-mail: stefanos@prismatelecomtesting.com skype: stefano.stagnaro _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
--Apple-Mail=_437561B1-E46A-4D8E-B586-1CF5DBD526CC Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html; = charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; = -webkit-nbsp-mode: space; line-break: after-white-space;" class=3D""><br = class=3D""><div><br class=3D""><blockquote type=3D"cite" class=3D""><div = class=3D"">On 12 Apr 2018, at 18:26, Stefano Stagnaro <<a = href=3D"mailto:stefanos@prismatelecomtesting.com" = class=3D"">stefanos@prismatelecomtesting.com</a>> wrote:</div><br = class=3D"Apple-interchange-newline"><div class=3D""><div class=3D"">Hi,<br= class=3D""><br class=3D"">I recently upgraded an oVirt deployment from = 3.6 to 4.0 and then 4.1.9 (my actual release). Since then, when = migrating many hosts simultaneously I always experience few migrations = failure like 1 on 10 vms. The failure can occur on any host; moreover, = after a couple of failure the destination host fall in Error status and = I have to manually re-activate or wait 30 min.<br class=3D""><br = class=3D"">Tipical error found on vdsm log is (from the source host):<br = class=3D"">2018-04-12 17:01:32,097+0200 ERROR (migsrc/3192dfe7) = [virt.vm] (vmId=3D'3192dfe7-eeac-4626-8c86-e49facc9006f') migration = destination error: Fatal error during migration (migration:287)<br = class=3D""><br class=3D"">Please find the logs of source host = (v15.ovirt), destination host (v14.ovirt) and engine here: <a = href=3D"https://www.dropbox.com/sh/xhf8ry4ih40poxd/AABxiFCIxDe14HSx2DqLE61= ya?dl=3D0" = class=3D"">https://www.dropbox.com/sh/xhf8ry4ih40poxd/AABxiFCIxDe14HSx2DqL= E61ya?dl=3D0</a><br class=3D""><br class=3D"">Some of the vm affected = from the migration failure are:<br class=3D"">svn<span = class=3D"Apple-tab-span" style=3D"white-space:pre"> = </span>3192dfe7-eeac-4626-8c86-e49facc9006f<br class=3D"">wood<span = class=3D"Apple-tab-span" style=3D"white-space:pre"> = </span>a8e83ff0-dfed-4074-b6b6-e947b8ebb952<br class=3D"">qnx66<span = class=3D"Apple-tab-span" style=3D"white-space:pre"> = </span>5697c4a4-9e40-4dd6-aba2-c8ab9904a584<br = class=3D""></div></div></blockquote><div><br class=3D""></div>can you = also include qemu log from = /var/log/libvirt/qemu/<vmname>?</div><div><br = class=3D""></div><div>btw you seem to be using the legacy migration = policy throttling the speed significantly. Please read into the = migration enhancements in 4.0</div><div><a = href=3D"https://www.ovirt.org/develop/release-management/features/virt/mig= ration-enhancements/" = class=3D"">https://www.ovirt.org/develop/release-management/features/virt/= migration-enhancements/</a></div><div><br = class=3D""></div><div>Thanks,</div><div>michal</div><div><br = class=3D""><blockquote type=3D"cite" class=3D""><div class=3D""><div = class=3D""><br class=3D"">Thank you very much for your help.<br = class=3D""><br class=3D"">-- <br class=3D"">Stefano Stagnaro<br = class=3D""><br class=3D"">Prisma Telecom Testing S.r.l.<br class=3D"">Via = Petrocchi, 4<br class=3D"">20127 Milano =E2=80=93 Italy<br class=3D""><br = class=3D"">Tel. 02 26113507 int 339<br class=3D""><a = href=3D"mailto:stefanos@prismatelecomtesting.com" class=3D"">e-mail: = stefanos@prismatelecomtesting.com</a><br class=3D"">skype: = stefano.stagnaro<br = class=3D"">_______________________________________________<br = class=3D"">Users mailing list<br class=3D"">Users@ovirt.org<br = class=3D"">http://lists.ovirt.org/mailman/listinfo/users<br = class=3D""></div></div></blockquote></div><br class=3D""></body></html>= --Apple-Mail=_437561B1-E46A-4D8E-B586-1CF5DBD526CC--

On Thu, 2018-04-12 at 20:20 +0200, Michal Skrivanek wrote:
On 12 Apr 2018, at 18:26, Stefano Stagnaro <stefanos@prismatelecomt esting.com> wrote: Hi,
I recently upgraded an oVirt deployment from 3.6 to 4.0 and then 4.1.9 (my actual release). Since then, when migrating many hosts simultaneously I always experience few migrations failure like 1 on 10 vms. The failure can occur on any host; moreover, after a couple of failure the destination host fall in Error status and I have to manually re-activate or wait 30 min.
Tipical error found on vdsm log is (from the source host): 2018-04-12 17:01:32,097+0200 ERROR (migsrc/3192dfe7) [virt.vm] (vmId='3192dfe7-eeac-4626-8c86-e49facc9006f') migration destination error: Fatal error during migration (migration:287)
Please find the logs of source host (v15.ovirt), destination host (v14.ovirt) and engine here: https://www.dropbox.com/sh/xhf8ry4ih40 poxd/AABxiFCIxDe14HSx2DqLE61ya?dl=0
Some of the vm affected from the migration failure are: svn 3192dfe7-eeac-4626-8c86-e49facc9006f wood a8e83ff0-dfed-4074-b6b6-e947b8ebb952 qnx66 5697c4a4-9e40-4dd6-aba2-c8ab9904a584
can you also include qemu log from /var/log/libvirt/qemu/<vmname>?
Hi Michal, I've added libvirt logs for relevant VMs on the previous Dropbox share.
btw you seem to be using the legacy migration policy throttling the speed significantly. Please read into the migration enhancements in 4.0 https://www.ovirt.org/develop/release-management/features/virt/migrat ion-enhancements/
I've already moved to Minimal Downtime and then to Post-copy with same results. VM migrations continue to fail randomly.
Thanks, michal
Thanks,Stefano.
Thank you very much for your help.
-- Stefano Stagnaro
Prisma Telecom Testing S.r.l. Via Petrocchi, 4 20127 Milano – Italy
Tel. 02 26113507 int 339 e-mail: stefanos@prismatelecomtesting.com skype: stefano.stagnaro _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

--Apple-Mail=_F611F34D-66F8-4B72-8CA1-0836FCB767CA Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8
On 17 Apr 2018, at 11:28, Stefano Stagnaro = <stefanos@prismatelecomtesting.com> wrote: =20 On Thu, 2018-04-12 at 20:20 +0200, Michal Skrivanek wrote:
=20 =20
On 12 Apr 2018, at 18:26, Stefano Stagnaro = <stefanos@prismatelecomtesting.com = <mailto:stefanos@prismatelecomtesting.com>> wrote: Hi, =20 I recently upgraded an oVirt deployment from 3.6 to 4.0 and then = 4.1.9 (my actual release). Since then, when migrating many hosts = simultaneously I always experience few migrations failure like 1 on 10 = vms. The failure can occur on any host; moreover, after a couple of = failure the destination host fall in Error status and I have to manually = re-activate or wait 30 min.
Are they all in a 3.6 cluster?=20
=20 Tipical error found on vdsm log is (from the source host): 2018-04-12 17:01:32,097+0200 ERROR (migsrc/3192dfe7) [virt.vm] = (vmId=3D'3192dfe7-eeac-4626-8c86-e49facc9006f') migration destination = error: Fatal error during migration (migration:287) =20 Please find the logs of source host (v15.ovirt), destination host = (v14.ovirt) and engine here: = https://www.dropbox.com/sh/xhf8ry4ih40poxd/AABxiFCIxDe14HSx2DqLE61ya?dl=3D= 0 = <https://www.dropbox.com/sh/xhf8ry4ih40poxd/AABxiFCIxDe14HSx2DqLE61ya?dl=3D= 0> =20 Some of the vm affected from the migration failure are: svn 3192dfe7-eeac-4626-8c86-e49facc9006f wood a8e83ff0-dfed-4074-b6b6-e947b8ebb952 qnx66 5697c4a4-9e40-4dd6-aba2-c8ab9904a584 =20 can you also include qemu log from /var/log/libvirt/qemu/<vmname>? =20 Hi Michal, I've added libvirt logs for relevant VMs on the previous = Dropbox share.
I do not see anything wrong. It=E2=80=99s a bit too much data to go = through, can you pinpoint the time and VM name when you see a failure? Thanks, michal
=20
=20 btw you seem to be using the legacy migration policy throttling the = speed significantly. Please read into the migration enhancements in 4.0 = https://www.ovirt.org/develop/release-management/features/virt/migration-e= nhancements/ = <https://www.ovirt.org/develop/release-management/features/virt/migration-= enhancements/> I've already moved to Minimal Downtime and then to Post-copy with same = results. VM migrations continue to fail randomly. =20 =20 Thanks, michal =20 Thanks, Stefano. =20 =20 =20
=20 Thank you very much for your help. =20 --=20 Stefano Stagnaro =20 Prisma Telecom Testing S.r.l. Via Petrocchi, 4 20127 Milano =E2=80=93 Italy =20 Tel. 02 26113507 int 339 e-mail: stefanos@prismatelecomtesting.com = <mailto:stefanos@prismatelecomtesting.com> skype: stefano.stagnaro _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users =20 =20
--Apple-Mail=_F611F34D-66F8-4B72-8CA1-0836FCB767CA Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html; = charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; = -webkit-nbsp-mode: space; line-break: after-white-space;" class=3D""><br = class=3D""><div><br class=3D""><blockquote type=3D"cite" class=3D""><div = class=3D"">On 17 Apr 2018, at 11:28, Stefano Stagnaro <<a = href=3D"mailto:stefanos@prismatelecomtesting.com" = class=3D"">stefanos@prismatelecomtesting.com</a>> wrote:</div><br = class=3D"Apple-interchange-newline"><div class=3D""><meta = http-equiv=3D"Content-Type" content=3D"text/html; charset=3Dutf-8" = class=3D""><div class=3D""><div class=3D"">On Thu, 2018-04-12 at 20:20 = +0200, Michal Skrivanek wrote:</div><blockquote type=3D"cite" = class=3D""><br class=3D""><div class=3D""><br class=3D""><blockquote = type=3D"cite" class=3D""><div class=3D"">On 12 Apr 2018, at 18:26, = Stefano Stagnaro <<a href=3D"mailto:stefanos@prismatelecomtesting.com" = class=3D"">stefanos@prismatelecomtesting.com</a>> wrote:</div><div = class=3D""><div class=3D"">Hi,<br class=3D""><br class=3D"">I recently = upgraded an oVirt deployment from 3.6 to 4.0 and then 4.1.9 (my actual = release). Since then, when migrating many hosts simultaneously I always = experience few migrations failure like 1 on 10 vms. The failure can = occur on any host; moreover, after a couple of failure the destination = host fall in Error status and I have to manually re-activate or wait 30 = min.<br = class=3D""></div></div></blockquote></div></blockquote></div></div></block= quote><div><br class=3D""></div>Are they all in a 3.6 = cluster? </div><div><br class=3D""><blockquote type=3D"cite" = class=3D""><div class=3D""><div class=3D""><blockquote type=3D"cite" = class=3D""><div class=3D""><blockquote type=3D"cite" class=3D""><div = class=3D""><div class=3D""><br class=3D"">Tipical error found on vdsm = log is (from the source host):<br class=3D"">2018-04-12 = 17:01:32,097+0200 ERROR (migsrc/3192dfe7) [virt.vm] = (vmId=3D'3192dfe7-eeac-4626-8c86-e49facc9006f') migration destination = error: Fatal error during migration (migration:287)<br class=3D""><br = class=3D"">Please find the logs of source host (v15.ovirt), destination = host (v14.ovirt) and engine here: <a = href=3D"https://www.dropbox.com/sh/xhf8ry4ih40poxd/AABxiFCIxDe14HSx2DqLE61= ya?dl=3D0" = class=3D"">https://www.dropbox.com/sh/xhf8ry4ih40poxd/AABxiFCIxDe14HSx2DqL= E61ya?dl=3D0</a><br class=3D""><br class=3D"">Some of the vm affected = from the migration failure are:<br class=3D"">svn = 3192dfe7-eeac-4626-8c86-e49facc9006f<br class=3D"">wood = a8e83ff0-dfed-4074-b6b6-e947b8ebb952<br class=3D"">qnx66 = 5697c4a4-9e40-4dd6-aba2-c8ab9904a584<br = class=3D""></div></div></blockquote><div class=3D""><br = class=3D""></div>can you also include qemu log from = /var/log/libvirt/qemu/<vmname>?</div></blockquote><div = class=3D""><br class=3D""></div><div class=3D"">Hi Michal, I've added = libvirt logs for relevant VMs on the previous Dropbox = share.</div></div></div></blockquote><div><br class=3D""></div>I do not = see anything wrong. It=E2=80=99s a bit too much data to go through, can = you pinpoint the time and VM name when you see a failure?</div><div><br = class=3D""></div><div>Thanks,</div><div>michal<br class=3D""><blockquote = type=3D"cite" class=3D""><div class=3D""><div class=3D""><div = class=3D""><br class=3D""></div><blockquote type=3D"cite" class=3D""><div = class=3D""><br class=3D""></div><div class=3D"">btw you seem to be using = the legacy migration policy throttling the speed significantly. Please = read into the migration enhancements in 4.0</div><div class=3D""><a = href=3D"https://www.ovirt.org/develop/release-management/features/virt/mig= ration-enhancements/" = class=3D"">https://www.ovirt.org/develop/release-management/features/virt/= migration-enhancements/</a></div></blockquote><div class=3D""><br = class=3D""></div><div class=3D"">I've already moved to Minimal Downtime = and then to Post-copy with same results. VM migrations continue to fail = randomly.</div><div class=3D""><br class=3D""></div><blockquote = type=3D"cite" class=3D""><div class=3D""><br class=3D""></div><div = class=3D"">Thanks,</div><div class=3D"">michal</div></blockquote><div = class=3D""><br class=3D""></div><div class=3D"">Thanks,</div><div = class=3D"">Stefano.</div><div class=3D""><br class=3D""></div><div = class=3D""><br class=3D""></div><blockquote type=3D"cite" class=3D""><div = class=3D""><br class=3D""><blockquote type=3D"cite" class=3D""><div = class=3D""><div class=3D""><br class=3D"">Thank you very much for your = help.<br class=3D""><br class=3D"">-- <br class=3D"">Stefano Stagnaro<br = class=3D""><br class=3D"">Prisma Telecom Testing S.r.l.<br class=3D"">Via = Petrocchi, 4<br class=3D"">20127 Milano =E2=80=93 Italy<br class=3D""><br = class=3D"">Tel. 02 26113507 int 339<br class=3D""><a = href=3D"mailto:stefanos@prismatelecomtesting.com" class=3D"">e-mail: = stefanos@prismatelecomtesting.com</a><br class=3D"">skype: = stefano.stagnaro<br = class=3D"">_______________________________________________<br = class=3D"">Users mailing list<br class=3D""><a = href=3D"mailto:Users@ovirt.org" class=3D"">Users@ovirt.org</a><br = class=3D"">http://lists.ovirt.org/mailman/listinfo/users<br = class=3D""></div></div></blockquote></div><br class=3D""></blockquote><div= class=3D""><span class=3D""><pre class=3D""><pre class=3D""><br = class=3D""></pre></pre></span></div></div></div></blockquote></div><br = class=3D""></body></html>= --Apple-Mail=_F611F34D-66F8-4B72-8CA1-0836FCB767CA--
participants (2)
-
Michal Skrivanek
-
Stefano Stagnaro