
--Apple-Mail=_F57D6FCC-B987-4E15-887D-7030B6C7C45B Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8
On 20 Sep 2017, at 21:08, Yaniv Kaul <ykaul@redhat.com> wrote: =20 =20 =20 On Sep 20, 2017 9:50 PM, <support@jac-properties.com = <mailto:support@jac-properties.com>> wrote: This matches about with what we were thinking, thank you! =20 To answer your questions =20 We do not have power management configured due to it causing a = cascading failure early in our deployment. The host was not fenced and = "confirm host rebooted" was never used. The VMs were powered on via = virsh (this shouldn't have happened) =20 The way they were powered on is most likely why they were corrupted is = our thought =20 =20
=20 We'd be happy if you could share both engine and host logs, including = vdsm.log, engine.log and /var/log/messages from both.=20 Y.=20 =20 =20 =20 Logan =20
On September 20, 2017 at 12:03 PM Michal Skrivanek = <michal.skrivanek@redhat.com <mailto:michal.skrivanek@redhat.com>> = wrote: =20 =20
On 20 Sep 2017, at 18:06, Logan Kuhn <support@jac-properties.com = <mailto:support@jac-properties.com>> wrote: =20 We had an incident where a VM hosts' disk filled up, the VMs all = went unknown in the web console, but were fully functional if you were = to login or use the services of one. =20 Hi, yes, that can happen since the VM=E2=80=99s storage is on NAS whereas =
=20
We couldn't migrate them so we powered them down on that host and =
=20 that=E2=80=99s a mistake. The host should be fenced in that case, you =
=20 Normally you should now be able to run those VMs while the status of =
However the disk image on a few of them were corrupted because once = we fixed the host with the full disk, it still thought it should be = running the VM. Which promptly corrupted the disk, the error seems to = be this in the logs: =20
=20 =20 this can only happen for VMs flagged as HA, is it a case? =20 Thanks, michal =20
=20 2017-09-19 21:59:11,058 INFO = [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] = (DefaultQuartzScheduler3) [36c806f6] VM = '70cf75c7-0fc2-4bbe-958e-7d0095f70960'(testhub) is running in db and not = running on VDS 'ef6dc2a3-af6e-4e00-aa4 0-493b31263417'(vm-int7) =20 We upgraded to 4.1.6 from 4.0.6 earlier in the day, I don't really =
yes.=20 That=E2=80=99s why we put a basic password protection to the plain virsh = access. Easy to circumvent, but then you=E2=80=99re on your own=E2=80=A6 Hm, how exactly were they powered on by virsh? Normally this is not = possible for oVirt VMs at all due to initial set up of host-specific = things(disk paths), we also use transient libvirt domains so stopped VMs = are not defined in libvirt once they stop. So I wonder how exactly was = this done? Unless they were in Paused state where you indeed can simply continue = the execution.=20 the server itself is non-functional as the management and all other = local processes are using local resources powered them up and let ovirt choose the host for it, same as always.=20 likely do not have a power management configured, do you? Even when you = do not have a fencing device available it should have been resolved = manually by rebooting it manually(after fixing the disk problem), or in = case of permanent damage (e.g. server needs to be replaced, that takes a = week, you need to run those VMs in the meantime elsewhere) it should = have been powered off and VM states should be reset by =E2=80=9Cconfirm = host has been rebooted=E2=80=9D manual action. the host is still Not Responding - was it not the case? How exactly you = get to the situation that you were able to power up the VMs? sorry, I meant "Normally you should not be able to run those VMs=E2=80=9D Thanks, michal think it's anything more than coincidence, but it's worrying enough to = send to the community.
=20 Regards, Logan _______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users = <http://lists.ovirt.org/mailman/listinfo/users> =20 =20
Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users = <http://lists.ovirt.org/mailman/listinfo/users>
--Apple-Mail=_F57D6FCC-B987-4E15-887D-7030B6C7C45B Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html = charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; = -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" = class=3D""><br class=3D""><div><blockquote type=3D"cite" class=3D""><div = class=3D"">On 20 Sep 2017, at 21:08, Yaniv Kaul <<a = href=3D"mailto:ykaul@redhat.com" class=3D"">ykaul@redhat.com</a>> = wrote:</div><br class=3D"Apple-interchange-newline"><div class=3D""><div = style=3D"font-family: Helvetica; font-size: 12px; font-style: normal; = font-variant-caps: normal; font-weight: normal; letter-spacing: normal; = text-align: start; text-indent: 0px; text-transform: none; white-space: = normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" = class=3D""><div class=3D"gmail_extra"><br = class=3D"Apple-interchange-newline"><br class=3D""><div = class=3D"gmail_quote">On Sep 20, 2017 9:50 PM, <<a = href=3D"mailto:support@jac-properties.com" = class=3D"">support@jac-properties.com</a>> wrote:<br = type=3D"attribution" class=3D""><blockquote class=3D"quote" = style=3D"margin: 0px 0px 0px 0.8ex; border-left-width: 1px; = border-left-style: solid; border-left-color: rgb(204, 204, 204); = padding-left: 1ex;"><u class=3D""></u><div class=3D""><p class=3D"">This = matches about with what we were thinking, thank you!</p><p class=3D"">To = answer your questions</p><p class=3D"">We do not have power management = configured due to it causing a cascading failure early in our = deployment. The host was not fenced and "confirm host rebooted" = was never used. The VMs were powered on via virsh (this shouldn't = have happened)</p><p class=3D"">The way they were powered on is most = likely why they were corrupted is our thought</p><div class=3D""><br = class=3D""></div></div></blockquote></div></div></div></div></blockquote><= div><br class=3D""></div>yes. </div><div>That=E2=80=99s why we put = a basic password protection to the plain virsh access. Easy to = circumvent, but then you=E2=80=99re on your own=E2=80=A6</div><div><br = class=3D""></div><div>Hm, how exactly were they powered on by virsh? = Normally this is not possible for oVirt VMs at all due to initial set up = of host-specific things(disk paths), we also use transient libvirt = domains so stopped VMs are not defined in libvirt once they stop. So I = wonder how exactly was this done?</div><div>Unless they were in Paused = state where you indeed can simply continue the = execution. </div><div><br class=3D""></div><div><blockquote = type=3D"cite" class=3D""><div class=3D""><div dir=3D"auto" = style=3D"font-family: Helvetica; font-size: 12px; font-style: normal; = font-variant-caps: normal; font-weight: normal; letter-spacing: normal; = text-align: start; text-indent: 0px; text-transform: none; white-space: = normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=3D""><br= class=3D""></div><div dir=3D"auto" style=3D"font-family: Helvetica; = font-size: 12px; font-style: normal; font-variant-caps: normal; = font-weight: normal; letter-spacing: normal; text-align: start; = text-indent: 0px; text-transform: none; white-space: normal; = word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=3D"">We'd be = happy if you could share both engine and host logs, including vdsm.log, = engine.log and /var/log/messages from both. </div><div dir=3D"auto" = style=3D"font-family: Helvetica; font-size: 12px; font-style: normal; = font-variant-caps: normal; font-weight: normal; letter-spacing: normal; = text-align: start; text-indent: 0px; text-transform: none; white-space: = normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" = class=3D"">Y. </div><div dir=3D"auto" style=3D"font-family: = Helvetica; font-size: 12px; font-style: normal; font-variant-caps: = normal; font-weight: normal; letter-spacing: normal; text-align: start; = text-indent: 0px; text-transform: none; white-space: normal; = word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=3D""><br = class=3D""></div><div dir=3D"auto" style=3D"font-family: Helvetica; = font-size: 12px; font-style: normal; font-variant-caps: normal; = font-weight: normal; letter-spacing: normal; text-align: start; = text-indent: 0px; text-transform: none; white-space: normal; = word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=3D""><div = class=3D"gmail_extra"><div class=3D"gmail_quote"><blockquote = class=3D"quote" style=3D"margin: 0px 0px 0px 0.8ex; border-left-width: = 1px; border-left-style: solid; border-left-color: rgb(204, 204, 204); = padding-left: 1ex;"><div class=3D""><font color=3D"#888888" class=3D""><p = class=3D""><br class=3D""></p><p class=3D"">Logan</p></font><div = class=3D"elided-text"><blockquote type=3D"cite" class=3D""><div = id=3D"m_-4156457467377991239ox-3145df7df0" style=3D"word-wrap: = break-word;" class=3D"">On September 20, 2017 at 12:03 PM Michal = Skrivanek <<a href=3D"mailto:michal.skrivanek@redhat.com" = target=3D"_blank" class=3D"">michal.skrivanek@redhat.com</a>> = wrote:<br class=3D""><br class=3D""><br class=3D""><div = class=3D""><blockquote type=3D"cite" class=3D""><div class=3D"">On 20 = Sep 2017, at 18:06, Logan Kuhn <<a = href=3D"mailto:support@jac-properties.com" target=3D"_blank" = class=3D"">support@jac-properties.com</a>> wrote:</div><br = class=3D"m_-4156457467377991239ox-3145df7df0-Apple-interchange-newline"><d= iv class=3D""><div dir=3D"ltr" class=3D""><div style=3D"font-family: = arial; font-size: 16px; background-color: rgb(253, 253, 253);" = class=3D"">We had an incident where a VM hosts' disk filled up, the VMs = all went unknown in the web console, but were fully functional if you = were to login or use the services of = one.</div></div></div></blockquote><div class=3D""><br = class=3D""></div><div class=3D"">Hi,</div>yes, that can happen since the = VM=E2=80=99s storage is on NAS whereas the server itself is = non-functional as the management and all other local processes are using = local resources</div><div class=3D""><br class=3D""><blockquote = type=3D"cite" class=3D""><div class=3D""><div dir=3D"ltr" class=3D""><div = style=3D"font-family: arial; font-size: 16px; background-color: rgb(253, = 253, 253);" class=3D""> <span = class=3D"Apple-converted-space"> </span>We couldn't migrate them so = we powered them down on that host and powered them up and let ovirt = choose the host for it, same as = always. </div></div></div></blockquote><div class=3D""><br = class=3D""></div><div class=3D"">that=E2=80=99s a mistake. The host = should be fenced in that case, you likely do not have a power management = configured, do you? Even when you do not have a fencing device available = it should have been resolved manually by rebooting it = manually(after fixing the disk problem), or in case of permanent = damage (e.g. server needs to be replaced, that takes a week, you need to = run those VMs in the meantime elsewhere) it should have been powered off = and VM states should be reset by =E2=80=9Cconfirm host has been = rebooted=E2=80=9D manual action.</div><div class=3D""><br = class=3D""></div><div class=3D"">Normally you should now be able to run = those VMs while the status of the host is still Not Responding - was it = not the case? How exactly you get to the situation that you were able to = power up the = VMs?</div></div></div></blockquote></div></div></blockquote></div></div></= div></div></blockquote><div><br class=3D""></div>sorry, I meant = "Normally you should not be able to run those VMs=E2=80=9D</div><div><br = class=3D""></div><div>Thanks,</div><div>michal</div><div><br = class=3D""><blockquote type=3D"cite" class=3D""><div class=3D""><div = dir=3D"auto" style=3D"font-family: Helvetica; font-size: 12px; = font-style: normal; font-variant-caps: normal; font-weight: normal; = letter-spacing: normal; text-align: start; text-indent: 0px; = text-transform: none; white-space: normal; word-spacing: 0px; = -webkit-text-stroke-width: 0px;" class=3D""><div = class=3D"gmail_extra"><div class=3D"gmail_quote"><blockquote = class=3D"quote" style=3D"margin: 0px 0px 0px 0.8ex; border-left-width: = 1px; border-left-style: solid; border-left-color: rgb(204, 204, 204); = padding-left: 1ex;"><div class=3D""><div class=3D"elided-text"><blockquote= type=3D"cite" class=3D""><div id=3D"m_-4156457467377991239ox-3145df7df0" = style=3D"word-wrap: break-word;" class=3D""><div class=3D""><div = class=3D""><br class=3D""></div><div class=3D""><br = class=3D""></div><blockquote type=3D"cite" class=3D""><div class=3D""><div= dir=3D"ltr" class=3D""><div style=3D"font-family: arial; font-size: = 16px; background-color: rgb(253, 253, 253);" class=3D"">However the disk = image on a few of them were corrupted because once we fixed the host = with the full disk, it still thought it should be running the VM. = Which promptly corrupted the disk, the error seems to be this in the = logs:</div></div></div></blockquote><div class=3D""><br = class=3D""></div>this can only happen for VMs flagged as HA, is it a = case?</div><div class=3D""><br class=3D""></div><div class=3D""><div = class=3D"">Thanks,</div><div class=3D"">michal</div><div class=3D""><br = class=3D""></div><blockquote type=3D"cite" class=3D""><div class=3D""><div= dir=3D"ltr" class=3D""><div style=3D"font-family: arial; font-size: = 16px; background-color: rgb(253, 253, 253);" class=3D""><br = class=3D""></div><div style=3D"font-family: arial; font-size: 16px; = background-color: rgb(253, 253, 253);" class=3D""><span = style=3D"font-family: monospace;" class=3D""><span = style=3D"background-color: rgb(255, 255, 255);" class=3D""><span = class=3D"m_-4156457467377991239ox-3145df7df0-gmail-Object" = id=3D"m_-4156457467377991239ox-3145df7df0-gmail-OBJ_PREFIX_DWT446_com_zimb= ra_date" style=3D"color: rgb(111, 22, 22);"><span = class=3D"m_-4156457467377991239ox-3145df7df0-gmail-Object" = id=3D"m_-4156457467377991239ox-3145df7df0-gmail-OBJ_PREFIX_DWT447_com_zimb= ra_date">2017-09-19</span></span> 21:59:11,058 INFO = [org.ovirt.engine.core.<wbr class=3D"">vdsbroker.monitoring.<wbr = class=3D"">VmAnalyzer] (DefaultQuartzScheduler3) [36c806f6] VM = '70cf75c7-0fc2-4bbe-958e-<wbr class=3D"">7d0095f70960'(testhub) = is </span><span style=3D"font-weight: bold; color: rgb(255, 84, = 84); background-color: rgb(255, 255, 255);" class=3D"">running</span><span= style=3D"background-color: rgb(255, 255, 255);" class=3D""> in db = and not </span><span style=3D"font-weight: bold; color: rgb(255, = 84, 84); background-color: rgb(255, 255, 255);" = class=3D"">running</span><span style=3D"background-color: rgb(255, 255, = 255);" class=3D""> on VDS 'ef6dc2a3-af6e-4e00-aa4</span><br = class=3D"">0-493b31263417'(vm-int7)<br class=3D""></span></div><div = style=3D"font-family: arial; font-size: 16px; background-color: rgb(253, = 253, 253);" class=3D""><br class=3D""></div><div style=3D"font-family: = arial; font-size: 16px; background-color: rgb(253, 253, 253);" = class=3D"">We upgraded to 4.1.6 from 4.0.6 earlier in the day, I don't = really think it's anything more than coincidence, but it's worrying = enough to send to the community.</div><div style=3D"font-family: arial; = font-size: 16px; background-color: rgb(253, 253, 253);" class=3D""><br = class=3D""></div><div style=3D"font-family: arial; font-size: 16px; = background-color: rgb(253, 253, 253);" class=3D"">Regards,<br = class=3D"">Logan</div></div>______________________________<wbr = class=3D"">_________________<br class=3D"">Users mailing list<br = class=3D""><a href=3D"mailto:Users@ovirt.org" target=3D"_blank" = class=3D"">Users@ovirt.org</a><br class=3D""><a = href=3D"http://lists.ovirt.org/mailman/listinfo/users" target=3D"_blank" = class=3D"">http://lists.ovirt.org/<wbr = class=3D"">mailman/listinfo/users</a><br = class=3D""></div></blockquote></div><br = class=3D""></div></blockquote></div></div><br = class=3D"">______________________________<wbr = class=3D"">_________________<br class=3D"">Users mailing list<br = class=3D""><a href=3D"mailto:Users@ovirt.org" = class=3D"">Users@ovirt.org</a><br class=3D""><a = href=3D"http://lists.ovirt.org/mailman/listinfo/users" rel=3D"noreferrer" = target=3D"_blank" class=3D"">http://lists.ovirt.org/<wbr = class=3D"">mailman/listinfo/users</a></blockquote></div></div></div></div>= </blockquote></div><br class=3D""></body></html>= --Apple-Mail=_F57D6FCC-B987-4E15-887D-7030B6C7C45B--