
--Apple-Mail=_BF44E8B2-8075-4840-B5A9-81A1D7170AB7 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8
On 20 Sep 2017, at 18:06, Logan Kuhn <support@jac-properties.com> = wrote: =20 We had an incident where a VM hosts' disk filled up, the VMs all went = unknown in the web console, but were fully functional if you were to = login or use the services of one.
We couldn't migrate them so we powered them down on that host and =
However the disk image on a few of them were corrupted because once we = fixed the host with the full disk, it still thought it should be running =
=20 2017-09-19 21:59:11,058 INFO = [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] = (DefaultQuartzScheduler3) [36c806f6] VM = '70cf75c7-0fc2-4bbe-958e-7d0095f70960'(testhub) is running in db and not = running on VDS 'ef6dc2a3-af6e-4e00-aa4 0-493b31263417'(vm-int7) =20 We upgraded to 4.1.6 from 4.0.6 earlier in the day, I don't really =
Hi, yes, that can happen since the VM=E2=80=99s storage is on NAS whereas = the server itself is non-functional as the management and all other = local processes are using local resources powered them up and let ovirt choose the host for it, same as always.=20 that=E2=80=99s a mistake. The host should be fenced in that case, you = likely do not have a power management configured, do you? Even when you = do not have a fencing device available it should have been resolved = manually by rebooting it manually(after fixing the disk problem), or in = case of permanent damage (e.g. server needs to be replaced, that takes a = week, you need to run those VMs in the meantime elsewhere) it should = have been powered off and VM states should be reset by =E2=80=9Cconfirm = host has been rebooted=E2=80=9D manual action. Normally you should now be able to run those VMs while the status of the = host is still Not Responding - was it not the case? How exactly you get = to the situation that you were able to power up the VMs? the VM. Which promptly corrupted the disk, the error seems to be this = in the logs: this can only happen for VMs flagged as HA, is it a case? Thanks, michal think it's anything more than coincidence, but it's worrying enough to = send to the community.
=20 Regards, Logan _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
--Apple-Mail=_BF44E8B2-8075-4840-B5A9-81A1D7170AB7 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html = charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; = -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" = class=3D""><br class=3D""><div><blockquote type=3D"cite" class=3D""><div = class=3D"">On 20 Sep 2017, at 18:06, Logan Kuhn <<a = href=3D"mailto:support@jac-properties.com" = class=3D"">support@jac-properties.com</a>> wrote:</div><br = class=3D"Apple-interchange-newline"><div class=3D""><div dir=3D"ltr" = class=3D""><div style=3D"font-family: arial; font-size: 16px; = background-color: rgb(253, 253, 253);" class=3D"">We had an incident = where a VM hosts' disk filled up, the VMs all went unknown in the web = console, but were fully functional if you were to login or use the = services of one.</div></div></div></blockquote><div><br = class=3D""></div><div>Hi,</div>yes, that can happen since the VM=E2=80=99s= storage is on NAS whereas the server itself is non-functional as the = management and all other local processes are using local = resources</div><div><br class=3D""><blockquote type=3D"cite" = class=3D""><div class=3D""><div dir=3D"ltr" class=3D""><div = style=3D"font-family: arial; font-size: 16px; background-color: rgb(253, = 253, 253);" class=3D""> We couldn't migrate them so we powered = them down on that host and powered them up and let ovirt choose the host = for it, same as always. </div></div></div></blockquote><div><br = class=3D""></div><div>that=E2=80=99s a mistake. The host should be = fenced in that case, you likely do not have a power management = configured, do you? Even when you do not have a fencing device available = it should have been resolved manually by rebooting it = manually(after fixing the disk problem), or in case of permanent = damage (e.g. server needs to be replaced, that takes a week, you need to = run those VMs in the meantime elsewhere) it should have been powered off = and VM states should be reset by =E2=80=9Cconfirm host has been = rebooted=E2=80=9D manual action.</div><div><br = class=3D""></div><div>Normally you should now be able to run those VMs = while the status of the host is still Not Responding - was it not the = case? How exactly you get to the situation that you were able to power = up the VMs?</div><div><br class=3D""></div><div><br = class=3D""></div><blockquote type=3D"cite" class=3D""><div class=3D""><div= dir=3D"ltr" class=3D""><div style=3D"font-family: arial; font-size: = 16px; background-color: rgb(253, 253, 253);" class=3D""> However the = disk image on a few of them were corrupted because once we fixed the = host with the full disk, it still thought it should be running the = VM. Which promptly corrupted the disk, the error seems to be this = in the logs:</div></div></div></blockquote><div><br class=3D""></div>this = can only happen for VMs flagged as HA, is it a case?</div><div><br = class=3D""></div><div><div>Thanks,</div><div>michal</div><div = class=3D""><br class=3D""></div><blockquote type=3D"cite" class=3D""><div = class=3D""><div dir=3D"ltr" class=3D""><div style=3D"font-family: arial; = font-size: 16px; background-color: rgb(253, 253, 253);" class=3D""><br = class=3D""></div><div style=3D"font-family: arial; font-size: 16px; = background-color: rgb(253, 253, 253);" class=3D""><span = style=3D"font-family:monospace" class=3D""><span = style=3D"background-color:rgb(255,255,255)" class=3D""><span = class=3D"gmail-Object" id=3D"gmail-OBJ_PREFIX_DWT446_com_zimbra_date" = style=3D"color:rgb(111,22,22)"><span class=3D"gmail-Object" = id=3D"gmail-OBJ_PREFIX_DWT447_com_zimbra_date" style=3D"cursor: = pointer;">2017-09-19</span></span> 21:59:11,058 INFO = [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] = (DefaultQuartzScheduler3) [36c806f6] VM = '70cf75c7-0fc2-4bbe-958e-7d0095f70960'(testhub) is </span><span = style=3D"font-weight:bold;color:rgb(255,84,84);background-color:rgb(255,25= 5,255)" class=3D"">running</span><span = style=3D"background-color:rgb(255,255,255)" class=3D""> in db and = not </span><span = style=3D"font-weight:bold;color:rgb(255,84,84);background-color:rgb(255,25= 5,255)" class=3D"">running</span><span = style=3D"background-color:rgb(255,255,255)" class=3D""> on VDS = 'ef6dc2a3-af6e-4e00-aa4</span><br class=3D"">0-493b31263417'(vm-int7)<br = class=3D""></span></div><div style=3D"font-family: arial; font-size: = 16px; background-color: rgb(253, 253, 253);" class=3D""><br = class=3D""></div><div style=3D"font-family: arial; font-size: 16px; = background-color: rgb(253, 253, 253);" class=3D"">We upgraded to 4.1.6 = from 4.0.6 earlier in the day, I don't really think it's anything more = than coincidence, but it's worrying enough to send to the = community.</div><div style=3D"font-family: arial; font-size: 16px; = background-color: rgb(253, 253, 253);" class=3D""><br = class=3D""></div><div style=3D"font-family: arial; font-size: 16px; = background-color: rgb(253, 253, 253);" class=3D"">Regards,<br = class=3D"">Logan</div></div> _______________________________________________<br class=3D"">Users = mailing list<br class=3D""><a href=3D"mailto:Users@ovirt.org" = class=3D"">Users@ovirt.org</a><br = class=3D"">http://lists.ovirt.org/mailman/listinfo/users<br = class=3D""></div></blockquote></div><br class=3D""></body></html>= --Apple-Mail=_BF44E8B2-8075-4840-B5A9-81A1D7170AB7--