Re: [ovirt-users] VM running on multiple hosts

20 Sep 2017

      --Apple-Mail=_BF44E8B2-8075-4840-B5A9-81A1D7170AB7
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=utf-8
...
On 20 Sep 2017, at 18:06, Logan Kuhn <support@jac-properties.com> =
wrote:
=20
We had an incident where a VM hosts' disk filled up, the VMs all went =
unknown in the web console, but were fully functional if you were to =
login or use the services of one.
...
We couldn't migrate them so we powered them down on that host and =
...
However the disk image on a few of them were corrupted because once we =
fixed the host with the full disk, it still thought it should be running =
...
=20
2017-09-19 21:59:11,058 INFO  =
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] =
(DefaultQuartzScheduler3) [36c806f6] VM =
'70cf75c7-0fc2-4bbe-958e-7d0095f70960'(testhub) is running in db and not =
running on VDS 'ef6dc2a3-af6e-4e00-aa4
0-493b31263417'(vm-int7)
=20
We upgraded to 4.1.6 from 4.0.6 earlier in the day, I don't really =
Hi,
yes, that can happen since the VM=E2=80=99s storage is on NAS whereas =
the server itself is non-functional as the management and all other =
local processes are using local resources

powered them up and let ovirt choose the host for it, same as always.=20

that=E2=80=99s a mistake. The host should be fenced in that case, you =
likely do not have a power management configured, do you? Even when you =
do not have a fencing device available it should have been resolved =
manually by rebooting it  manually(after fixing the disk problem), or in =
case of permanent damage (e.g. server needs to be replaced, that takes a =
week, you need to run those VMs in the meantime elsewhere) it should =
have been powered off and VM states should be reset by =E2=80=9Cconfirm =
host has been rebooted=E2=80=9D manual action.

Normally you should now be able to run those VMs while the status of the =
host is still Not Responding - was it not the case? How exactly you get =
to the situation that you were able to power up the VMs?

the VM.  Which promptly corrupted the disk, the error seems to be this =
in the logs:

this can only happen for VMs flagged as HA, is it a case?

Thanks,
michal

think it's anything more than coincidence, but it's worrying enough to =
send to the community.
...
=20
Regards,
Logan
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
--Apple-Mail=_BF44E8B2-8075-4840-B5A9-81A1D7170AB7
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=utf-8

<html><head><meta http-equiv=3D"Content-Type" content=3D"text/html =
charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" =
class=3D""><br class=3D""><div><blockquote type=3D"cite" class=3D""><div =
class=3D"">On 20 Sep 2017, at 18:06, Logan Kuhn <<a =
href=3D"mailto:support@jac-properties.com" =
class=3D"">support@jac-properties.com</a>> wrote:</div><br =
class=3D"Apple-interchange-newline"><div class=3D""><div dir=3D"ltr" =
class=3D""><div style=3D"font-family: arial; font-size: 16px; =
background-color: rgb(253, 253, 253);" class=3D"">We had an incident =
where a VM hosts' disk filled up, the VMs all went unknown in the web =
console, but were fully functional if you were to login or use the =
services of one.</div></div></div></blockquote><div><br =
class=3D""></div><div>Hi,</div>yes, that can happen since the VM=E2=80=99s=
 storage is on NAS whereas the server itself is non-functional as the =
management and all other local processes are using local =
resources</div><div><br class=3D""><blockquote type=3D"cite" =
class=3D""><div class=3D""><div dir=3D"ltr" class=3D""><div =
style=3D"font-family: arial; font-size: 16px; background-color: rgb(253, =
253, 253);" class=3D"">  We couldn't migrate them so we powered =
them down on that host and powered them up and let ovirt choose the host =
for it, same as always. </div></div></div></blockquote><div><br =
class=3D""></div><div>that=E2=80=99s a mistake. The host should be =
fenced in that case, you likely do not have a power management =
configured, do you? Even when you do not have a fencing device available =
it should have been resolved manually by rebooting it =
 manually(after fixing the disk problem), or in case of permanent =
damage (e.g. server needs to be replaced, that takes a week, you need to =
run those VMs in the meantime elsewhere) it should have been powered off =
and VM states should be reset by =E2=80=9Cconfirm host has been =
rebooted=E2=80=9D manual action.</div><div><br =
class=3D""></div><div>Normally you should now be able to run those VMs =
while the status of the host is still Not Responding - was it not the =
case? How exactly you get to the situation that you were able to power =
up the VMs?</div><div><br class=3D""></div><div><br =
class=3D""></div><blockquote type=3D"cite" class=3D""><div class=3D""><div=
 dir=3D"ltr" class=3D""><div style=3D"font-family: arial; font-size: =
16px; background-color: rgb(253, 253, 253);" class=3D""> However the =
disk image on a few of them were corrupted because once we fixed the =
host with the full disk, it still thought it should be running the =
VM.  Which promptly corrupted the disk, the error seems to be this =
in the logs:</div></div></div></blockquote><div><br class=3D""></div>this =
can only happen for VMs flagged as HA, is it a case?</div><div><br =
class=3D""></div><div><div>Thanks,</div><div>michal</div><div =
class=3D""><br class=3D""></div><blockquote type=3D"cite" class=3D""><div =
class=3D""><div dir=3D"ltr" class=3D""><div style=3D"font-family: arial; =
font-size: 16px; background-color: rgb(253, 253, 253);" class=3D""><br =
class=3D""></div><div style=3D"font-family: arial; font-size: 16px; =
background-color: rgb(253, 253, 253);" class=3D""><span =
style=3D"font-family:monospace" class=3D""><span =
style=3D"background-color:rgb(255,255,255)" class=3D""><span =
class=3D"gmail-Object" id=3D"gmail-OBJ_PREFIX_DWT446_com_zimbra_date" =
style=3D"color:rgb(111,22,22)"><span class=3D"gmail-Object" =
id=3D"gmail-OBJ_PREFIX_DWT447_com_zimbra_date" style=3D"cursor: =
pointer;">2017-09-19</span></span> 21:59:11,058 INFO =
 [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] =
(DefaultQuartzScheduler3) [36c806f6] VM =
'70cf75c7-0fc2-4bbe-958e-7d0095f70960'(testhub) is </span><span =
style=3D"font-weight:bold;color:rgb(255,84,84);background-color:rgb(255,25=
5,255)" class=3D"">running</span><span =
style=3D"background-color:rgb(255,255,255)" class=3D""> in db and =
not </span><span =
style=3D"font-weight:bold;color:rgb(255,84,84);background-color:rgb(255,25=
5,255)" class=3D"">running</span><span =
style=3D"background-color:rgb(255,255,255)" class=3D""> on VDS =
'ef6dc2a3-af6e-4e00-aa4</span><br class=3D"">0-493b31263417'(vm-int7)<br =
class=3D""></span></div><div style=3D"font-family: arial; font-size: =
16px; background-color: rgb(253, 253, 253);" class=3D""><br =
class=3D""></div><div style=3D"font-family: arial; font-size: 16px; =
background-color: rgb(253, 253, 253);" class=3D"">We upgraded to 4.1.6 =
from 4.0.6 earlier in the day, I don't really think it's anything more =
than coincidence, but it's worrying enough to send to the =
community.</div><div style=3D"font-family: arial; font-size: 16px; =
background-color: rgb(253, 253, 253);" class=3D""><br =
class=3D""></div><div style=3D"font-family: arial; font-size: 16px; =
background-color: rgb(253, 253, 253);" class=3D"">Regards,<br =
class=3D"">Logan</div></div>
_______________________________________________<br class=3D"">Users =
mailing list<br class=3D""><a href=3D"mailto:Users@ovirt.org" =
class=3D"">Users@ovirt.org</a><br =
class=3D"">http://lists.ovirt.org/mailman/listinfo/users<br =
class=3D""></div></blockquote></div><br class=3D""></body></html>=

--Apple-Mail=_BF44E8B2-8075-4840-B5A9-81A1D7170AB7--

Re: [ovirt-users] VM running on multiple hosts

Michal Skrivanek