------=_Part_386347_617418787.1505930303404
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
This matches about with what we were thinking, thank you!
To answer your questions
We do not have power management configured due to it causing a cascading fa=
ilure early in our deployment. The host was not fenced and "confirm host r=
ebooted" was never used. The VMs were powered on via virsh (this shouldn't=
have happened)
The way they were powered on is most likely why they were corrupted is our =
thought
Logan
On September 20, 2017 at 12:03 PM Michal Skrivanek
<michal.skrivanek@=
redhat.com> wrote:
=20
=20
=20
> > On 20 Sep 2017, at 18:06, Logan Kuhn <support@jac-pro=
perties.com mailto:support@jac-properties.com > wrote:
>=20
> We had an incident where a VM hosts' disk filled up, the VMs al=
l
went unknown in the web console, but were fully functional if you were to=
login or use the services of one.
>=20
> >=20
Hi,
yes, that can happen since the VM=E2=80=99s storage is on NAS whereas=
the
server itself is non-functional as the management and all other local =
processes are using local resources
=20
=20
> > We couldn't migrate them so we powered them down on=
that host and powered them up and let ovirt choose the host for it, same a=
s always.=20
>=20
> >=20
that=E2=80=99s a mistake. The host should be fenced in that case, you=
likely
do not have a power management configured, do you? Even when you do=
not have a fencing device available it should have been resolved manually =
by rebooting it manually(after fixing the disk problem), or in case of per=
manent damage (e.g. server needs to be replaced, that takes a week, you nee=
d to run those VMs in the meantime elsewhere) it should have been powered o=
ff and VM states should be reset by =E2=80=9Cconfirm host has been rebooted=
=E2=80=9D manual action.
=20
Normally you should now be able to run those VMs while the status of =
the host
is still Not Responding - was it not the case? How exactly you get=
to the situation that you were able to power up the VMs?
=20
=20
=20
> > However the disk image on a few of them were corrupte=
d
because once we fixed the host with the full disk, it still thought it sh=
ould be running the VM. Which promptly corrupted the disk, the error seems=
to be this in the logs:
>=20
> >=20
this can only happen for VMs flagged as HA, is it a case?
=20
Thanks,
michal
=20
=20
> >=20
> 2017-09-19 21:59:11,058 INFO [org.ovirt.engine.core.vdsbroker.=
monitoring.VmAnalyzer] (DefaultQuartzScheduler3) [36c806f6] VM '70cf75c7-0f=
c2-4bbe-958e-7d0095f70960'(testhub) is running in db and not running on VDS=
'ef6dc2a3-af6e-4e00-aa4
> 0-493b31263417'(vm-int7)
>=20
> We upgraded to 4.1.6 from 4.0.6 earlier in the day, I don't rea=
lly think it's anything more than coincidence, but it's worrying enough to
=
send to the community.
>=20
> Regards,
> Logan
> _______________________________________________
> Users mailing list
> Users(a)ovirt.org mailto:Users@ovirt.org
>
http://lists.ovirt.org/mailman/listinfo/users
>=20
> >=20
=20
------=_Part_386347_617418787.1505930303404
MIME-Version: 1.0
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
<!DOCTYPE html>
<html><head>
<meta charset=3D"UTF-8">
</head><body><p>This matches about with what we were thinking, thank
you!</=
p><p>To answer your questions</p><p>We do not have power management
configu=
red due to it causing a cascading failure early in our deployment.  Th=
e host was not fenced and "confirm host rebooted" was never used.&#=
160; The VMs were powered on via virsh (this shouldn't have happened)</=
p><p>The way they were powered on is most likely why they were corrupted is=
our thought</p><p><br></p><p>Logan</p><blockquote
type=3D"cite"><div id=3D=
"ox-3145df7df0" style=3D"word-wrap: break-word;"
class=3D"">On September 20=
, 2017 at 12:03 PM Michal Skrivanek &#60;michal.skrivanek(a)redhat.com&#62; w=
rote:<br><br><br class=3D""><div><blockquote
type=3D"cite"><div class=3D"">=
On 20 Sep 2017, at 18:06, Logan Kuhn <<a href=3D"mailto:support@jac-pro=
perties.com" class=3D"">support(a)jac-properties.com</a>&#62;
wrote:</div><br=
class=3D"ox-3145df7df0-Apple-interchange-newline"><div
class=3D""><div dir=
=3D"ltr" class=3D""><div style=3D"font-family: arial;
font-size: 16px; back=
ground-color: #fdfdfd;" class=3D"">We had an incident where a VM
hosts'=
disk filled up, the VMs all went unknown in the web console, but were full=
y functional if you were to login or use the services of
one.</div></div></=
div></blockquote><div><br
class=3D""></div><div>Hi,</div>yes, that can happ=
en since the VM’s storage is on NAS whereas the server itself is non-=
functional as the management and all other local processes are using local =
resources</div><div><br class=3D""><blockquote
type=3D"cite"><div class=3D"=
"><div dir=3D"ltr" class=3D""><div
style=3D"font-family: arial; font-size: =
16px; background-color: #fdfdfd;" class=3D"">  We
couldn't migrate=
them so we powered them down on that host and powered them up and let ovir=
t choose the host for it, same as
always. </div></div></div></blockquo=
te><div><br class=3D""></div><div>that’s a
mistake. The host should b=
e fenced in that case, you likely do not have a power management configured=
, do you? Even when you do not have a fencing device available it should ha=
ve been resolved manually by rebooting it  manually(after fixing the d=
isk problem), or in case of permanent damage (e.g. server needs to be repla=
ced, that takes a week, you need to run those VMs in the meantime elsewhere=
) it should have been powered off and VM states should be reset by “c=
onfirm host has been rebooted” manual action.</div><div><br
class=3D"=
"></div><div>Normally you should now be able to run those VMs while
the sta=
tus of the host is still Not Responding - was it not the case? How exactly =
you get to the situation that you were able to power up the VMs?</div><div>=
<br class=3D""></div><div><br
class=3D""></div><blockquote type=3D"cite"><d=
iv class=3D""><div dir=3D"ltr" class=3D""><div
style=3D"font-family: arial;=
font-size: 16px; background-color: #fdfdfd;" class=3D"">However the
disk i=
mage on a few of them were corrupted because once we fixed the host with th=
e full disk, it still thought it should be running the VM.  Which prom=
ptly corrupted the disk, the error seems to be this in the logs:</div></div=
</div></blockquote><div><br
class=3D""></div>this can only happen for VMs =
flagged as HA, is
it a case?</div><div><br
class=3D""></div><div><div>Thank=
s,</div><div>michal</div><div class=3D""><br
class=3D""></div><blockquote t=
ype=3D"cite"><div class=3D""><div dir=3D"ltr"
class=3D""><div style=3D"font=
-family: arial; font-size: 16px; background-color: #fdfdfd;"
class=3D""><br=
class=3D""></div><div style=3D"font-family: arial; font-size:
16px; backgr=
ound-color: #fdfdfd;" class=3D""><span style=3D"font-family:
monospace;" cl=
ass=3D""><span style=3D"background-color: #ffffff;"
class=3D""><span class=
=3D"ox-3145df7df0-gmail-Object"
id=3D"ox-3145df7df0-gmail-OBJ_PREFIX_DWT446=
_com_zimbra_date" style=3D"color: #6f1616;"><span
class=3D"ox-3145df7df0-gm=
ail-Object" id=3D"ox-3145df7df0-gmail-OBJ_PREFIX_DWT447_com_zimbra_date"
st=
yle=3D"cursor:
pointer;">2017-09-19</span></span> 21:59:11,058 INFO &#=
160;[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (DefaultQuartzS=
cheduler3) [36c806f6] VM '70cf75c7-0fc2-4bbe-958e-7d0095f70960'(tes=
thub) is </span><span style=3D"font-weight: bold; color: #ff5454;
back=
ground-color: #ffffff;" class=3D"">running</span><span
style=3D"background-=
color: #ffffff;" class=3D""> in db and
not </span><span style=3D"=
font-weight: bold; color: #ff5454; background-color: #ffffff;"
class=3D"">r=
unning</span><span style=3D"background-color: #ffffff;"
class=3D""> on=
VDS 'ef6dc2a3-af6e-4e00-aa4</span><br
class=3D"">0-493b31263417'(v=
m-int7)<br class=3D""></span></div><div
style=3D"font-family: arial; font-s=
ize: 16px; background-color: #fdfdfd;" class=3D""><br
class=3D""></div><div=
style=3D"font-family: arial; font-size: 16px; background-color: #fdfdfd;" =
class=3D"">We upgraded to 4.1.6 from 4.0.6 earlier in the day, I
don't =
really think it's anything more than coincidence, but it's worrying=
enough to send to the community.</div><div style=3D"font-family: arial;
fo=
nt-size: 16px; background-color: #fdfdfd;" class=3D""><br
class=3D""></div>=
<div style=3D"font-family: arial; font-size: 16px; background-color: #fdfdf=
d;" class=3D"">Regards,<br
class=3D"">Logan</div></div>____________________=
___________________________<br class=3D"">Users mailing list<br
class=3D"">=
<a href=3D"mailto:Users@ovirt.org"
class=3D"">Users(a)ovirt.org</a><br class=
=3D"">http://lists.ovirt.org/mailman/listinfo/users<br
class=3D""></div></b=
lockquote></div><br
class=3D""></div></blockquote></body></html>
=20
------=_Part_386347_617418787.1505930303404--