--Apple-Mail=_F57D6FCC-B987-4E15-887D-7030B6C7C45B
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
charset=utf-8
On 20 Sep 2017, at 21:08, Yaniv Kaul <ykaul(a)redhat.com> wrote:
=20
=20
=20
On Sep 20, 2017 9:50 PM, <support(a)jac-properties.com =
<mailto:support@jac-properties.com>> wrote:
This matches about with what we were thinking, thank you!
=20
To answer your questions
=20
We do not have power management configured due to it causing a =
cascading failure
early in our deployment. The host was not fenced and =
"confirm host rebooted" was never used. The VMs were powered on via =
virsh (this shouldn't have happened)
=20
The way they were powered on is most likely why they were corrupted is =
our
thought
=20
=20
yes.=20
That=E2=80=99s why we put a basic password protection to the plain virsh =
access. Easy to circumvent, but then you=E2=80=99re on your own=E2=80=A6
Hm, how exactly were they powered on by virsh? Normally this is not =
possible for oVirt VMs at all due to initial set up of host-specific =
things(disk paths), we also use transient libvirt domains so stopped VMs =
are not defined in libvirt once they stop. So I wonder how exactly was =
this done?
Unless they were in Paused state where you indeed can simply continue =
the execution.=20
=20
We'd be happy if you could share both engine and host logs, including =
vdsm.log, engine.log and /var/log/messages from both.=20
> Y.=20
=20
=20
>=20
> Logan
>=20
>> On September 20, 2017 at 12:03 PM Michal Skrivanek =
<michal.skrivanek(a)redhat.com <mailto:michal.skrivanek@redhat.com>> =
wrote:
>=20
>=20
>> On 20 Sep 2017, at 18:06, Logan Kuhn <support(a)jac-properties.com =
<mailto:support@jac-properties.com>> wrote:
>>=20
>> We had an incident where a VM hosts' disk filled up, the VMs all =
went
unknown in the web console, but were fully functional if you were =
to login or use the services of one.
>=20
> Hi,
> yes, that can happen since the VM=E2=80=99s storage is on NAS whereas =
the
server itself is non-functional as the management and all other =
local processes are using local resources
>=20
>> We couldn't migrate them so we powered them down on that host and =
powered them up and let ovirt choose the host for it, same as always.=20
>=20
> that=E2=80=99s a mistake. The host should be fenced in that case, you =
likely
do not have a power management configured, do you? Even when you =
do not have a fencing device available it should have been resolved =
manually by rebooting it manually(after fixing the disk problem), or in =
case of permanent damage (e.g. server needs to be replaced, that takes a =
week, you need to run those VMs in the meantime elsewhere) it should =
have been powered off and VM states should be reset by =E2=80=9Cconfirm =
host has been rebooted=E2=80=9D manual action.
>=20
> Normally you should now be able to run those VMs while the status of =
the host
is still Not Responding - was it not the case? How exactly you =
get to the situation that you were able to power up the VMs?
sorry, I meant "Normally you should not be able to run those VMs=E2=80=9D
Thanks,
michal
>=20
>=20
>> However the disk image on a few of them were corrupted because once =
we
fixed the host with the full disk, it still thought it should be =
running the VM. Which promptly corrupted the disk, the error seems to =
be this in the logs:
>=20
> this can only happen for VMs flagged as HA, is it a case?
>=20
> Thanks,
> michal
>=20
>>=20
>> 2017-09-19 21:59:11,058 INFO =
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] =
(DefaultQuartzScheduler3) [36c806f6] VM =
'70cf75c7-0fc2-4bbe-958e-7d0095f70960'(testhub) is running in db and not =
running on VDS 'ef6dc2a3-af6e-4e00-aa4
>> 0-493b31263417'(vm-int7)
>>=20
>> We upgraded to 4.1.6 from 4.0.6 earlier in the day, I don't really =
think it's anything more than coincidence, but it's worrying enough to =
send to the community.
>>=20
>> Regards,
>> Logan
>> _______________________________________________
>> Users mailing list
>> Users(a)ovirt.org <mailto:Users@ovirt.org>
>>
http://lists.ovirt.org/mailman/listinfo/users =
<
http://lists.ovirt.org/mailman/listinfo/users>
>
=20
=20
> _______________________________________________
> Users mailing list
> Users(a)ovirt.org <mailto:Users@ovirt.org>
>
http://lists.ovirt.org/mailman/listinfo/users =
<
http://lists.ovirt.org/mailman/listinfo/users>
--Apple-Mail=_F57D6FCC-B987-4E15-887D-7030B6C7C45B
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
charset=utf-8
<html><head><meta http-equiv=3D"Content-Type"
content=3D"text/html =
charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" =
class=3D""><br class=3D""><div><blockquote
type=3D"cite" class=3D""><div =
class=3D"">On 20 Sep 2017, at 21:08, Yaniv Kaul <<a =
href=3D"mailto:ykaul@redhat.com"
class=3D"">ykaul(a)redhat.com</a>&gt; =
wrote:</div><br class=3D"Apple-interchange-newline"><div
class=3D""><div =
style=3D"font-family: Helvetica; font-size: 12px; font-style: normal; =
font-variant-caps: normal; font-weight: normal; letter-spacing: normal; =
text-align: start; text-indent: 0px; text-transform: none; white-space: =
normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" =
class=3D""><div class=3D"gmail_extra"><br =
class=3D"Apple-interchange-newline"><br class=3D""><div =
class=3D"gmail_quote">On Sep 20, 2017 9:50 PM, <<a =
href=3D"mailto:support@jac-properties.com" =
class=3D"">support(a)jac-properties.com</a>&gt; wrote:<br =
type=3D"attribution" class=3D""><blockquote
class=3D"quote" =
style=3D"margin: 0px 0px 0px 0.8ex; border-left-width: 1px; =
border-left-style: solid; border-left-color: rgb(204, 204, 204); =
padding-left: 1ex;"><u class=3D""></u><div
class=3D""><p class=3D"">This =
matches about with what we were thinking, thank you!</p><p
class=3D"">To =
answer your questions</p><p class=3D"">We do not have power
management =
configured due to it causing a cascading failure early in our =
deployment. The host was not fenced and "confirm host rebooted" =
was never used. The VMs were powered on via virsh (this shouldn't =
have happened)</p><p class=3D"">The way they were powered on is most
=
likely why they were corrupted is our thought</p><div
class=3D""><br =
class=3D""></div></div></blockquote></div></div></div></div></blockquote><=
div><br
class=3D""></div>yes. </div><div>That=E2=80=99s
why we put =
a basic password protection to the plain virsh access. Easy to =
circumvent, but then you=E2=80=99re on your own=E2=80=A6</div><div><br =
class=3D""></div><div>Hm, how exactly were they powered on by
virsh? =
Normally this is not possible for oVirt VMs at all due to initial set up =
of host-specific things(disk paths), we also use transient libvirt =
domains so stopped VMs are not defined in libvirt once they stop. So I =
wonder how exactly was this done?</div><div>Unless they were in Paused =
state where you indeed can simply continue the =
execution. </div><div><br
class=3D""></div><div><blockquote =
type=3D"cite" class=3D""><div class=3D""><div
dir=3D"auto" =
style=3D"font-family: Helvetica; font-size: 12px; font-style: normal; =
font-variant-caps: normal; font-weight: normal; letter-spacing: normal; =
text-align: start; text-indent: 0px; text-transform: none; white-space: =
normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;"
class=3D""><br=
class=3D""></div><div dir=3D"auto"
style=3D"font-family: Helvetica; =
font-size: 12px; font-style: normal; font-variant-caps: normal; =
font-weight: normal; letter-spacing: normal; text-align: start; =
text-indent: 0px; text-transform: none; white-space: normal; =
word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=3D"">We'd
be =
happy if you could share both engine and host logs, including vdsm.log, =
engine.log and /var/log/messages from both. </div><div
dir=3D"auto" =
style=3D"font-family: Helvetica; font-size: 12px; font-style: normal; =
font-variant-caps: normal; font-weight: normal; letter-spacing: normal; =
text-align: start; text-indent: 0px; text-transform: none; white-space: =
normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" =
class=3D"">Y. </div><div dir=3D"auto"
style=3D"font-family: =
Helvetica; font-size: 12px; font-style: normal; font-variant-caps: =
normal; font-weight: normal; letter-spacing: normal; text-align: start; =
text-indent: 0px; text-transform: none; white-space: normal; =
word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=3D""><br =
class=3D""></div><div dir=3D"auto"
style=3D"font-family: Helvetica; =
font-size: 12px; font-style: normal; font-variant-caps: normal; =
font-weight: normal; letter-spacing: normal; text-align: start; =
text-indent: 0px; text-transform: none; white-space: normal; =
word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=3D""><div
=
class=3D"gmail_extra"><div
class=3D"gmail_quote"><blockquote =
class=3D"quote" style=3D"margin: 0px 0px 0px 0.8ex; border-left-width: =
1px; border-left-style: solid; border-left-color: rgb(204, 204, 204); =
padding-left: 1ex;"><div class=3D""><font
color=3D"#888888" class=3D""><p =
class=3D""><br class=3D""></p><p
class=3D"">Logan</p></font><div =
class=3D"elided-text"><blockquote type=3D"cite"
class=3D""><div =
id=3D"m_-4156457467377991239ox-3145df7df0" style=3D"word-wrap: =
break-word;" class=3D"">On September 20, 2017 at 12:03 PM Michal =
Skrivanek <<a href=3D"mailto:michal.skrivanek@redhat.com" =
target=3D"_blank"
class=3D"">michal.skrivanek(a)redhat.com</a>&gt; =
wrote:<br class=3D""><br class=3D""><br
class=3D""><div =
class=3D""><blockquote type=3D"cite"
class=3D""><div class=3D"">On 20 =
Sep 2017, at 18:06, Logan Kuhn <<a =
href=3D"mailto:support@jac-properties.com" target=3D"_blank" =
class=3D"">support(a)jac-properties.com</a>&gt;
wrote:</div><br =
class=3D"m_-4156457467377991239ox-3145df7df0-Apple-interchange-newline"><d=
iv class=3D""><div dir=3D"ltr" class=3D""><div
style=3D"font-family: =
arial; font-size: 16px; background-color: rgb(253, 253, 253);" =
class=3D"">We had an incident where a VM hosts' disk filled up, the VMs
=
all went unknown in the web console, but were fully functional if you =
were to login or use the services of =
one.</div></div></div></blockquote><div
class=3D""><br =
class=3D""></div><div class=3D"">Hi,</div>yes,
that can happen since the =
VM=E2=80=99s storage is on NAS whereas the server itself is =
non-functional as the management and all other local processes are using =
local resources</div><div class=3D""><br
class=3D""><blockquote =
type=3D"cite" class=3D""><div class=3D""><div
dir=3D"ltr" class=3D""><div =
style=3D"font-family: arial; font-size: 16px; background-color: rgb(253, =
253, 253);" class=3D""> <span =
class=3D"Apple-converted-space"> </span>We couldn't
migrate them so =
we powered them down on that host and powered them up and let ovirt =
choose the host for it, same as =
always. </div></div></div></blockquote><div
class=3D""><br =
class=3D""></div><div class=3D"">that=E2=80=99s a
mistake. The host =
should be fenced in that case, you likely do not have a power management =
configured, do you? Even when you do not have a fencing device available =
it should have been resolved manually by rebooting it =
manually(after fixing the disk problem), or in case of permanent =
damage (e.g. server needs to be replaced, that takes a week, you need to =
run those VMs in the meantime elsewhere) it should have been powered off =
and VM states should be reset by =E2=80=9Cconfirm host has been =
rebooted=E2=80=9D manual action.</div><div class=3D""><br =
class=3D""></div><div class=3D"">Normally you should
now be able to run =
those VMs while the status of the host is still Not Responding - was it =
not the case? How exactly you get to the situation that you were able to =
power up the =
VMs?</div></div></div></blockquote></div></div></blockquote></div></div></=
div></div></blockquote><div><br
class=3D""></div>sorry, I meant =
"Normally you should not be able to run those
VMs=E2=80=9D</div><div><br =
class=3D""></div><div>Thanks,</div><div>michal</div><div><br
=
class=3D""><blockquote type=3D"cite"
class=3D""><div class=3D""><div =
dir=3D"auto" style=3D"font-family: Helvetica; font-size: 12px; =
font-style: normal; font-variant-caps: normal; font-weight: normal; =
letter-spacing: normal; text-align: start; text-indent: 0px; =
text-transform: none; white-space: normal; word-spacing: 0px; =
-webkit-text-stroke-width: 0px;" class=3D""><div =
class=3D"gmail_extra"><div
class=3D"gmail_quote"><blockquote =
class=3D"quote" style=3D"margin: 0px 0px 0px 0.8ex; border-left-width: =
1px; border-left-style: solid; border-left-color: rgb(204, 204, 204); =
padding-left: 1ex;"><div class=3D""><div
class=3D"elided-text"><blockquote=
type=3D"cite" class=3D""><div
id=3D"m_-4156457467377991239ox-3145df7df0" =
style=3D"word-wrap: break-word;" class=3D""><div
class=3D""><div =
class=3D""><br class=3D""></div><div
class=3D""><br =
class=3D""></div><blockquote type=3D"cite"
class=3D""><div class=3D""><div=
dir=3D"ltr" class=3D""><div style=3D"font-family: arial;
font-size: =
16px; background-color: rgb(253, 253, 253);" class=3D"">However the disk
=
image on a few of them were corrupted because once we fixed the host =
with the full disk, it still thought it should be running the VM. =
Which promptly corrupted the disk, the error seems to be this in the =
logs:</div></div></div></blockquote><div
class=3D""><br =
class=3D""></div>this can only happen for VMs flagged as HA, is it a =
case?</div><div class=3D""><br
class=3D""></div><div class=3D""><div =
class=3D"">Thanks,</div><div
class=3D"">michal</div><div class=3D""><br =
class=3D""></div><blockquote type=3D"cite"
class=3D""><div class=3D""><div=
dir=3D"ltr" class=3D""><div style=3D"font-family: arial;
font-size: =
16px; background-color: rgb(253, 253, 253);" class=3D""><br =
class=3D""></div><div style=3D"font-family: arial; font-size:
16px; =
background-color: rgb(253, 253, 253);" class=3D""><span =
style=3D"font-family: monospace;" class=3D""><span =
style=3D"background-color: rgb(255, 255, 255);" class=3D""><span
=
class=3D"m_-4156457467377991239ox-3145df7df0-gmail-Object" =
id=3D"m_-4156457467377991239ox-3145df7df0-gmail-OBJ_PREFIX_DWT446_com_zimb=
ra_date" style=3D"color: rgb(111, 22, 22);"><span =
class=3D"m_-4156457467377991239ox-3145df7df0-gmail-Object" =
id=3D"m_-4156457467377991239ox-3145df7df0-gmail-OBJ_PREFIX_DWT447_com_zimb=
ra_date">2017-09-19</span></span> 21:59:11,058 INFO =
[org.ovirt.engine.core.<wbr
class=3D"">vdsbroker.monitoring.<wbr =
class=3D"">VmAnalyzer] (DefaultQuartzScheduler3) [36c806f6] VM =
'70cf75c7-0fc2-4bbe-958e-<wbr class=3D"">7d0095f70960'(testhub) =
is </span><span style=3D"font-weight: bold; color: rgb(255, 84, =
84); background-color: rgb(255, 255, 255);"
class=3D"">running</span><span=
style=3D"background-color: rgb(255, 255, 255);"
class=3D""> in db =
and not </span><span style=3D"font-weight: bold; color: rgb(255, =
84, 84); background-color: rgb(255, 255, 255);" =
class=3D"">running</span><span style=3D"background-color:
rgb(255, 255, =
255);" class=3D""> on VDS
'ef6dc2a3-af6e-4e00-aa4</span><br =
class=3D"">0-493b31263417'(vm-int7)<br
class=3D""></span></div><div =
style=3D"font-family: arial; font-size: 16px; background-color: rgb(253, =
253, 253);" class=3D""><br
class=3D""></div><div style=3D"font-family: =
arial; font-size: 16px; background-color: rgb(253, 253, 253);" =
class=3D"">We upgraded to 4.1.6 from 4.0.6 earlier in the day, I don't =
really think it's anything more than coincidence, but it's worrying =
enough to send to the community.</div><div style=3D"font-family: arial; =
font-size: 16px; background-color: rgb(253, 253, 253);"
class=3D""><br =
class=3D""></div><div style=3D"font-family: arial; font-size:
16px; =
background-color: rgb(253, 253, 253);" class=3D"">Regards,<br =
class=3D"">Logan</div></div>______________________________<wbr
=
class=3D"">_________________<br class=3D"">Users mailing
list<br =
class=3D""><a href=3D"mailto:Users@ovirt.org"
target=3D"_blank" =
class=3D"">Users(a)ovirt.org</a><br class=3D""><a =
href=3D"http://lists.ovirt.org/mailman/listinfo/users"
target=3D"_blank" =
class=3D"">http://lists.ovirt.org/<wbr =
class=3D"">mailman/listinfo/users</a><br =
class=3D""></div></blockquote></div><br =
class=3D""></div></blockquote></div></div><br =
class=3D"">______________________________<wbr =
class=3D"">_________________<br class=3D"">Users mailing
list<br =
class=3D""><a href=3D"mailto:Users@ovirt.org" =
class=3D"">Users(a)ovirt.org</a><br class=3D""><a =
href=3D"http://lists.ovirt.org/mailman/listinfo/users"
rel=3D"noreferrer" =
target=3D"_blank"
class=3D"">http://lists.ovirt.org/<wbr =
class=3D"">mailman/listinfo/users</a></blockquote></div></div></div></div>=
</blockquote></div><br class=3D""></body></html>=
--Apple-Mail=_F57D6FCC-B987-4E15-887D-7030B6C7C45B--