This is a multi-part message in MIME format.
--------------020302000209030903080908
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit
Adding Eli.
On 01/27/2014 02:50 PM, Andrew Lau wrote:
Hi,
I think he was asking what if the power management device reported
that the host was powered off. Then VMs should be brought back up as
being off would essentially be the same as running a power cycle/reboot?
Another example I'm seeing is what happens if the whole host loses
power and it's power management device then becomes unavailable (ie.
not reachable) then you're stuck in the case where it requires manual
intervention.
I would be interested to potentially see something like a timeout on
those problematic VMs (eg. if nothing was read or write after x amount
of time) then you could consider the host as offline? I guess then
that adds a lot of risk..
On Mon, Jan 27, 2014 at 11:43 PM, Tareq Alayan <talayan(a)redhat.com
<mailto:talayan@redhat.com>> wrote:
Hi,
Power management makes use of special *dedicated* hardware in
order to restart hosts independently of host OS. The engine
connects to a power management devices using a *dedicated* network
IP address.
The engine is capable of rebooting hosts that have entered a
non-operational or non-responsive state,
The abilities provided by all power management devices are: check
status, start, stop and recycle (restart)...
In the case of non-responsive host: all of the VMs that are
currently running on that host can also become non-responsive.
However, the non-responsive host keeps locking the VM hard disk
for all VMs it is running. Attempting to start a VM on a different
host and assign the second host write privileges for the virtual
machine hard disk image can cause data corruption.
Rebooting allows the engine to assume that the lock on a VM hard
disk image has been released.
The engine can know for sure that the problematic host has been
rebooted via the power management device and then it can start a
VM from the problematic host on another host without risking data
corruption.
Important note: A virtual machine that has been marked
highly-available can not be safely started on a different host
without the certainty that doing so will not cause data corruption.
N-joy,
--Tareq
On 01/27/2014 02:05 PM, Dafna Ron wrote:
I am adding Tareq for the Power Management implementation.
Dafna
On 01/27/2014 11:48 AM, Karli Sjöberg wrote:
On Mon, 2014-01-27 at 11:11 +0000, Dafna Ron wrote:
Powering off the host will never trigger vm migration.
As far as engine is concerned it just lost connection
to the host, but
has no way of telling if the host is down or if a
router is down.
CanŽt it at least check with power management if the Host
status is down
first?
I mean, if the network is down there will be no response
from either PM
or Host. But if PM is up and can tell you that the Host is
down, sounds
rather clear cut to me...
Seems to me the VM's would be restarted sooner if the flow
was altered
to first check with PM if itŽs a network or Host issue,
and if Host
issue, immediately restart VM's on another Host, instead
of waiting for
a potentially problematic Host to boot up eventually.
/K
since vm's can continue running on the host even if
engine has no access
to it, starting the vm's on the second host can cause
split brain and
data corruption.
The way that the engine knows what's going on is by
sending heath check
queries to the vdsm.
Power management will try to reboot a host when the
health checks to
vdsm will not be answered.
So... if engine gets no reply and has no way of
rebooting the host, the
host status will be changed to Non-Responsive and the
vm's will be
unknown because engine has no way of knowing what's
happening with the
vm's.
Since reboot of the host will kill the vm's running on
it - this will
never cause any vm migration but... along with the
High-Availability vm
feature, you will be able to have some of the vm's
re-started on the
second host after the host reboot (and that is only if
Power Management
was confirmed as successful).
VM migration is only triggered when:
1. Cluster configuration states that the vm should be
migrated in case
of failure
2. Engine has access to the host - so the failure is
on the storage side
and not the host side.
3. the vms are not actively writing (although there
might be a new RFE
for it).
hope this clears things up
Dafna
On 01/27/2014 10:11 AM, Andrew Lau wrote:
Hi,
Have you got power management enabled?
That's the fencing feature required for the engine
to ensure that the
host is actually offline. It won't resume any
other VMs to prevent
potential VM corruption (eg. VM running on
multiple hosts).
Andrew.
On Jan 27, 2014 5:12 PM, "Jaison peter"
<urotrip2(a)gmail.com <mailto:urotrip2@gmail.com>
<mailto:urotrip2@gmail.com
<mailto:urotrip2@gmail.com>>> wrote:
Hi all ,
I was setting a two node ovirt cluster with
ovirt engine on
seperate node . I completed the configuration
and tested VM live
migrations with out any issues . Then for
checking cluster HA I
powered down one host and expected vms
running on that host to be
migrated to the other one . But nothing
happened , Engine detected
host as un-rechable and marked it as
non-operational and vm ran on
that host went to 'unknown state' . Is that
not possible to setup
a fully HA ovirt cluster with two nodes ? or
else is that my
configuration problem ? please advice .
Thanks & Regards
Alex
_______________________________________________
Users mailing list
Users(a)ovirt.org <mailto:Users@ovirt.org>
<mailto:Users@ovirt.org <mailto:Users@ovirt.org>>
http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________
Users mailing list
Users(a)ovirt.org <mailto:Users@ovirt.org>
http://lists.ovirt.org/mailman/listinfo/users
--
Dafna Ron
_______________________________________________
Users mailing list
Users(a)ovirt.org <mailto:Users@ovirt.org>
http://lists.ovirt.org/mailman/listinfo/users
--------------020302000209030903080908
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
Adding Eli.<br>
<br>
<br>
<div class="moz-cite-prefix">On 01/27/2014 02:50 PM, Andrew Lau
wrote:<br>
</div>
<blockquote
cite="mid:CAD7dF9fNhvSsd+Oj2s+rJo4oSkwnZiU6H23tbY0XL12_JShfsw@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_default"
style="font-family:tahoma,sans-serif">Hi,</div>
<div class="gmail_default"
style="font-family:tahoma,sans-serif"><br>
</div>
<div class="gmail_default"
style="font-family:tahoma,sans-serif">
I think he was asking what if the power management device
reported that the host was powered off. Then VMs should be
brought back up as being off would essentially be the same as
running a power cycle/reboot?</div>
<div class="gmail_default"
style="font-family:tahoma,sans-serif">
<br>
</div>
<div class="gmail_default"
style="font-family:tahoma,sans-serif">Another
example I'm seeing is what happens if the whole host loses
power and it's power management device then becomes
unavailable (ie. not reachable) then you're stuck in the case
where it requires manual intervention. </div>
<div class="gmail_default"
style="font-family:tahoma,sans-serif"><br>
</div>
<div class="gmail_default"
style="font-family:tahoma,sans-serif">I
would be interested to potentially see something like a
timeout on those problematic VMs (eg. if nothing was read or
write after x amount of time) then you could consider the host
as offline? I guess then that adds a lot of risk..</div>
<div class="gmail_extra">
<br>
<br>
<div class="gmail_quote">On Mon, Jan 27, 2014 at 11:43 PM,
Tareq Alayan <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:talayan@redhat.com"
target="_blank">talayan(a)redhat.com</a>&gt;</span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
Hi,<br>
<br>
Power management makes use of special *dedicated* hardware
in order to restart hosts independently of host OS. The
engine connects to a power management devices using a
*dedicated* network IP address.<br>
The engine is capable of rebooting hosts that have entered
a non-operational or non-responsive state,<br>
The abilities provided by all power management devices
are: check status, start, stop and recycle (restart)...<br>
<br>
In the case of non-responsive host: all of the VMs that
are currently running on that host can also become
non-responsive. However, the non-responsive host keeps
locking the VM hard disk for all VMs it is running.
Attempting to start a VM on a different host and assign
the second host write privileges for the virtual machine
hard disk image can cause data corruption.<br>
Rebooting allows the engine to assume that the lock on a
VM hard disk image has been released.<br>
The engine can know for sure that the problematic host has
been rebooted via the power management device and then it
can start a VM from the problematic host on another host
without risking data corruption.<br>
Important note: A virtual machine that has been marked
highly-available can not be safely started on a different
host without the certainty that doing so will not cause
data corruption.<br>
<br>
N-joy,<br>
<br>
--Tareq
<div class="HOEnZb">
<div class="h5"><br>
<br>
<br>
<br>
On 01/27/2014 02:05 PM, Dafna Ron wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
I am adding Tareq for the Power Management
implementation.<br>
<br>
Dafna<br>
<br>
<br>
On 01/27/2014 11:48 AM, Karli Sjöberg wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
On Mon, 2014-01-27 at 11:11 +0000, Dafna Ron
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0
0 .8ex;border-left:1px #ccc
solid;padding-left:1ex">
Powering off the host will never trigger vm
migration.<br>
As far as engine is concerned it just lost
connection to the host, but<br>
has no way of telling if the host is down or if
a router is down.<br>
</blockquote>
Can´t it at least check with power management if
the Host status is down<br>
first?<br>
<br>
I mean, if the network is down there will be no
response from either PM<br>
or Host. But if PM is up and can tell you that the
Host is down, sounds<br>
rather clear cut to me...<br>
<br>
Seems to me the VM's would be restarted sooner if
the flow was altered<br>
to first check with PM if it´s a network or Host
issue, and if Host<br>
issue, immediately restart VM's on another Host,
instead of waiting for<br>
a potentially problematic Host to boot up
eventually.<br>
<br>
/K<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0
0 .8ex;border-left:1px #ccc
solid;padding-left:1ex">
since vm's can continue running on the host even
if engine has no access<br>
to it, starting the vm's on the second host can
cause split brain and<br>
data corruption.<br>
<br>
The way that the engine knows what's going on is
by sending heath check<br>
queries to the vdsm.<br>
Power management will try to reboot a host when
the health checks to<br>
vdsm will not be answered.<br>
So... if engine gets no reply and has no way of
rebooting the host, the<br>
host status will be changed to Non-Responsive
and the vm's will be<br>
unknown because engine has no way of knowing
what's happening with the<br>
vm's.<br>
Since reboot of the host will kill the vm's
running on it - this will<br>
never cause any vm migration but... along with
the High-Availability vm<br>
feature, you will be able to have some of the
vm's re-started on the<br>
second host after the host reboot (and that is
only if Power Management<br>
was confirmed as successful).<br>
<br>
VM migration is only triggered when:<br>
1. Cluster configuration states that the vm
should be migrated in case<br>
of failure<br>
2. Engine has access to the host - so the
failure is on the storage side<br>
and not the host side.<br>
3. the vms are not actively writing (although
there might be a new RFE<br>
for it).<br>
<br>
hope this clears things up<br>
<br>
Dafna<br>
<br>
<br>
<br>
On 01/27/2014 10:11 AM, Andrew Lau wrote:<br>
<blockquote class="gmail_quote" style="margin:0
0 0 .8ex;border-left:1px #ccc
solid;padding-left:1ex">
Hi,<br>
<br>
Have you got power management enabled?<br>
<br>
That's the fencing feature required for the
engine to ensure that the<br>
host is actually offline. It won't resume any
other VMs to prevent<br>
potential VM corruption (eg. VM running on
multiple hosts).<br>
<br>
Andrew.<br>
<br>
On Jan 27, 2014 5:12 PM, "Jaison peter" <<a
moz-do-not-send="true"
href="mailto:urotrip2@gmail.com"
target="_blank">urotrip2(a)gmail.com</a><br>
<mailto:<a moz-do-not-send="true"
href="mailto:urotrip2@gmail.com"
target="_blank">urotrip2(a)gmail.com</a>&gt;&gt;
wrote:<br>
<br>
Hi all ,<br>
<br>
I was setting a two node ovirt
cluster
with ovirt engine on<br>
seperate node . I completed the
configuration and tested VM live<br>
migrations with out any issues .
Then for
checking cluster HA I<br>
powered down one host and
expected vms
running on that host to be<br>
migrated to the other one . But
nothing
happened , Engine detected<br>
host as un-rechable and marked
it as
non-operational and vm ran on<br>
that host went to 'unknown
state' . Is
that not possible to setup<br>
a fully HA ovirt cluster with
two nodes ?
or else is that my<br>
configuration problem ? please
advice .<br>
<br>
Thanks &
Regards<br>
<br>
Alex<br>
<br>
_______________________________________________<br>
Users mailing list<br>
<a
moz-do-not-send="true"
href="mailto:Users@ovirt.org"
target="_blank">Users(a)ovirt.org</a>
<mailto:<a moz-do-not-send="true"
href="mailto:Users@ovirt.org"
target="_blank">Users(a)ovirt.org</a>&gt;<br>
<a
moz-do-not-send="true"
href="http://lists.ovirt.org/mailman/listinfo/users"
target="_blank">http://lists.ovirt.org/mailman/listinfo/user...
<br>
<br>
<br>
_______________________________________________<br>
Users mailing list<br>
<a moz-do-not-send="true"
href="mailto:Users@ovirt.org"
target="_blank">Users(a)ovirt.org</a><br>
<a moz-do-not-send="true"
href="http://lists.ovirt.org/mailman/listinfo/users"
target="_blank">http://lists.ovirt.org/mailman/listinfo/user...
</blockquote>
<br>
-- <br>
Dafna Ron<br>
_______________________________________________<br>
Users mailing list<br>
<a moz-do-not-send="true"
href="mailto:Users@ovirt.org"
target="_blank">Users(a)ovirt.org</a><br>
<a moz-do-not-send="true"
href="http://lists.ovirt.org/mailman/listinfo/users"
target="_blank">http://lists.ovirt.org/mailman/listinfo/user...
</blockquote>
<br>
<br>
</blockquote>
<br>
<br>
</blockquote>
<br>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</blockquote>
<br>
</body>
</html>
--------------020302000209030903080908--