This is a multi-part message in MIME format.
--------------050907030908030304090009
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Hi,
rest of logs:
www.uschovna.cz/en/zasilka/HYGXR57CNHM3TP39-L3W
<
http://www.uschovna.cz/en/zasilka/HYGXR57CNHM3TP39-L3W>
The TEST is the last big event in logs ....
TEST TIME : about 14:00-14:30 CET
regs.Pavel
On 31.3.2016 14:30, Yaniv Kaul wrote:
Hi Pavel,
Thanks for the report. Can you begin with a more accurate description
of your environment?
Begin with host, oVirt and Gluster versions. Then continue with the
exact setup (what are 'A', 'B', 'C' - domains? Volumes? What is
the
mapping between domains and volumes?).
Are there any logs you can share with us?
I'm sure with more information, we'd be happy to look at the issue.
Y.
On Thu, Mar 31, 2016 at 3:09 PM, paf1(a)email.cz <mailto:paf1@email.cz>
<paf1(a)email.cz <mailto:paf1@email.cz>> wrote:
Hello,
we tried the following test - with unwanted results
input:
5 node gluster
A = replica 3 with arbiter 1 ( node1+node2+arbiter on node 5 )
B = replica 3 with arbiter 1 ( node3+node4+arbiter on node 5 )
C = distributed replica 3 arbiter 1 ( node1+node2, node3+node4,
each arbiter on node 5)
node 5 has only arbiter replica ( 4x )
TEST:
1) directly reboot one node - OK ( is not important which ( data
node or arbiter node ))
2) directly reboot two nodes - OK ( if nodes are not from the
same replica )
3) directly reboot three nodes - yes, this is the main problem
and a questions ....
- rebooted all three nodes from replica "B" ( not so
possible, but who knows ... )
- all VMs with data on this replica was paused ( no data
access ) - OK
- all VMs running on replica "B" nodes lost ( started
manually, later )( datas on other replicas ) - acceptable
BUT
- !!! all oVIrt domains went down !! - master domain is on
replica "A" which lost only one member from three !!!
so we are not expecting that all domain will go down,
especially master with 2 live members.
Results:
- the whole cluster unreachable until at all domains up -
depent of all nodes up !!!
- all paused VMs started back - OK
- rest of all VMs rebooted and runnig - OK
Questions:
1) why all domains down if master domain ( on replica "A" )
has two runnig members ( 2 of 3 ) ??
2) how to fix that colaps without waiting to all nodes up ? (
in worste case if node has HW error eg. ) ??
3) which oVirt cluster policy can prevent that situation ??
( if any )
regs.
Pavel
_______________________________________________
Users mailing list
Users(a)ovirt.org <mailto:Users@ovirt.org>
http://lists.ovirt.org/mailman/listinfo/users
--------------050907030908030304090009
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: 8bit
<html>
<head>
<meta content="text/html; charset=utf-8"
http-equiv="Content-Type">
</head>
<body text="#000066" bgcolor="#FFFFFF">
Hi, <br>
rest of logs:<br>
<a href="http://www.uschovna.cz/en/zasilka/HYGXR57CNHM3TP39-L3W"
style="text-decoration:none;color:#ff9c00;">www.uschovna.cz/en/zasilka/HYGXR57CNHM3TP39-L3W</a><br>
<br>
The TEST is the last big event in logs ....<br>
TEST TIME : about 14:00-14:30 CET<br>
<br>
regs.Pavel<br>
<br>
<div class="moz-cite-prefix">On 31.3.2016 14:30, Yaniv Kaul
wrote:<br>
</div>
<blockquote
cite="mid:CAJgorsaOUQ_42GUSPh-H1vGUgJ114JYcUHR8vHwvmcWR+w8Jmw@mail.gmail.com"
type="cite">
<div dir="ltr">Hi Pavel,
<div><br>
</div>
<div>Thanks for the report. Can you begin with a more accurate
description of your environment?</div>
<div>Begin with host, oVirt and Gluster versions. Then continue
with the exact setup (what are 'A', 'B', 'C' - domains?
Volumes? What is the mapping between domains and volumes?).</div>
<div><br>
</div>
<div>Are there any logs you can share with us?</div>
<div><br>
</div>
<div>I'm sure with more information, we'd be happy to look at
the issue.</div>
<div>Y.</div>
<div><br>
</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Thu, Mar 31, 2016 at 3:09 PM, <a
moz-do-not-send="true"
href="mailto:paf1@email.cz">paf1@email.cz</a>
<span dir="ltr"><<a moz-do-not-send="true"
href="mailto:paf1@email.cz"
target="_blank">paf1(a)email.cz</a>&gt;</span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000066" bgcolor="#FFFFFF"> Hello,
<br>
we tried the following test - with unwanted results<br>
<br>
input:<br>
5 node gluster<br>
A = replica 3 with arbiter 1 ( node1+node2+arbiter on node
5 )<br>
B = replica 3 with arbiter 1 ( node3+node4+arbiter on node
5 )<br>
C = distributed replica 3 arbiter 1 ( node1+node2,
node3+node4, each arbiter on node 5)<br>
node 5 has only arbiter replica ( 4x )<br>
<br>
TEST:<br>
1) directly reboot one node - OK ( is not important which
( data node or arbiter node ))<br>
2) directly reboot two nodes - OK ( if nodes are not
from the same replica ) <br>
3) directly reboot three nodes - yes, this is the main
problem and a questions ....<br>
- rebooted all three nodes from replica "B" ( not so
possible, but who knows ... )<br>
- all VMs with data on this replica was paused ( no
data access ) - OK<br>
- all VMs running on replica "B" nodes lost ( started
manually, later )( datas on other replicas ) - acceptable<br>
BUT<br>
- !!! all oVIrt domains went down !! - master domain
is on replica "A" which lost only one member from three
!!!<br>
so we are not expecting that all domain will go down,
especially master with 2 live members.<br>
<br>
Results: <br>
- the whole cluster unreachable until at all domains
up - depent of all nodes up !!!<br>
- all paused VMs started back - OK<br>
- rest of all VMs rebooted and runnig - OK<br>
<br>
Questions:<br>
1) why all domains down if master domain ( on replica
"A" ) has two runnig members ( 2 of 3 ) ??<br>
2) how to fix that colaps without waiting to all nodes
up ? ( in worste case if node has HW error eg. ) ??<br>
3) which oVirt cluster policy can prevent that
situation ?? ( if any )<br>
<br>
regs.<br>
Pavel<br>
<br>
<br>
</div>
<br>
_______________________________________________<br>
Users mailing list<br>
<a moz-do-not-send="true"
href="mailto:Users@ovirt.org">Users@ovirt.org</a><br>
<a moz-do-not-send="true"
href="http://lists.ovirt.org/mailman/listinfo/users"
rel="noreferrer"
target="_blank">http://lists.ovirt.org/mailman/listinfo/user...
<br>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</body>
</html>
--------------050907030908030304090009--