This is a multi-part message in MIME format.
--------------010603000305050305060402
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
On 03/31/2016 06:41 PM, paf1(a)email.cz wrote:
Thank you Pavel for the interesting test report and sharing the logs.
You are right - the master domain should not go down if 2 of 3 bricks
are available from volume A (1HP12-R3A1P1).
I notice that host kvmarbiter was not responsive at 2016-03-31 13:27:19
, but the ConnectStorageServerVDSCommand executed on kvmarbiter node
returned success at 2016-03-31 13:27:26
Could you also share the vdsm logs from 1hp1, 1hp2 and kvmarbiter nodes
during this time ?
Ravi, Krutika - could you take a look at the gluster logs?
regs.Pavel
On 31.3.2016 14:30, Yaniv Kaul wrote:
> Hi Pavel,
>
> Thanks for the report. Can you begin with a more accurate description
> of your environment?
> Begin with host, oVirt and Gluster versions. Then continue with the
> exact setup (what are 'A', 'B', 'C' - domains? Volumes? What
is the
> mapping between domains and volumes?).
>
> Are there any logs you can share with us?
>
> I'm sure with more information, we'd be happy to look at the issue.
> Y.
>
>
> On Thu, Mar 31, 2016 at 3:09 PM, paf1(a)email.cz <mailto:paf1@email.cz>
> <paf1(a)email.cz <mailto:paf1@email.cz>> wrote:
>
> Hello,
> we tried the following test - with unwanted results
>
> input:
> 5 node gluster
> A = replica 3 with arbiter 1 ( node1+node2+arbiter on node 5 )
> B = replica 3 with arbiter 1 ( node3+node4+arbiter on node 5 )
> C = distributed replica 3 arbiter 1 ( node1+node2, node3+node4,
> each arbiter on node 5)
> node 5 has only arbiter replica ( 4x )
>
> TEST:
> 1) directly reboot one node - OK ( is not important which ( data
> node or arbiter node ))
> 2) directly reboot two nodes - OK ( if nodes are not from the
> same replica )
> 3) directly reboot three nodes - yes, this is the main problem
> and a questions ....
> - rebooted all three nodes from replica "B" ( not so
> possible, but who knows ... )
> - all VMs with data on this replica was paused ( no data
> access ) - OK
> - all VMs running on replica "B" nodes lost ( started
> manually, later )( datas on other replicas ) - acceptable
> BUT
> - !!! all oVIrt domains went down !! - master domain is on
> replica "A" which lost only one member from three !!!
> so we are not expecting that all domain will go down,
> especially master with 2 live members.
>
> Results:
> - the whole cluster unreachable until at all domains up -
> depent of all nodes up !!!
> - all paused VMs started back - OK
> - rest of all VMs rebooted and runnig - OK
>
> Questions:
> 1) why all domains down if master domain ( on replica "A" )
> has two runnig members ( 2 of 3 ) ??
> 2) how to fix that colaps without waiting to all nodes up ? (
> in worste case if node has HW error eg. ) ??
> 3) which oVirt cluster policy can prevent that situation
> ?? ( if any )
>
> regs.
> Pavel
>
>
>
> _______________________________________________
> Users mailing list
> Users(a)ovirt.org <mailto:Users@ovirt.org>
>
http://lists.ovirt.org/mailman/listinfo/users
>
>
_______________________________________________
Users mailing list
Users(a)ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
--------------010603000305050305060402
Content-Type: text/html; charset=windows-1252
Content-Transfer-Encoding: 8bit
<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<br>
<br>
<div class="moz-cite-prefix">On 03/31/2016 06:41 PM, <a
class="moz-txt-link-abbreviated"
href="mailto:paf1@email.cz">paf1@email.cz</a>
wrote:<br>
</div>
<blockquote cite="mid:56FD221F.30707@email.cz" type="cite">
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
Hi, <br>
rest of logs:<br>
<a moz-do-not-send="true"
href="http://www.uschovna.cz/en/zasilka/HYGXR57CNHM3TP39-L3W"
style="text-decoration:none;color:#ff9c00;">www.uschovna.cz/en/zasilka/HYGXR57CNHM3TP39-L3W</a><br>
<br>
The TEST is the last big event in logs ....<br>
TEST TIME : about 14:00-14:30 CET<br>
</blockquote>
<br>
Thank you Pavel for the interesting test report and sharing the
logs.<br>
<br>
You are right - the master domain should not go down if 2 of 3
bricks are available from volume A (1HP12-R3A1P1).<br>
<br>
I notice that host kvmarbiter was not responsive at 2016-03-31
13:27:19 , but the ConnectStorageServerVDSCommand executed on
kvmarbiter node returned success at 2016-03-31 13:27:26<br>
<br>
Could you also share the vdsm logs from 1hp1, 1hp2 and kvmarbiter
nodes during this time ?<br>
<br>
Ravi, Krutika - could you take a look at the gluster logs? <br>
<br>
<blockquote cite="mid:56FD221F.30707@email.cz" type="cite">
<br>
regs.Pavel<br>
<br>
<div class="moz-cite-prefix">On 31.3.2016 14:30, Yaniv Kaul
wrote:<br>
</div>
<blockquote
cite="mid:CAJgorsaOUQ_42GUSPh-H1vGUgJ114JYcUHR8vHwvmcWR+w8Jmw@mail.gmail.com"
type="cite">
<div dir="ltr">Hi Pavel,
<div><br>
</div>
<div>Thanks for the report. Can you begin with a more accurate
description of your environment?</div>
<div>Begin with host, oVirt and Gluster versions. Then
continue with the exact setup (what are 'A', 'B', 'C'
-
domains? Volumes? What is the mapping between domains and
volumes?).</div>
<div><br>
</div>
<div>Are there any logs you can share with us?</div>
<div><br>
</div>
<div>I'm sure with more information, we'd be happy to look at
the issue.</div>
<div>Y.</div>
<div><br>
</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Thu, Mar 31, 2016 at 3:09 PM, <a
moz-do-not-send="true"
href="mailto:paf1@email.cz"><a class="moz-txt-link-abbreviated"
href="mailto:paf1@email.cz">paf1@email.cz</a></a>
<span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:paf1@email.cz"
target="_blank">paf1(a)email.cz</a>&gt;</span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000066" bgcolor="#FFFFFF"> Hello,
<br>
we tried the following test - with unwanted results<br>
<br>
input:<br>
5 node gluster<br>
A = replica 3 with arbiter 1 ( node1+node2+arbiter on
node 5 )<br>
B = replica 3 with arbiter 1 ( node3+node4+arbiter on
node 5 )<br>
C = distributed replica 3 arbiter 1 ( node1+node2,
node3+node4, each arbiter on node 5)<br>
node 5 has only arbiter replica ( 4x )<br>
<br>
TEST:<br>
1) directly reboot one node - OK ( is not important
which ( data node or arbiter node ))<br>
2) directly reboot two nodes - OK ( if nodes are not
from the same replica ) <br>
3) directly reboot three nodes - yes, this is the main
problem and a questions ....<br>
- rebooted all three nodes from replica "B" ( not
so possible, but who knows ... )<br>
- all VMs with data on this replica was paused ( no
data access ) - OK<br>
- all VMs running on replica "B" nodes lost (
started manually, later )( datas on other replicas ) -
acceptable<br>
BUT<br>
- !!! all oVIrt domains went down !! - master domain
is on replica "A" which lost only one member from three
!!!<br>
so we are not expecting that all domain will go
down, especially master with 2 live members.<br>
<br>
Results: <br>
- the whole cluster unreachable until at all domains
up - depent of all nodes up !!!<br>
- all paused VMs started back - OK<br>
- rest of all VMs rebooted and runnig - OK<br>
<br>
Questions:<br>
1) why all domains down if master domain ( on
replica "A" ) has two runnig members ( 2 of 3 ) ??<br>
2) how to fix that colaps without waiting to all
nodes up ? ( in worste case if node has HW error eg. )
??<br>
3) which oVirt cluster policy can prevent that
situation ?? ( if any )<br>
<br>
regs.<br>
Pavel<br>
<br>
<br>
</div>
<br>
_______________________________________________<br>
Users mailing list<br>
<a moz-do-not-send="true"
href="mailto:Users@ovirt.org">Users@ovirt.org</a><br>
<a moz-do-not-send="true"
href="http://lists.ovirt.org/mailman/listinfo/users"
rel="noreferrer"
target="_blank">http://lists.ovirt.org/mailman/listinfo/user...
<br>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
Users mailing list
<a class="moz-txt-link-abbreviated"
href="mailto:Users@ovirt.org">Users@ovirt.org</a>
<a class="moz-txt-link-freetext"
href="http://lists.ovirt.org/mailman/listinfo/users">http://...
</pre>
</blockquote>
<br>
</body>
</html>
--------------010603000305050305060402--