[ovirt-users] ovirt with glusterfs - big test - unwanted results
paf1 at email.cz
paf1 at email.cz
Thu Mar 31 13:50:56 UTC 2016
Hello Yaniv,
we tried another small test - reboot two nodes from replica3 a1 (
1HP12-R3A1P1 ) which keep master domain.
All domains went down = master down, but master domain didn't move to
another available domain ( eg. 2HP12-R3A1P1 ).
It looks that "master domain" management isn't correct ( has a bug ?? )
regs.
Pavel
On 31.3.2016 14:30, Yaniv Kaul wrote:
> Hi Pavel,
>
> Thanks for the report. Can you begin with a more accurate description
> of your environment?
> Begin with host, oVirt and Gluster versions. Then continue with the
> exact setup (what are 'A', 'B', 'C' - domains? Volumes? What is the
> mapping between domains and volumes?).
>
> Are there any logs you can share with us?
>
> I'm sure with more information, we'd be happy to look at the issue.
> Y.
>
>
> On Thu, Mar 31, 2016 at 3:09 PM, paf1 at email.cz <mailto:paf1 at email.cz>
> <paf1 at email.cz <mailto:paf1 at email.cz>> wrote:
>
> Hello,
> we tried the following test - with unwanted results
>
> input:
> 5 node gluster
> A = replica 3 with arbiter 1 ( node1+node2+arbiter on node 5 )
> B = replica 3 with arbiter 1 ( node3+node4+arbiter on node 5 )
> C = distributed replica 3 arbiter 1 ( node1+node2, node3+node4,
> each arbiter on node 5)
> node 5 has only arbiter replica ( 4x )
>
> TEST:
> 1) directly reboot one node - OK ( is not important which ( data
> node or arbiter node ))
> 2) directly reboot two nodes - OK ( if nodes are not from the
> same replica )
> 3) directly reboot three nodes - yes, this is the main problem
> and a questions ....
> - rebooted all three nodes from replica "B" ( not so
> possible, but who knows ... )
> - all VMs with data on this replica was paused ( no data
> access ) - OK
> - all VMs running on replica "B" nodes lost ( started
> manually, later )( datas on other replicas ) - acceptable
> BUT
> - !!! all oVIrt domains went down !! - master domain is on
> replica "A" which lost only one member from three !!!
> so we are not expecting that all domain will go down,
> especially master with 2 live members.
>
> Results:
> - the whole cluster unreachable until at all domains up -
> depent of all nodes up !!!
> - all paused VMs started back - OK
> - rest of all VMs rebooted and runnig - OK
>
> Questions:
> 1) why all domains down if master domain ( on replica "A" )
> has two runnig members ( 2 of 3 ) ??
> 2) how to fix that colaps without waiting to all nodes up ? (
> in worste case if node has HW error eg. ) ??
> 3) which oVirt cluster policy can prevent that situation ??
> ( if any )
>
> regs.
> Pavel
>
>
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org <mailto:Users at ovirt.org>
> http://lists.ovirt.org/mailman/listinfo/users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160331/9c10e28a/attachment-0001.html>
More information about the Users
mailing list