[ovirt-users] ovirt with glusterfs - big test - unwanted results

paf1 at email.cz paf1 at email.cz
Tue Apr 5 13:32:36 UTC 2016


Hello Sahina,
look attached logs which U requested

regs.
Pavel

On 5.4.2016 14:07, Sahina Bose wrote:
>
>
> On 03/31/2016 06:41 PM, paf1 at email.cz wrote:
>> Hi,
>> rest of logs:
>> www.uschovna.cz/en/zasilka/HYGXR57CNHM3TP39-L3W 
>> <http://www.uschovna.cz/en/zasilka/HYGXR57CNHM3TP39-L3W>
>>
>> The TEST is the last big event in logs ....
>> TEST TIME : about 14:00-14:30  CET
>
> Thank you Pavel for the interesting test report and sharing the logs.
>
> You are right - the master domain should not go down if 2 of 3 bricks 
> are available from volume A (1HP12-R3A1P1).
>
> I notice that host kvmarbiter was not responsive at 2016-03-31 
> 13:27:19 , but the ConnectStorageServerVDSCommand executed on 
> kvmarbiter node returned success at 2016-03-31 13:27:26
>
> Could you also share the vdsm logs from 1hp1, 1hp2 and kvmarbiter 
> nodes during this time ?
>
> Ravi, Krutika - could you take a look at the gluster logs?
>
>>
>> regs.Pavel
>>
>> On 31.3.2016 14:30, Yaniv Kaul wrote:
>>> Hi Pavel,
>>>
>>> Thanks for the report. Can you begin with a more accurate 
>>> description of your environment?
>>> Begin with host, oVirt and Gluster versions. Then continue with the 
>>> exact setup (what are 'A', 'B', 'C' - domains? Volumes? What is the 
>>> mapping between domains and volumes?).
>>>
>>> Are there any logs you can share with us?
>>>
>>> I'm sure with more information, we'd be happy to look at the issue.
>>> Y.
>>>
>>>
>>> On Thu, Mar 31, 2016 at 3:09 PM, paf1 at email.cz <paf1 at email.cz 
>>> <mailto:paf1 at email.cz>> wrote:
>>>
>>>     Hello,
>>>     we tried the  following test - with unwanted results
>>>
>>>     input:
>>>     5 node gluster
>>>     A = replica 3 with arbiter 1 ( node1+node2+arbiter on node 5 )
>>>     B = replica 3 with arbiter 1 ( node3+node4+arbiter on node 5 )
>>>     C = distributed replica 3 arbiter 1  ( node1+node2, node3+node4,
>>>     each arbiter on node 5)
>>>     node 5 has only arbiter replica ( 4x )
>>>
>>>     TEST:
>>>     1)  directly reboot one node - OK ( is not important which (
>>>     data node or arbiter node ))
>>>     2)  directly reboot two nodes - OK ( if  nodes are not from the
>>>     same replica )
>>>     3)  directly reboot three nodes - yes, this is the main problem
>>>     and a questions ....
>>>         - rebooted all three nodes from replica "B"  ( not so
>>>     possible, but who knows ... )
>>>         - all VMs with data on this replica was paused ( no data
>>>     access ) - OK
>>>         - all VMs running on replica "B" nodes lost ( started
>>>     manually, later )( datas on other replicas ) - acceptable
>>>     BUT
>>>         - !!! all oVIrt domains went down !! - master domain is on
>>>     replica "A" which lost only one member from three !!!
>>>         so we are not expecting that all domain will go down,
>>>     especially master with 2 live members.
>>>
>>>     Results:
>>>         - the whole cluster unreachable until at all domains up -
>>>     depent of all nodes up !!!
>>>         - all paused VMs started back - OK
>>>         - rest of all VMs rebooted and runnig - OK
>>>
>>>     Questions:
>>>         1) why all domains down if master domain ( on replica "A" )
>>>     has two runnig members ( 2 of 3 )  ??
>>>         2) how to fix that colaps without waiting to all nodes up ?
>>>     ( in worste case if node has HW error eg. ) ??
>>>         3) which oVirt  cluster  policy  can prevent that situation
>>>     ?? ( if any )
>>>
>>>     regs.
>>>     Pavel
>>>
>>>
>>>
>>>     _______________________________________________
>>>     Users mailing list
>>>     Users at ovirt.org <mailto:Users at ovirt.org>
>>>     http://lists.ovirt.org/mailman/listinfo/users
>>>
>>>
>>
>>
>>
>> _______________________________________________
>> Users mailing list
>> Users at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160405/e3174d11/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vdsm.logs.tar
Type: application/x-tar
Size: 7065600 bytes
Desc: not available
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160405/e3174d11/attachment-0001.tar>


More information about the Users mailing list