Hello Sahina,
look attached logs which U requested
regs.
Pavel
On 5.4.2016 14:07, Sahina Bose wrote:
On 03/31/2016 06:41 PM, paf1(a)email.cz wrote:
> Hi,
> rest of logs:
>
www.uschovna.cz/en/zasilka/HYGXR57CNHM3TP39-L3W
> <
http://www.uschovna.cz/en/zasilka/HYGXR57CNHM3TP39-L3W>
>
> The TEST is the last big event in logs ....
> TEST TIME : about 14:00-14:30 CET
Thank you Pavel for the interesting test report and sharing the logs.
You are right - the master domain should not go down if 2 of 3 bricks
are available from volume A (1HP12-R3A1P1).
I notice that host kvmarbiter was not responsive at 2016-03-31
13:27:19 , but the ConnectStorageServerVDSCommand executed on
kvmarbiter node returned success at 2016-03-31 13:27:26
Could you also share the vdsm logs from 1hp1, 1hp2 and kvmarbiter
nodes during this time ?
Ravi, Krutika - could you take a look at the gluster logs?
>
> regs.Pavel
>
> On 31.3.2016 14:30, Yaniv Kaul wrote:
>> Hi Pavel,
>>
>> Thanks for the report. Can you begin with a more accurate
>> description of your environment?
>> Begin with host, oVirt and Gluster versions. Then continue with the
>> exact setup (what are 'A', 'B', 'C' - domains? Volumes?
What is the
>> mapping between domains and volumes?).
>>
>> Are there any logs you can share with us?
>>
>> I'm sure with more information, we'd be happy to look at the issue.
>> Y.
>>
>>
>> On Thu, Mar 31, 2016 at 3:09 PM, paf1(a)email.cz <paf1(a)email.cz
>> <mailto:paf1@email.cz>> wrote:
>>
>> Hello,
>> we tried the following test - with unwanted results
>>
>> input:
>> 5 node gluster
>> A = replica 3 with arbiter 1 ( node1+node2+arbiter on node 5 )
>> B = replica 3 with arbiter 1 ( node3+node4+arbiter on node 5 )
>> C = distributed replica 3 arbiter 1 ( node1+node2, node3+node4,
>> each arbiter on node 5)
>> node 5 has only arbiter replica ( 4x )
>>
>> TEST:
>> 1) directly reboot one node - OK ( is not important which (
>> data node or arbiter node ))
>> 2) directly reboot two nodes - OK ( if nodes are not from the
>> same replica )
>> 3) directly reboot three nodes - yes, this is the main problem
>> and a questions ....
>> - rebooted all three nodes from replica "B" ( not so
>> possible, but who knows ... )
>> - all VMs with data on this replica was paused ( no data
>> access ) - OK
>> - all VMs running on replica "B" nodes lost ( started
>> manually, later )( datas on other replicas ) - acceptable
>> BUT
>> - !!! all oVIrt domains went down !! - master domain is on
>> replica "A" which lost only one member from three !!!
>> so we are not expecting that all domain will go down,
>> especially master with 2 live members.
>>
>> Results:
>> - the whole cluster unreachable until at all domains up -
>> depent of all nodes up !!!
>> - all paused VMs started back - OK
>> - rest of all VMs rebooted and runnig - OK
>>
>> Questions:
>> 1) why all domains down if master domain ( on replica "A" )
>> has two runnig members ( 2 of 3 ) ??
>> 2) how to fix that colaps without waiting to all nodes up ?
>> ( in worste case if node has HW error eg. ) ??
>> 3) which oVirt cluster policy can prevent that situation
>> ?? ( if any )
>>
>> regs.
>> Pavel
>>
>>
>>
>> _______________________________________________
>> Users mailing list
>> Users(a)ovirt.org <mailto:Users@ovirt.org>
>>
http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>
>
>
> _______________________________________________
> Users mailing list
> Users(a)ovirt.org
>
http://lists.ovirt.org/mailman/listinfo/users