From paf1 at email.cz Thu Mar 31 08:09:11 2016
Content-Type: multipart/mixed; boundary="===============9176115236222937382=="
MIME-Version: 1.0
From: paf1 at email.cz
To: users at ovirt.org
Subject: [ovirt-users] ovirt with glusterfs - big test - unwanted results
Date: Thu, 31 Mar 2016 14:09:05 +0200
Message-ID: <56FD1361.3010805@email.cz>
--===============9176115236222937382==
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
This is a multi-part message in MIME format.
--------------070802090208020205070907
Content-Type: text/plain; charset=3Dutf-8; format=3Dflowed
Content-Transfer-Encoding: 7bit
Hello,
we tried the following test - with unwanted results
input:
5 node gluster
A =3D replica 3 with arbiter 1 ( node1+node2+arbiter on node 5 )
B =3D replica 3 with arbiter 1 ( node3+node4+arbiter on node 5 )
C =3D distributed replica 3 arbiter 1 ( node1+node2, node3+node4, each =
arbiter on node 5)
node 5 has only arbiter replica ( 4x )
TEST:
1) directly reboot one node - OK ( is not important which ( data node =
or arbiter node ))
2) directly reboot two nodes - OK ( if nodes are not from the same =
replica )
3) directly reboot three nodes - yes, this is the main problem and a =
questions ....
- rebooted all three nodes from replica "B" ( not so possible, but =
who knows ... )
- all VMs with data on this replica was paused ( no data access ) - OK
- all VMs running on replica "B" nodes lost ( started manually, =
later )( datas on other replicas ) - acceptable
BUT
- !!! all oVIrt domains went down !! - master domain is on replica =
"A" which lost only one member from three !!!
so we are not expecting that all domain will go down, especially =
master with 2 live members.
Results:
- the whole cluster unreachable until at all domains up - depent of =
all nodes up !!!
- all paused VMs started back - OK
- rest of all VMs rebooted and runnig - OK
Questions:
1) why all domains down if master domain ( on replica "A" ) has two =
runnig members ( 2 of 3 ) ??
2) how to fix that colaps without waiting to all nodes up ? ( in =
worste case if node has HW error eg. ) ??
3) which oVirt cluster policy can prevent that situation ?? ( if =
any )
regs.
Pavel
--------------070802090208020205070907
Content-Type: text/html; charset=3Dutf-8
Content-Transfer-Encoding: 8bit
Hello,
we tried the=C2=A0 following test - with unwanted results
input:
5 node gluster
A =3D replica 3 with arbiter 1 ( node1+node2+arbiter on node 5 )
B =3D replica 3 with arbiter 1 ( node3+node4+arbiter on node 5 )
C =3D distributed replica 3 arbiter 1=C2=A0 ( node1+node2, node3+node4,
each arbiter on node 5)
node 5 has only arbiter replica ( 4x )
TEST:
1)=C2=A0 directly reboot one node - OK ( is not important which ( data
node or arbiter node ))
2)=C2=A0 directly reboot two nodes - OK ( if=C2=A0 nodes are not from t=
he same
replica )
3)=C2=A0 directly reboot three nodes - yes, this is the main problem and
a questions ....
=C2=A0=C2=A0=C2=A0 - rebooted all three nodes from replica "B"=C2=A0 ( =
not so possible,
but who knows ... )
=C2=A0=C2=A0=C2=A0 - all VMs with data on this replica was paused ( no =
data access
) - OK
=C2=A0=C2=A0=C2=A0 - all VMs running on replica "B" nodes lost (=C2=A0 =
started manually,
later )( datas on other replicas ) - acceptable
BUT
=C2=A0=C2=A0=C2=A0 - !!! all oVIrt domains went down !! - master domain=
is on
replica "A" which lost only one member from three !!!
=C2=A0=C2=A0=C2=A0 so we are not expecting that all domain will go down=
, especially
master with 2 live members.
=C2=A0=C2=A0=C2=A0
Results:
=C2=A0=C2=A0=C2=A0 - the whole cluster unreachable until at all domains=
up - depent
of all nodes up !!!
=C2=A0=C2=A0=C2=A0 - all paused VMs started back - OK
=C2=A0=C2=A0=C2=A0 - rest of all VMs rebooted and runnig - OK
Questions:
=C2=A0=C2=A0=C2=A0 1) why all domains down if master domain ( on replic=
a "A" ) has
two runnig members ( 2 of 3 )=C2=A0 ??
=C2=A0=C2=A0=C2=A0 2) how to fix that colaps without waiting to all nod=
es up ? ( in
worste case if node has HW error eg. ) ??
=C2=A0=C2=A0=C2=A0 3) which oVirt=C2=A0 cluster=C2=A0 policy=C2=A0 can =
prevent that situation ?? (
if any )
regs.
Pavel
--------------070802090208020205070907--
--===============9176115236222937382==
Content-Type: multipart/alternative
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="attachment.bin"
VGhpcyBpcyBhIG11bHRpLXBhcnQgbWVzc2FnZSBpbiBNSU1FIGZvcm1hdC4KLS0tLS0tLS0tLS0t
LS0wNzA4MDIwOTAyMDgwMjAyMDUwNzA5MDcKQ29udGVudC1UeXBlOiB0ZXh0L3BsYWluOyBjaGFy
c2V0PXV0Zi04OyBmb3JtYXQ9Zmxvd2VkCkNvbnRlbnQtVHJhbnNmZXItRW5jb2Rpbmc6IDdiaXQK
CkhlbGxvLAp3ZSB0cmllZCB0aGUgIGZvbGxvd2luZyB0ZXN0IC0gd2l0aCB1bndhbnRlZCByZXN1
bHRzCgppbnB1dDoKNSBub2RlIGdsdXN0ZXIKQSA9IHJlcGxpY2EgMyB3aXRoIGFyYml0ZXIgMSAo
IG5vZGUxK25vZGUyK2FyYml0ZXIgb24gbm9kZSA1ICkKQiA9IHJlcGxpY2EgMyB3aXRoIGFyYml0
ZXIgMSAoIG5vZGUzK25vZGU0K2FyYml0ZXIgb24gbm9kZSA1ICkKQyA9IGRpc3RyaWJ1dGVkIHJl
cGxpY2EgMyBhcmJpdGVyIDEgICggbm9kZTErbm9kZTIsIG5vZGUzK25vZGU0LCBlYWNoIAphcmJp
dGVyIG9uIG5vZGUgNSkKbm9kZSA1IGhhcyBvbmx5IGFyYml0ZXIgcmVwbGljYSAoIDR4ICkKClRF
U1Q6CjEpICBkaXJlY3RseSByZWJvb3Qgb25lIG5vZGUgLSBPSyAoIGlzIG5vdCBpbXBvcnRhbnQg
d2hpY2ggKCBkYXRhIG5vZGUgCm9yIGFyYml0ZXIgbm9kZSApKQoyKSAgZGlyZWN0bHkgcmVib290
IHR3byBub2RlcyAtIE9LICggaWYgIG5vZGVzIGFyZSBub3QgZnJvbSB0aGUgc2FtZSAKcmVwbGlj
YSApCjMpICBkaXJlY3RseSByZWJvb3QgdGhyZWUgbm9kZXMgLSB5ZXMsIHRoaXMgaXMgdGhlIG1h
aW4gcHJvYmxlbSBhbmQgYSAKcXVlc3Rpb25zIC4uLi4KICAgICAtIHJlYm9vdGVkIGFsbCB0aHJl
ZSBub2RlcyBmcm9tIHJlcGxpY2EgIkIiICAoIG5vdCBzbyBwb3NzaWJsZSwgYnV0IAp3aG8ga25v
d3MgLi4uICkKICAgICAtIGFsbCBWTXMgd2l0aCBkYXRhIG9uIHRoaXMgcmVwbGljYSB3YXMgcGF1
c2VkICggbm8gZGF0YSBhY2Nlc3MgKSAtIE9LCiAgICAgLSBhbGwgVk1zIHJ1bm5pbmcgb24gcmVw
bGljYSAiQiIgbm9kZXMgbG9zdCAoICBzdGFydGVkIG1hbnVhbGx5LCAKbGF0ZXIgKSggZGF0YXMg
b24gb3RoZXIgcmVwbGljYXMgKSAtIGFjY2VwdGFibGUKQlVUCiAgICAgLSAhISEgYWxsIG9WSXJ0
IGRvbWFpbnMgd2VudCBkb3duICEhIC0gbWFzdGVyIGRvbWFpbiBpcyBvbiByZXBsaWNhIAoiQSIg
d2hpY2ggbG9zdCBvbmx5IG9uZSBtZW1iZXIgZnJvbSB0aHJlZSAhISEKICAgICBzbyB3ZSBhcmUg
bm90IGV4cGVjdGluZyB0aGF0IGFsbCBkb21haW4gd2lsbCBnbyBkb3duLCBlc3BlY2lhbGx5IApt
YXN0ZXIgd2l0aCAyIGxpdmUgbWVtYmVycy4KClJlc3VsdHM6CiAgICAgLSB0aGUgd2hvbGUgY2x1
c3RlciB1bnJlYWNoYWJsZSB1bnRpbCBhdCBhbGwgZG9tYWlucyB1cCAtIGRlcGVudCBvZiAKYWxs
IG5vZGVzIHVwICEhIQogICAgIC0gYWxsIHBhdXNlZCBWTXMgc3RhcnRlZCBiYWNrIC0gT0sKICAg
ICAtIHJlc3Qgb2YgYWxsIFZNcyByZWJvb3RlZCBhbmQgcnVubmlnIC0gT0sKClF1ZXN0aW9uczoK
ICAgICAxKSB3aHkgYWxsIGRvbWFpbnMgZG93biBpZiBtYXN0ZXIgZG9tYWluICggb24gcmVwbGlj
YSAiQSIgKSBoYXMgdHdvIApydW5uaWcgbWVtYmVycyAoIDIgb2YgMyApICA/PwogICAgIDIpIGhv
dyB0byBmaXggdGhhdCBjb2xhcHMgd2l0aG91dCB3YWl0aW5nIHRvIGFsbCBub2RlcyB1cCA/ICgg
aW4gCndvcnN0ZSBjYXNlIGlmIG5vZGUgaGFzIEhXIGVycm9yIGVnLiApID8/CiAgICAgMykgd2hp
Y2ggb1ZpcnQgIGNsdXN0ZXIgIHBvbGljeSAgY2FuIHByZXZlbnQgdGhhdCBzaXR1YXRpb24gPz8g
KCBpZiAKYW55ICkKCnJlZ3MuClBhdmVsCgoKCi0tLS0tLS0tLS0tLS0tMDcwODAyMDkwMjA4MDIw
MjA1MDcwOTA3CkNvbnRlbnQtVHlwZTogdGV4dC9odG1sOyBjaGFyc2V0PXV0Zi04CkNvbnRlbnQt
VHJhbnNmZXItRW5jb2Rpbmc6IDhiaXQKCjxodG1sPgogIDxoZWFkPgoKICAgIDxtZXRhIGh0dHAt
ZXF1aXY9ImNvbnRlbnQtdHlwZSIgY29udGVudD0idGV4dC9odG1sOyBjaGFyc2V0PXV0Zi04Ij4K
ICA8L2hlYWQ+CiAgPGJvZHkgdGV4dD0iIzAwMDA2NiIgYmdjb2xvcj0iI0ZGRkZGRiI+CiAgICBI
ZWxsbywgPGJyPgogICAgd2UgdHJpZWQgdGhlwqAgZm9sbG93aW5nIHRlc3QgLSB3aXRoIHVud2Fu
dGVkIHJlc3VsdHM8YnI+CiAgICA8YnI+CiAgICBpbnB1dDo8YnI+CiAgICA1IG5vZGUgZ2x1c3Rl
cjxicj4KICAgIEEgPSByZXBsaWNhIDMgd2l0aCBhcmJpdGVyIDEgKCBub2RlMStub2RlMithcmJp
dGVyIG9uIG5vZGUgNSApPGJyPgogICAgQiA9IHJlcGxpY2EgMyB3aXRoIGFyYml0ZXIgMSAoIG5v
ZGUzK25vZGU0K2FyYml0ZXIgb24gbm9kZSA1ICk8YnI+CiAgICBDID0gZGlzdHJpYnV0ZWQgcmVw
bGljYSAzIGFyYml0ZXIgMcKgICggbm9kZTErbm9kZTIsIG5vZGUzK25vZGU0LAogICAgZWFjaCBh
cmJpdGVyIG9uIG5vZGUgNSk8YnI+CiAgICBub2RlIDUgaGFzIG9ubHkgYXJiaXRlciByZXBsaWNh
ICggNHggKTxicj4KICAgIDxicj4KICAgIFRFU1Q6PGJyPgogICAgMSnCoCBkaXJlY3RseSByZWJv
b3Qgb25lIG5vZGUgLSBPSyAoIGlzIG5vdCBpbXBvcnRhbnQgd2hpY2ggKCBkYXRhCiAgICBub2Rl
IG9yIGFyYml0ZXIgbm9kZSApKTxicj4KICAgIDIpwqAgZGlyZWN0bHkgcmVib290IHR3byBub2Rl
cyAtIE9LICggaWbCoCBub2RlcyBhcmUgbm90IGZyb20gdGhlIHNhbWUKICAgIHJlcGxpY2EgKSA8
YnI+CiAgICAzKcKgIGRpcmVjdGx5IHJlYm9vdCB0aHJlZSBub2RlcyAtIHllcywgdGhpcyBpcyB0
aGUgbWFpbiBwcm9ibGVtIGFuZAogICAgYSBxdWVzdGlvbnMgLi4uLjxicj4KICAgIMKgwqDCoCAt
IHJlYm9vdGVkIGFsbCB0aHJlZSBub2RlcyBmcm9tIHJlcGxpY2EgIkIiwqAgKCBub3Qgc28gcG9z
c2libGUsCiAgICBidXQgd2hvIGtub3dzIC4uLiApPGJyPgogICAgwqDCoMKgIC0gYWxsIFZNcyB3
aXRoIGRhdGEgb24gdGhpcyByZXBsaWNhIHdhcyBwYXVzZWQgKCBubyBkYXRhIGFjY2VzcwogICAg
KSAtIE9LPGJyPgogICAgwqDCoMKgIC0gYWxsIFZNcyBydW5uaW5nIG9uIHJlcGxpY2EgIkIiIG5v
ZGVzIGxvc3QgKMKgIHN0YXJ0ZWQgbWFudWFsbHksCiAgICBsYXRlciApKCBkYXRhcyBvbiBvdGhl
ciByZXBsaWNhcyApIC0gYWNjZXB0YWJsZTxicj4KICAgIEJVVDxicj4KICAgIMKgwqDCoCAtICEh
ISBhbGwgb1ZJcnQgZG9tYWlucyB3ZW50IGRvd24gISEgLSBtYXN0ZXIgZG9tYWluIGlzIG9uCiAg
ICByZXBsaWNhICJBIiB3aGljaCBsb3N0IG9ubHkgb25lIG1lbWJlciBmcm9tIHRocmVlICEhITxi
cj4KICAgIMKgwqDCoCBzbyB3ZSBhcmUgbm90IGV4cGVjdGluZyB0aGF0IGFsbCBkb21haW4gd2ls
bCBnbyBkb3duLCBlc3BlY2lhbGx5CiAgICBtYXN0ZXIgd2l0aCAyIGxpdmUgbWVtYmVycy48YnI+
CiAgICDCoMKgwqAgPGJyPgogICAgUmVzdWx0czogPGJyPgogICAgwqDCoMKgIC0gdGhlIHdob2xl
IGNsdXN0ZXIgdW5yZWFjaGFibGUgdW50aWwgYXQgYWxsIGRvbWFpbnMgdXAgLSBkZXBlbnQKICAg
IG9mIGFsbCBub2RlcyB1cCAhISE8YnI+CiAgICDCoMKgwqAgLSBhbGwgcGF1c2VkIFZNcyBzdGFy
dGVkIGJhY2sgLSBPSzxicj4KICAgIMKgwqDCoCAtIHJlc3Qgb2YgYWxsIFZNcyByZWJvb3RlZCBh
bmQgcnVubmlnIC0gT0s8YnI+CiAgICA8YnI+CiAgICBRdWVzdGlvbnM6PGJyPgogICAgwqDCoMKg
IDEpIHdoeSBhbGwgZG9tYWlucyBkb3duIGlmIG1hc3RlciBkb21haW4gKCBvbiByZXBsaWNhICJB
IiApIGhhcwogICAgdHdvIHJ1bm5pZyBtZW1iZXJzICggMiBvZiAzICnCoCA/Pzxicj4KICAgIMKg
wqDCoCAyKSBob3cgdG8gZml4IHRoYXQgY29sYXBzIHdpdGhvdXQgd2FpdGluZyB0byBhbGwgbm9k
ZXMgdXAgPyAoIGluCiAgICB3b3JzdGUgY2FzZSBpZiBub2RlIGhhcyBIVyBlcnJvciBlZy4gKSA/
Pzxicj4KICAgIMKgwqDCoCAzKSB3aGljaCBvVmlydMKgIGNsdXN0ZXLCoCBwb2xpY3nCoCBjYW4g
cHJldmVudCB0aGF0IHNpdHVhdGlvbiA/PyAoCiAgICBpZiBhbnkgKTxicj4KICAgIDxicj4KICAg
IHJlZ3MuPGJyPgogICAgUGF2ZWw8YnI+CiAgICA8YnI+CiAgICA8YnI+CiAgPC9ib2R5Pgo8L2h0
bWw+CgotLS0tLS0tLS0tLS0tLTA3MDgwMjA5MDIwODAyMDIwNTA3MDkwNy0tCg==
--===============9176115236222937382==--