[ovirt-users] oVirt 4 master storage down - unable to activate after power loss

Robin Vanderveken rvanderveken at dlttechnologies.com
Mon Jul 25 15:33:31 UTC 2016


I resolved the issue.

First I put the nodes in Maintenance mode (not stopping Gluster service).

Then because the files that were in split-brain mode were not important I
set the hex value to 0 on both nodes. I guess I could have used something
like "getfattr -d -m . -e hex
/gluster-bricks/vol1/vol1/9d1a204c-1b1a-450f-986d-a8c84babb4c1/dom_md/ids"
and find the correct hex and such, the following article explains it
nicely:
https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3/html/Administration_Guide/sect-Managing_Split-brain.html
.

[root at oVirt-Node1 ~]# setfattr -n trusted.afr.vol1-client-1 -v
0x000000000000000000000000
/gluster-bricks/vol1/vol1/9d1a204c-1b1a-450f-986d-a8c84babb4c1/dom_md/ids
[root at oVirt-Node1 ~]# setfattr -n trusted.afr.vol1-client-1 -v
0x000000000000000000000000 /gluster-bricks/vol1/vol1
​​
/
​
__DIRECT_IO_TEST__
[root at oVirt-Node1 ~]# setfattr -n trusted.afr.vol1-client-1 -v
0x000000000000000000000000
/gluster-bricks/vol1/vol1/9d1a204c-1b1a-450f-986d-a8c84babb4c1/dom_md/inbox
[root at oVirt-Node1 ~]# setfattr -n trusted.afr.vol1-client-1 -v
0x000000000000000000000000
/gluster-bricks/vol1/vol1/9d1a204c-1b1a-450f-986d-a8c84babb4c1/dom_md/leases

[root at ovirt-node2 ~]# setfattr -n trusted.afr.vol1-client-0 -v
0x000000000000000000000000
/gluster-bricks/vol1/vol1/9d1a204c-1b1a-450f-986d-a8c84babb4c1/dom_md/ids
[root at ovirt-node2 ~]# setfattr -n trusted.afr.vol1-client-0 -v
0x000000000000000000000000 /gluster-bricks/vol1/vol1
​
/
​
__DIRECT_IO_TEST__
[root at ovirt-node2 ~]# setfattr -n trusted.afr.vol1-client-0 -v
0x000000000000000000000000
/gluster-bricks/vol1/vol1/9d1a204c-1b1a-450f-986d-a8c84babb4c1/dom_md/inbox
[root at ovirt-node2 ~]# setfattr -n trusted.afr.vol1-client-0 -v
0x000000000000000000000000
/gluster-bricks/vol1/vol1/9d1a204c-1b1a-450f-986d-a8c84babb4c1/dom_md/leases

The heal command now gives no errors.

[root at oVirt-Node1 ~]# gluster volume heal vol1 info
Brick 10.0.0.194:/gluster-bricks/vol1/vol1
/__DIRECT_IO_TEST__
/__DIRECT_IO_TEST__
Status: Connected
Number of entries: 2

Brick 10.0.0.199:/gluster-bricks/vol1/vol1
/__DIRECT_IO_TEST__
/__DIRECT_IO_TEST__
Status: Connected
Number of entries: 2

I activated the nodes again in oVirt and activated the data center (going
in Storage > master volume > Data Center > Activate). Now it goes to Locked
and to Up. Now the SPM is automatically selected in the nodes.

Thank you very much!!!

Kind regards
Robin Vanderveken

On 25 July 2016 at 16:10, Robin Vanderveken <
rvanderveken at dlttechnologies.com> wrote:

> Your command was very helpful! It seems to be a split-brain error! I
> should be able to fix this, I'll let you know when it's fixed.
>
> We are aware that 2 gluster volumes is not recommended, we didn't know a
> minimum of 3 is needed to counter a power loss.
>
> The output of "gluster volume heal <volume_name> info" is:
>
> [root at ovirt-node2 ~]# gluster volume heal vol1 info
> Brick 10.0.0.194:/gluster-bricks/vol1/vol1
> /9d1a204c-1b1a-450f-986d-a8c84babb4c1/dom_md/ids - Is in split-brain
>
> /__DIRECT_IO_TEST__ - Is in split-brain
>
> /9d1a204c-1b1a-450f-986d-a8c84babb4c1/dom_md/inbox - Is in split-brain
>
> /9d1a204c-1b1a-450f-986d-a8c84babb4c1/dom_md/leases - Is in split-brain
>
> Status: Connected
> Number of entries: 4
>
> Brick 10.0.0.199:/gluster-bricks/vol1/vol1
> <gfid:b9d48873-846d-4964-93c7-0a662de04d22> - Is in split-brain
>
> /__DIRECT_IO_TEST__ - Is in split-brain
>
> <gfid:cbc7b097-e600-43d3-8f20-9626f6d0239b> - Is in split-brain
>
> <gfid:cfa177c0-91fa-4649-b99b-73e03ddbc756> - Is in split-brain
>
> Status: Connected
> Number of entries: 4
>
> ​I also added GlusterFS logs, I was unsure which you needed so I added as
> many as possible.
>
> Kind regards
> Robin Vanderveken
>
>
> On 25 July 2016 at 15:49, Sahina Bose <sabose at redhat.com> wrote:
>
>> First off, a replica 2 gluster volume is not recommended if you want to
>> survive power loss - you need to have a replica 3 gluster volume.
>>
>> From engine logs, I see "Could not connect host to Data Center(Storage
>> issue)" but there are no errors in the attached vdsm.log. Can you provide
>> the relevant vdsm log and also the gluster mount logs?
>>
>> What's the output of "gluster volume heal <volname> info" ?
>>
>> ----- Original Message -----
>> > From: "Robin Vanderveken" <rvanderveken at dlttechnologies.com>
>> > To: users at ovirt.org
>> > Sent: Monday, July 25, 2016 2:01:24 PM
>> > Subject: [ovirt-users] oVirt 4 master storage down - unable to
>> activate       after power loss
>> >
>> > Dear oVirt users
>> >
>> > I've been having several problems with my oVirt nodes utilising
>> GlusterFS
>> > after simulating a power loss.
>> >
>> > Our testing setup for oVirt consists of 3 nodes, 2 with GlusterFS
>> storage and
>> > 1 just for computing. Everything seems to be setup correct and was
>> working
>> > correctly. Then after simulating a power loss the master storage goes
>> down,
>> > and therefore the VMs and data center go down as well.
>> >
>> > I checked the GlusterFS configuration and it seems to be correct (see
>> > attachment). I checked the oVirt configuration and it seems to be
>> correct
>> > (see attachment). I tried putting several Nodes in Maintenace several
>> times,
>> > even putting in maintenance and reinstalling them. Only when putting the
>> > nodes in maintenance (and choosing to stop the GlusterFS service)
>> triggers
>> > Contending on the other Nodes. Only then there is a chance that a Node
>> goes
>> > from Contending to SPM, which does not always happen. After trying this
>> > several times I got a main Node to become SPM, but the master storage
>> > remains down. When I select the master storage, go in the Data center,
>> > select the data center and click Activate, then both the master storage
>> and
>> > the data center go in the state Locked for a few seconds and then
>> Inactive
>> > again.
>> >
>> > Then I upgraded to oVirt 4 ( oVirt Engine Version:
>> 4.0.0.6-1.el7.centos) and
>> > tried everything again, resulting in the same result.
>> >
>> > I searched online and found this mailing list which is very similar:
>> > http://lists.ovirt.org/pipermail/users/2014-August/026905.html .
>> > Unfortunately the solution was not posted/mailed. I also found this:
>> > https://www.mail-archive.com/users@ovirt.org/msg08105.html suggesting
>> the
>> > kernel version is not correct, but I am unsure how to check this.
>> >
>> > In the attachment I added relevant logs:
>> > - GlusterFS service on both main nodes
>> > - /var/log/vdsm/vdsm.log , /var/log/vdsm/supervdsm.log ,
>> /var/log/messages ,
>> > /var/log/sanlock.log , /var/log/ovirt-engine/engine. log-20160725.gz ,
>> > /var/log/ovirt-engine/engine.log of the oVirt engine
>> > as attachments (I clicked on Activate on the data center at 9:45:30 am
>> GMT).
>> > It can be possible that I need to send different engine logs, please
>> tell me
>> > if necessary. Any help would be highly appreciated.
>> >
>> > Kind regards
>> > Robin Vanderveken
>> >
>> > --
>> > Robin Vanderveken
>> > DLT Technologies
>> > Tel. +352 691 412 922
>> >
>> > Disclaimer
>> >
>> > Les informations contenues dans ce message sont destinées exclusivement
>> à
>> > l’usage de la personne à laquelle elles sont adressées. Elles peuvent
>> > contenir des données confidentielles et/ou privilégiées et/ou protégées
>> par
>> > des droits de propriété intellectuelle ou d’autres lois. Si vous avez
>> reçu
>> > ce message par erreur, veuillez le détruire, ainsi que toutes ses
>> annexes,
>> > et notifier son expéditeur ; il vous est interdit de copier ou
>> d’utiliser ce
>> > messages ou ses annexes ou de divulguer son contenu à quiconque. La
>> > transmission de données par e-mail ne peut être garantie comme un moyen
>> sûr
>> > et infaillible, ni comme étant exempt de tout virus. L’expéditeur
>> décline
>> > toute responsabilité en cas de perte ou de dommage résultant de ce
>> message
>> > ou de son utilisation, quelle qu’elle soit.
>> >
>> > The information contained in this message is intended for the addressee
>> only
>> > and may contain confidential and/or privileged information and/or
>> > information protected by intellectual property rights or other legal
>> rules.
>> > If you are not the intended recipient, please delete this message and
>> any
>> > attachment to it and notify the sender; you may not copy or use this
>> message
>> > or its attachments in any way nor disclose its contents to anyone.
>> Emails
>> > cannot be guaranteed to be secure or to be error or virus free. No
>> liability
>> > is accepted by the sender for any loss damage arising in any way from
>> this
>> > message or its use.
>> >
>> > _______________________________________________
>> > Users mailing list
>> > Users at ovirt.org
>> > http://lists.ovirt.org/mailman/listinfo/users
>> >
>>
>
>
>
> --
> Robin Vanderveken
> DLT Technologies
> Tel. +352 691 412 922
>



-- 
Robin Vanderveken
DLT Technologies
Tel. +352 691 412 922

-- 
*Disclaimer*

*Les informations contenues dans ce message sont destinées exclusivement à 
l’usage de la personne à laquelle elles sont adressées. Elles peuvent 
contenir des données confidentielles et/ou privilégiées et/ou protégées par 
des droits de propriété intellectuelle ou d’autres lois. Si vous avez reçu 
ce message par erreur, veuillez le détruire, ainsi que toutes ses annexes, 
et notifier son expéditeur ; il vous est interdit de copier ou d’utiliser 
ce messages ou ses annexes ou de divulguer son contenu à quiconque. La 
transmission de données par e-mail ne peut être garantie comme un moyen sûr 
et infaillible, ni comme étant exempt de tout virus. L’expéditeur décline 
toute responsabilité en cas de perte ou de dommage résultant de ce message 
ou de son utilisation, quelle qu’elle soit. *

*The information contained in this message is intended for the addressee 
only and may contain confidential and/or privileged information and/or 
information protected by intellectual property rights or other legal rules. 
If you are not the intended recipient, please delete this message and any 
attachment to it and notify the sender; you may not copy or use this 
message or its attachments in any way nor disclose its contents to anyone. 
Emails cannot be guaranteed to be secure or to be error or virus free. No 
liability is accepted by the sender for any loss damage arising in any way 
from this message or its use. *
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160725/ec430144/attachment-0001.html>


More information about the Users mailing list