I've been having problems with my gluster Engine volume recently as well
after updating to latest stable 4.3.3. For the past few days I've seen a
random brick in the Engine volume go down and I have to force start it to
get it working again. Right now I'm seeing that there are unsynced entries
and one node is showing transport endpoint not connected, even though peer
status if fine and all other volumes are working normally.
On Mon, May 13, 2019 at 12:14 PM Darrell Budic <budic(a)onholyground.com>
wrote:
I see this sometimes after rebooting a server, and it usually stops
happening, generally within a few hours, I’ve never tracked it down
further. Don’t know for sure, but I assume it’s related to healing and goes
away once everything syncs up.
Occasionally it turns out to be a communications problem between servers
(usually an update to something screws up my firewall), so I always check
my peer status when I see it and make sure all servers are talking to each
other.
> On May 13, 2019, at 4:13 AM, Andreas Elvers <
andreas.elvers+ovirtforum(a)solutions.work> wrote:
>
> I restored my engine to a gluster volume named :/engine on a three node
hyperconverged oVirt 4.3.3.1 cluster. Before restoring I was checking the
status of the volumes. They were clean. No heal entries. All peers
connected. gluster volume status looked good. Then I restored. This went
well. The engine is up. But the engine gluster volume shows entries on
node02 and node03. The engine was installed to node01. I have to deploy the
engine to the other two hosts to reach full HA, but I bet maintenance is
not possible until the volume is healed.
>
> I tried "gluster volume heal engine" also with added "full". The
heal
entries will disappear for a few seconds and then /dom_md/ids will pop up
again. The __DIRECT_IO_TEST__ will join later. The split-brain info has no
entries. Is this some kind of hidden split brain? Maybe there is data on
node01 brick which got not synced to the other two nodes? I can only
speculate. Gluster docs say: this should heal. But it doesn't. I have two
other volumes. Those are fine. One of them containing 3 VMs that are
running. I also tried to shut down the engine, so no-one was using the
volume. Then heal. Same effect. Those two files will always show up. But
none other. Heal can always be started successfully from any of the
participating nodes.
>
> Reset the volume bricks one by one and cross fingers?
>
> [root@node03 ~]# gluster volume heal engine info
> Brick node01.infra.solutions.work:/gluster_bricks/engine/engine
> Status: Connected
> Number of entries: 0
>
> Brick node02.infra.solutions.work:/gluster_bricks/engine/engine
> /9f4d5ae9-e01d-4b73-8b6d-e349279e9782/dom_md/ids
> /__DIRECT_IO_TEST__
> Status: Connected
> Number of entries: 2
>
> Brick node03.infra.solutions.work:/gluster_bricks/engine/engine
> /9f4d5ae9-e01d-4b73-8b6d-e349279e9782/dom_md/ids
> /__DIRECT_IO_TEST__
> Status: Connected
> Number of entries: 2
> _______________________________________________
> Users mailing list -- users(a)ovirt.org
> To unsubscribe send an email to users-leave(a)ovirt.org
> Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/L3YCRPRAGPU...
_______________________________________________
Users mailing list -- users(a)ovirt.org
To unsubscribe send an email to users-leave(a)ovirt.org
Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/6XOCRXRCQOU...