I was just helping Tristam on #ovirt with a similar problem, we found that his two
upgraded nodes were running multiple glusterfsd processes per brick (but not all bricks).
His volume & brick files in /var/lib/gluster looked normal, but starting glusterd
would often spawn extra fsd processes per brick, seemed random. Gluster bug? Maybe related
to
https://bugzilla.redhat.com/show_bug.cgi?id=1651246
<
https://bugzilla.redhat.com/show_bug.cgi?id=1651246>, but I’m helping debug this
one second hand… Possibly related to the brick crashes? We wound up stopping glusterd,
killing off all the fsds, restarting glusterd, and repeating until it only spawned one fsd
per brick. Did that to each updated server, then restarted glusterd on the not-yet-updated
server to get it talking to the right bricks. That seemed to get to a mostly stable
gluster environment, but he’s still seeing 1-2 files listed as needing healing on the
upgraded bricks (but not the 3.12 brick). Mainly the DIRECT_IO_TEST and one of the dom/ids
files, but he can probably update that. Did manage to get his engine going again, waiting
to see if he’s stable now.
Anyway, figured it was worth posting about so people could check for multiple brick
processes (glusterfsd) if they hit this stability issue as well, maybe find common
ground.
Note: also encountered
https://bugzilla.redhat.com/show_bug.cgi?id=1348434
<
https://bugzilla.redhat.com/show_bug.cgi?id=1348434> trying to get his engine back
up, restarting libvirtd let us get it going again. Maybe un-needed if he’d been able to
complete his third node upgrades, but he got stuck before then, so...
-Darrell
On Feb 14, 2019, at 1:12 AM, Sahina Bose <sabose(a)redhat.com>
wrote:
On Thu, Feb 14, 2019 at 2:39 AM Ron Jerome <ronjero(a)gmail.com> wrote:
>
>
>>
>> Can you be more specific? What things did you see, and did you report bugs?
>
> I've got this one:
https://bugzilla.redhat.com/show_bug.cgi?id=1649054
> and this one:
https://bugzilla.redhat.com/show_bug.cgi?id=1651246
> and I've got bricks randomly going offline and getting out of sync with the
others at which point I've had to manually stop and start the volume to get things
back in sync.
Thanks for reporting these. Will follow up on the bugs to ensure
they're addressed.
Regarding brciks going offline - are the brick processes crashing? Can
you provide logs of glusterd and bricks. Or is this to do with
ovirt-engine and brick status not being in sync?
> _______________________________________________
> Users mailing list -- users(a)ovirt.org
> To unsubscribe send an email to users-leave(a)ovirt.org
> Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/3RVMLCRK4BW...
_______________________________________________
Users mailing list -- users(a)ovirt.org
To unsubscribe send an email to users-leave(a)ovirt.org
Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/4PKJSVDIH3V...