On Thu, Feb 21, 2019 at 12:42 PM Jason P. Thomas <jthomasp(a)gmualumni.org>
wrote:
On 2/20/19 5:33 PM, Darrell Budic wrote:
I was just helping Tristam on #ovirt with a similar problem, we found that
his two upgraded nodes were running multiple glusterfsd processes per brick
(but not all bricks). His volume & brick files in /var/lib/gluster looked
normal, but starting glusterd would often spawn extra fsd processes per
brick, seemed random. Gluster bug? Maybe related to
https://bugzilla.redhat.com/show_bug.cgi?id=1651246
<
https://secure-web.cisco.com/11uS9v5-7B1hISSgr4n1jaC1x9yTQKn-73yRRcNpYlFW...;,
but I’m helping debug this one second hand… Possibly related to the brick
crashes? We wound up stopping glusterd, killing off all the fsds,
restarting glusterd, and repeating until it only spawned one fsd per brick.
Did that to each updated server, then restarted glusterd on the
not-yet-updated server to get it talking to the right bricks. That seemed
to get to a mostly stable gluster environment, but he’s still seeing 1-2
files listed as needing healing on the upgraded bricks (but not the 3.12
brick). Mainly the DIRECT_IO_TEST and one of the dom/ids files, but he can
probably update that. Did manage to get his engine going again, waiting to
see if he’s stable now.
Anyway, figured it was worth posting about so people could check for
multiple brick processes (glusterfsd) if they hit this stability issue as
well, maybe find common ground.
Note: also encountered
https://bugzilla.redhat.com/show_bug.cgi?id=1348434
<
https://secure-web.cisco.com/1CJbhPkGP3eo9xS3wOwhXfdnHZ0eGLtZYnuANAHXoIbm...
trying
to get his engine back up, restarting libvirtd let us get it going again.
Maybe un-needed if he’d been able to complete his third node upgrades, but
he got stuck before then, so...
-Darrell
Stable is a relative term. My unsynced entries total for each of my 4
volumes changes drastically (with the exception of the engine volume, it
pretty much bounces between 1 and 4). The cluster has been "healing" for
18 hours or so and only the unupgraded HC node has healed bricks. I did
have the problem that some files/directories were owned by root:root.
These VMs did not boot until I changed ownership to 36:36. Even after 18
hours, there's anywhere from 20-386 entries in vol heal info for my 3 non
engine bricks. Overnight I had one brick on one volume go down on one HC
node. When I bounced glusterd, it brought up a new fsd process for that
brick. I killed the old one and now vol status reports the right pid on
each of the nodes. This is quite the debacle. If I can provide any info
that might help get this debacle moving in the right direction, let me know.
Jason aka Tristam
On Feb 14, 2019, at 1:12 AM, Sahina Bose <sabose(a)redhat.com> wrote:
On Thu, Feb 14, 2019 at 2:39 AM Ron Jerome <ronjero(a)gmail.com> wrote:
Can you be more specific? What things did you see, and did you report bugs?
I've got this one:
https://bugzilla.redhat.com/show_bug.cgi?id=1649054
<
https://secure-web.cisco.com/1fdvHVkDZwPN_s0gUWUKOUmyHFx9oetGXKTmYB2gI-cq...
and this one:
https://bugzilla.redhat.com/show_bug.cgi?id=1651246
<
https://secure-web.cisco.com/11uS9v5-7B1hISSgr4n1jaC1x9yTQKn-73yRRcNpYlFW...
and I've got bricks randomly going offline and getting out of sync with
the others at which point I've had to manually stop and start the volume to
get things back in sync.
Thanks for reporting these. Will follow up on the bugs to ensure
they're addressed.
Regarding brciks going offline - are the brick processes crashing? Can
you provide logs of glusterd and bricks. Or is this to do with
ovirt-engine and brick status not being in sync?
_______________________________________________
Users mailing list -- users(a)ovirt.org
To unsubscribe send an email to users-leave(a)ovirt.org
Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
<
https://secure-web.cisco.com/1ubMaXUij250PN8zKVQvmo6NUYWPOdVDirkU4lwkRkpC...
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
<
https://secure-web.cisco.com/1HjeIIkwx_NRkoCsnonfHu87z-MFaPfE3HOMBJ02Mzwy...
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/3RVMLCRK4BW...
<
https://secure-web.cisco.com/1UWvyzTQ1hvgxQjlr5U2FtRchlkKEt3X3o8pIWpYnEnm...
_______________________________________________
Users mailing list -- users(a)ovirt.org
To unsubscribe send an email to users-leave(a)ovirt.org
Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
<
https://secure-web.cisco.com/1ubMaXUij250PN8zKVQvmo6NUYWPOdVDirkU4lwkRkpC...
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
<
https://secure-web.cisco.com/1HjeIIkwx_NRkoCsnonfHu87z-MFaPfE3HOMBJ02Mzwy...
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/4PKJSVDIH3V...
<
https://secure-web.cisco.com/1kVJCekzD6OmVhEVJvL2ektzS5SnUq6q8sPriKIqMftl...
_______________________________________________
Users mailing list -- users(a)ovirt.org
To unsubscribe send an email to users-leave(a)ovirt.org
Privacy Statement:
https://secure-web.cisco.com/1ubMaXUij250PN8zKVQvmo6NUYWPOdVDirkU4lwkRkpC...
oVirt Code of Conduct:
https://secure-web.cisco.com/1HjeIIkwx_NRkoCsnonfHu87z-MFaPfE3HOMBJ02Mzwy...
List Archives:
https://secure-web.cisco.com/1XcKrt1wH3y9o2mcDXqQa9v-MXc1VugRHkrHz1HJwNk-...
_______________________________________________
Users mailing list -- users(a)ovirt.org
To unsubscribe send an email to users-leave(a)ovirt.org
Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/O2WI5F77DVZ...
--
GREG SHEREMETA
SENIOR SOFTWARE ENGINEER - TEAM LEAD - RHV UX
Red Hat NA
<