Could you share the gluster mount and brick logs? You'll find  them under /var/log/glusterfs.
Also, what's the version of gluster you're using?
Also, output of `gluster volume info <ENGINE_VOLNAME>`?

-Krutika

On Thu, Jun 21, 2018 at 9:50 AM, Sahina Bose <sabose@redhat.com> wrote:


On Wed, Jun 20, 2018 at 11:33 PM, Hanson Turner <hanson@andrewswireless.net> wrote:

Hi Benny,

Who should I be reaching out to for help with a gluster based hosted engine corruption?


Krutika, could you help?



--== Host 1 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : ovirtnode1.abcxyzdomains.net
Host ID                            : 1
Engine status                      : {"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail": "Up"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 92254a68
local_conf_timestamp               : 115910
Host timestamp                     : 115910
Extra metadata (valid at timestamp):
    metadata_parse_version=1
    metadata_feature_version=1
    timestamp=115910 (Mon Jun 18 09:43:20 2018)
    host-id=1
    score=3400
    vm_conf_refresh_time=115910 (Mon Jun 18 09:43:20 2018)
    conf_on_shared_storage=True
    maintenance=False
    state=GlobalMaintenance
    stopped=False


My when I VNC into my HE, All I get is:
Probing EDD (edd=off to disable)... ok


So, that's why it's failing the liveliness check... I cannot get the screen on HE to change short of ctl-alt-del which will reboot the HE.
I do have backups for the HE that are/were run on a nightly basis.

If the cluster was left alone, the HE vm would bounce from machine to machine trying to boot. This is why the cluster is in maintenance mode.
One of the nodes was down for a period of time and brought back, sometime through the night, which is when the automated backup kicks, the HE started bouncing around. Got nearly 1000 emails.

This seems to be the same error (but may not be the same cause) as listed here:
https://bugzilla.redhat.com/show_bug.cgi?id=1569827

Thanks,

Hanson


_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/3NLA2URX3KN44FGFUVV4N5EJBPICABHH/