On Fri, Nov 9, 2018 at 3:42 AM Dev Ops <sipandbite(a)hotmail.com> wrote:
The switches above our environment had some VPC issues and the port channels went
offline. The ports that had issues belonged to 2 of the gfs nodes in our environment. We
have 3 storage nodes total with the 3rd being the arbiter. I wound up rebooting the first
2 nodes and everything came back happy. After a few hours I noticed that the storage was
up but complaining about being out of sync and needing healing. Within the hour I noticed
a VM had paused itself due to storage issues. This is a small environment, for now, with
only 30 VM's. I am new to Ovirt so this is uncharted territory for me. I am tailing
some logs and things look sort of normal and google is sending me down a wormhole.
If I run "gluster volume heal cps-vms-gfs info" this number seems to be
changing pretty regularly. Logs are showing lots of entries like this:
[2018-11-08 21:55:05.996675] I [MSGID: 114047]
[client-handshake.c:1242:client_setvolume_cbk] 0-cps-vms-gfs-client-1: Server and Client
lk-version numbers are not same, reopening the fds
[2018-11-08 21:55:05.997693] I [MSGID: 108002] [afr-common.c:5312:afr_notify]
0-cps-vms-gfs-replicate-0: Client-quorum is met
[2018-11-08 21:55:05.997717] I [MSGID: 114035]
[client-handshake.c:202:client_set_lk_version_cbk] 0-cps-vms-gfs-client-1: Server lk
version = 1
I guess I am curious what else should I be looking for? Is this just taking forever to
heal? Is there something else I can run or I should do to verify things are actually
getting better? I ran an actual heal command and it cleared everything for a few seconds
and then the entries started to populate again when I did the info command.
[root@cps-vms-gfs01 glusterfs]# gluster volume status
Status of volume: cps-vms-gfs
Gluster process
TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.8.255.1:/gluster/cps-vms-gfs01/brick
49152 0 Y 4054
Brick 10.8.255.2:/gluster/cps-vms-gfs02/brick
49152 0 Y 4144
Brick 10.8.255.3:/gluster/cps-vms-gfs03/brick
49152 0 Y 4294
Self-heal Daemon on localhost N/A N/A Y 4279
Self-heal Daemon on
cps-vms-gfs02.cisco.com N/A N/A Y 5185
Self-heal Daemon on 10.196.152.145 N/A N/A Y 50948
Task Status of Volume cps-vms-gfs
------------------------------------------------------------------------------
There are no active volume tasks
I am running ovirt 4.2.5 and gluster 3.12.11.
Can you provide output of gluster volume heal cps-vms-gfs info, and
the logs from /var/log/glusterfs/glfsheal-cps-vms-gfs.log and the
brick logs from /var/log/glusterfs/bricks for this volume.
>
> Thanks!
> _______________________________________________
> Users mailing list -- users(a)ovirt.org
> To unsubscribe send an email to users-leave(a)ovirt.org
> Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/MDZXUZQSWQU...