Brick : Brick bdtpltfmovt01-strg:/gluster_bricks/pltfm_data01/pltfm_data01 TCP Port : 49152 RDMA Port : 0 Online : Y Pid : 24372 File System : xfs Device : /dev/mapper/gluster_vg_sdb-gluster_lv_pltfm_data01 Mount Options : rw,seclabel,noatime,nodiratime,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota Inode Size : 512 Disk Space Free : 552.0GB Total Disk Space : 1.5TB Inode Count : 157286400
Brick : Brick bdtpltfmovt02-strg:/gluster_bricks/pltfm_data01/pltfm_data01 TCP Port : 49152 RDMA Port : 0 Online : Y Pid : 24485 File System : xfs Device : /dev/mapper/gluster_vg_sdb-gluster_lv_pltfm_data01 Mount Options : rw,seclabel,noatime,nodiratime,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota Inode Size : 512 Disk Space Free : 552.0GB Total Disk Space : 1.5TB Inode Count : 157286400
Brick : Brick bdtpltfmovt03-strg:/gluster_bricks/pltfm_data01/pltfm_data01 TCP Port : 49152 RDMA Port : 0 Online : Y Pid : 24988 File System : xfs Device : /dev/mapper/gluster_vg_sdb-gluster_lv_pltfm_data01 Mount Options : rw,seclabel,noatime,nodiratime,attr2,inode64,sunit=512,swidth=512,noquota Inode Size : 512 Disk Space Free : 552.0GB Total Disk Space : 1.5TB Inode Count : 157286400 Free Inodes : 157245890
On 11 Oct 2020, at 19:18, Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Hi Simon,
Usually it is the network, but you need real-world data. I would open screen sessions and run ping continiously . Something like this:
while true; do echo -n "$(date) "; timeout -s 9 1 ping -c 1 ovirt2 | grep icmp_seq; sleep 1; done | tee -a /tmp/icmp_log
Are all systems in the same network ?
What about dns resolution - do you have entries in /etc/hosts ?
Best Regards,
Strahil Nikolov
В неделя, 11 октомври 2020 г., 11:54:47 Гринуич+3, Simon Scott <simon@justconnect.ie> написа:
Thanks Strahil.
I have found between 1 & 4 Gluster peer rpc-clnt-ping timer expired messages in the rhev-data-center-mnt-glusterSD-hostname-strg:_pltfm_data01.log on the storage network IP. Of the 6 Hosts only 1 does not have these timeouts.
Fencing has been disabled but can you identify which logs are key to identifying the cause please.
It's a bonded (bond1) 10GB ovirt-mgmt logical network and Prod VM VLAN interface AND a bonded (bond2) 10GB Gluster storage network.
Dropped packets are seen incrementing in the vdsm.log but neither ethtool -S or kernel logs are showing dropped packets. I am wondering if they are being dropped due to the ring buffers being small.
Kind Regards
Shimme
________________________________
From: Strahil Nikolov <hunter86_bg@yahoo.com>
Sent: Thursday 8 October 2020 20:40
To: users@ovirt.org <users@ovirt.org>; Simon Scott <simon@justconnect.ie>
Subject: Re: [ovirt-users] Gluster volume not responding
Every Monday and Wednesday morning there are gluster connectivity timeouts >but all checks of the network and network configs are ok.
Based on this one I make the following conclusions:
1. Issue is reoccuring
2. You most probably have a network issue
Have you checked the following:
- are there any ping timeouts between fuse clients and gluster nodes
- Have you tried to disable fencing and check the logs after the issue reoccurs
- Are you sharing Blackup and Prod networks ? Is it possible some backup/other production load in your environment to "black-out" your oVirt ?
- Have you check the gluster cluster's logs for anything meaningful ?
Best Regards,
Strahil Nikolov
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/U527TGUQR6RV7Z426NWMO3K4OXQJABCM/