Hi Simon,
I doubt the system needs tuning from network perspective.
I guess you can run some 'screen'-s which a pinging another system and logging
everything to a file.
Best Regards,
Strahil Nikolov
В петък, 9 октомври 2020 г., 01:05:22 Гринуич+3, Simon Scott <simon(a)justconnect.ie>
написа:
Thanks Strahil.
I have found between 1 & 4 Gluster peer rpc-clnt-ping timer expired messages in the
rhev-data-center-mnt-glusterSD-hostname-strg:_pltfm_data01.log on the storage network IP.
Of the 6 Hosts only 1 does not have these timeouts.
Fencing has been disabled but can you identify which logs are key to identifying the cause
please.
It's a bonded (bond1) 10GB ovirt-mgmt logical network and Prod VM VLAN interface AND a
bonded (bond2) 10GB Gluster storage network.
Dropped packets are seen incrementing in the vdsm.log but neither ethtool -S or kernel
logs are showing dropped packets. I am wondering if they are being dropped due to the ring
buffers being small.
Kind Regards
Shimme
________________________________
From: Strahil Nikolov <hunter86_bg(a)yahoo.com>
Sent: Thursday 8 October 2020 20:40
To: users(a)ovirt.org <users(a)ovirt.org>; Simon Scott <simon(a)justconnect.ie>
Subject: Re: [ovirt-users] Gluster volume not responding
Every Monday and Wednesday morning there are gluster connectivity
timeouts >but all checks of the network and network configs are ok.
Based on this one I make the following conclusions:
1. Issue is reoccuring
2. You most probably have a network issue
Have you checked the following:
- are there any ping timeouts between fuse clients and gluster nodes
- Have you tried to disable fencing and check the logs after the issue reoccurs
- Are you sharing Blackup and Prod networks ? Is it possible some backup/other production
load in your environment to "black-out" your oVirt ?
- Have you check the gluster cluster's logs for anything meaningful ?
Best Regards,
Strahil Nikolov