Re: Gluster volume not responding

9 Oct 2020

      Hi Simon,

I doubt the system needs tuning from network perspective.

I guess you can run some 'screen'-s which a pinging another system and logging everything to a file.

Best Regards,
Strahil Nikolov

В петък, 9 октомври 2020 г., 01:05:22 Гринуич+3, Simon Scott <simon@justconnect.ie> написа: 

Thanks Strahil.

I have found between 1 & 4 Gluster peer rpc-clnt-ping timer expired messages in the rhev-data-center-mnt-glusterSD-hostname-strg:_pltfm_data01.log on the storage network IP. Of the 6 Hosts only 1 does not have these timeouts.

Fencing has been disabled but can you identify which logs are key to identifying the cause please.

It's a bonded (bond1) 10GB ovirt-mgmt logical network and Prod VM VLAN interface AND a bonded (bond2) 10GB Gluster storage network. 

Dropped packets are seen incrementing in the vdsm.log but neither ethtool -S or kernel logs are showing dropped packets. I am wondering if they are being dropped due to the ring buffers being small.

Kind Regards

Shimme

________________________________ 
From: Strahil Nikolov <hunter86_bg@yahoo.com>
Sent: Thursday 8 October 2020 20:40
To: users@ovirt.org <users@ovirt.org>; Simon Scott <simon@justconnect.ie>
Subject: Re: [ovirt-users] Gluster volume not responding 

...
Every Monday and Wednesday morning there are gluster connectivity timeouts >but all checks of the network and network configs are ok.
Based on this one I make the following conclusions:
1. Issue is reoccuring
2. You most probably have a network issue

Have you checked the following:
- are there any ping timeouts between fuse clients and gluster nodes
- Have you tried to disable fencing and check the logs after the issue reoccurs
- Are you sharing Blackup and Prod networks ? Is it possible some backup/other production load in your environment to "black-out" your oVirt ?
- Have you check the gluster cluster's logs for anything meaningful ?

Best Regards,
Strahil Nikolov

Strahil Nikolov

tags

participants (1)