Re: [ovirt-users] Ovirt host activation and lvm looping with high CPU load trying to mount iSCSI storage

23 Jan 2017

      Hi,

So, on Thursday we had the worst scenario occur. All hosts in the 4 node cluster we've had these issues with went non responsive and starting looping through various states. Spread across these hosts are the 45 guests we have the 45 storage domains for. As we have a responsibility to the end users, we had to make the decision to stop trying to bring this cluster online and scrap it based on the information you've provided. We've now split the cluster in half and created two clusters with the guests spread between them (around 20 on each). I've also taken the step of starting to present a few 2 TB storage domains and am migrating the guest disks from their individual storage domains onto grouped shared domains.

This immediately reduces the number of storage domains by half on the clusters and will reduce it further as we consolidate the storage. We obviously still have the same number of guest disks so will still have a large number of logical volumes, we just reduce the number of physical volumes presented to each host (and storage domains within Ovirt). We'll just have to see if that improves things.

Thanks for your assistance and focus with the problem and I'm glad we helped squash at least one bug. I would have liked to actually get to the bottom of the problem with that specific cluster, but events took a turn for the worse and forced our hand.

At the moment the clusters are both behaving but it's early days yet. We haven't changed any of the iSCSI settings on the new clusters but we have kept the modified monitor.py.

Regards,
Mark

Re: [ovirt-users] Ovirt host activation and lvm looping with high CPU load trying to mount iSCSI storage

Mark Greenall