Quick question about oVirt 3.6 and vdsm log in DEBUG mode (apprently by default??)

We're running oVirt 3.6 on 9 Dell Blades but with just two fairly fat fabrics, one for LAN stuff, ovirtmgmt and one for iSCSI to the storage domains. 15 VM Storage Domains iSCSI has 4 paths going through a 40Gbit i/o blade to switch 115 VMs or thereabouts 9 VLANS, sharing an i/o blade with ovirtmgmt 40Gbit to switch 500+ virtual disks What we are seeing more and more is that if we do an operation like expose a new LUN and configure a new storage domain, that all of the hyervisors go "red triangle" and "Connecting..." and it takes a very long time (all day) to straighten out. My guess is that there's too much to look at vdsm wise and so it's waiting a short(er) period of time for a completed response than what vdsm is going to us, and it just cycles over and over until it just happens to work. I'm thinking that vdsm having DEBUG enabled isn't helping the latency, but as far as I know it came this way be default. Can we safely disable DEBUG on the hypervisor hosts for vdsm? Can we do this while things are roughly in a steady state? Remember, just doing the moves could throw everything into vdsm la-la-land (actually, that might not be true, might take a new storage thing to do that). Just thinking out loud... can we safely turn off DEBUG logging on the vdsms? Can we do this "live" through bouncing of the vdsm if everything is "steady state"? Do you think this might help the problems we're having with storage operations? (I can see all the blades logging in iSCSI wise, but ovirt engine does the whole red triangle connecting thing, for many, many, many hours). Thanks, Christopher

On Mar 13, 2018 11:48 PM, "Christopher Cox" <ccox@endlessnow.com> wrote: We're running oVirt 3.6 on 9 Dell Blades but with just two fairly fat fabrics, one for LAN stuff, ovirtmgmt and one for iSCSI to the storage domains. 15 VM Storage Domains iSCSI has 4 paths going through a 40Gbit i/o blade to switch 115 VMs or thereabouts 9 VLANS, sharing an i/o blade with ovirtmgmt 40Gbit to switch 500+ virtual disks What we are seeing more and more is that if we do an operation like expose a new LUN and configure a new storage domain, that all of the hyervisors go "red triangle" and "Connecting..." and it takes a very long time (all day) to straighten out. My guess is that there's too much to look at vdsm wise and so it's waiting a short(er) period of time for a completed response than what vdsm is going to us, and it just cycles over and over until it just happens to work. Please upgrade. We have solved issues and improved performance and scale substantially since 3.6. You may also wish to apply lvm filters. Y. I'm thinking that vdsm having DEBUG enabled isn't helping the latency, but as far as I know it came this way be default. Can we safely disable DEBUG on the hypervisor hosts for vdsm? Can we do this while things are roughly in a steady state? Remember, just doing the moves could throw everything into vdsm la-la-land (actually, that might not be true, might take a new storage thing to do that). Just thinking out loud... can we safely turn off DEBUG logging on the vdsms? Can we do this "live" through bouncing of the vdsm if everything is "steady state"? Do you think this might help the problems we're having with storage operations? (I can see all the blades logging in iSCSI wise, but ovirt engine does the whole red triangle connecting thing, for many, many, many hours). Thanks, Christopher _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

On 03/14/2018 01:34 AM, Yaniv Kaul wrote:
On Mar 13, 2018 11:48 PM, "Christopher Cox" <ccox@endlessnow.com <mailto:ccox@endlessnow.com>> wrote:
...snip...
What we are seeing more and more is that if we do an operation like expose a new LUN and configure a new storage domain, that all of the hyervisors go "red triangle" and "Connecting..." and it takes a very long time (all day) to straighten out.
My guess is that there's too much to look at vdsm wise and so it's waiting a short(er) period of time for a completed response than what vdsm is going to us, and it just cycles over and over until it just happens to work.
Please upgrade. We have solved issues and improved performance and scale substantially since 3.6. You may also wish to apply lvm filters. Y.
Oh, we know and are looking at what we'll have to do to upgrade. With that said, is there more information on what you mentioned as "lvm filters" posted somewhere? Also, would VM reduction, and IMHO, virtual disk reduction help this problem? Is there and engine config parameters that might help as well? Thanks for any help on this.

On 03/14/2018 09:10 AM, Christopher Cox wrote:
On 03/14/2018 01:34 AM, Yaniv Kaul wrote:
On Mar 13, 2018 11:48 PM, "Christopher Cox" <ccox@endlessnow.com <mailto:ccox@endlessnow.com>> wrote:
...snip...
What we are seeing more and more is that if we do an operation like expose a new LUN and configure a new storage domain, that all of the hyervisors go "red triangle" and "Connecting..." and it takes a very long time (all day) to straighten out.
My guess is that there's too much to look at vdsm wise and so it's waiting a short(er) period of time for a completed response than what vdsm is going to us, and it just cycles over and over until it just happens to work.
Please upgrade. We have solved issues and improved performance and scale substantially since 3.6. You may also wish to apply lvm filters. Y.
Oh, we know and are looking at what we'll have to do to upgrade. With that said, is there more information on what you mentioned as "lvm filters" posted somewhere?
Also, would VM reduction, and IMHO, virtual disk reduction help this problem?
Is there and engine config parameters that might help as well?
Thanks for any help on this.
Based on a different older thread about having lots of virtual networks which sounded somewhat similar, I have increased our vdsTimeout value. Any opinions on whether or not that might help? Right now I'm forced to tell my management that we'll have to "roll the dice" to find out. But kind of hoping to hear someone "say" it should help. Anyone? Just looking for something more substantial...
participants (2)
-
Christopher Cox
-
Yaniv Kaul