[ovirt-users] oVirt Instability with Dell Compellent via iSCSI/Multipath
Chris Jones - BookIt.com Systems Administrator
chris.jones at bookit.com
Wed May 20 20:29:32 UTC 2015
Sorry for the delay on this. I am in the process of reproducing the
error to get the logs.
On 05/19/2015 07:31 PM, Douglas Schilling Landgraf wrote:
> Hello Chris,
>
> On 05/19/2015 06:19 PM, Chris Jones - BookIt.com Systems Administrator
> wrote:
>> Engine: oVirt Engine Version: 3.5.2-1.el7.centos
>> Nodes: oVirt Node - 3.5 - 0.999.201504280931.el7.centos
>> Remote storage: Dell Compellent SC8000
>> Storage setup: 2 nics connected to the Compellent. Several domains
>> backed by LUNs. Several VM disk using direct LUN.
>> Networking: Dell 10 Gb/s switches
>>
>> I've been struggling with oVirt completely falling apart due to storage
>> related issues. By falling apart I mean most to all of the nodes
>> suddenly losing contact with the storage domains. This results in an
>> endless loop of the VMs on the failed nodes trying to be migrated and
>> remigrated as the nodes flap between response and unresponsive. During
>> these times, engine.log looks like this.
>>
>> 2015-05-19 03:09:42,443 WARN
>> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData]
>> (org.ovirt.thread.pool-8-thread-50) domain
>> c46adffc-614a-4fa2-9d2d-954f174f6a39:db_binlog_1 in problem. vds:
>> blade6c1.ism.ld
>> 2015-05-19 03:09:42,560 WARN
>> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData]
>> (org.ovirt.thread.pool-8-thread-38) domain
>> 0b1d36e4-7992-43c7-8ac0-740f7c2cadb7:ovirttest1 in problem. vds:
>> blade2c1.ism.ld
>> 2015-05-19 03:09:45,497 WARN
>> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData]
>> (org.ovirt.thread.pool-8-thread-24) domain
>> 05c8fa9c-fcbf-4a17-a3c6-011696a1b9a2:ovirttest2 in problem. vds:
>> blade3c2.ism.ld
>> 2015-05-19 03:09:51,713 WARN
>> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData]
>> (org.ovirt.thread.pool-8-thread-46) domain
>> b050c455-5ab1-4107-b055-bfcc811195fc:os_data_1 in problem. vds:
>> blade4c2.ism.ld
>> 2015-05-19 03:09:57,647 INFO
>> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData]
>> (org.ovirt.thread.pool-8-thread-13) Domain
>> c46adffc-614a-4fa2-9d2d-954f174f6a39:db_binlog_1 recovered from problem.
>> vds: blade6c1.ism.ld
>> 2015-05-19 03:09:57,782 WARN
>> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData]
>> (org.ovirt.thread.pool-8-thread-6) domain
>> 26929b89-d1ca-4718-90d6-b3a6da585451:generic_data_1 in problem. vds:
>> blade2c1.ism.ld
>> 2015-05-19 03:09:57,783 INFO
>> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData]
>> (org.ovirt.thread.pool-8-thread-6) Domain
>> 0b1d36e4-7992-43c7-8ac0-740f7c2cadb7:ovirttest1 recovered from problem.
>> vds: blade2c1.ism.ld
>> 2015-05-19 03:10:00,639 INFO
>> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData]
>> (org.ovirt.thread.pool-8-thread-31) Domain
>> c46adffc-614a-4fa2-9d2d-954f174f6a39:db_binlog_1 recovered from problem.
>> vds: blade4c1.ism.ld
>> 2015-05-19 03:10:00,703 WARN
>> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData]
>> (org.ovirt.thread.pool-8-thread-17) domain
>> 64101f40-0f10-471d-9f5f-44591f9e087d:logging_1 in problem. vds:
>> blade1c1.ism.ld
>> 2015-05-19 03:10:00,712 INFO
>> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData]
>> (org.ovirt.thread.pool-8-thread-4) Domain
>> 05c8fa9c-fcbf-4a17-a3c6-011696a1b9a2:ovirttest2 recovered from problem.
>> vds: blade3c2.ism.ld
>> 2015-05-19 03:10:06,931 INFO
>> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData]
>> (org.ovirt.thread.pool-8-thread-48) Domain
>> 05c8fa9c-fcbf-4a17-a3c6-011696a1b9a2:ovirttest2 recovered from problem.
>> vds: blade4c2.ism.ld
>> 2015-05-19 03:10:06,931 INFO
>> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData]
>> (org.ovirt.thread.pool-8-thread-48) Domain
>> 05c8fa9c-fcbf-4a17-a3c6-011696a1b9a2:ovirttest2 has recovered from
>> problem. No active host in the DC is reporting it as problematic, so
>> clearing the domain recovery timer.
>> 2015-05-19 03:10:06,932 INFO
>> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData]
>> (org.ovirt.thread.pool-8-thread-48) Domain
>> b050c455-5ab1-4107-b055-bfcc811195fc:os_data_1 recovered from problem.
>> vds: blade4c2.ism.ld
>> 2015-05-19 03:10:06,933 INFO
>> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData]
>> (org.ovirt.thread.pool-8-thread-48) Domain
>> b050c455-5ab1-4107-b055-bfcc811195fc:os_data_1 has recovered from
>> problem. No active host in the DC is reporting it as problematic, so
>> clearing the domain recovery timer.
>> 2015-05-19 03:10:09,929 WARN
>> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData]
>> (org.ovirt.thread.pool-8-thread-16) domain
>> b050c455-5ab1-4107-b055-bfcc811195fc:os_data_1 in problem. vds:
>> blade3c1.ism.ld
>>
>>
>> My troubleshooting steps so far:
>>
>> 1. Tailing engine.log for "in problem" and "recovered from problem"
>> 2. Shutting down all the VMs.
>> 3. Shutting down all but one node.
>> 4. Bringing up one node at a time to see what the log reports.
>
> vdsm.log in the node side, will help here too.
>
>> When only one node is active everything is fine. When a second node
>> comes up, I begin to see the log output as shown above. I've been
>> struggling with this for over a month. I'm sure others have used oVirt
>> with a Compellent and encountered (and worked around) similar problems.
>> I'm looking for some help in figuring out if it's oVirt or something
>> that I'm doing wrong.
>>
>> We're close to giving up on oVirt completely because of this.
>>
>> P.S.
>>
>> I've tested via bare metal and Proxmox with the Compellent. Not at the
>> same scale but it seems to work fine there.
>
> Do you mind to share your vdsm.log from the oVirt Node machine?
>
> To go to console in oVirt Node, press F2 in TUI.
> Files: /var/log/vdsm/vdsm*.log
>
> # rpm -qa | grep -i vdsm
> might help too.
>
> Thanks!
>
More information about the Users
mailing list