Re: [ovirt-users] One RHEV Virtual Machine does not Automatically Resume following Compellent SAN Controller Failover

26 May 2016

      What DR solution are you using?

Yaniv Dary
Technical Product Manager
Red Hat Israel Ltd.
34 Jerusalem Road
Building A, 4th floor
Ra'anana, Israel 4350109

Tel : +972 (9) 7692306
        8272306
Email: ydary@redhat.com
IRC : ydary

On Wed, Nov 25, 2015 at 1:15 PM, Simone Tiraboschi <stirabos@redhat.com>
wrote:
...
Adding Nir who knows it far better than me.
On Mon, Nov 23, 2015 at 8:37 PM, Duckworth, Douglas C <duckd@tulane.edu>
wrote:
...
Hello --
Not sure if y'all can help with this issue we've been seeing with RHEV...
On 11/13/2015, during Code Upgrade of Compellent SAN at our Disaster
Recovery Site, we Failed Over to Secondary SAN Controller.  Most Virtual
Machines in our DR Cluster Resumed automatically after Pausing except VM
"BADVM" on Host "BADHOST."
In Engine.log you can see that BADVM was sent into "VM_PAUSED_EIO" state
at 10:47:57:
"VM BADVM has paused due to storage I/O problem."
On this Red Hat Enterprise Virtualization Hypervisor 6.6
(20150512.0.el6ev) Host, two other VMs paused but then automatically
resumed without System Administrator intervention...
In our DR Cluster, 22 VMs also resumed automatically...
None of these Guest VMs are engaged in high I/O as these are DR site VMs
not currently doing anything.
We sent this information to Dell.  Their response:
"The root cause may reside within your virtualization solution, not the
parent OS (RHEV-Hypervisor disc) or Storage (Dell Compellent.)"
We are doing this Failover again on Sunday November 29th so we would
like to know how to mitigate this issue, given we have to manually
resume paused VMs that don't resume automatically.
Before we initiated SAN Controller Failover, all iSCSI paths to Targets
were present on Host tulhv2p03.
VM logs on Host show in /var/log/libvirt/qemu/badhost.log that Storage
error was reported:
block I/O error in device 'drive-virtio-disk0': Input/output error (5)
block I/O error in device 'drive-virtio-disk0': Input/output error (5)
block I/O error in device 'drive-virtio-disk0': Input/output error (5)
block I/O error in device 'drive-virtio-disk0': Input/output error (5)
All disks used by this Guest VM are provided by single Storage Domain
COM_3TB4_DR with serial "270."  In syslog we do see that all paths for
that Storage Domain Failed:
Nov 13 16:47:40 multipathd: 36000d310005caf000000000000000270: remaining
active paths: 0
Though these recovered later:
Nov 13 16:59:17 multipathd: 36000d310005caf000000000000000270: sdbg -
tur checker reports path is up
Nov 13 16:59:17 multipathd: 36000d310005caf000000000000000270: remaining
active paths: 8
Does anyone have an idea of why the VM would fail to automatically
resume if the iSCSI paths used by its Storage Domain recovered?
Thanks
Doug
--
Thanks
Douglas Charles Duckworth
Unix Administrator
Tulane University
Technology Services
1555 Poydras Ave
NOLA -- 70112
E: duckd@tulane.edu
O: 504-988-9341
F: 504-988-8505
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users