[ovirt-users] One RHEV Virtual Machine does not Automatically Resume following Compellent SAN Controller Failover

Yaniv Dary ydary at redhat.com
Thu May 26 02:14:35 EDT 2016


What DR solution are you using?

Yaniv Dary
Technical Product Manager
Red Hat Israel Ltd.
34 Jerusalem Road
Building A, 4th floor
Ra'anana, Israel 4350109

Tel : +972 (9) 7692306
        8272306
Email: ydary at redhat.com
IRC : ydary


On Wed, Nov 25, 2015 at 1:15 PM, Simone Tiraboschi <stirabos at redhat.com>
wrote:

> Adding Nir who knows it far better than me.
>
>
> On Mon, Nov 23, 2015 at 8:37 PM, Duckworth, Douglas C <duckd at tulane.edu>
> wrote:
>
>> Hello --
>>
>> Not sure if y'all can help with this issue we've been seeing with RHEV...
>>
>> On 11/13/2015, during Code Upgrade of Compellent SAN at our Disaster
>> Recovery Site, we Failed Over to Secondary SAN Controller.  Most Virtual
>> Machines in our DR Cluster Resumed automatically after Pausing except VM
>> "BADVM" on Host "BADHOST."
>>
>> In Engine.log you can see that BADVM was sent into "VM_PAUSED_EIO" state
>> at 10:47:57:
>>
>> "VM BADVM has paused due to storage I/O problem."
>>
>> On this Red Hat Enterprise Virtualization Hypervisor 6.6
>> (20150512.0.el6ev) Host, two other VMs paused but then automatically
>> resumed without System Administrator intervention...
>>
>> In our DR Cluster, 22 VMs also resumed automatically...
>>
>> None of these Guest VMs are engaged in high I/O as these are DR site VMs
>> not currently doing anything.
>>
>> We sent this information to Dell.  Their response:
>>
>> "The root cause may reside within your virtualization solution, not the
>> parent OS (RHEV-Hypervisor disc) or Storage (Dell Compellent.)"
>>
>> We are doing this Failover again on Sunday November 29th so we would
>> like to know how to mitigate this issue, given we have to manually
>> resume paused VMs that don't resume automatically.
>>
>> Before we initiated SAN Controller Failover, all iSCSI paths to Targets
>> were present on Host tulhv2p03.
>>
>> VM logs on Host show in /var/log/libvirt/qemu/badhost.log that Storage
>> error was reported:
>>
>> block I/O error in device 'drive-virtio-disk0': Input/output error (5)
>> block I/O error in device 'drive-virtio-disk0': Input/output error (5)
>> block I/O error in device 'drive-virtio-disk0': Input/output error (5)
>> block I/O error in device 'drive-virtio-disk0': Input/output error (5)
>>
>> All disks used by this Guest VM are provided by single Storage Domain
>> COM_3TB4_DR with serial "270."  In syslog we do see that all paths for
>> that Storage Domain Failed:
>>
>> Nov 13 16:47:40 multipathd: 36000d310005caf000000000000000270: remaining
>> active paths: 0
>>
>> Though these recovered later:
>>
>> Nov 13 16:59:17 multipathd: 36000d310005caf000000000000000270: sdbg -
>> tur checker reports path is up
>> Nov 13 16:59:17 multipathd: 36000d310005caf000000000000000270: remaining
>> active paths: 8
>>
>> Does anyone have an idea of why the VM would fail to automatically
>> resume if the iSCSI paths used by its Storage Domain recovered?
>>
>> Thanks
>> Doug
>>
>> --
>> Thanks
>>
>> Douglas Charles Duckworth
>> Unix Administrator
>> Tulane University
>> Technology Services
>> 1555 Poydras Ave
>> NOLA -- 70112
>>
>> E: duckd at tulane.edu
>> O: 504-988-9341
>> F: 504-988-8505
>> _______________________________________________
>> Users mailing list
>> Users at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160526/a19024b7/attachment.html>


More information about the Users mailing list