<div dir="ltr">Adding Nir who knows it far better than me.<br><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Nov 23, 2015 at 8:37 PM, Duckworth, Douglas C <span dir="ltr"><<a href="mailto:duckd@tulane.edu" target="_blank">duckd@tulane.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hello --<br>
<br>
Not sure if y'all can help with this issue we've been seeing with RHEV...<br>
<br>
On 11/13/2015, during Code Upgrade of Compellent SAN at our Disaster<br>
Recovery Site, we Failed Over to Secondary SAN Controller. Most Virtual<br>
Machines in our DR Cluster Resumed automatically after Pausing except VM<br>
"BADVM" on Host "BADHOST."<br>
<br>
In Engine.log you can see that BADVM was sent into "VM_PAUSED_EIO" state<br>
at 10:47:57:<br>
<br>
"VM BADVM has paused due to storage I/O problem."<br>
<br>
On this Red Hat Enterprise Virtualization Hypervisor 6.6<br>
(20150512.0.el6ev) Host, two other VMs paused but then automatically<br>
resumed without System Administrator intervention...<br>
<br>
In our DR Cluster, 22 VMs also resumed automatically...<br>
<br>
None of these Guest VMs are engaged in high I/O as these are DR site VMs<br>
not currently doing anything.<br>
<br>
We sent this information to Dell. Their response:<br>
<br>
"The root cause may reside within your virtualization solution, not the<br>
parent OS (RHEV-Hypervisor disc) or Storage (Dell Compellent.)"<br>
<br>
We are doing this Failover again on Sunday November 29th so we would<br>
like to know how to mitigate this issue, given we have to manually<br>
resume paused VMs that don't resume automatically.<br>
<br>
Before we initiated SAN Controller Failover, all iSCSI paths to Targets<br>
were present on Host tulhv2p03.<br>
<br>
VM logs on Host show in /var/log/libvirt/qemu/badhost.log that Storage<br>
error was reported:<br>
<br>
block I/O error in device 'drive-virtio-disk0': Input/output error (5)<br>
block I/O error in device 'drive-virtio-disk0': Input/output error (5)<br>
block I/O error in device 'drive-virtio-disk0': Input/output error (5)<br>
block I/O error in device 'drive-virtio-disk0': Input/output error (5)<br>
<br>
All disks used by this Guest VM are provided by single Storage Domain<br>
COM_3TB4_DR with serial "270." In syslog we do see that all paths for<br>
that Storage Domain Failed:<br>
<br>
Nov 13 16:47:40 multipathd: 36000d310005caf000000000000000270: remaining<br>
active paths: 0<br>
<br>
Though these recovered later:<br>
<br>
Nov 13 16:59:17 multipathd: 36000d310005caf000000000000000270: sdbg -<br>
tur checker reports path is up<br>
Nov 13 16:59:17 multipathd: 36000d310005caf000000000000000270: remaining<br>
active paths: 8<br>
<br>
Does anyone have an idea of why the VM would fail to automatically<br>
resume if the iSCSI paths used by its Storage Domain recovered?<br>
<br>
Thanks<br>
Doug<br>
<br>
--<br>
Thanks<br>
<br>
Douglas Charles Duckworth<br>
Unix Administrator<br>
Tulane University<br>
Technology Services<br>
1555 Poydras Ave<br>
NOLA -- 70112<br>
<br>
E: <a href="mailto:duckd@tulane.edu">duckd@tulane.edu</a><br>
O: <a href="tel:504-988-9341" value="+15049889341">504-988-9341</a><br>
F: <a href="tel:504-988-8505" value="+15049888505">504-988-8505</a><br>
_______________________________________________<br>
Users mailing list<br>
<a href="mailto:Users@ovirt.org">Users@ovirt.org</a><br>
<a href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br>
</blockquote></div><br></div></div>