<div dir="ltr">Adding Nir who knows it far better than me.<br><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Nov 23, 2015 at 8:37 PM, Duckworth, Douglas C <span dir="ltr">&lt;<a href="mailto:duckd@tulane.edu" target="_blank">duckd@tulane.edu</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hello --<br>

<br>

Not sure if y&#39;all can help with this issue we&#39;ve been seeing with RHEV...<br>

<br>

On 11/13/2015, during Code Upgrade of Compellent SAN at our Disaster<br>

Recovery Site, we Failed Over to Secondary SAN Controller.  Most Virtual<br>

Machines in our DR Cluster Resumed automatically after Pausing except VM<br>

&quot;BADVM&quot; on Host &quot;BADHOST.&quot;<br>

<br>

In Engine.log you can see that BADVM was sent into &quot;VM_PAUSED_EIO&quot; state<br>

at 10:47:57:<br>

<br>

&quot;VM BADVM has paused due to storage I/O problem.&quot;<br>

<br>

On this Red Hat Enterprise Virtualization Hypervisor 6.6<br>

(20150512.0.el6ev) Host, two other VMs paused but then automatically<br>

resumed without System Administrator intervention...<br>

<br>

In our DR Cluster, 22 VMs also resumed automatically...<br>

<br>

None of these Guest VMs are engaged in high I/O as these are DR site VMs<br>

not currently doing anything.<br>

<br>

We sent this information to Dell.  Their response:<br>

<br>

&quot;The root cause may reside within your virtualization solution, not the<br>

parent OS (RHEV-Hypervisor disc) or Storage (Dell Compellent.)&quot;<br>

<br>

We are doing this Failover again on Sunday November 29th so we would<br>

like to know how to mitigate this issue, given we have to manually<br>

resume paused VMs that don&#39;t resume automatically.<br>

<br>

Before we initiated SAN Controller Failover, all iSCSI paths to Targets<br>

were present on Host tulhv2p03.<br>

<br>

VM logs on Host show in /var/log/libvirt/qemu/badhost.log that Storage<br>

error was reported:<br>

<br>

block I/O error in device &#39;drive-virtio-disk0&#39;: Input/output error (5)<br>

block I/O error in device &#39;drive-virtio-disk0&#39;: Input/output error (5)<br>

block I/O error in device &#39;drive-virtio-disk0&#39;: Input/output error (5)<br>

block I/O error in device &#39;drive-virtio-disk0&#39;: Input/output error (5)<br>

<br>

All disks used by this Guest VM are provided by single Storage Domain<br>

COM_3TB4_DR with serial &quot;270.&quot;  In syslog we do see that all paths for<br>

that Storage Domain Failed:<br>

<br>

Nov 13 16:47:40 multipathd: 36000d310005caf000000000000000270: remaining<br>

active paths: 0<br>

<br>

Though these recovered later:<br>

<br>

Nov 13 16:59:17 multipathd: 36000d310005caf000000000000000270: sdbg -<br>

tur checker reports path is up<br>

Nov 13 16:59:17 multipathd: 36000d310005caf000000000000000270: remaining<br>

active paths: 8<br>

<br>

Does anyone have an idea of why the VM would fail to automatically<br>

resume if the iSCSI paths used by its Storage Domain recovered?<br>

<br>

Thanks<br>

Doug<br>

<br>

--<br>

Thanks<br>

<br>

Douglas Charles Duckworth<br>

Unix Administrator<br>

Tulane University<br>

Technology Services<br>

1555 Poydras Ave<br>

NOLA -- 70112<br>

<br>

E: <a href="mailto:duckd@tulane.edu">duckd@tulane.edu</a><br>

O: <a href="tel:504-988-9341" value="+15049889341">504-988-9341</a><br>

F: <a href="tel:504-988-8505" value="+15049888505">504-988-8505</a><br>

_______________________________________________<br>

Users mailing list<br>

<a href="mailto:Users@ovirt.org">Users@ovirt.org</a><br>

<a href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br>

</blockquote></div><br></div></div>