<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        font-size:11.0pt;
        font-family:"Calibri",sans-serif;
        mso-fareast-language:EN-US;}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
        {mso-style-priority:34;
        margin-top:0cm;
        margin-right:0cm;
        margin-bottom:0cm;
        margin-left:36.0pt;
        margin-bottom:.0001pt;
        font-size:11.0pt;
        font-family:"Calibri",sans-serif;
        mso-fareast-language:EN-US;}
p.msonormal0, li.msonormal0, div.msonormal0
        {mso-style-name:msonormal;
        mso-margin-top-alt:auto;
        margin-right:0cm;
        mso-margin-bottom-alt:auto;
        margin-left:0cm;
        font-size:12.0pt;
        font-family:"Times New Roman",serif;}
span.EmailStyle19
        {mso-style-type:personal;
        font-family:"Calibri",sans-serif;
        color:windowtext;}
span.EmailStyle20
        {mso-style-type:personal;
        font-family:"Calibri",sans-serif;
        color:windowtext;}
span.EmailStyle21
        {mso-style-type:personal;
        font-family:"Calibri",sans-serif;
        color:windowtext;}
span.EmailStyle22
        {mso-style-type:personal;
        font-family:"Calibri",sans-serif;
        color:windowtext;}
span.EmailStyle23
        {mso-style-type:personal-reply;
        font-family:"Calibri",sans-serif;
        color:windowtext;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-size:10.0pt;}
@page WordSection1
        {size:612.0pt 792.0pt;
        margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-GB" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal">Hi,<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Host server: Dell PowerEdge R815 (40 cores and 768GB memory)<o:p></o:p></p>
<p class="MsoNormal">Stoage: Dell Equallogic (Firmware V8.1.4)<o:p></o:p></p>
<p class="MsoNormal">OS: Centos 7.3 (although the same thing happens on 7.2)<o:p></o:p></p>
<p class="MsoNormal">Ovirt: 4.0.6.3-1<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">We have several Ovirt clusters. Two of the hosts (in separate clusters) are showing as up in Hosted Engine but the guests running on them are showing as Not Responding. I can connect to the guests via ssh, etc but can’t interact with them
from the Ovirt GUI. It was fine on Saturday (28<sup>th</sup> Jan) morning but looks like something happened Sunday morning around 07:14 as we suddenly see the following in engine.log on one host:<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">2017-01-29 07:14:26,952 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (DefaultQuartzScheduler1) [53ca8dc5] VM 'd0aa990f-e6aa-4e79-93ce-011fe1372fb0'(lnd-ion-lindev-01) moved from 'Up' --> 'NotResponding'<o:p></o:p></p>
<p class="MsoNormal">2017-01-29 07:14:27,069 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler1) [53ca8dc5] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM lnd-ion-lindev-01 is not
responding.<o:p></o:p></p>
<p class="MsoNormal">2017-01-29 07:14:27,070 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (DefaultQuartzScheduler1) [53ca8dc5] VM '788bfc0e-1712-469e-9a0a-395b8bb3f369'(lnd-ion-windev-02) moved from 'Up' --> 'NotResponding'<o:p></o:p></p>
<p class="MsoNormal">2017-01-29 07:14:27,088 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler1) [53ca8dc5] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM lnd-ion-windev-02 is not
responding.<o:p></o:p></p>
<p class="MsoNormal">2017-01-29 07:14:27,089 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (DefaultQuartzScheduler1) [53ca8dc5] VM 'd7eaa4ec-d65e-45c0-bc4f-505100658121'(lnd-ion-windev-04) moved from 'Up' --> 'NotResponding'<o:p></o:p></p>
<p class="MsoNormal">2017-01-29 07:14:27,103 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler1) [53ca8dc5] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM lnd-ion-windev-04 is not
responding.<o:p></o:p></p>
<p class="MsoNormal">2017-01-29 07:14:27,104 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (DefaultQuartzScheduler1) [53ca8dc5] VM '5af875ad-70f9-4f49-9640-ee2b9927348b'(lnd-anv9-sup1) moved from 'Up' --> 'NotResponding'<o:p></o:p></p>
<p class="MsoNormal">2017-01-29 07:14:27,121 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler1) [53ca8dc5] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM lnd-anv9-sup1 is not responding.<o:p></o:p></p>
<p class="MsoNormal">2017-01-29 07:14:27,121 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (DefaultQuartzScheduler1) [53ca8dc5] VM 'b3b7c5f3-0b5b-4d8f-9cc8-b758cc1ce3b9'(lnd-db-dev-03) moved from 'Up' --> 'NotResponding'<o:p></o:p></p>
<p class="MsoNormal">2017-01-29 07:14:27,136 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler1) [53ca8dc5] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM lnd-db-dev-03 is not responding.<o:p></o:p></p>
<p class="MsoNormal">2017-01-29 07:14:27,137 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (DefaultQuartzScheduler1) [53ca8dc5] VM '6c0a6e17-47c3-4464-939b-e83984dbeaa6'(lnd-db-dev-04) moved from 'Up' --> 'NotResponding'<o:p></o:p></p>
<p class="MsoNormal">2017-01-29 07:14:27,167 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler1) [53ca8dc5] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM lnd-db-dev-04 is not responding.<o:p></o:p></p>
<p class="MsoNormal">2017-01-29 07:14:27,168 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (DefaultQuartzScheduler1) [53ca8dc5] VM 'ab15bb08-1244-4dc1-a4f1-f6e94246aa23'(lnd-ion-lindev-05) moved from 'Up' --> 'NotResponding'<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Checking the vdsm logs this morning on the hosts I see a lot of the following messages:<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">jsonrpc.Executor/0::WARNING::2017-01-30 09:34:15,989::vm::4890::virt.vm::(_setUnresponsiveIfTimeout) vmId=`ab15bb08-1244-4dc1-a4f1-f6e94246aa23`::monitor became unresponsive (command timeout, age=94854.48)<o:p></o:p></p>
<p class="MsoNormal">jsonrpc.Executor/0::WARNING::2017-01-30 09:34:15,990::vm::4890::virt.vm::(_setUnresponsiveIfTimeout) vmId=`20a51347-ef08-47a9-9982-32b2047991e1`::monitor became unresponsive (command timeout, age=94854.48)<o:p></o:p></p>
<p class="MsoNormal">jsonrpc.Executor/0::WARNING::2017-01-30 09:34:15,991::vm::4890::virt.vm::(_setUnresponsiveIfTimeout) vmId=`2cd8698d-a0f9-43b7-9a89-92a93e920eb7`::monitor became unresponsive (command timeout, age=94854.49)<o:p></o:p></p>
<p class="MsoNormal">jsonrpc.Executor/0::WARNING::2017-01-30 09:34:15,992::vm::4890::virt.vm::(_setUnresponsiveIfTimeout) vmId=`5af875ad-70f9-4f49-9640-ee2b9927348b`::monitor became unresponsive (command timeout, age=94854.49)<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">and<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">vdsm.Scheduler::WARNING::2017-01-30 09:36:36,444::periodic::212::virt.periodic.Operation::(_dispatch) could not run <VmDispatcher operation=<class 'vdsm.virt.periodic.DriveWatermarkMonitor'> at 0x295bd50>, executor queue full<o:p></o:p></p>
<p class="MsoNormal">vdsm.Scheduler::WARNING::2017-01-30 09:36:38,446::periodic::212::virt.periodic.Operation::(_dispatch) could not run <VmDispatcher operation=<class 'vdsm.virt.periodic.DriveWatermarkMonitor'> at 0x295bd50>, executor queue full<o:p></o:p></p>
<p class="MsoNormal">vdsm.Scheduler::WARNING::2017-01-30 09:36:38,627::periodic::212::virt.periodic.Operation::(_dispatch) could not run <vdsm.virt.sampling.HostMonitor object at 0x295bdd0>, executor queue full<o:p></o:p></p>
<p class="MsoNormal">vdsm.Scheduler::WARNING::2017-01-30 09:36:38,707::periodic::212::virt.periodic.Operation::(_dispatch) could not run <vdsm.virt.sampling.VMBulkSampler object at 0x295ba90>, executor queue full<o:p></o:p></p>
<p class="MsoNormal">vdsm.Scheduler::WARNING::2017-01-30 09:36:38,929::periodic::212::virt.periodic.Operation::(_dispatch) could not run <VmDispatcher operation=<class 'vdsm.virt.periodic.BlockjobMonitor'> at 0x295ba10>, executor queue full<o:p></o:p></p>
<p class="MsoNormal">vdsm.Scheduler::WARNING::2017-01-30 09:36:40,450::periodic::212::virt.periodic.Operation::(_dispatch) could not run <VmDispatcher operation=<class 'vdsm.virt.periodic.DriveWatermarkMonitor'> at 0x295bd50>, executor queue full<o:p></o:p></p>
<p class="MsoNormal">vdsm.Scheduler::WARNING::2017-01-30 09:36:42,451::periodic::212::virt.periodic.Operation::(_dispatch) could not run <VmDispatcher operation=<class 'vdsm.virt.periodic.DriveWatermarkMonitor'> at 0x295bd50>, executor queue full<o:p></o:p></p>
<p class="MsoNormal">vdsm.Scheduler::WARNING::2017-01-30 09:36:44,452::periodic::212::virt.periodic.Operation::(_dispatch) could not run <VmDispatcher operation=<class 'vdsm.virt.periodic.DriveWatermarkMonitor'> at 0x295bd50>, executor queue full<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">I’ve also attached logs from time period for one of the hosts in question. This host is in a single node DC and cluster with iSCSI shared storage. I’ve had to make the time window on the logs quite small to fit within the mail size limit.
Let me know if you need anything more specific.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Many Thanks,<o:p></o:p></p>
<p class="MsoNormal">Mark<o:p></o:p></p>
</div>
</body>
</html>