<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        font-size:11.0pt;
        font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
        {mso-style-priority:34;
        margin-top:0cm;
        margin-right:0cm;
        margin-bottom:0cm;
        margin-left:36.0pt;
        margin-bottom:.0001pt;
        font-size:11.0pt;
        font-family:"Calibri",sans-serif;}
p.msonormal0, li.msonormal0, div.msonormal0
        {mso-style-name:msonormal;
        mso-margin-top-alt:auto;
        margin-right:0cm;
        mso-margin-bottom-alt:auto;
        margin-left:0cm;
        font-size:12.0pt;
        font-family:"Times New Roman",serif;}
span.EmailStyle19
        {mso-style-type:personal;
        font-family:"Calibri",sans-serif;
        color:windowtext;}
span.EmailStyle20
        {mso-style-type:personal;
        font-family:"Calibri",sans-serif;
        color:windowtext;}
span.EmailStyle21
        {mso-style-type:personal;
        font-family:"Calibri",sans-serif;
        color:windowtext;}
span.EmailStyle22
        {mso-style-type:personal;
        font-family:"Calibri",sans-serif;
        color:windowtext;}
span.EmailStyle23
        {mso-style-type:personal;
        font-family:"Calibri",sans-serif;
        color:windowtext;}
span.EmailStyle24
        {mso-style-type:personal;
        font-family:"Calibri",sans-serif;
        color:windowtext;}
span.EmailStyle25
        {mso-style-type:personal-reply;
        font-family:"Calibri",sans-serif;
        color:windowtext;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-size:10.0pt;}
@page WordSection1
        {size:612.0pt 792.0pt;
        margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body bgcolor="white" lang="EN-GB" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal"><span style="mso-fareast-language:EN-US">Hi Pavel,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US">Thanks for responding. I bounced the VDSMD service, the guests recovered and the monitor and queue full messages also cleared. However, we did keep getting intermittent “Guest x Not Responding “
messages being communicated by the Hosted Engine, in most cases the guests would actually almost immediately recover though. The odd occasion would result in guests staying “Not Responding” and me bouncing the VDSMD service again. The Host had a memory load
of around 85% (out of 768GB) and a CPU load of around 65% (48 cores). I have since added another host to that cluster and spread the guests between the two hosts. This seems to have totally cleared the messages (at least for the last 5 days anyway).<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US">I suspect the problem is load related. At what capacity would Ovirt regard a host as being ‘full’?<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US">Thanks,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US">Mark<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span lang="EN-US">From:</span></b><span lang="EN-US"> Pavel Gashev [mailto:Pax@acronis.com]
<br>
<b>Sent:</b> 31 January 2017 15:19<br>
<b>To:</b> Mark Greenall <m.greenall@iontrading.com>; users@ovirt.org<br>
<b>Subject:</b> Re: [ovirt-users] Ovirt 4.0.6 guests 'Not Responding'<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><span lang="EN-US">Mark,<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Could you please file a bug report? <o:p>
</o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Restart of vdsmd service would help to resolve the “executor queue full” state.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span lang="EN-US" style="color:black">From: </span></b><span lang="EN-US" style="color:black"><<a href="mailto:users-bounces@ovirt.org">users-bounces@ovirt.org</a>> on behalf of Mark Greenall <<a href="mailto:m.greenall@iontrading.com">m.greenall@iontrading.com</a>><br>
<b>Date: </b>Monday 30 January 2017 at 15:26<br>
<b>To: </b>"<a href="mailto:users@ovirt.org">users@ovirt.org</a>" <<a href="mailto:users@ovirt.org">users@ovirt.org</a>><br>
<b>Subject: </b>[ovirt-users] Ovirt 4.0.6 guests 'Not Responding'</span><span lang="EN-US" style="font-size:12.0pt;color:black"><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span lang="EN-US" style="font-family:"Times New Roman",serif"><o:p> </o:p></span></p>
</div>
<p class="MsoNormal"><span lang="EN-US">Hi,<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> <o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Host server: Dell PowerEdge R815 (40 cores and 768GB memory)<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Stoage: Dell Equallogic (Firmware V8.1.4)<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">OS: Centos 7.3 (although the same thing happens on 7.2)<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Ovirt: 4.0.6.3-1<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> <o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">We have several Ovirt clusters. Two of the hosts (in separate clusters) are showing as up in Hosted Engine but the guests running on them are showing as Not Responding. I can connect to the guests via ssh, etc but can’t
interact with them from the Ovirt GUI. It was fine on Saturday (28<sup>th</sup> Jan) morning but looks like something happened Sunday morning around 07:14 as we suddenly see the following in engine.log on one host:<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> <o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">2017-01-29 07:14:26,952 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (DefaultQuartzScheduler1) [53ca8dc5] VM 'd0aa990f-e6aa-4e79-93ce-011fe1372fb0'(lnd-ion-lindev-01) moved from 'Up' --> 'NotResponding'<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">2017-01-29 07:14:27,069 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler1) [53ca8dc5] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM lnd-ion-lindev-01
is not responding.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">2017-01-29 07:14:27,070 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (DefaultQuartzScheduler1) [53ca8dc5] VM '788bfc0e-1712-469e-9a0a-395b8bb3f369'(lnd-ion-windev-02) moved from 'Up' --> 'NotResponding'<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">2017-01-29 07:14:27,088 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler1) [53ca8dc5] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM lnd-ion-windev-02
is not responding.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">2017-01-29 07:14:27,089 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (DefaultQuartzScheduler1) [53ca8dc5] VM 'd7eaa4ec-d65e-45c0-bc4f-505100658121'(lnd-ion-windev-04) moved from 'Up' --> 'NotResponding'<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">2017-01-29 07:14:27,103 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler1) [53ca8dc5] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM lnd-ion-windev-04
is not responding.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">2017-01-29 07:14:27,104 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (DefaultQuartzScheduler1) [53ca8dc5] VM '5af875ad-70f9-4f49-9640-ee2b9927348b'(lnd-anv9-sup1) moved from 'Up' --> 'NotResponding'<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">2017-01-29 07:14:27,121 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler1) [53ca8dc5] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM lnd-anv9-sup1
is not responding.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">2017-01-29 07:14:27,121 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (DefaultQuartzScheduler1) [53ca8dc5] VM 'b3b7c5f3-0b5b-4d8f-9cc8-b758cc1ce3b9'(lnd-db-dev-03) moved from 'Up' --> 'NotResponding'<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">2017-01-29 07:14:27,136 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler1) [53ca8dc5] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM lnd-db-dev-03
is not responding.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">2017-01-29 07:14:27,137 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (DefaultQuartzScheduler1) [53ca8dc5] VM '6c0a6e17-47c3-4464-939b-e83984dbeaa6'(lnd-db-dev-04) moved from 'Up' --> 'NotResponding'<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">2017-01-29 07:14:27,167 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler1) [53ca8dc5] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM lnd-db-dev-04
is not responding.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">2017-01-29 07:14:27,168 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (DefaultQuartzScheduler1) [53ca8dc5] VM 'ab15bb08-1244-4dc1-a4f1-f6e94246aa23'(lnd-ion-lindev-05) moved from 'Up' --> 'NotResponding'<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> <o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> <o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Checking the vdsm logs this morning on the hosts I see a lot of the following messages:<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> <o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">jsonrpc.Executor/0::WARNING::2017-01-30 09:34:15,989::vm::4890::virt.vm::(_setUnresponsiveIfTimeout) vmId=`ab15bb08-1244-4dc1-a4f1-f6e94246aa23`::monitor became unresponsive (command timeout, age=94854.48)<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">jsonrpc.Executor/0::WARNING::2017-01-30 09:34:15,990::vm::4890::virt.vm::(_setUnresponsiveIfTimeout) vmId=`20a51347-ef08-47a9-9982-32b2047991e1`::monitor became unresponsive (command timeout, age=94854.48)<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">jsonrpc.Executor/0::WARNING::2017-01-30 09:34:15,991::vm::4890::virt.vm::(_setUnresponsiveIfTimeout) vmId=`2cd8698d-a0f9-43b7-9a89-92a93e920eb7`::monitor became unresponsive (command timeout, age=94854.49)<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">jsonrpc.Executor/0::WARNING::2017-01-30 09:34:15,992::vm::4890::virt.vm::(_setUnresponsiveIfTimeout) vmId=`5af875ad-70f9-4f49-9640-ee2b9927348b`::monitor became unresponsive (command timeout, age=94854.49)<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> <o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">and<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> <o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">vdsm.Scheduler::WARNING::2017-01-30 09:36:36,444::periodic::212::virt.periodic.Operation::(_dispatch) could not run <VmDispatcher operation=<class 'vdsm.virt.periodic.DriveWatermarkMonitor'> at 0x295bd50>, executor queue
full<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">vdsm.Scheduler::WARNING::2017-01-30 09:36:38,446::periodic::212::virt.periodic.Operation::(_dispatch) could not run <VmDispatcher operation=<class 'vdsm.virt.periodic.DriveWatermarkMonitor'> at 0x295bd50>, executor queue
full<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">vdsm.Scheduler::WARNING::2017-01-30 09:36:38,627::periodic::212::virt.periodic.Operation::(_dispatch) could not run <vdsm.virt.sampling.HostMonitor object at 0x295bdd0>, executor queue full<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">vdsm.Scheduler::WARNING::2017-01-30 09:36:38,707::periodic::212::virt.periodic.Operation::(_dispatch) could not run <vdsm.virt.sampling.VMBulkSampler object at 0x295ba90>, executor queue full<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">vdsm.Scheduler::WARNING::2017-01-30 09:36:38,929::periodic::212::virt.periodic.Operation::(_dispatch) could not run <VmDispatcher operation=<class 'vdsm.virt.periodic.BlockjobMonitor'> at 0x295ba10>, executor queue full<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">vdsm.Scheduler::WARNING::2017-01-30 09:36:40,450::periodic::212::virt.periodic.Operation::(_dispatch) could not run <VmDispatcher operation=<class 'vdsm.virt.periodic.DriveWatermarkMonitor'> at 0x295bd50>, executor queue
full<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">vdsm.Scheduler::WARNING::2017-01-30 09:36:42,451::periodic::212::virt.periodic.Operation::(_dispatch) could not run <VmDispatcher operation=<class 'vdsm.virt.periodic.DriveWatermarkMonitor'> at 0x295bd50>, executor queue
full<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">vdsm.Scheduler::WARNING::2017-01-30 09:36:44,452::periodic::212::virt.periodic.Operation::(_dispatch) could not run <VmDispatcher operation=<class 'vdsm.virt.periodic.DriveWatermarkMonitor'> at 0x295bd50>, executor queue
full<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> <o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">I’ve also attached logs from time period for one of the hosts in question. This host is in a single node DC and cluster with iSCSI shared storage. I’ve had to make the time window on the logs quite small to fit within
the mail size limit. Let me know if you need anything more specific.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> <o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Many Thanks,<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Mark<o:p></o:p></span></p>
</div>
</body>
</html>