--Apple-Mail=_0F75B640-95EC-4CBD-8AE7-5347CDE34B1F
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
charset=utf-8
On 6 Feb 2017, at 16:20, Mark Greenall
<m.greenall(a)iontrading.com> =
wrote:
=20
Hi Pavel,
=20
Thanks for responding. I bounced the VDSMD service, the guests =
recovered and the
monitor and queue full messages also cleared. However, =
we did keep getting intermittent =E2=80=9CGuest x Not Responding =E2=80=9C=
messages being communicated by the Hosted Engine, in most cases the =
guests would actually almost immediately recover though. The odd =
occasion would result in guests staying =E2=80=9CNot Responding=E2=80=9D =
and me bouncing the VDSMD service again. The Host had a memory load of =
around 85% (out of 768GB) and a CPU load of around 65% (48 cores). I =
have since added another host to that cluster and spread the guests =
between the two hosts. This seems to have totally cleared the messages =
(at least for the last 5 days anyway).
=20
I suspect the problem is load related. At what capacity would Ovirt =
regard a host
as being =E2=80=98full=E2=80=99?
the above sounds ok, but one of the best indicators is the unix system =
load
what is the number of VMs (and guest cpus) you=E2=80=99re running on =
that 48 core host?=20
also check if the vdsm or libvirt process cpu usage is not exceptionally =
high
=20
Thanks,
Mark
=20
From: Pavel Gashev [mailto:Pax@acronis.com <mailto:Pax@acronis.com>]=20=
Sent: 31 January 2017 15:19
To: Mark Greenall <m.greenall(a)iontrading.com =
<mailto:m.greenall@iontrading.com>>; users(a)ovirt.org =
<mailto:users@ovirt.org>
Subject: Re: [ovirt-users] Ovirt 4.0.6 guests 'Not
Responding'
=20
Mark,
=20
Could you please file a bug report?=20
=20
Restart of vdsmd service would help to resolve the =E2=80=9Cexecutor =
queue
full=E2=80=9D state.
=20
=20
From: <users-bounces(a)ovirt.org <mailto:users-bounces@ovirt.org>> on =
behalf of Mark Greenall <m.greenall(a)iontrading.com =
<mailto:m.greenall@iontrading.com>>
Date: Monday 30 January 2017 at 15:26
To: "users(a)ovirt.org <mailto:users@ovirt.org>" <users(a)ovirt.org =
<mailto:users@ovirt.org>>
Subject: [ovirt-users] Ovirt 4.0.6 guests 'Not Responding'
=20
Hi,
=20
Host server: Dell PowerEdge R815 (40 cores and 768GB memory)
Stoage: Dell Equallogic (Firmware V8.1.4)
OS: Centos 7.3 (although the same thing happens on 7.2)
Ovirt: 4.0.6.3-1
=20
We have several Ovirt clusters. Two of the hosts (in separate =
clusters) are
showing as up in Hosted Engine but the guests running on =
them are showing as Not Responding. I can connect to the guests via ssh, =
etc but can=E2=80=99t interact with them from the Ovirt GUI. It was fine =
on Saturday (28th Jan) morning but looks like something happened Sunday =
morning around 07:14 as we suddenly see the following in engine.log on =
one host:
=20
2017-01-29 07:14:26,952 INFO =
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] =
(DefaultQuartzScheduler1) [53ca8dc5] VM =
'd0aa990f-e6aa-4e79-93ce-011fe1372fb0'(lnd-ion-lindev-01) moved from =
'Up' --> 'NotResponding'
2017-01-29 07:14:27,069 WARN =
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] =
(DefaultQuartzScheduler1) [53ca8dc5] Correlation ID: null, Call Stack: =
null, Custom Event ID: -1, Message: VM lnd-ion-lindev-01 is not =
responding.
2017-01-29 07:14:27,070 INFO =
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] =
(DefaultQuartzScheduler1) [53ca8dc5] VM =
'788bfc0e-1712-469e-9a0a-395b8bb3f369'(lnd-ion-windev-02) moved from =
'Up' --> 'NotResponding'
2017-01-29 07:14:27,088 WARN =
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] =
(DefaultQuartzScheduler1) [53ca8dc5] Correlation ID: null, Call Stack: =
null, Custom Event ID: -1, Message: VM lnd-ion-windev-02 is not =
responding.
2017-01-29 07:14:27,089 INFO =
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] =
(DefaultQuartzScheduler1) [53ca8dc5] VM =
'd7eaa4ec-d65e-45c0-bc4f-505100658121'(lnd-ion-windev-04) moved from =
'Up' --> 'NotResponding'
2017-01-29 07:14:27,103 WARN =
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] =
(DefaultQuartzScheduler1) [53ca8dc5] Correlation ID: null, Call Stack: =
null, Custom Event ID: -1, Message: VM lnd-ion-windev-04 is not =
responding.
2017-01-29 07:14:27,104 INFO =
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] =
(DefaultQuartzScheduler1) [53ca8dc5] VM =
'5af875ad-70f9-4f49-9640-ee2b9927348b'(lnd-anv9-sup1) moved from 'Up' =
--> 'NotResponding'
2017-01-29 07:14:27,121 WARN =
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] =
(DefaultQuartzScheduler1) [53ca8dc5] Correlation ID: null, Call Stack: =
null, Custom Event ID: -1, Message: VM lnd-anv9-sup1 is not responding.
2017-01-29 07:14:27,121 INFO =
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] =
(DefaultQuartzScheduler1) [53ca8dc5] VM =
'b3b7c5f3-0b5b-4d8f-9cc8-b758cc1ce3b9'(lnd-db-dev-03) moved from 'Up' =
--> 'NotResponding'
2017-01-29 07:14:27,136 WARN =
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] =
(DefaultQuartzScheduler1) [53ca8dc5] Correlation ID: null, Call Stack: =
null, Custom Event ID: -1, Message: VM lnd-db-dev-03 is not responding.
2017-01-29 07:14:27,137 INFO =
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] =
(DefaultQuartzScheduler1) [53ca8dc5] VM =
'6c0a6e17-47c3-4464-939b-e83984dbeaa6'(lnd-db-dev-04) moved from 'Up' =
--> 'NotResponding'
2017-01-29 07:14:27,167 WARN =
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] =
(DefaultQuartzScheduler1) [53ca8dc5] Correlation ID: null, Call Stack: =
null, Custom Event ID: -1, Message: VM lnd-db-dev-04 is not responding.
2017-01-29 07:14:27,168 INFO =
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] =
(DefaultQuartzScheduler1) [53ca8dc5] VM =
'ab15bb08-1244-4dc1-a4f1-f6e94246aa23'(lnd-ion-lindev-05) moved from =
'Up' --> 'NotResponding'
=20
=20
Checking the vdsm logs this morning on the hosts I see a lot of the =
following
messages:
=20
jsonrpc.Executor/0::WARNING::2017-01-30 =
09:34:15,989::vm::4890::virt.vm::(_setUnresponsiveIfTimeout) =
vmId=3D`ab15bb08-1244-4dc1-a4f1-f6e94246aa23`::monitor became =
unresponsive (command timeout, age=3D94854.48)
jsonrpc.Executor/0::WARNING::2017-01-30 =
09:34:15,990::vm::4890::virt.vm::(_setUnresponsiveIfTimeout) =
vmId=3D`20a51347-ef08-47a9-9982-32b2047991e1`::monitor became =
unresponsive (command timeout, age=3D94854.48)
jsonrpc.Executor/0::WARNING::2017-01-30 =
09:34:15,991::vm::4890::virt.vm::(_setUnresponsiveIfTimeout) =
vmId=3D`2cd8698d-a0f9-43b7-9a89-92a93e920eb7`::monitor became =
unresponsive (command timeout, age=3D94854.49)
jsonrpc.Executor/0::WARNING::2017-01-30 =
09:34:15,992::vm::4890::virt.vm::(_setUnresponsiveIfTimeout) =
vmId=3D`5af875ad-70f9-4f49-9640-ee2b9927348b`::monitor became =
unresponsive (command timeout, age=3D94854.49)
=20
and
=20
vdsm.Scheduler::WARNING::2017-01-30 =
09:36:36,444::periodic::212::virt.periodic.Operation::(_dispatch) could =
not run <VmDispatcher operation=3D<class =
'vdsm.virt.periodic.DriveWatermarkMonitor'> at 0x295bd50>, executor =
queue full
vdsm.Scheduler::WARNING::2017-01-30 =
09:36:38,446::periodic::212::virt.periodic.Operation::(_dispatch) could =
not run <VmDispatcher operation=3D<class =
'vdsm.virt.periodic.DriveWatermarkMonitor'> at 0x295bd50>, executor =
queue full
vdsm.Scheduler::WARNING::2017-01-30 =
09:36:38,627::periodic::212::virt.periodic.Operation::(_dispatch) could =
not run <vdsm.virt.sampling.HostMonitor object at 0x295bdd0>, executor =
queue full
vdsm.Scheduler::WARNING::2017-01-30 =
09:36:38,707::periodic::212::virt.periodic.Operation::(_dispatch) could =
not run <vdsm.virt.sampling.VMBulkSampler object at 0x295ba90>, executor =
queue full
vdsm.Scheduler::WARNING::2017-01-30 =
09:36:38,929::periodic::212::virt.periodic.Operation::(_dispatch) could =
not run <VmDispatcher operation=3D<class =
'vdsm.virt.periodic.BlockjobMonitor'> at 0x295ba10>, executor queue full
vdsm.Scheduler::WARNING::2017-01-30 =
09:36:40,450::periodic::212::virt.periodic.Operation::(_dispatch) could =
not run <VmDispatcher operation=3D<class =
'vdsm.virt.periodic.DriveWatermarkMonitor'> at 0x295bd50>, executor =
queue full
vdsm.Scheduler::WARNING::2017-01-30 =
09:36:42,451::periodic::212::virt.periodic.Operation::(_dispatch) could =
not run <VmDispatcher operation=3D<class =
'vdsm.virt.periodic.DriveWatermarkMonitor'> at 0x295bd50>, executor =
queue full
vdsm.Scheduler::WARNING::2017-01-30 =
09:36:44,452::periodic::212::virt.periodic.Operation::(_dispatch) could =
not run <VmDispatcher operation=3D<class =
'vdsm.virt.periodic.DriveWatermarkMonitor'> at 0x295bd50>, executor =
queue full
=20
I=E2=80=99ve also attached logs from time period for one of the hosts =
in
question. This host is in a single node DC and cluster with iSCSI =
shared storage. I=E2=80=99ve had to make the time window on the logs =
quite small to fit within the mail size limit. Let me know if you need =
anything more specific.
=20
Many Thanks,
Mark
_______________________________________________
Users mailing list
Users(a)ovirt.org <mailto:Users@ovirt.org>
http://lists.ovirt.org/mailman/listinfo/users =
<
http://lists.ovirt.org/mailman/listinfo/users>
--Apple-Mail=_0F75B640-95EC-4CBD-8AE7-5347CDE34B1F
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
charset=utf-8
<html><head><meta http-equiv=3D"Content-Type"
content=3D"text/html =
charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" =
class=3D""><br class=3D""><div><blockquote
type=3D"cite" class=3D""><div =
class=3D"">On 6 Feb 2017, at 16:20, Mark Greenall <<a =
href=3D"mailto:m.greenall@iontrading.com" =
class=3D"">m.greenall(a)iontrading.com</a>&gt;
wrote:</div><br =
class=3D"Apple-interchange-newline"><div class=3D""><div
=
class=3D"WordSection1" style=3D"page: WordSection1; font-family: =
Helvetica; font-size: 12px; font-style: normal; font-variant-caps: =
normal; font-weight: normal; letter-spacing: normal; orphans: auto; =
text-align: start; text-indent: 0px; text-transform: none; white-space: =
normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; =
background-color: rgb(255, 255, 255);"><div style=3D"margin: 0cm 0cm =
0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" =
class=3D""><span class=3D"">Hi Pavel,<o:p =
class=3D""></o:p></span></div><div style=3D"margin:
0cm 0cm 0.0001pt; =
font-size: 11pt; font-family: Calibri, sans-serif;" class=3D""><span
=
class=3D""><o:p
class=3D""> </o:p></span></div><div
style=3D"margin: =
0cm 0cm 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" =
class=3D""><span class=3D"">Thanks for responding. I bounced
the VDSMD =
service, the guests recovered and the monitor and queue full messages =
also cleared. However, we did keep getting intermittent =E2=80=9CGuest x =
Not Responding =E2=80=9C messages being communicated by the Hosted =
Engine, in most cases the guests would actually almost immediately =
recover though. The odd occasion would result in guests staying =E2=80=9CN=
ot Responding=E2=80=9D and me bouncing the VDSMD service again. The Host =
had a memory load of around 85% (out of 768GB) and a CPU load of around =
65% (48 cores). I have since added another host to that cluster and =
spread the guests between the two hosts. This seems to have totally =
cleared the messages (at least for the last 5 days anyway).<o:p =
class=3D""></o:p></span></div><div style=3D"margin:
0cm 0cm 0.0001pt; =
font-size: 11pt; font-family: Calibri, sans-serif;" class=3D""><span
=
class=3D""><o:p
class=3D""> </o:p></span></div><div
style=3D"margin: =
0cm 0cm 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" =
class=3D""><span class=3D"">I suspect the problem is load
related. At =
what capacity would Ovirt regard a host as being =
=E2=80=98full=E2=80=99?</span></div></div></div></blockquote><div><br
=
class=3D""></div><div>the above sounds ok, but one of the best =
indicators is the unix system load</div><div>what is the number of VMs =
(and guest cpus) you=E2=80=99re running on that 48 core =
host? </div></div><div>also check if the vdsm or libvirt
process =
cpu usage is not exceptionally high</div><div><br
class=3D""><blockquote =
type=3D"cite" class=3D""><div class=3D""><div
class=3D"WordSection1" =
style=3D"page: WordSection1; font-family: Helvetica; font-size: 12px; =
font-style: normal; font-variant-caps: normal; font-weight: normal; =
letter-spacing: normal; orphans: auto; text-align: start; text-indent: =
0px; text-transform: none; white-space: normal; widows: auto; =
word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: =
rgb(255, 255, 255);"><div style=3D"margin: 0cm 0cm 0.0001pt; font-size: =
11pt; font-family: Calibri, sans-serif;" class=3D""><span
class=3D""><o:p =
class=3D""></o:p></span></div><div style=3D"margin:
0cm 0cm 0.0001pt; =
font-size: 11pt; font-family: Calibri, sans-serif;" class=3D""><span
=
class=3D""><o:p
class=3D""> </o:p></span></div><div
style=3D"margin: =
0cm 0cm 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" =
class=3D""><span class=3D"">Thanks,<o:p
class=3D""></o:p></span></div><div=
style=3D"margin: 0cm 0cm 0.0001pt; font-size: 11pt; font-family: =
Calibri, sans-serif;" class=3D""><span
class=3D"">Mark<o:p =
class=3D""></o:p></span></div><div style=3D"margin:
0cm 0cm 0.0001pt; =
font-size: 11pt; font-family: Calibri, sans-serif;" class=3D""><span
=
class=3D""><o:p
class=3D""> </o:p></span></div><div
class=3D""><div =
style=3D"border-style: solid none none; border-top-color: rgb(225, 225, =
225); border-top-width: 1pt; padding: 3pt 0cm 0cm;" class=3D""><div
=
style=3D"margin: 0cm 0cm 0.0001pt; font-size: 11pt; font-family: =
Calibri, sans-serif;" class=3D""><b class=3D""><span
lang=3D"EN-US" =
class=3D"">From:</span></b><span lang=3D"EN-US"
class=3D""><span =
class=3D"Apple-converted-space"> </span>Pavel Gashev [<a =
href=3D"mailto:Pax@acronis.com" style=3D"color: purple; text-decoration: =
underline;" class=3D"">mailto:Pax@acronis.com</a>]<span =
class=3D"Apple-converted-space"> </span><br
class=3D""><b =
class=3D"">Sent:</b><span
class=3D"Apple-converted-space"> </span>31 =
January 2017 15:19<br class=3D""><b
class=3D"">To:</b><span =
class=3D"Apple-converted-space"> </span>Mark Greenall
<<a =
href=3D"mailto:m.greenall@iontrading.com" style=3D"color: purple; =
text-decoration: underline;" =
class=3D"">m.greenall(a)iontrading.com</a>&gt;;<span =
class=3D"Apple-converted-space"> </span><a =
href=3D"mailto:users@ovirt.org" style=3D"color: purple; text-decoration: =
underline;" class=3D"">users(a)ovirt.org</a><br
class=3D""><b =
class=3D"">Subject:</b><span =
class=3D"Apple-converted-space"> </span>Re: [ovirt-users]
Ovirt =
4.0.6 guests 'Not Responding'<o:p =
class=3D""></o:p></span></div></div></div><div
style=3D"margin: 0cm 0cm =
0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" =
class=3D""><o:p
class=3D""> </o:p></div><div style=3D"margin:
0cm =
0cm 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" =
class=3D""><span lang=3D"EN-US"
class=3D"">Mark,<o:p =
class=3D""></o:p></span></div><div style=3D"margin:
0cm 0cm 0.0001pt; =
font-size: 11pt; font-family: Calibri, sans-serif;" class=3D""><span
=
lang=3D"EN-US" class=3D""><o:p
class=3D""> </o:p></span></div><div =
style=3D"margin: 0cm 0cm 0.0001pt; font-size: 11pt; font-family: =
Calibri, sans-serif;" class=3D""><span lang=3D"EN-US"
class=3D"">Could =
you please file a bug report?<span =
class=3D"Apple-converted-space"> </span><o:p =
class=3D""></o:p></span></div><div style=3D"margin:
0cm 0cm 0.0001pt; =
font-size: 11pt; font-family: Calibri, sans-serif;" class=3D""><span
=
lang=3D"EN-US" class=3D""><o:p
class=3D""> </o:p></span></div><div =
style=3D"margin: 0cm 0cm 0.0001pt; font-size: 11pt; font-family: =
Calibri, sans-serif;" class=3D""><span lang=3D"EN-US"
class=3D"">Restart =
of vdsmd service would help to resolve the =E2=80=9Cexecutor queue =
full=E2=80=9D state.<o:p
class=3D""></o:p></span></div><div =
style=3D"margin: 0cm 0cm 0.0001pt; font-size: 11pt; font-family: =
Calibri, sans-serif;" class=3D""><span lang=3D"EN-US"
class=3D""><o:p =
class=3D""> </o:p></span></div><div
style=3D"margin: 0cm 0cm =
0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" =
class=3D""><span lang=3D"EN-US" class=3D""><o:p
=
class=3D""> </o:p></span></div><div
style=3D"border-style: solid =
none none; border-top-color: rgb(181, 196, 223); border-top-width: 1pt; =
padding: 3pt 0cm 0cm;" class=3D""><div style=3D"margin: 0cm 0cm
=
0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" =
class=3D""><b class=3D""><span lang=3D"EN-US"
style=3D"" =
class=3D"">From:<span =
class=3D"Apple-converted-space"> </span></span></b><span
=
lang=3D"EN-US" style=3D"" class=3D""><<a =
href=3D"mailto:users-bounces@ovirt.org" style=3D"color: purple; =
text-decoration: underline;"
class=3D"">users-bounces(a)ovirt.org</a>&gt; =
on behalf of Mark Greenall <<a =
href=3D"mailto:m.greenall@iontrading.com" style=3D"color: purple; =
text-decoration: underline;" =
class=3D"">m.greenall(a)iontrading.com</a>&gt;<br
class=3D""><b =
class=3D"">Date:<span =
class=3D"Apple-converted-space"> </span></b>Monday 30
January 2017 =
at 15:26<br class=3D""><b class=3D"">To:<span =
class=3D"Apple-converted-space"> </span></b>"<a
=
href=3D"mailto:users@ovirt.org" style=3D"color: purple; text-decoration: =
underline;" class=3D"">users(a)ovirt.org</a>" <<a =
href=3D"mailto:users@ovirt.org" style=3D"color: purple; text-decoration: =
underline;" class=3D"">users(a)ovirt.org</a>&gt;<br
class=3D""><b =
class=3D"">Subject:<span =
class=3D"Apple-converted-space"> </span></b>[ovirt-users]
Ovirt =
4.0.6 guests 'Not Responding'</span><span lang=3D"EN-US" =
style=3D"font-size: 12pt;" class=3D""><o:p =
class=3D""></o:p></span></div></div><div
class=3D""><div style=3D"margin: =
0cm 0cm 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" =
class=3D""><span lang=3D"EN-US" style=3D"font-family:
'Times New Roman', =
serif;" class=3D""><o:p
class=3D""> </o:p></span></div></div><div
=
style=3D"margin: 0cm 0cm 0.0001pt; font-size: 11pt; font-family: =
Calibri, sans-serif;" class=3D""><span lang=3D"EN-US"
class=3D"">Hi,<o:p =
class=3D""></o:p></span></div><div style=3D"margin:
0cm 0cm 0.0001pt; =
font-size: 11pt; font-family: Calibri, sans-serif;" class=3D""><span
=
lang=3D"EN-US" class=3D""> <o:p
class=3D""></o:p></span></div><div =
style=3D"margin: 0cm 0cm 0.0001pt; font-size: 11pt; font-family: =
Calibri, sans-serif;" class=3D""><span lang=3D"EN-US"
class=3D"">Host =
server: Dell PowerEdge R815 (40 cores and 768GB memory)<o:p =
class=3D""></o:p></span></div><div style=3D"margin:
0cm 0cm 0.0001pt; =
font-size: 11pt; font-family: Calibri, sans-serif;" class=3D""><span
=
lang=3D"EN-US" class=3D"">Stoage: Dell Equallogic (Firmware
V8.1.4)<o:p =
class=3D""></o:p></span></div><div style=3D"margin:
0cm 0cm 0.0001pt; =
font-size: 11pt; font-family: Calibri, sans-serif;" class=3D""><span
=
lang=3D"EN-US" class=3D"">OS: Centos 7.3 (although the same thing
=
happens on 7.2)<o:p
class=3D""></o:p></span></div><div style=3D"margin:
=
0cm 0cm 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" =
class=3D""><span lang=3D"EN-US" class=3D"">Ovirt:
4.0.6.3-1<o:p =
class=3D""></o:p></span></div><div style=3D"margin:
0cm 0cm 0.0001pt; =
font-size: 11pt; font-family: Calibri, sans-serif;" class=3D""><span
=
lang=3D"EN-US" class=3D""> <o:p
class=3D""></o:p></span></div><div =
style=3D"margin: 0cm 0cm 0.0001pt; font-size: 11pt; font-family: =
Calibri, sans-serif;" class=3D""><span lang=3D"EN-US"
class=3D"">We have =
several Ovirt clusters. Two of the hosts (in separate clusters) are =
showing as up in Hosted Engine but the guests running on them are =
showing as Not Responding. I can connect to the guests via ssh, etc but =
can=E2=80=99t interact with them from the Ovirt GUI. It was fine on =
Saturday (28<sup class=3D"">th</sup><span =
class=3D"Apple-converted-space"> </span>Jan) morning but looks
like =
something happened Sunday morning around 07:14 as we suddenly see the =
following in engine.log on one host:<o:p =
class=3D""></o:p></span></div><div style=3D"margin:
0cm 0cm 0.0001pt; =
font-size: 11pt; font-family: Calibri, sans-serif;" class=3D""><span
=
lang=3D"EN-US" class=3D""> <o:p
class=3D""></o:p></span></div><div =
style=3D"margin: 0cm 0cm 0.0001pt; font-size: 11pt; font-family: =
Calibri, sans-serif;" class=3D""><span lang=3D"EN-US" =
class=3D"">2017-01-29 07:14:26,952 INFO =
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] =
(DefaultQuartzScheduler1) [53ca8dc5] VM =
'd0aa990f-e6aa-4e79-93ce-011fe1372fb0'(lnd-ion-lindev-01) moved from =
'Up' --> 'NotResponding'<o:p
class=3D""></o:p></span></div><div =
style=3D"margin: 0cm 0cm 0.0001pt; font-size: 11pt; font-family: =
Calibri, sans-serif;" class=3D""><span lang=3D"EN-US" =
class=3D"">2017-01-29 07:14:27,069 WARN =
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] =
(DefaultQuartzScheduler1) [53ca8dc5] Correlation ID: null, Call Stack: =
null, Custom Event ID: -1, Message: VM lnd-ion-lindev-01 is not =
responding.<o:p class=3D""></o:p></span></div><div
style=3D"margin: 0cm =
0cm 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" =
class=3D""><span lang=3D"EN-US"
class=3D"">2017-01-29 07:14:27,070 =
INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] =
(DefaultQuartzScheduler1) [53ca8dc5] VM =
'788bfc0e-1712-469e-9a0a-395b8bb3f369'(lnd-ion-windev-02) moved from =
'Up' --> 'NotResponding'<o:p
class=3D""></o:p></span></div><div =
style=3D"margin: 0cm 0cm 0.0001pt; font-size: 11pt; font-family: =
Calibri, sans-serif;" class=3D""><span lang=3D"EN-US" =
class=3D"">2017-01-29 07:14:27,088 WARN =
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] =
(DefaultQuartzScheduler1) [53ca8dc5] Correlation ID: null, Call Stack: =
null, Custom Event ID: -1, Message: VM lnd-ion-windev-02 is not =
responding.<o:p class=3D""></o:p></span></div><div
style=3D"margin: 0cm =
0cm 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" =
class=3D""><span lang=3D"EN-US"
class=3D"">2017-01-29 07:14:27,089 =
INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] =
(DefaultQuartzScheduler1) [53ca8dc5] VM =
'd7eaa4ec-d65e-45c0-bc4f-505100658121'(lnd-ion-windev-04) moved from =
'Up' --> 'NotResponding'<o:p
class=3D""></o:p></span></div><div =
style=3D"margin: 0cm 0cm 0.0001pt; font-size: 11pt; font-family: =
Calibri, sans-serif;" class=3D""><span lang=3D"EN-US" =
class=3D"">2017-01-29 07:14:27,103 WARN =
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] =
(DefaultQuartzScheduler1) [53ca8dc5] Correlation ID: null, Call Stack: =
null, Custom Event ID: -1, Message: VM lnd-ion-windev-04 is not =
responding.<o:p class=3D""></o:p></span></div><div
style=3D"margin: 0cm =
0cm 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" =
class=3D""><span lang=3D"EN-US"
class=3D"">2017-01-29 07:14:27,104 =
INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] =
(DefaultQuartzScheduler1) [53ca8dc5] VM =
'5af875ad-70f9-4f49-9640-ee2b9927348b'(lnd-anv9-sup1) moved from 'Up' =
--> 'NotResponding'<o:p
class=3D""></o:p></span></div><div =
style=3D"margin: 0cm 0cm 0.0001pt; font-size: 11pt; font-family: =
Calibri, sans-serif;" class=3D""><span lang=3D"EN-US" =
class=3D"">2017-01-29 07:14:27,121 WARN =
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] =
(DefaultQuartzScheduler1) [53ca8dc5] Correlation ID: null, Call Stack: =
null, Custom Event ID: -1, Message: VM lnd-anv9-sup1 is not =
responding.<o:p class=3D""></o:p></span></div><div
style=3D"margin: 0cm =
0cm 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" =
class=3D""><span lang=3D"EN-US"
class=3D"">2017-01-29 07:14:27,121 =
INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] =
(DefaultQuartzScheduler1) [53ca8dc5] VM =
'b3b7c5f3-0b5b-4d8f-9cc8-b758cc1ce3b9'(lnd-db-dev-03) moved from 'Up' =
--> 'NotResponding'<o:p
class=3D""></o:p></span></div><div =
style=3D"margin: 0cm 0cm 0.0001pt; font-size: 11pt; font-family: =
Calibri, sans-serif;" class=3D""><span lang=3D"EN-US" =
class=3D"">2017-01-29 07:14:27,136 WARN =
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] =
(DefaultQuartzScheduler1) [53ca8dc5] Correlation ID: null, Call Stack: =
null, Custom Event ID: -1, Message: VM lnd-db-dev-03 is not =
responding.<o:p class=3D""></o:p></span></div><div
style=3D"margin: 0cm =
0cm 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" =
class=3D""><span lang=3D"EN-US"
class=3D"">2017-01-29 07:14:27,137 =
INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] =
(DefaultQuartzScheduler1) [53ca8dc5] VM =
'6c0a6e17-47c3-4464-939b-e83984dbeaa6'(lnd-db-dev-04) moved from 'Up' =
--> 'NotResponding'<o:p
class=3D""></o:p></span></div><div =
style=3D"margin: 0cm 0cm 0.0001pt; font-size: 11pt; font-family: =
Calibri, sans-serif;" class=3D""><span lang=3D"EN-US" =
class=3D"">2017-01-29 07:14:27,167 WARN =
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] =
(DefaultQuartzScheduler1) [53ca8dc5] Correlation ID: null, Call Stack: =
null, Custom Event ID: -1, Message: VM lnd-db-dev-04 is not =
responding.<o:p class=3D""></o:p></span></div><div
style=3D"margin: 0cm =
0cm 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" =
class=3D""><span lang=3D"EN-US"
class=3D"">2017-01-29 07:14:27,168 =
INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] =
(DefaultQuartzScheduler1) [53ca8dc5] VM =
'ab15bb08-1244-4dc1-a4f1-f6e94246aa23'(lnd-ion-lindev-05) moved from =
'Up' --> 'NotResponding'<o:p
class=3D""></o:p></span></div><div =
style=3D"margin: 0cm 0cm 0.0001pt; font-size: 11pt; font-family: =
Calibri, sans-serif;" class=3D""><span lang=3D"EN-US" =
class=3D""> <o:p
class=3D""></o:p></span></div><div style=3D"margin:
=
0cm 0cm 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" =
class=3D""><span lang=3D"EN-US"
class=3D""> <o:p =
class=3D""></o:p></span></div><div style=3D"margin:
0cm 0cm 0.0001pt; =
font-size: 11pt; font-family: Calibri, sans-serif;" class=3D""><span
=
lang=3D"EN-US" class=3D"">Checking the vdsm logs this morning on
the =
hosts I see a lot of the following messages:<o:p =
class=3D""></o:p></span></div><div style=3D"margin:
0cm 0cm 0.0001pt; =
font-size: 11pt; font-family: Calibri, sans-serif;" class=3D""><span
=
lang=3D"EN-US" class=3D""> <o:p
class=3D""></o:p></span></div><div =
style=3D"margin: 0cm 0cm 0.0001pt; font-size: 11pt; font-family: =
Calibri, sans-serif;" class=3D""><span lang=3D"EN-US" =
class=3D"">jsonrpc.Executor/0::WARNING::2017-01-30 =
09:34:15,989::vm::4890::virt.vm::(_setUnresponsiveIfTimeout) =
vmId=3D`ab15bb08-1244-4dc1-a4f1-f6e94246aa23`::monitor became =
unresponsive (command timeout, age=3D94854.48)<o:p =
class=3D""></o:p></span></div><div style=3D"margin:
0cm 0cm 0.0001pt; =
font-size: 11pt; font-family: Calibri, sans-serif;" class=3D""><span
=
lang=3D"EN-US" class=3D"">jsonrpc.Executor/0::WARNING::2017-01-30
=
09:34:15,990::vm::4890::virt.vm::(_setUnresponsiveIfTimeout) =
vmId=3D`20a51347-ef08-47a9-9982-32b2047991e1`::monitor became =
unresponsive (command timeout, age=3D94854.48)<o:p =
class=3D""></o:p></span></div><div style=3D"margin:
0cm 0cm 0.0001pt; =
font-size: 11pt; font-family: Calibri, sans-serif;" class=3D""><span
=
lang=3D"EN-US" class=3D"">jsonrpc.Executor/0::WARNING::2017-01-30
=
09:34:15,991::vm::4890::virt.vm::(_setUnresponsiveIfTimeout) =
vmId=3D`2cd8698d-a0f9-43b7-9a89-92a93e920eb7`::monitor became =
unresponsive (command timeout, age=3D94854.49)<o:p =
class=3D""></o:p></span></div><div style=3D"margin:
0cm 0cm 0.0001pt; =
font-size: 11pt; font-family: Calibri, sans-serif;" class=3D""><span
=
lang=3D"EN-US" class=3D"">jsonrpc.Executor/0::WARNING::2017-01-30
=
09:34:15,992::vm::4890::virt.vm::(_setUnresponsiveIfTimeout) =
vmId=3D`5af875ad-70f9-4f49-9640-ee2b9927348b`::monitor became =
unresponsive (command timeout, age=3D94854.49)<o:p =
class=3D""></o:p></span></div><div style=3D"margin:
0cm 0cm 0.0001pt; =
font-size: 11pt; font-family: Calibri, sans-serif;" class=3D""><span
=
lang=3D"EN-US" class=3D""> <o:p
class=3D""></o:p></span></div><div =
style=3D"margin: 0cm 0cm 0.0001pt; font-size: 11pt; font-family: =
Calibri, sans-serif;" class=3D""><span lang=3D"EN-US"
class=3D"">and<o:p =
class=3D""></o:p></span></div><div style=3D"margin:
0cm 0cm 0.0001pt; =
font-size: 11pt; font-family: Calibri, sans-serif;" class=3D""><span
=
lang=3D"EN-US" class=3D""> <o:p
class=3D""></o:p></span></div><div =
style=3D"margin: 0cm 0cm 0.0001pt; font-size: 11pt; font-family: =
Calibri, sans-serif;" class=3D""><span lang=3D"EN-US" =
class=3D"">vdsm.Scheduler::WARNING::2017-01-30 =
09:36:36,444::periodic::212::virt.periodic.Operation::(_dispatch) could =
not run <VmDispatcher operation=3D<class =
'vdsm.virt.periodic.DriveWatermarkMonitor'> at 0x295bd50>, =
executor queue full<o:p
class=3D""></o:p></span></div><div =
style=3D"margin: 0cm 0cm 0.0001pt; font-size: 11pt; font-family: =
Calibri, sans-serif;" class=3D""><span lang=3D"EN-US" =
class=3D"">vdsm.Scheduler::WARNING::2017-01-30 =
09:36:38,446::periodic::212::virt.periodic.Operation::(_dispatch) could =
not run <VmDispatcher operation=3D<class =
'vdsm.virt.periodic.DriveWatermarkMonitor'> at 0x295bd50>, =
executor queue full<o:p
class=3D""></o:p></span></div><div =
style=3D"margin: 0cm 0cm 0.0001pt; font-size: 11pt; font-family: =
Calibri, sans-serif;" class=3D""><span lang=3D"EN-US" =
class=3D"">vdsm.Scheduler::WARNING::2017-01-30 =
09:36:38,627::periodic::212::virt.periodic.Operation::(_dispatch) could =
not run <vdsm.virt.sampling.HostMonitor object at 0x295bdd0>, =
executor queue full<o:p
class=3D""></o:p></span></div><div =
style=3D"margin: 0cm 0cm 0.0001pt; font-size: 11pt; font-family: =
Calibri, sans-serif;" class=3D""><span lang=3D"EN-US" =
class=3D"">vdsm.Scheduler::WARNING::2017-01-30 =
09:36:38,707::periodic::212::virt.periodic.Operation::(_dispatch) could =
not run <vdsm.virt.sampling.VMBulkSampler object at 0x295ba90>, =
executor queue full<o:p
class=3D""></o:p></span></div><div =
style=3D"margin: 0cm 0cm 0.0001pt; font-size: 11pt; font-family: =
Calibri, sans-serif;" class=3D""><span lang=3D"EN-US" =
class=3D"">vdsm.Scheduler::WARNING::2017-01-30 =
09:36:38,929::periodic::212::virt.periodic.Operation::(_dispatch) could =
not run <VmDispatcher operation=3D<class =
'vdsm.virt.periodic.BlockjobMonitor'> at 0x295ba10>, executor =
queue full<o:p class=3D""></o:p></span></div><div
style=3D"margin: 0cm =
0cm 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" =
class=3D""><span lang=3D"EN-US" =
class=3D"">vdsm.Scheduler::WARNING::2017-01-30 =
09:36:40,450::periodic::212::virt.periodic.Operation::(_dispatch) could =
not run <VmDispatcher operation=3D<class =
'vdsm.virt.periodic.DriveWatermarkMonitor'> at 0x295bd50>, =
executor queue full<o:p
class=3D""></o:p></span></div><div =
style=3D"margin: 0cm 0cm 0.0001pt; font-size: 11pt; font-family: =
Calibri, sans-serif;" class=3D""><span lang=3D"EN-US" =
class=3D"">vdsm.Scheduler::WARNING::2017-01-30 =
09:36:42,451::periodic::212::virt.periodic.Operation::(_dispatch) could =
not run <VmDispatcher operation=3D<class =
'vdsm.virt.periodic.DriveWatermarkMonitor'> at 0x295bd50>, =
executor queue full<o:p
class=3D""></o:p></span></div><div =
style=3D"margin: 0cm 0cm 0.0001pt; font-size: 11pt; font-family: =
Calibri, sans-serif;" class=3D""><span lang=3D"EN-US" =
class=3D"">vdsm.Scheduler::WARNING::2017-01-30 =
09:36:44,452::periodic::212::virt.periodic.Operation::(_dispatch) could =
not run <VmDispatcher operation=3D<class =
'vdsm.virt.periodic.DriveWatermarkMonitor'> at 0x295bd50>, =
executor queue full<o:p
class=3D""></o:p></span></div><div =
style=3D"margin: 0cm 0cm 0.0001pt; font-size: 11pt; font-family: =
Calibri, sans-serif;" class=3D""><span lang=3D"EN-US" =
class=3D""> <o:p
class=3D""></o:p></span></div><div style=3D"margin:
=
0cm 0cm 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" =
class=3D""><span lang=3D"EN-US"
class=3D"">I=E2=80=99ve also attached =
logs from time period for one of the hosts in question. This host is in =
a single node DC and cluster with iSCSI shared storage. I=E2=80=99ve had =
to make the time window on the logs quite small to fit within the mail =
size limit. Let me know if you need anything more specific.<o:p =
class=3D""></o:p></span></div><div style=3D"margin:
0cm 0cm 0.0001pt; =
font-size: 11pt; font-family: Calibri, sans-serif;" class=3D""><span
=
lang=3D"EN-US" class=3D""> <o:p
class=3D""></o:p></span></div><div =
style=3D"margin: 0cm 0cm 0.0001pt; font-size: 11pt; font-family: =
Calibri, sans-serif;" class=3D""><span lang=3D"EN-US"
class=3D"">Many =
Thanks,<o:p class=3D""></o:p></span></div><div
style=3D"margin: 0cm 0cm =
0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" =
class=3D""><span lang=3D"EN-US"
class=3D"">Mark<o:p =
class=3D""></o:p></span></div></div><span
style=3D"font-family: =
Helvetica; font-size: 12px; font-style: normal; font-variant-caps: =
normal; font-weight: normal; letter-spacing: normal; orphans: auto; =
text-align: start; text-indent: 0px; text-transform: none; white-space: =
normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; =
background-color: rgb(255, 255, 255); float: none; display: inline =
!important;" =
class=3D"">_______________________________________________</span><br
=
style=3D"font-family: Helvetica; font-size: 12px; font-style: normal; =
font-variant-caps: normal; font-weight: normal; letter-spacing: normal; =
orphans: auto; text-align: start; text-indent: 0px; text-transform: =
none; white-space: normal; widows: auto; word-spacing: 0px; =
-webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255);" =
class=3D""><span style=3D"font-family: Helvetica; font-size: 12px; =
font-style: normal; font-variant-caps: normal; font-weight: normal; =
letter-spacing: normal; orphans: auto; text-align: start; text-indent: =
0px; text-transform: none; white-space: normal; widows: auto; =
word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: =
rgb(255, 255, 255); float: none; display: inline !important;" =
class=3D"">Users mailing list</span><br style=3D"font-family:
Helvetica; =
font-size: 12px; font-style: normal; font-variant-caps: normal; =
font-weight: normal; letter-spacing: normal; orphans: auto; text-align: =
start; text-indent: 0px; text-transform: none; white-space: normal; =
widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; =
background-color: rgb(255, 255, 255);" class=3D""><a =
href=3D"mailto:Users@ovirt.org" style=3D"color: purple; text-decoration: =
underline; font-family: Helvetica; font-size: 12px; font-style: normal; =
font-variant-caps: normal; font-weight: normal; letter-spacing: normal; =
orphans: auto; text-align: start; text-indent: 0px; text-transform: =
none; white-space: normal; widows: auto; word-spacing: 0px; =
-webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; =
background-color: rgb(255, 255, 255);"
class=3D"">Users(a)ovirt.org</a><br =
style=3D"font-family: Helvetica; font-size: 12px; font-style: normal; =
font-variant-caps: normal; font-weight: normal; letter-spacing: normal; =
orphans: auto; text-align: start; text-indent: 0px; text-transform: =
none; white-space: normal; widows: auto; word-spacing: 0px; =
-webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255);" =
class=3D""><a
href=3D"http://lists.ovirt.org/mailman/listinfo/users" =
style=3D"color: purple; text-decoration: underline; font-family: =
Helvetica; font-size: 12px; font-style: normal; font-variant-caps: =
normal; font-weight: normal; letter-spacing: normal; orphans: auto; =
text-align: start; text-indent: 0px; text-transform: none; white-space: =
normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; =
-webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255);" =
class=3D"">http://lists.ovirt.org/mailman/listinfo/users<... =
style=3D"font-family: Helvetica; font-size: 12px; font-style: normal; =
font-variant-caps: normal; font-weight: normal; letter-spacing: normal; =
orphans: auto; text-align: start; text-indent: 0px; text-transform: =
none; white-space: normal; widows: auto; word-spacing: 0px; =
-webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255);" =
class=3D""></div></blockquote></div><br
class=3D""></body></html>=
--Apple-Mail=_0F75B640-95EC-4CBD-8AE7-5347CDE34B1F--