[ovirt-users] Ongoing "VM is not responding" and "ETL sampling errors".

J Brian Ismay jismay at cenic.org
Wed Aug 9 17:54:16 UTC 2017


Hello the list,

I have been slowly bringing up a 9-node cluster for the last few months.

All nodes are identical, dual 2-port 10G nics, lots of memory and CPU
Storage is a Netapp Filer accessed via NFS on a dedicated 10Gb 
dual-switch environment.

Generally everything is working fine, but ever since our last rebuild of 
the cluster in preperation for a move into production status we have 
been getting repeated errors showing in the HostedEngine console:

VM foo is not responding.
VM bar is not responding.
VM baz is not responding.

These errors happen on a fairly regular basis, and generally are 
multiple VMs all being hosted by different nodes. When errors occur I 
also lose external connectivity to the VM in question, both via its 
service IP address and via the ovirt console. The actual outages appear 
to generally last 15-20 seconds and then things recover and go back to 
normal.

We are also getting much more frequent errors:

ETL service sampling has encountered an error. Please consult the 
service log for more details.

I have attached snippets from the Engine engine.log from this morning. 
If any other logs are needed for to help diagnosis I can provide them.

--
Brian Ismay
SR. Systems Administrator
jismay at cenic.org

----
engine.log: NOTE, the system clock is in UTC, local time is PDT, so this 
occurred at 07:48AM local time.

2017-08-09 14:48:37,237 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(DefaultQuartzScheduler6) [2cea1ef7] VM 
'69880324-2d2e-4a70-8071-4ae0f0ae342e'(vm1) moved from 'Up' --> 
'NotResponding'
2017-08-09 14:48:37,277 WARN  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler6) [2cea1ef7] Correlation ID: null, Call Stack: 
null, Custom Event ID: -1, Message: VM vm1 is not responding.
2017-08-09 14:48:37,277 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(DefaultQuartzScheduler6) [2cea1ef7] VM 
'4471e3ee-9f69-4903-b68f-c1293aea047f'(vm2) moved from 'Up' --> 
'NotResponding'
2017-08-09 14:48:37,282 WARN  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler6) [2cea1ef7] Correlation ID: null, Call Stack: 
null, Custom Event ID: -1, Message: VM vm2 is not responding.
2017-08-09 14:48:38,326 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(DefaultQuartzScheduler5) [cf129f7] VM 
'35fd4afa-12a1-4326-9db5-a86939a01fa8'(vm3) moved from 'Up' --> 
'NotResponding'
2017-08-09 14:48:38,360 WARN  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler5) [cf129f7] Correlation ID: null, Call Stack: 
null, Custom Event ID: -1, Message: VM vm3 is not responding.
2017-08-09 14:48:38,360 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(DefaultQuartzScheduler5) [cf129f7] VM 
'd83e9633-3597-4046-95ee-2a166682b85e'(vm4) moved from 'Up' --> 
'NotResponding'
2017-08-09 14:48:38,365 WARN  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler5) [cf129f7] Correlation ID: null, Call Stack: 
null, Custom Event ID: -1, Message: VM vm4 is not responding.
2017-08-09 14:48:49,075 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(DefaultQuartzScheduler8) [3b1149ff] VM 
'd41984d0-4418-4991-9af0-25593abac976'(vm5) moved from 'Up' --> 
'NotResponding'
2017-08-09 14:48:49,130 WARN  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler8) [3b1149ff] Correlation ID: null, Call Stack: 
null, Custom Event ID: -1, Message: VM vm5 is not responding.
2017-08-09 14:48:49,131 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(DefaultQuartzScheduler8) [3b1149ff] VM 
'ed87b37d-5b79-4105-ba89-29a59361eb4e'(vm6) moved from 'Up' --> 
'NotResponding'
2017-08-09 14:48:49,136 WARN  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler8) [3b1149ff] Correlation ID: null, Call Stack: 
null, Custom Event ID: -1, Message: VM vm6 is not responding.
2017-08-09 14:48:52,221 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(DefaultQuartzScheduler7) [2973c87] VM 
'506980f4-6764-4cc6-bb20-c1956d8ed201'(vm7) moved from 'Up' --> 
'NotResponding'
2017-08-09 14:48:52,226 WARN  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler7) [2973c87] Correlation ID: null, Call Stack: 
null, Custom Event ID: -1, Message: VM vm7 is not responding.
2017-08-09 14:48:52,299 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(DefaultQuartzScheduler6) [2cea1ef7] VM 
'69880324-2d2e-4a70-8071-4ae0f0ae342e'(vm1) moved from 'NotResponding' 
--> 'Up'
2017-08-09 14:48:52,300 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(DefaultQuartzScheduler6) [2cea1ef7] VM 
'4471e3ee-9f69-4903-b68f-c1293aea047f'(vm2) moved from 'NotResponding' 
--> 'Up'
2017-08-09 14:48:53,373 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(DefaultQuartzScheduler5) [cf129f7] VM 
'638b2aab-e4f7-43e0-a2a8-95c75813e669'(vm8) moved from 'Up' --> 
'NotResponding'
2017-08-09 14:48:53,379 WARN  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler5) [cf129f7] Correlation ID: null, Call Stack: 
null, Custom Event ID: -1, Message: VM vm8 is not responding.
2017-08-09 14:48:54,380 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(DefaultQuartzScheduler6) [2cea1ef7] VM 
'35fd4afa-12a1-4326-9db5-a86939a01fa8'(vm3) moved from 'NotResponding' 
--> 'Up'
2017-08-09 14:48:54,381 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(DefaultQuartzScheduler6) [2cea1ef7] VM 
'd83e9633-3597-4046-95ee-2a166682b85e'(vm4) moved from 'NotResponding' 
--> 'Up'
2017-08-09 14:49:04,197 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(DefaultQuartzScheduler7) [2973c87] VM 
'd41984d0-4418-4991-9af0-25593abac976'(vm5) moved from 'NotResponding' 
--> 'Up'
2017-08-09 14:49:04,198 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(DefaultQuartzScheduler7) [2973c87] VM 
'ed87b37d-5b79-4105-ba89-29a59361eb4e'(vm6) moved from 'NotResponding' 
--> 'Up'
2017-08-09 14:49:07,293 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(DefaultQuartzScheduler8) [3b1149ff] VM 
'506980f4-6764-4cc6-bb20-c1956d8ed201'(vm7) moved from 'NotResponding' 
--> 'Up'
2017-08-09 14:49:09,388 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(DefaultQuartzScheduler7) [2973c87] VM 
'638b2aab-e4f7-43e0-a2a8-95c75813e669'(vm8) moved from 'NotResponding' 
--> 'Up'


More information about the Users mailing list