[ovirt-users] ovirt 3.6, we had the ovirt manager go down in a bad way and all VMs for one node marked Unknown and Not Reponding while up

Christopher Cox ccox at endlessnow.com
Thu Jan 25 22:12:53 UTC 2018


On 01/25/2018 02:25 PM, Douglas Landgraf wrote:
> On Wed, Jan 24, 2018 at 10:18 AM, Christopher Cox <ccox at endlessnow.com> wrote:
>> Would restarting vdsm on the node in question help fix this?  Again, all the
>> VMs are up on the node.  Prior attempts to fix this problem have left the
>> node in a state where I can issue the "has been rebooted" command to it,
>> it's confused.
>>
>> So... node is up.  All VMs are up.  Can't issue "has been rebooted" to the
>> node, all VMs show Unknown and not responding but they are up.
>>
>> Chaning the status is the ovirt db to 0 works for a second and then it goes
>> immediately back to 8 (which is why I'm wondering if I should restart vdsm
>> on the node).
> 
> It's not recommended to change db manually.
> 
>>
>> Oddly enough, we're running all of this in production.  So, watching it all
>> go down isn't the best option for us.
>>
>> Any advice is welcome.
> 
> 
> We would need to see the node/engine logs, have you found any error in
> the vdsm.log
> (from nodes) or engine.log? Could you please share the error?


In short, the error is our ovirt manager lost network (our problem) and 
crashed hard (hardware issue on the server)..  On bring up, we had some 
network changes (that caused the lost network problem) so our LACP bond 
was down for a bit while we were trying to bring it up (noting the ovirt 
manager is up while we're reestablishing the network on the switch side).

In other word, that's the "error" so to speak that got us to where we are.

Full DEBUG enabled on the logs... The error messages seem obvious to 
me.. starts like this (nothing the ISO DOMAIN was coming off an NFS 
mount off the ovirt management server... yes... we know... we do have 
plans to move that).

So on the hypervisor node itself, from the vdsm.log (vdsm.log.33.xz):

(hopefully no surprise here)

Thread-2426633::WARNING::2018-01-23 
13:50:56,672::fileSD::749::Storage.scanDomains::(collectMetaFiles) Could 
not collect metadata file for domain path 
/rhev/data-center/mnt/d0lppc129.skopos.me:_var_lib_exports_iso-20160408002844
Traceback (most recent call last):
   File "/usr/share/vdsm/storage/fileSD.py", line 735, in collectMetaFiles
     sd.DOMAIN_META_DATA))
   File "/usr/share/vdsm/storage/outOfProcess.py", line 121, in glob
     return self._iop.glob(pattern)
   File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 
536, in glob
     return self._sendCommand("glob", {"pattern": pattern}, self.timeout)
   File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 
421, in _sendCommand
     raise Timeout(os.strerror(errno.ETIMEDOUT))
Timeout: Connection timed out
Thread-27::ERROR::2018-01-23 
13:50:56,672::sdc::145::Storage.StorageDomainCache::(_findDomain) domain 
e5ecae2f-5a06-4743-9a43-e74d83992c35 not found
Traceback (most recent call last):
   File "/usr/share/vdsm/storage/sdc.py", line 143, in _findDomain
     dom = findMethod(sdUUID)
   File "/usr/share/vdsm/storage/nfsSD.py", line 122, in findDomain
     return NfsStorageDomain(NfsStorageDomain.findDomainPath(sdUUID))
   File "/usr/share/vdsm/storage/nfsSD.py", line 112, in findDomainPath
     raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist: 
(u'e5ecae2f-5a06-4743-9a43-e74d83992c35',)
Thread-27::ERROR::2018-01-23 
13:50:56,673::monitor::276::Storage.Monitor::(_monitorDomain) Error 
monitoring domain e5ecae2f-5a06-4743-9a43-e74d83992c35
Traceback (most recent call last):
   File "/usr/share/vdsm/storage/monitor.py", line 272, in _monitorDomain
     self._performDomainSelftest()
   File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 769, in 
wrapper
     value = meth(self, *a, **kw)
   File "/usr/share/vdsm/storage/monitor.py", line 339, in 
_performDomainSelftest
     self.domain.selftest()
   File "/usr/share/vdsm/storage/sdc.py", line 49, in __getattr__
     return getattr(self.getRealDomain(), attrName)
   File "/usr/share/vdsm/storage/sdc.py", line 52, in getRealDomain
     return self._cache._realProduce(self._sdUUID)
   File "/usr/share/vdsm/storage/sdc.py", line 124, in _realProduce
     domain = self._findDomain(sdUUID)
   File "/usr/share/vdsm/storage/sdc.py", line 143, in _findDomain
     dom = findMethod(sdUUID)
   File "/usr/share/vdsm/storage/nfsSD.py", line 122, in findDomain
     return NfsStorageDomain(NfsStorageDomain.findDomainPath(sdUUID))
   File "/usr/share/vdsm/storage/nfsSD.py", line 112, in findDomainPath
     raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist: 
(u'e5ecae2f-5a06-4743-9a43-e74d83992c35',)


Again, all the hypervisor nodes will complain about having the NFS area 
for ISO DOMAIN now gone.  Remember the ovirt manager node held this and 
it has now network has gone out and the node crashed (note: the ovirt 
node (the actual server box) shouldn't crash due to the network outage, 
but it did.

So here is the engine collapse as it lost network connectivity (before 
the server actually crashed hard).

2018-01-23 13:45:33,666 ERROR 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler_Worker-87) [] Correlation ID: null, Call Stack: 
null, Custom Event ID: -1, Message: VDSM d0lppn067 command failed: 
Heartbeat exeeded
2018-01-23 13:45:33,666 ERROR 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler_Worker-10) [21574461] Correlation ID: null, Call 
Stack: null, Custom Event ID: -1, Message: VDSM d0lppn072 command 
failed: Heartbeat exeeded
2018-01-23 13:45:33,666 ERROR 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler_Worker-37) [4e8ec41d] Correlation ID: null, Call 
Stack: null, Custom Event ID: -1, Message: VDSM d0lppn066 command 
failed: Heartbeat exeeded
2018-01-23 13:45:33,667 ERROR 
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetStatsVDSCommand] 
(DefaultQuartzScheduler_Worker-87) [] Command 
'GetStatsVDSCommand(HostName = d0lppn067, 
VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', 
hostId='f99c68c8-b0e8-437b-8cd9-ebaddaaede96', 
vds='Host[d0lppn067,f99c68c8-b0e8-437b-8cd9-ebaddaaede96]'})' execution 
failed: VDSGenericException: VDSNetworkException: Heartbeat exeeded
2018-01-23 13:45:33,667 ERROR 
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetStatsVDSCommand] 
(DefaultQuartzScheduler_Worker-10) [21574461] Command 
'GetStatsVDSCommand(HostName = d0lppn072, 
VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', 
hostId='fdc00296-973d-4268-bd79-6dac535974e0', 
vds='Host[d0lppn072,fdc00296-973d-4268-bd79-6dac535974e0]'})' execution 
failed: VDSGenericException: VDSNetworkException: Heartbeat exeeded
2018-01-23 13:45:33,667 ERROR 
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetStatsVDSCommand] 
(DefaultQuartzScheduler_Worker-37) [4e8ec41d] Command 
'GetStatsVDSCommand(HostName = d0lppn066, 
VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', 
hostId='14abf559-4b62-4ebd-a345-77fa9e1fa3ae', 
vds='Host[d0lppn066,14abf559-4b62-4ebd-a345-77fa9e1fa3ae]'})' execution 
failed: VDSGenericException: VDSNetworkException: Heartbeat exeeded
2018-01-23 13:45:33,669 ERROR 
[org.ovirt.engine.core.vdsbroker.HostMonitoring] 
(DefaultQuartzScheduler_Worker-87) []  Failed getting vds stats, 
vds='d0lppn067'(f99c68c8-b0e8-437b-8cd9-ebaddaaede96): 
org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: 
VDSGenericException: VDSNetworkException: Heartbeat exeeded
2018-01-23 13:45:33,669 ERROR 
[org.ovirt.engine.core.vdsbroker.HostMonitoring] 
(DefaultQuartzScheduler_Worker-10) [21574461]  Failed getting vds stats, 
  vds='d0lppn072'(fdc00296-973d-4268-bd79-6dac535974e0): 
org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: 
VDSGenericException: VDSNetworkException: Heartbeat exeeded
2018-01-23 13:45:33,669 ERROR 
[org.ovirt.engine.core.vdsbroker.HostMonitoring] 
(DefaultQuartzScheduler_Worker-37) [4e8ec41d]  Failed getting vds stats, 
  vds='d0lppn066'(14abf559-4b62-4ebd-a345-77fa9e1fa3ae): 
org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: 
VDSGenericException: VDSNetworkException: Heartbeat exeeded
2018-01-23 13:45:33,671 ERROR 
[org.ovirt.engine.core.vdsbroker.HostMonitoring] 
(DefaultQuartzScheduler_Worker-10) [21574461] Failure to refresh Vds 
runtime info: VDSGenericException: VDSNetworkException: Heartbeat exeeded
2018-01-23 13:45:33,671 ERROR 
[org.ovirt.engine.core.vdsbroker.HostMonitoring] 
(DefaultQuartzScheduler_Worker-37) [4e8ec41d] Failure to refresh Vds 
runtime info: VDSGenericException: VDSNetworkException: Heartbeat exeeded
2018-01-23 13:45:33,671 ERROR 
[org.ovirt.engine.core.vdsbroker.HostMonitoring] 
(DefaultQuartzScheduler_Worker-87) [] Failure to refresh Vds runtime 
info: VDSGenericException: VDSNetworkException: Heartbeat exeeded
2018-01-23 13:45:33,671 ERROR 
[org.ovirt.engine.core.vdsbroker.HostMonitoring] 
(DefaultQuartzScheduler_Worker-37) [4e8ec41d] Exception: 
org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: 
VDSGenericException: VDSNetworkException: Heartbeat exeeded
         at 
org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.proceedProxyReturnValue(BrokerCommandBase.java:188) 
[vdsbroker.jar:]
         at 
org.ovirt.engine.core.vdsbroker.vdsbroker.GetStatsVDSCommand.executeVdsBrokerCommand(GetStatsVDSCommand.java:21) 
[vdsbroker.jar:]
         at 
org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:110) 
[vdsbroker.jar:]
         at 
org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:65) 
[vdsbroker.jar:]
         at 
org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33) 
[dal.jar:]
         at 
org.ovirt.engine.core.vdsbroker.ResourceManager.runVdsCommand(ResourceManager.java:467) 
[vdsbroker.jar:]
         at 
org.ovirt.engine.core.vdsbroker.HostMonitoring.refreshVdsStats(HostMonitoring.java:472) 
[vdsbroker.jar:]
         at 
org.ovirt.engine.core.vdsbroker.HostMonitoring.refreshVdsRunTimeInfo(HostMonitoring.java:114) 
[vdsbroker.jar:]
         at 
org.ovirt.engine.core.vdsbroker.HostMonitoring.refresh(HostMonitoring.java:84) 
[vdsbroker.jar:]
         at 
org.ovirt.engine.core.vdsbroker.VdsManager.onTimer(VdsManager.java:227) 
[vdsbroker.jar:]
         at sun.reflect.GeneratedMethodAccessor75.invoke(Unknown Source) 
[:1.8.0_102]
         at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
[rt.jar:1.8.0_102]
         at java.lang.reflect.Method.invoke(Method.java:498) 
[rt.jar:1.8.0_102]
         at 
org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod(JobWrapper.java:81) 
[scheduler.jar:]
         at 
org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:52) 
[scheduler.jar:]
         at org.quartz.core.JobRunShell.run(JobRunShell.java:213) 
[quartz.jar:]
         at 
org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557) 
[quartz.jar:]

2018-01-23 13:45:33,671 ERROR 
[org.ovirt.engine.core.vdsbroker.HostMonitoring] 
(DefaultQuartzScheduler_Worker-10) [21574461] Exception: 
org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: 
VDSGenericException: VDSNetworkException: Heartbeat exeeded
         at 
org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.proceedProxyReturnValue(BrokerCommandBase.java:188) 
[vdsbroker.jar:]
         at 
org.ovirt.engine.core.vdsbroker.vdsbroker.GetStatsVDSCommand.executeVdsBrokerCommand(GetStatsVDSCommand.java:21) 
[vdsbroker.jar:]
         at 
org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:110) 
[vdsbroker.jar:]
         at 
org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:65) 
[vdsbroker.jar:]
         at 
org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33) 
[dal.jar:]
         at 
org.ovirt.engine.core.vdsbroker.ResourceManager.runVdsCommand(ResourceManager.java:467) 
[vdsbroker.jar:]
         at 
org.ovirt.engine.core.vdsbroker.HostMonitoring.refreshVdsStats(HostMonitoring.java:472) 
[vdsbroker.jar:]
         at 
org.ovirt.engine.core.vdsbroker.HostMonitoring.refreshVdsRunTimeInfo(HostMonitoring.java:114) 
[vdsbroker.jar:]
         at 
org.ovirt.engine.core.vdsbroker.HostMonitoring.refresh(HostMonitoring.java:84) 
[vdsbroker.jar:]
         at 
org.ovirt.engine.core.vdsbroker.VdsManager.onTimer(VdsManager.java:227) 
[vdsbroker.jar:]
         at sun.reflect.GeneratedMethodAccessor75.invoke(Unknown Source) 
[:1.8.0_102]
         at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
[rt.jar:1.8.0_102]
         at java.lang.reflect.Method.invoke(Method.java:498) 
[rt.jar:1.8.0_102]
         at 
org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod(JobWrapper.java:81) 
[scheduler.jar:]
         at 
org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:52) 
[scheduler.jar:]
         at org.quartz.core.JobRunShell.run(JobRunShell.java:213) 
[quartz.jar:]
         at 
org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557) 
[quartz.jar:]

2018-01-23 13:45:33,671 ERROR 
[org.ovirt.engine.core.vdsbroker.HostMonitoring] 
(DefaultQuartzScheduler_Worker-87) [] Exception: 
org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: 
VDSGenericException: VDSNetworkException: Heartbeat exeeded
         at 
org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.proceedProxyReturnValue(BrokerCommandBase.java:188) 
[vdsbroker.jar:]
         at 
org.ovirt.engine.core.vdsbroker.vdsbroker.GetStatsVDSCommand.executeVdsBrokerCommand(GetStatsVDSCommand.java:21) 
[vdsbroker.jar:]
         at 
org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:110) 
[vdsbroker.jar:]
         at 
org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:65) 
[vdsbroker.jar:]
         at 
org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33) 
[dal.jar:]
         at 
org.ovirt.engine.core.vdsbroker.ResourceManager.runVdsCommand(ResourceManager.java:467) 
[vdsbroker.jar:]
         at 
org.ovirt.engine.core.vdsbroker.HostMonitoring.refreshVdsStats(HostMonitoring.java:472) 
[vdsbroker.jar:]




Here are the engine logs show problem with node d0lppn065, the VMs first 
go to "Unknown" then then "Unknown" plus "not responding":

2018-01-23 14:48:00,712 ERROR 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(org.ovirt.thread.pool-8-thread-28) [] Correlation ID: null, Call Stack: 
org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: 
org.ovirt.vdsm.jsonrpc.client.ClientConnection
Exception: Connection failed
         at 
org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.createNetworkException(VdsBrokerCommand.java:157)
         at 
org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:120)
         at 
org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:65)
         at 
org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33)
         at 
org.ovirt.engine.core.vdsbroker.ResourceManager.runVdsCommand(ResourceManager.java:467)
         at 
org.ovirt.engine.core.vdsbroker.VmsStatisticsFetcher.fetch(VmsStatisticsFetcher.java:27)
         at 
org.ovirt.engine.core.vdsbroker.PollVmStatsRefresher.poll(PollVmStatsRefresher.java:35)
         at sun.reflect.GeneratedMethodAccessor80.invoke(Unknown Source)
         at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
         at java.lang.reflect.Method.invoke(Method.java:498)
         at 
org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod(JobWrapper.java:81)
         at 
org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:52)
         at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
         at 
org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)
Caused by: org.ovirt.vdsm.jsonrpc.client.ClientConnectionException: 
Connection failed
         at 
org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient.connect(ReactorClient.java:155)
         at 
org.ovirt.vdsm.jsonrpc.client.JsonRpcClient.getClient(JsonRpcClient.java:134)
         at 
org.ovirt.vdsm.jsonrpc.client.JsonRpcClient.call(JsonRpcClient.java:81)
         at 
org.ovirt.engine.core.vdsbroker.jsonrpc.FutureMap.<init>(FutureMap.java:70)
         at 
org.ovirt.engine.core.vdsbroker.jsonrpc.JsonRpcVdsServer.getAllVmStats(JsonRpcVdsServer.java:331)
         at 
org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand.executeVdsBrokerCommand(GetAllVmStatsVDSCommand.java:20)
         at 
org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:110)
         ... 12 more
, Custom Event ID: -1, Message: Host d0lppn065 is non responsive.
2018-01-23 14:48:00,713 INFO 
[org.ovirt.engine.core.bll.VdsEventListener] 
(org.ovirt.thread.pool-8-thread-1) [] ResourceManager::vdsNotResponding 
entered for Host '2797cae7-6886-4898-a5e4-23361ce03a90', '10.32.0.65'
2018-01-23 14:48:00,713 WARN 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(org.ovirt.thread.pool-8-thread-36) [] Correlation ID: null, Call Stack: 
null, Custom Event ID: -1, Message: VM vtop3 was set to the Unknown status.

...etc... (sorry about the wraps below)

2018-01-23 14:59:07,817 INFO 
[org.ovirt.engine.core.vdsbroker.VmAnalyzer] 
(DefaultQuartzScheduler_Worker-75) [] VM 
'30f7af86-c2b9-41c3-b2c5-49f5bbdd0e27'(d0lpvd070) moved from 'Up' --> 
'NotResponding'
2018-01-23 14:59:07,819 INFO 
[org.ovirt.engine.core.vdsbroker.VmsStatisticsFetcher] 
(DefaultQuartzScheduler_Worker-74) [] Fetched 15 VMs from VDS 
'8cb119c5-b7f0-48a3-970a-205d96b2e940'
2018-01-23 14:59:07,936 WARN 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler_Worker-75) [] Correlation ID: null, Call Stack: 
null, Custom Event ID: -1, Message: VM d0lpvd070 is not responding.
2018-01-23 14:59:07,939 INFO 
[org.ovirt.engine.core.vdsbroker.VmAnalyzer] 
(DefaultQuartzScheduler_Worker-75) [] VM 
'ebc5bb82-b985-451b-8313-827b5f40eaf3'(d0lpvd039) moved from 'Up' --> 
'NotResponding'
2018-01-23 14:59:08,032 WARN 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler_Worker-75) [] Correlation ID: null, Call Stack: 
null, Custom Event ID: -1, Message: VM d0lpvd039 is not responding.
2018-01-23 14:59:08,038 INFO 
[org.ovirt.engine.core.vdsbroker.VmAnalyzer] 
(DefaultQuartzScheduler_Worker-75) [] VM 
'494c4f9e-1616-476a-8f66-a26a96b76e56'(vtop3) moved from 'Up' --> 
'NotResponding'
2018-01-23 14:59:08,134 WARN 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler_Worker-75) [] Correlation ID: null, Call Stack: 
null, Custom Event ID: -1, Message: VM vtop3 is not responding.
2018-01-23 14:59:08,136 INFO 
[org.ovirt.engine.core.vdsbroker.VmAnalyzer] 
(DefaultQuartzScheduler_Worker-75) [] VM 
'eaeaf73c-d9e2-426e-a2f2-7fcf085137b0'(d0lpvw059) moved from 'Up' --> 
'NotResponding'
2018-01-23 14:59:08,237 WARN 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler_Worker-75) [] Correlation ID: null, Call Stack: 
null, Custom Event ID: -1, Message: VM d0lpvw059 is not responding.
2018-01-23 14:59:08,239 INFO 
[org.ovirt.engine.core.vdsbroker.VmAnalyzer] 
(DefaultQuartzScheduler_Worker-75) [] VM 
'8308a547-37a1-4163-8170-f89b6dc85ba8'(d0lpvm058) moved from 'Up' --> 
'NotResponding'
2018-01-23 14:59:08,326 WARN 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler_Worker-75) [] Correlation ID: null, Call Stack: 
null, Custom Event ID: -1, Message: VM d0lpvm058 is not responding.
2018-01-23 14:59:08,328 INFO 
[org.ovirt.engine.core.vdsbroker.VmAnalyzer] 
(DefaultQuartzScheduler_Worker-75) [] VM 
'3d544926-3326-44e1-8b2a-ec632f51112a'(d0lqva056) moved from 'Up' --> 
'NotResponding'
2018-01-23 14:59:08,400 WARN 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler_Worker-75) [] Correlation ID: null, Call Stack: 
null, Custom Event ID: -1, Message: VM d0lqva056 is not responding.
2018-01-23 14:59:08,402 INFO 
[org.ovirt.engine.core.vdsbroker.VmAnalyzer] 
(DefaultQuartzScheduler_Worker-75) [] VM 
'989e5a17-789d-4eba-8a5e-f74846128842'(d0lpva078) moved from 'Up' --> 
'NotResponding'
2018-01-23 14:59:08,472 WARN 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler_Worker-75) [] Correlation ID: null, Call Stack: 
null, Custom Event ID: -1, Message: VM d0lpva078 is not responding.
2018-01-23 14:59:08,474 INFO 
[org.ovirt.engine.core.vdsbroker.VmAnalyzer] 
(DefaultQuartzScheduler_Worker-75) [] VM 
'050a71c1-9e65-43c6-bdb2-18eba571e2eb'(d0lpvw077) moved from 'Up' --> 
'NotResponding'
2018-01-23 14:59:08,545 WARN 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler_Worker-75) [] Correlation ID: null, Call Stack: 
null, Custom Event ID: -1, Message: VM d0lpvw077 is not responding.
2018-01-23 14:59:08,547 INFO 
[org.ovirt.engine.core.vdsbroker.VmAnalyzer] 
(DefaultQuartzScheduler_Worker-75) [] VM 
'c3b497fd-6181-4dd1-9acf-8e32f981f769'(d0lpva079) moved from 'Up' --> 
'NotResponding'
2018-01-23 14:59:08,621 WARN 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler_Worker-75) [] Correlation ID: null, Call Stack: 
null, Custom Event ID: -1, Message: VM d0lpva079 is not responding.
2018-01-23 14:59:08,623 INFO 
[org.ovirt.engine.core.vdsbroker.VmAnalyzer] 
(DefaultQuartzScheduler_Worker-75) [] VM 
'7cd22b39-feb1-4c6e-8643-ac8fb0578842'(d0lqva034) moved from 'Up' --> 
'NotResponding'
2018-01-23 14:59:08,690 WARN 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler_Worker-75) [] Correlation ID: null, Call Stack: 
null, Custom Event ID: -1, Message: VM d0lqva034 is not responding.
2018-01-23 14:59:08,692 INFO 
[org.ovirt.engine.core.vdsbroker.VmAnalyzer] 
(DefaultQuartzScheduler_Worker-75) [] VM 
'2ab9b1d8-d1e8-4071-a47c-294e586d2fb6'(d0lpvd038) moved from 'Up' --> 
'NotResponding'
2018-01-23 14:59:08,763 WARN 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler_Worker-75) [] Correlation ID: null, Call Stack: 
null, Custom Event ID: -1, Message: VM d0lpvd038 is not responding.
2018-01-23 14:59:08,768 INFO 
[org.ovirt.engine.core.vdsbroker.VmAnalyzer] 
(DefaultQuartzScheduler_Worker-75) [] VM 
'ecb4e795-9eeb-4cdc-a356-c1b9b32af5aa'(d0lqva031) moved from 'Up' --> 
'NotResponding'
2018-01-23 14:59:08,836 WARN 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler_Worker-75) [] Correlation ID: null, Call Stack: 
null, Custom Event ID: -1, Message: VM d0lqva031 is not responding.
2018-01-23 14:59:08,838 INFO 
[org.ovirt.engine.core.vdsbroker.VmAnalyzer] 
(DefaultQuartzScheduler_Worker-75) [] VM 
'1a361727-1607-43d9-bd22-34d45b386d3e'(d0lqva033) moved from 'Up' --> 
'NotResponding'
2018-01-23 14:59:08,911 WARN 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler_Worker-75) [] Correlation ID: null, Call Stack: 
null, Custom Event ID: -1, Message: VM d0lqva033 is not responding.
2018-01-23 14:59:08,913 INFO 
[org.ovirt.engine.core.vdsbroker.VmAnalyzer] 
(DefaultQuartzScheduler_Worker-75) [] VM 
'0cd65f90-719e-429e-a845-f425612d7b14'(vtop4) moved from 'Up' --> 
'NotResponding'
2018-01-23 14:59:08,984 WARN 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler_Worker-75) [] Correlation ID: null, Call Stack: 
null, Custom Event ID: -1, Message: VM vtop4 is not responding.

> 
> Probably it's time to think to upgrade your environment from 3.6.

I know.  But from a production standpoint mid-2016 wasn't that long ago. 
  And 4 was just coming out of beta at the time.

We were upgrading from 3.4 to 3.6.  And it took a long time (again, 
because it's all "live").  Trust me, the move to 4.0 was discussed, it 
was just a timing thing.

With that said, I do "hear you"....and certainly it's being discussed. 
We just don't see a "good" migration path... we see a slow path (moving 
nodes out, upgrading, etc.) and knowing that as with all things, nobody 
can guarantee "success", which would be a very bad thing.  So going from 
working 3.6 to totally (potential) broken 4.2, isn't going to impress 
anyone here, you know?  If all goes according to our best guesses, then 
great, but when things go bad, and the chance is not insignificant, 
well... I'm just not quite prepared with my résumé if you know what I mean.

Don't get me wrong, our move from 3.4 to 3.6 had some similar risks, but 
we also migrated to whole new infrastructure, a luxury we will not have 
this time.  And somehow 3.4 to 3.6 doesn't sound as risky as 3.6 to 4.2.

Is there a path from oVirt to RHEV?  Every bit of help we get helps us 
in making that decision as well, which I think would be a very good 
thing for both of us. (I inherited all this oVirt and I was the "guy" 
doing the 3.4 to 3.6 with the all new infrastructure).


More information about the Users mailing list