[ovirt-users] ovirt 3.6, we had the ovirt manager go down in a bad way and all VMs for one node marked Unknown and Not Reponding while up
Douglas Landgraf
dlandgra at redhat.com
Thu Jan 25 22:57:20 UTC 2018
On Thu, Jan 25, 2018 at 5:12 PM, Christopher Cox <ccox at endlessnow.com> wrote:
> On 01/25/2018 02:25 PM, Douglas Landgraf wrote:
>>
>> On Wed, Jan 24, 2018 at 10:18 AM, Christopher Cox <ccox at endlessnow.com>
>> wrote:
>>>
>>> Would restarting vdsm on the node in question help fix this? Again, all
>>> the
>>> VMs are up on the node. Prior attempts to fix this problem have left the
>>> node in a state where I can issue the "has been rebooted" command to it,
>>> it's confused.
>>>
>>> So... node is up. All VMs are up. Can't issue "has been rebooted" to
>>> the
>>> node, all VMs show Unknown and not responding but they are up.
>>>
>>> Chaning the status is the ovirt db to 0 works for a second and then it
>>> goes
>>> immediately back to 8 (which is why I'm wondering if I should restart
>>> vdsm
>>> on the node).
>>
>>
>> It's not recommended to change db manually.
>>
>>>
>>> Oddly enough, we're running all of this in production. So, watching it
>>> all
>>> go down isn't the best option for us.
>>>
>>> Any advice is welcome.
>>
>>
>>
>> We would need to see the node/engine logs, have you found any error in
>> the vdsm.log
>> (from nodes) or engine.log? Could you please share the error?
>
>
>
> In short, the error is our ovirt manager lost network (our problem) and
> crashed hard (hardware issue on the server).. On bring up, we had some
> network changes (that caused the lost network problem) so our LACP bond was
> down for a bit while we were trying to bring it up (noting the ovirt manager
> is up while we're reestablishing the network on the switch side).
>
> In other word, that's the "error" so to speak that got us to where we are.
>
> Full DEBUG enabled on the logs... The error messages seem obvious to me..
> starts like this (nothing the ISO DOMAIN was coming off an NFS mount off the
> ovirt management server... yes... we know... we do have plans to move that).
>
> So on the hypervisor node itself, from the vdsm.log (vdsm.log.33.xz):
>
> (hopefully no surprise here)
>
> Thread-2426633::WARNING::2018-01-23
> 13:50:56,672::fileSD::749::Storage.scanDomains::(collectMetaFiles) Could not
> collect metadata file for domain path
> /rhev/data-center/mnt/d0lppc129.skopos.me:_var_lib_exports_iso-20160408002844
> Traceback (most recent call last):
> File "/usr/share/vdsm/storage/fileSD.py", line 735, in collectMetaFiles
> sd.DOMAIN_META_DATA))
> File "/usr/share/vdsm/storage/outOfProcess.py", line 121, in glob
> return self._iop.glob(pattern)
> File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 536,
> in glob
> return self._sendCommand("glob", {"pattern": pattern}, self.timeout)
> File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 421,
> in _sendCommand
> raise Timeout(os.strerror(errno.ETIMEDOUT))
> Timeout: Connection timed out
> Thread-27::ERROR::2018-01-23
> 13:50:56,672::sdc::145::Storage.StorageDomainCache::(_findDomain) domain
> e5ecae2f-5a06-4743-9a43-e74d83992c35 not found
> Traceback (most recent call last):
> File "/usr/share/vdsm/storage/sdc.py", line 143, in _findDomain
> dom = findMethod(sdUUID)
> File "/usr/share/vdsm/storage/nfsSD.py", line 122, in findDomain
> return NfsStorageDomain(NfsStorageDomain.findDomainPath(sdUUID))
> File "/usr/share/vdsm/storage/nfsSD.py", line 112, in findDomainPath
> raise se.StorageDomainDoesNotExist(sdUUID)
> StorageDomainDoesNotExist: Storage domain does not exist:
> (u'e5ecae2f-5a06-4743-9a43-e74d83992c35',)
> Thread-27::ERROR::2018-01-23
> 13:50:56,673::monitor::276::Storage.Monitor::(_monitorDomain) Error
> monitoring domain e5ecae2f-5a06-4743-9a43-e74d83992c35
> Traceback (most recent call last):
> File "/usr/share/vdsm/storage/monitor.py", line 272, in _monitorDomain
> self._performDomainSelftest()
> File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 769, in
> wrapper
> value = meth(self, *a, **kw)
> File "/usr/share/vdsm/storage/monitor.py", line 339, in
> _performDomainSelftest
> self.domain.selftest()
> File "/usr/share/vdsm/storage/sdc.py", line 49, in __getattr__
> return getattr(self.getRealDomain(), attrName)
> File "/usr/share/vdsm/storage/sdc.py", line 52, in getRealDomain
> return self._cache._realProduce(self._sdUUID)
> File "/usr/share/vdsm/storage/sdc.py", line 124, in _realProduce
> domain = self._findDomain(sdUUID)
> File "/usr/share/vdsm/storage/sdc.py", line 143, in _findDomain
> dom = findMethod(sdUUID)
> File "/usr/share/vdsm/storage/nfsSD.py", line 122, in findDomain
> return NfsStorageDomain(NfsStorageDomain.findDomainPath(sdUUID))
> File "/usr/share/vdsm/storage/nfsSD.py", line 112, in findDomainPath
> raise se.StorageDomainDoesNotExist(sdUUID)
> StorageDomainDoesNotExist: Storage domain does not exist:
> (u'e5ecae2f-5a06-4743-9a43-e74d83992c35',)
>
>
> Again, all the hypervisor nodes will complain about having the NFS area for
> ISO DOMAIN now gone. Remember the ovirt manager node held this and it has
> now network has gone out and the node crashed (note: the ovirt node (the
> actual server box) shouldn't crash due to the network outage, but it did.
I have added VDSM people in this thread to review it. I am assuming
the network changes (during the crash) still make the storage domain
available for the nodes.
>
> So here is the engine collapse as it lost network connectivity (before the
> server actually crashed hard).
>
> 2018-01-23 13:45:33,666 ERROR
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (DefaultQuartzScheduler_Worker-87) [] Correlation ID: null, Call Stack:
> null, Custom Event ID: -1, Message: VDSM d0lppn067 command failed: Heartbeat
> exeeded
> 2018-01-23 13:45:33,666 ERROR
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (DefaultQuartzScheduler_Worker-10) [21574461] Correlation ID: null, Call
> Stack: null, Custom Event ID: -1, Message: VDSM d0lppn072 command failed:
> Heartbeat exeeded
> 2018-01-23 13:45:33,666 ERROR
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (DefaultQuartzScheduler_Worker-37) [4e8ec41d] Correlation ID: null, Call
> Stack: null, Custom Event ID: -1, Message: VDSM d0lppn066 command failed:
> Heartbeat exeeded
> 2018-01-23 13:45:33,667 ERROR
> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetStatsVDSCommand]
> (DefaultQuartzScheduler_Worker-87) [] Command 'GetStatsVDSCommand(HostName =
> d0lppn067, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true',
> hostId='f99c68c8-b0e8-437b-8cd9-ebaddaaede96',
> vds='Host[d0lppn067,f99c68c8-b0e8-437b-8cd9-ebaddaaede96]'})' execution
> failed: VDSGenericException: VDSNetworkException: Heartbeat exeeded
> 2018-01-23 13:45:33,667 ERROR
> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetStatsVDSCommand]
> (DefaultQuartzScheduler_Worker-10) [21574461] Command
> 'GetStatsVDSCommand(HostName = d0lppn072,
> VdsIdAndVdsVDSCommandParametersBase:{runAsync='true',
> hostId='fdc00296-973d-4268-bd79-6dac535974e0',
> vds='Host[d0lppn072,fdc00296-973d-4268-bd79-6dac535974e0]'})' execution
> failed: VDSGenericException: VDSNetworkException: Heartbeat exeeded
> 2018-01-23 13:45:33,667 ERROR
> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetStatsVDSCommand]
> (DefaultQuartzScheduler_Worker-37) [4e8ec41d] Command
> 'GetStatsVDSCommand(HostName = d0lppn066,
> VdsIdAndVdsVDSCommandParametersBase:{runAsync='true',
> hostId='14abf559-4b62-4ebd-a345-77fa9e1fa3ae',
> vds='Host[d0lppn066,14abf559-4b62-4ebd-a345-77fa9e1fa3ae]'})' execution
> failed: VDSGenericException: VDSNetworkException: Heartbeat exeeded
> 2018-01-23 13:45:33,669 ERROR
> [org.ovirt.engine.core.vdsbroker.HostMonitoring]
> (DefaultQuartzScheduler_Worker-87) [] Failed getting vds stats,
> vds='d0lppn067'(f99c68c8-b0e8-437b-8cd9-ebaddaaede96):
> org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException:
> VDSGenericException: VDSNetworkException: Heartbeat exeeded
> 2018-01-23 13:45:33,669 ERROR
> [org.ovirt.engine.core.vdsbroker.HostMonitoring]
> (DefaultQuartzScheduler_Worker-10) [21574461] Failed getting vds stats,
> vds='d0lppn072'(fdc00296-973d-4268-bd79-6dac535974e0):
> org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException:
> VDSGenericException: VDSNetworkException: Heartbeat exeeded
> 2018-01-23 13:45:33,669 ERROR
> [org.ovirt.engine.core.vdsbroker.HostMonitoring]
> (DefaultQuartzScheduler_Worker-37) [4e8ec41d] Failed getting vds stats,
> vds='d0lppn066'(14abf559-4b62-4ebd-a345-77fa9e1fa3ae):
> org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException:
> VDSGenericException: VDSNetworkException: Heartbeat exeeded
> 2018-01-23 13:45:33,671 ERROR
> [org.ovirt.engine.core.vdsbroker.HostMonitoring]
> (DefaultQuartzScheduler_Worker-10) [21574461] Failure to refresh Vds runtime
> info: VDSGenericException: VDSNetworkException: Heartbeat exeeded
> 2018-01-23 13:45:33,671 ERROR
> [org.ovirt.engine.core.vdsbroker.HostMonitoring]
> (DefaultQuartzScheduler_Worker-37) [4e8ec41d] Failure to refresh Vds runtime
> info: VDSGenericException: VDSNetworkException: Heartbeat exeeded
> 2018-01-23 13:45:33,671 ERROR
> [org.ovirt.engine.core.vdsbroker.HostMonitoring]
> (DefaultQuartzScheduler_Worker-87) [] Failure to refresh Vds runtime info:
> VDSGenericException: VDSNetworkException: Heartbeat exeeded
> 2018-01-23 13:45:33,671 ERROR
> [org.ovirt.engine.core.vdsbroker.HostMonitoring]
> (DefaultQuartzScheduler_Worker-37) [4e8ec41d] Exception:
> org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException:
> VDSGenericException: VDSNetworkException: Heartbeat exeeded
> at
> org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.proceedProxyReturnValue(BrokerCommandBase.java:188)
> [vdsbroker.jar:]
> at
> org.ovirt.engine.core.vdsbroker.vdsbroker.GetStatsVDSCommand.executeVdsBrokerCommand(GetStatsVDSCommand.java:21)
> [vdsbroker.jar:]
> at
> org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:110)
> [vdsbroker.jar:]
> at
> org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:65)
> [vdsbroker.jar:]
> at
> org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33)
> [dal.jar:]
> at
> org.ovirt.engine.core.vdsbroker.ResourceManager.runVdsCommand(ResourceManager.java:467)
> [vdsbroker.jar:]
> at
> org.ovirt.engine.core.vdsbroker.HostMonitoring.refreshVdsStats(HostMonitoring.java:472)
> [vdsbroker.jar:]
> at
> org.ovirt.engine.core.vdsbroker.HostMonitoring.refreshVdsRunTimeInfo(HostMonitoring.java:114)
> [vdsbroker.jar:]
> at
> org.ovirt.engine.core.vdsbroker.HostMonitoring.refresh(HostMonitoring.java:84)
> [vdsbroker.jar:]
> at
> org.ovirt.engine.core.vdsbroker.VdsManager.onTimer(VdsManager.java:227)
> [vdsbroker.jar:]
> at sun.reflect.GeneratedMethodAccessor75.invoke(Unknown Source)
> [:1.8.0_102]
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> [rt.jar:1.8.0_102]
> at java.lang.reflect.Method.invoke(Method.java:498)
> [rt.jar:1.8.0_102]
> at
> org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod(JobWrapper.java:81)
> [scheduler.jar:]
> at
> org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:52)
> [scheduler.jar:]
> at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
> [quartz.jar:]
> at
> org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)
> [quartz.jar:]
>
> 2018-01-23 13:45:33,671 ERROR
> [org.ovirt.engine.core.vdsbroker.HostMonitoring]
> (DefaultQuartzScheduler_Worker-10) [21574461] Exception:
> org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException:
> VDSGenericException: VDSNetworkException: Heartbeat exeeded
> at
> org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.proceedProxyReturnValue(BrokerCommandBase.java:188)
> [vdsbroker.jar:]
> at
> org.ovirt.engine.core.vdsbroker.vdsbroker.GetStatsVDSCommand.executeVdsBrokerCommand(GetStatsVDSCommand.java:21)
> [vdsbroker.jar:]
> at
> org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:110)
> [vdsbroker.jar:]
> at
> org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:65)
> [vdsbroker.jar:]
> at
> org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33)
> [dal.jar:]
> at
> org.ovirt.engine.core.vdsbroker.ResourceManager.runVdsCommand(ResourceManager.java:467)
> [vdsbroker.jar:]
> at
> org.ovirt.engine.core.vdsbroker.HostMonitoring.refreshVdsStats(HostMonitoring.java:472)
> [vdsbroker.jar:]
> at
> org.ovirt.engine.core.vdsbroker.HostMonitoring.refreshVdsRunTimeInfo(HostMonitoring.java:114)
> [vdsbroker.jar:]
> at
> org.ovirt.engine.core.vdsbroker.HostMonitoring.refresh(HostMonitoring.java:84)
> [vdsbroker.jar:]
> at
> org.ovirt.engine.core.vdsbroker.VdsManager.onTimer(VdsManager.java:227)
> [vdsbroker.jar:]
> at sun.reflect.GeneratedMethodAccessor75.invoke(Unknown Source)
> [:1.8.0_102]
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> [rt.jar:1.8.0_102]
> at java.lang.reflect.Method.invoke(Method.java:498)
> [rt.jar:1.8.0_102]
> at
> org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod(JobWrapper.java:81)
> [scheduler.jar:]
> at
> org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:52)
> [scheduler.jar:]
> at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
> [quartz.jar:]
> at
> org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)
> [quartz.jar:]
>
> 2018-01-23 13:45:33,671 ERROR
> [org.ovirt.engine.core.vdsbroker.HostMonitoring]
> (DefaultQuartzScheduler_Worker-87) [] Exception:
> org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException:
> VDSGenericException: VDSNetworkException: Heartbeat exeeded
> at
> org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.proceedProxyReturnValue(BrokerCommandBase.java:188)
> [vdsbroker.jar:]
> at
> org.ovirt.engine.core.vdsbroker.vdsbroker.GetStatsVDSCommand.executeVdsBrokerCommand(GetStatsVDSCommand.java:21)
> [vdsbroker.jar:]
> at
> org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:110)
> [vdsbroker.jar:]
> at
> org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:65)
> [vdsbroker.jar:]
> at
> org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33)
> [dal.jar:]
> at
> org.ovirt.engine.core.vdsbroker.ResourceManager.runVdsCommand(ResourceManager.java:467)
> [vdsbroker.jar:]
> at
> org.ovirt.engine.core.vdsbroker.HostMonitoring.refreshVdsStats(HostMonitoring.java:472)
> [vdsbroker.jar:]
>
>
>
>
> Here are the engine logs show problem with node d0lppn065, the VMs first go
> to "Unknown" then then "Unknown" plus "not responding":
>
> 2018-01-23 14:48:00,712 ERROR
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (org.ovirt.thread.pool-8-thread-28) [] Correlation ID: null, Call Stack:
> org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException:
> org.ovirt.vdsm.jsonrpc.client.ClientConnection
> Exception: Connection failed
> at
> org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.createNetworkException(VdsBrokerCommand.java:157)
> at
> org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:120)
> at
> org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:65)
> at
> org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33)
> at
> org.ovirt.engine.core.vdsbroker.ResourceManager.runVdsCommand(ResourceManager.java:467)
> at
> org.ovirt.engine.core.vdsbroker.VmsStatisticsFetcher.fetch(VmsStatisticsFetcher.java:27)
> at
> org.ovirt.engine.core.vdsbroker.PollVmStatsRefresher.poll(PollVmStatsRefresher.java:35)
> at sun.reflect.GeneratedMethodAccessor80.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod(JobWrapper.java:81)
> at
> org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:52)
> at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
> at
> org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)
> Caused by: org.ovirt.vdsm.jsonrpc.client.ClientConnectionException:
> Connection failed
> at
> org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient.connect(ReactorClient.java:155)
> at
> org.ovirt.vdsm.jsonrpc.client.JsonRpcClient.getClient(JsonRpcClient.java:134)
> at
> org.ovirt.vdsm.jsonrpc.client.JsonRpcClient.call(JsonRpcClient.java:81)
> at
> org.ovirt.engine.core.vdsbroker.jsonrpc.FutureMap.<init>(FutureMap.java:70)
> at
> org.ovirt.engine.core.vdsbroker.jsonrpc.JsonRpcVdsServer.getAllVmStats(JsonRpcVdsServer.java:331)
> at
> org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand.executeVdsBrokerCommand(GetAllVmStatsVDSCommand.java:20)
> at
> org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:110)
> ... 12 more
> , Custom Event ID: -1, Message: Host d0lppn065 is non responsive.
> 2018-01-23 14:48:00,713 INFO [org.ovirt.engine.core.bll.VdsEventListener]
> (org.ovirt.thread.pool-8-thread-1) [] ResourceManager::vdsNotResponding
> entered for Host '2797cae7-6886-4898-a5e4-23361ce03a90', '10.32.0.65'
> 2018-01-23 14:48:00,713 WARN
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (org.ovirt.thread.pool-8-thread-36) [] Correlation ID: null, Call Stack:
> null, Custom Event ID: -1, Message: VM vtop3 was set to the Unknown status.
>
> ...etc... (sorry about the wraps below)
>
> 2018-01-23 14:59:07,817 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer]
> (DefaultQuartzScheduler_Worker-75) [] VM
> '30f7af86-c2b9-41c3-b2c5-49f5bbdd0e27'(d0lpvd070) moved from 'Up' -->
> 'NotResponding'
> 2018-01-23 14:59:07,819 INFO
> [org.ovirt.engine.core.vdsbroker.VmsStatisticsFetcher]
> (DefaultQuartzScheduler_Worker-74) [] Fetched 15 VMs from VDS
> '8cb119c5-b7f0-48a3-970a-205d96b2e940'
> 2018-01-23 14:59:07,936 WARN
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (DefaultQuartzScheduler_Worker-75) [] Correlation ID: null, Call Stack:
> null, Custom Event ID: -1, Message: VM d0lpvd070 is not responding.
> 2018-01-23 14:59:07,939 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer]
> (DefaultQuartzScheduler_Worker-75) [] VM
> 'ebc5bb82-b985-451b-8313-827b5f40eaf3'(d0lpvd039) moved from 'Up' -->
> 'NotResponding'
> 2018-01-23 14:59:08,032 WARN
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (DefaultQuartzScheduler_Worker-75) [] Correlation ID: null, Call Stack:
> null, Custom Event ID: -1, Message: VM d0lpvd039 is not responding.
> 2018-01-23 14:59:08,038 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer]
> (DefaultQuartzScheduler_Worker-75) [] VM
> '494c4f9e-1616-476a-8f66-a26a96b76e56'(vtop3) moved from 'Up' -->
> 'NotResponding'
> 2018-01-23 14:59:08,134 WARN
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (DefaultQuartzScheduler_Worker-75) [] Correlation ID: null, Call Stack:
> null, Custom Event ID: -1, Message: VM vtop3 is not responding.
> 2018-01-23 14:59:08,136 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer]
> (DefaultQuartzScheduler_Worker-75) [] VM
> 'eaeaf73c-d9e2-426e-a2f2-7fcf085137b0'(d0lpvw059) moved from 'Up' -->
> 'NotResponding'
> 2018-01-23 14:59:08,237 WARN
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (DefaultQuartzScheduler_Worker-75) [] Correlation ID: null, Call Stack:
> null, Custom Event ID: -1, Message: VM d0lpvw059 is not responding.
> 2018-01-23 14:59:08,239 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer]
> (DefaultQuartzScheduler_Worker-75) [] VM
> '8308a547-37a1-4163-8170-f89b6dc85ba8'(d0lpvm058) moved from 'Up' -->
> 'NotResponding'
> 2018-01-23 14:59:08,326 WARN
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (DefaultQuartzScheduler_Worker-75) [] Correlation ID: null, Call Stack:
> null, Custom Event ID: -1, Message: VM d0lpvm058 is not responding.
> 2018-01-23 14:59:08,328 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer]
> (DefaultQuartzScheduler_Worker-75) [] VM
> '3d544926-3326-44e1-8b2a-ec632f51112a'(d0lqva056) moved from 'Up' -->
> 'NotResponding'
> 2018-01-23 14:59:08,400 WARN
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (DefaultQuartzScheduler_Worker-75) [] Correlation ID: null, Call Stack:
> null, Custom Event ID: -1, Message: VM d0lqva056 is not responding.
> 2018-01-23 14:59:08,402 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer]
> (DefaultQuartzScheduler_Worker-75) [] VM
> '989e5a17-789d-4eba-8a5e-f74846128842'(d0lpva078) moved from 'Up' -->
> 'NotResponding'
> 2018-01-23 14:59:08,472 WARN
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (DefaultQuartzScheduler_Worker-75) [] Correlation ID: null, Call Stack:
> null, Custom Event ID: -1, Message: VM d0lpva078 is not responding.
> 2018-01-23 14:59:08,474 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer]
> (DefaultQuartzScheduler_Worker-75) [] VM
> '050a71c1-9e65-43c6-bdb2-18eba571e2eb'(d0lpvw077) moved from 'Up' -->
> 'NotResponding'
> 2018-01-23 14:59:08,545 WARN
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (DefaultQuartzScheduler_Worker-75) [] Correlation ID: null, Call Stack:
> null, Custom Event ID: -1, Message: VM d0lpvw077 is not responding.
> 2018-01-23 14:59:08,547 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer]
> (DefaultQuartzScheduler_Worker-75) [] VM
> 'c3b497fd-6181-4dd1-9acf-8e32f981f769'(d0lpva079) moved from 'Up' -->
> 'NotResponding'
> 2018-01-23 14:59:08,621 WARN
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (DefaultQuartzScheduler_Worker-75) [] Correlation ID: null, Call Stack:
> null, Custom Event ID: -1, Message: VM d0lpva079 is not responding.
> 2018-01-23 14:59:08,623 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer]
> (DefaultQuartzScheduler_Worker-75) [] VM
> '7cd22b39-feb1-4c6e-8643-ac8fb0578842'(d0lqva034) moved from 'Up' -->
> 'NotResponding'
> 2018-01-23 14:59:08,690 WARN
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (DefaultQuartzScheduler_Worker-75) [] Correlation ID: null, Call Stack:
> null, Custom Event ID: -1, Message: VM d0lqva034 is not responding.
> 2018-01-23 14:59:08,692 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer]
> (DefaultQuartzScheduler_Worker-75) [] VM
> '2ab9b1d8-d1e8-4071-a47c-294e586d2fb6'(d0lpvd038) moved from 'Up' -->
> 'NotResponding'
> 2018-01-23 14:59:08,763 WARN
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (DefaultQuartzScheduler_Worker-75) [] Correlation ID: null, Call Stack:
> null, Custom Event ID: -1, Message: VM d0lpvd038 is not responding.
> 2018-01-23 14:59:08,768 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer]
> (DefaultQuartzScheduler_Worker-75) [] VM
> 'ecb4e795-9eeb-4cdc-a356-c1b9b32af5aa'(d0lqva031) moved from 'Up' -->
> 'NotResponding'
> 2018-01-23 14:59:08,836 WARN
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (DefaultQuartzScheduler_Worker-75) [] Correlation ID: null, Call Stack:
> null, Custom Event ID: -1, Message: VM d0lqva031 is not responding.
> 2018-01-23 14:59:08,838 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer]
> (DefaultQuartzScheduler_Worker-75) [] VM
> '1a361727-1607-43d9-bd22-34d45b386d3e'(d0lqva033) moved from 'Up' -->
> 'NotResponding'
> 2018-01-23 14:59:08,911 WARN
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (DefaultQuartzScheduler_Worker-75) [] Correlation ID: null, Call Stack:
> null, Custom Event ID: -1, Message: VM d0lqva033 is not responding.
> 2018-01-23 14:59:08,913 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer]
> (DefaultQuartzScheduler_Worker-75) [] VM
> '0cd65f90-719e-429e-a845-f425612d7b14'(vtop4) moved from 'Up' -->
> 'NotResponding'
> 2018-01-23 14:59:08,984 WARN
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (DefaultQuartzScheduler_Worker-75) [] Correlation ID: null, Call Stack:
> null, Custom Event ID: -1, Message: VM vtop4 is not responding.
>
>>
>> Probably it's time to think to upgrade your environment from 3.6.
>
>
> I know. But from a production standpoint mid-2016 wasn't that long ago.
> And 4 was just coming out of beta at the time.
>
> We were upgrading from 3.4 to 3.6. And it took a long time (again, because
> it's all "live"). Trust me, the move to 4.0 was discussed, it was just a
> timing thing.
>
> With that said, I do "hear you"....and certainly it's being discussed. We
> just don't see a "good" migration path... we see a slow path (moving nodes
> out, upgrading, etc.) and knowing that as with all things, nobody can
> guarantee "success", which would be a very bad thing. So going from working
> 3.6 to totally (potential) broken 4.2, isn't going to impress anyone here,
> you know? If all goes according to our best guesses, then great, but when
> things go bad, and the chance is not insignificant, well... I'm just not
> quite prepared with my résumé if you know what I mean.
>
> Don't get me wrong, our move from 3.4 to 3.6 had some similar risks, but we
> also migrated to whole new infrastructure, a luxury we will not have this
> time. And somehow 3.4 to 3.6 doesn't sound as risky as 3.6 to 4.2.
I see your concern. However, keep your system updated with recent
software is something I would recommend. You could setup a parallel
4.2 env and move the VMS slowly from 3.6.
>
> Is there a path from oVirt to RHEV? Every bit of help we get helps us in
> making that decision as well, which I think would be a very good thing for
> both of us. (I inherited all this oVirt and I was the "guy" doing the 3.4 to
> 3.6 with the all new infrastructure).
Yes, you can import your setup to RHEV.
--
Cheers
Douglas
More information about the Users
mailing list