Any idea with this issue?
I am still trying to understand what may be causing this issue.
Many thanx for any assistance.
Alex
On Wed, Mar 28, 2018 at 10:06 AM, Yedidyah Bar David <didi(a)redhat.com>
wrote:
On Tue, Mar 27, 2018 at 3:38 PM, Sandro Bonazzola
<sbonazzo(a)redhat.com>
wrote:
>
>
> 2018-03-27 14:34 GMT+02:00 Alex K <rightkicktech(a)gmail.com>:
>
>> Hi All,
>>
>> Any idea on the below?
>>
>> I am using oVirt Guest Tools 4.2-1.el7.centos for the VM.
>> The Window 2016 server VM (which it the one with the relatively big
>> disks: 500 GB) it is consistently rendered unresponsive when trying to get
>> a snapshot.
>> I amy provide any other additional logs if needed.
>>
>
> Adding some people to the thread
>
Adding more people for this part.
>
>
>
>>
>> Alex
>>
>> On Sun, Mar 25, 2018 at 7:30 PM, Alex K <rightkicktech(a)gmail.com> wrote:
>>
>>> Hi folks,
>>>
>>> I am facing frequently the following issue:
>>>
>>> On some large VMs (Windows 2016 with two disk drives, 60GB and 500GB)
>>> when attempting to create a snapshot of the VM, the VM becomes
>>> unresponsive.
>>>
>>> The errors that I managed to collect were:
>>>
>>> vdsm error at host hosting the VM:
>>> 2018-03-25 14:40:13,442+0000 WARN (vdsm.Scheduler) [Executor] Worker
>>> blocked: <Worker name=jsonrpc/7 running <Task <JsonRpcTask
{'params':
>>> {u'frozen': False, u'vmID':
u'a5c761a2-41cd-40c2-b65f-f3819293e8a4',
>>> u'snapDrives': [{u'baseVolumeID':
u'2a33e585-ece8-4f4d-b45d-5ecc9239200e',
>>> u'domainID': u'888e3aae-f49f-42f7-a7fa-76700befabea',
u'volumeID':
>>> u'e9a01ebd-83dd-40c3-8c83-5302b0d15e04', u'imageID':
>>> u'c75b8e93-3067-4472-bf24-dafada224e4d'}, {u'baseVolumeID':
>>> u'3fb2278c-1b0d-4677-a529-99084e4b08af', u'domainID':
>>> u'888e3aae-f49f-42f7-a7fa-76700befabea', u'volumeID':
>>> u'78e6b6b1-2406-4393-8d92-831a6d4f1337', u'imageID':
>>> u'd4223744-bf5d-427b-bec2-f14b9bc2ef81'}]}, 'jsonrpc':
'2.0',
>>> 'method': u'VM.snapshot', 'id':
u'89555c87-9701-4260-9952-789965261e65'}
>>> at 0x7fca4004cc90> timeout=60, duration=60 at 0x39d8210> task#=155842
at
>>> 0x2240e10> (executor:351)
>>> 2018-03-25 14:40:15,261+0000 INFO (jsonrpc/3) [jsonrpc.JsonRpcServer]
>>> RPC call VM.getStats failed (error 1) in 0.01 seconds (__init__:539)
>>> 2018-03-25 14:40:17,471+0000 WARN (jsonrpc/5) [virt.vm]
>>> (vmId='a5c761a2-41cd-40c2-b65f-f3819293e8a4') monitor became
>>> unresponsive (command timeout, age=67.9100000001) (vm:5132)
>>>
>>> engine.log:
>>> 2018-03-25 14:40:19,875Z WARN [org.ovirt.engine.core.dal.dbb
>>> roker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler2)
>>> [1d737df7] EVENT_ID: VM_NOT_RESPONDING(126), Correlation ID: null, Call
>>> Stack: null, Custom ID: null, Custom Event ID: -1, Message: VM Data-Server
>>> is not responding.
>>>
>>> 2018-03-25 14:42:13,708Z ERROR [org.ovirt.engine.core.dal.dbb
>>> roker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler5)
>>> [17789048-009a-454b-b8ad-2c72c7cd37aa] EVENT_ID:
>>> VDS_BROKER_COMMAND_FAILURE(10,802), Correlation ID: null, Call Stack:
>>> null, Custom ID: null, Custom Event ID: -1, Message: VDSM v1.cluster
>>> command SnapshotVDS failed: Message timeout which can be caused by
>>> communication issues
>>> 2018-03-25 14:42:13,708Z ERROR [org.ovirt.engine.core.vdsbrok
>>> er.vdsbroker.SnapshotVDSCommand] (DefaultQuartzScheduler5)
>>> [17789048-009a-454b-b8ad-2c72c7cd37aa] Command
>>> 'SnapshotVDSCommand(HostName = v1.cluster,
SnapshotVDSCommandParameters:{runAsync='true',
>>> hostId='a713d988-ee03-4ff0-a0cd-dc4cde1507f4',
>>> vmId='a5c761a2-41cd-40c2-b65f-f3819293e8a4'})' execution failed:
>>> VDSGenericException: VDSNetworkException: Message timeout which can be
>>> caused by communication issues
>>> 2018-03-25 14:42:13,708Z WARN [org.ovirt.engine.core.bll.sna
>>> pshots.CreateAllSnapshotsFromVmCommand] (DefaultQuartzScheduler5)
>>> [17789048-009a-454b-b8ad-2c72c7cd37aa] Could not perform live snapshot
>>> due to error, VM will still be configured to the new created snapshot:
>>> EngineException: org.ovirt.engine.core.vdsbroke
>>> r.vdsbroker.VDSNetworkException: VDSGenericException:
>>> VDSNetworkException: Message timeout which can be caused by communication
>>> issues (Failed with error VDS_NETWORK_ERROR and code 5022)
>>> 2018-03-25 14:42:13,708Z WARN [org.ovirt.engine.core.vdsbroker.VdsManager]
>>> (org.ovirt.thread.pool-6-thread-15) [17789048-009a-454b-b8ad-2c72c7cd37aa]
>>> Host 'v1.cluster' is not responding. It will stay in Connecting state
for a
>>> grace period of 61 seconds and after that an attempt to fence the host will
>>> be issued.
>>> 2018-03-25 14:42:13,725Z WARN [org.ovirt.engine.core.dal.dbb
>>> roker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-15)
>>> [17789048-009a-454b-b8ad-2c72c7cd37aa] EVENT_ID:
>>> VDS_HOST_NOT_RESPONDING_CONNECTING(9,008), Correlation ID: null, Call
>>> Stack: null, Custom ID: null, Custom Event ID: -1, Message: Host v1.cluster
>>> is not responding. It will stay in Connecting state for a grace period of
>>> 61 seconds and after that an attempt to fence the host will be issued.
>>> 2018-03-25 14:42:13,751Z WARN [org.ovirt.engine.core.dal.dbb
>>> roker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler5)
>>> [17789048-009a-454b-b8ad-2c72c7cd37aa] EVENT_ID:
>>> USER_CREATE_LIVE_SNAPSHOT_FINISHED_FAILURE(170), Correlation ID:
>>> 17789048-009a-454b-b8ad-2c72c7cd37aa, Job ID:
>>> 16e48c28-a8c7-4841-bd81-1f2d370f345d, Call Stack:
>>> org.ovirt.engine.core.common.errors.EngineException: EngineException:
>>> org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException:
>>> VDSGenericException: VDSNetworkException: Message timeout which can be
>>> caused by communication issues (Failed with error VDS_NETWORK_ERROR and
>>> code 5022)
>>> 2018-03-25 14:42:14,372Z ERROR [org.ovirt.engine.core.dal.dbb
>>> roker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler5) []
>>> EVENT_ID: USER_CREATE_SNAPSHOT_FINISHED_FAILURE(69), Correlation ID:
>>> 17789048-009a-454b-b8ad-2c72c7cd37aa, Job ID:
>>> 16e48c28-a8c7-4841-bd81-1f2d370f345d, Call Stack:
>>> org.ovirt.engine.core.common.errors.EngineException: EngineException:
>>> org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException:
>>> VDSGenericException: VDSNetworkException: Message timeout which can be
>>> caused by communication issues (Failed with error VDS_NETWORK_ERROR and
>>> code 5022)
>>> 2018-03-25 14:42:14,372Z WARN [org.ovirt.engine.core.bll.Con
>>> currentChildCommandsExecutionCallback] (DefaultQuartzScheduler5) []
>>> Command 'CreateAllSnapshotsFromVm' id:
'bad4f5be-5306-413f-a86a-513b3cfd3c66'
>>> end method execution failed, as the command isn't marked for endAction()
>>> retries silently ignoring
>>> 2018-03-25 14:42:15,951Z WARN [org.ovirt.engine.core.dal.dbb
>>> roker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler5)
>>> [5017c163] EVENT_ID: VDS_NO_SELINUX_ENFORCEMENT(25), Correlation ID:
>>> null, Call Stack: null, Custom ID: null, Custom Event ID: -1, Message: Host
>>> v1.cluster does not enforce SELinux. Current status: DISABLED
>>> 2018-03-25 14:42:15,951Z WARN [org.ovirt.engine.core.vdsbroker.VdsManager]
>>> (DefaultQuartzScheduler5) [5017c163] Host 'v1.cluster' is running
with
>>> SELinux in 'DISABLED' mode
>>>
>>> As soon as the VM is unresponsive, the VM console that was already open
>>> freezes. I can resume the VM only by powering off and on.
>>>
>>> I am using ovirt 4.1.9 with 3 nodes and self-hosted engine. I am
>>> running mostly Windows 10 and Windows 2016 server VMs. I have installed
>>> latest guest agents from:
>>>
>>>
http://resources.ovirt.org/pub/ovirt-4.2/iso/oVirt-toolsSetu
>>> p/4.2-1.el7.centos/
>>>
>>> At the screen where one takes a snapshot I get a warning saying "Could
>>> not detect guest agent on the VM. Note that without guest agent the data on
>>> the created snapshot may be inconsistent". See attached. I have
verified
>>> that ovirt guest tools are installed and shown at installed apps at engine
>>> GUI. Also Ovirt Guest Agent (32 bit) and qemu-ga are listed as running at
>>> the windows tasks manager. Shouldn't ovirt guest agent be 64 bit on
Windows
>>> 64 bit?
>>>
>>
No idea, but I do not think it's related to your problem of freezing while
taking a snapshot.
This error was already discussed in the past, see e.g.:
http://lists.ovirt.org/pipermail/users/2017-June/082577.html
Best regards,
--
Didi