I checked vdsm log, if i correctly understand, that it's the bug?
https://paste.fedoraproject.org/paste/tHwLPSnIKI8Px1XBjGw7UA
[root@ovirt3 vdsm]# find /var/log/vdsm/ -name "vdsm*" -mtime -1 -exec xzgrep
--color "2018-10-17.*ERROR" {} \; | sort -k1
2018-10-17 02:04:49,163+0300 ERROR (check/loop) [storage.Monitor] Error checking path
/rhev/data-center/mnt/glusterSD/10.10.1.3:_gl__data/b606bde9-21da-4c10-b77b-2d8e0c374ea2/dom_md/metadata
(monitor:498)
2018-10-17 02:05:09,159+0300 ERROR (check/loop) [storage.Monitor] Error checking path
/rhev/data-center/mnt/glusterSD/10.10.1.3:_gl__data/b606bde9-21da-4c10-b77b-2d8e0c374ea2/dom_md/metadata
(monitor:498)
2018-10-17 02:06:19,155+0300 ERROR (check/loop) [storage.Monitor] Error checking path
/rhev/data-center/mnt/glusterSD/10.10.1.3:_gl__data/b606bde9-21da-4c10-b77b-2d8e0c374ea2/dom_md/metadata
(monitor:498)
2018-10-17 02:06:39,157+0300 ERROR (check/loop) [storage.Monitor] Error checking path
/rhev/data-center/mnt/glusterSD/10.10.1.3:_gl__data/b606bde9-21da-4c10-b77b-2d8e0c374ea2/dom_md/metadata
(monitor:498)
2018-10-17 02:14:09,158+0300 ERROR (check/loop) [storage.Monitor] Error checking path
/rhev/data-center/mnt/glusterSD/10.10.1.3:_gl__data/b606bde9-21da-4c10-b77b-2d8e0c374ea2/dom_md/metadata
(monitor:498)
2018-10-17 02:19:19,163+0300 ERROR (check/loop) [storage.Monitor] Error checking path
/rhev/data-center/mnt/glusterSD/10.10.1.3:_gl__data/b606bde9-21da-4c10-b77b-2d8e0c374ea2/dom_md/metadata
(monitor:498)
2018-10-17 02:20:39,156+0300 ERROR (check/loop) [storage.Monitor] Error checking path
/rhev/data-center/mnt/glusterSD/10.10.1.3:_gl__data/b606bde9-21da-4c10-b77b-2d8e0c374ea2/dom_md/metadata
(monitor:498)
2018-10-17 02:21:59,161+0300 ERROR (check/loop) [storage.Monitor] Error checking path
/rhev/data-center/mnt/glusterSD/10.10.1.3:_gl__data/b606bde9-21da-4c10-b77b-2d8e0c374ea2/dom_md/metadata
(monitor:498)
2018-10-17 02:22:19,154+0300 ERROR (check/loop) [storage.Monitor] Error checking path
/rhev/data-center/mnt/glusterSD/10.10.1.3:_gl__data/b606bde9-21da-4c10-b77b-2d8e0c374ea2/dom_md/metadata
(monitor:498)
2018-10-17 02:29:49,163+0300 ERROR (check/loop) [storage.Monitor] Error checking path
/rhev/data-center/mnt/glusterSD/10.10.1.3:_gl__data/b606bde9-21da-4c10-b77b-2d8e0c374ea2/dom_md/metadata
(monitor:498)
2018-10-17 02:30:29,158+0300 ERROR (check/loop) [storage.Monitor] Error checking path
/rhev/data-center/mnt/glusterSD/10.10.1.3:_gl__data/b606bde9-21da-4c10-b77b-2d8e0c374ea2/dom_md/metadata
(monitor:498)
2018-10-17 02:30:49,158+0300 ERROR (check/loop) [storage.Monitor] Error checking path
/rhev/data-center/mnt/glusterSD/10.10.1.3:_gl__data/b606bde9-21da-4c10-b77b-2d8e0c374ea2/dom_md/metadata
(monitor:498)
2018-10-17 02:32:59,159+0300 ERROR (check/loop) [storage.Monitor] Error checking path
/rhev/data-center/mnt/glusterSD/10.10.1.3:_gl__data/b606bde9-21da-4c10-b77b-2d8e0c374ea2/dom_md/metadata
(monitor:498)
2018-10-17 08:36:19,087+0300 ERROR (check/loop) [storage.Monitor] Error checking path
/rhev/data-center/mnt/192.168.3.11:_ovirt__newstorage/9ce3b963-660d-4a8c-987e-17550b7b28c7/dom_md/metadata
(monitor:498)
2018-10-17 08:36:19,138+0300 ERROR (check/loop) [storage.Monitor] Error checking path
/rhev/data-center/mnt/qnap.company.ru<http://qnap.company.ru>:_engine/b10b9091-a66e-4c26-a1bf-a79ce66d4df4/dom_md/metadata
(monitor:498)
2018-10-17 08:36:19,170+0300 ERROR (check/loop) [storage.Monitor] Error checking path
/rhev/data-center/mnt/10.10.10.254:_ovirt__iso/25647e6d-5b55-4d6a-8c49-04b696aa1109/dom_md/metadata
(monitor:498)
2018-10-17 08:37:09,681+0300 ERROR (monitor/25647e6) [storage.Monitor] Error checking
domain 25647e6d-5b55-4d6a-8c49-04b696aa1109 (monitor:424)
2018-10-17 08:37:09,681+0300 ERROR (monitor/b10b909) [storage.Monitor] Error checking
domain b10b9091-a66e-4c26-a1bf-a79ce66d4df4 (monitor:424)
2018-10-17 08:37:11,928+0300 ERROR (monitor/9ce3b96) [storage.Monitor] Error checking
domain 9ce3b963-660d-4a8c-987e-17550b7b28c7 (monitor:424)
2018-10-17 10:10:59,159+0300 ERROR (check/loop) [storage.Monitor] Error checking path
/rhev/data-center/mnt/glusterSD/10.10.1.3:_gl__data/b606bde9-21da-4c10-b77b-2d8e0c374ea2/dom_md/metadata
(monitor:498)
2018-10-17 10:53:28,082+0300 ERROR (periodic/168) [root] failed to retrieve Hosted Engine
HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished?
(api:196)
2018-10-17 10:56:37,635+0300 ERROR (periodic/1) [root] failed to retrieve Hosted Engine HA
score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished?
(api:196)
2018-10-17 10:56:37,966+0300 ERROR (jsonrpc/3) [storage.TaskManager.Task]
(Task='519e3d58-c68e-4ec2-b144-a54442411dc1') Unexpected error (task:875)
2018-10-17 10:56:37,968+0300 ERROR (jsonrpc/3) [storage.Dispatcher] FINISH
getStorageDomainInfo error=Storage domain does not exist:
(u'b10b9091-a66e-4c26-a1bf-a79ce66d4df4',) (dispatcher:82)
2018-10-17 11:48:41,457+0300 ERROR (check/loop) [storage.Monitor] Error checking path
/rhev/data-center/mnt/glusterSD/10.10.1.3:_gl__data/b606bde9-21da-4c10-b77b-2d8e0c374ea2/dom_md/metadata
(monitor:498)
2018-10-17 11:49:31,317+0300 ERROR (monitor/b606bde) [storage.Monitor] Error checking
domain b606bde9-21da-4c10-b77b-2d8e0c374ea2 (monitor:424)
2018-10-17 11:49:36,837+0300 ERROR (periodic/2) [Executor] Unhandled exception in <Task
discardable <UpdateVolumes vm=60621bbd-6bfe-454d-9088-be7496b24489 at
0x7f5b44653190> timeout=30.0, duration=60 at 0x7f5b646d0710> (executor:317)
2018-10-17 11:49:36,837+0300 ERROR (periodic/2) [storage.Dispatcher] FINISH getVolumeSize
error=Connection timed out (dispatcher:86)
2018-10-17 11:49:36,837+0300 ERROR (periodic/2) [storage.TaskManager.Task]
(Task='b043e518-308a-45f0-bbfb-6dcb574ffba8') Unexpected error (task:875)
2018-10-17 11:49:36,842+0300 ERROR (periodic/0) [Executor] Unhandled exception in <Task
discardable <UpdateVolumes vm=ddf00567-3842-4982-8035-5cdd8efa2c36 at
0x7f5b444c42d0> timeout=30.0, duration=60 at 0x7f5b444c4f50> (executor:317)
2018-10-17 11:49:36,842+0300 ERROR (periodic/0) [storage.Dispatcher] FINISH getVolumeSize
error=Connection timed out (dispatcher:86)
2018-10-17 11:49:36,842+0300 ERROR (periodic/0) [storage.TaskManager.Task]
(Task='90f4ae89-b402-4b38-9f1b-ed68e7e92589') Unexpected error (task:875)
2018-10-17 11:49:36,843+0300 ERROR (periodic/3) [Executor] Unhandled exception in <Task
discardable <UpdateVolumes vm=b40f7d38-daf7-499b-988c-1c2d19d115a5 at
0x7f5b444c4310> timeout=30.0, duration=60 at 0x7f5b842ae7d0> (executor:317)
2018-10-17 11:49:36,843+0300 ERROR (periodic/3) [storage.Dispatcher] FINISH getVolumeSize
error=Connection timed out (dispatcher:86)
2018-10-17 11:49:36,843+0300 ERROR (periodic/3) [storage.TaskManager.Task]
(Task='87116c65-d594-4f4d-a56b-1eea462f68fc') Unexpected error (task:875)
2018-10-17 11:49:36,845+0300 ERROR (periodic/1) [storage.Dispatcher] FINISH getVolumeSize
error=Connection timed out (dispatcher:86)
2018-10-17 11:49:36,845+0300 ERROR (periodic/1) [storage.TaskManager.Task]
(Task='9cc7d57c-92a5-484d-9f4c-7481091b1ebe') Unexpected error (task:875)
2018-10-17 11:49:36,846+0300 ERROR (periodic/1) [Executor] Unhandled exception in <Task
discardable <UpdateVolumes vm=acadd04a-b762-46eb-81b8-bf276758de64 at
0x7f5b842aef50> timeout=30.0, duration=60 at 0x7f5b444e2590> (executor:317)
2018-10-17 11:50:06,837+0300 ERROR (periodic/4) [Executor] Unhandled exception in <Task
discardable <UpdateVolumes vm=5147652d-8288-4614-a1fd-51372af6a93f at
0x7f5b6420e5d0> timeout=30.0, duration=60 at 0x7f5b6420e590> (executor:317)
2018-10-17 11:50:06,837+0300 ERROR (periodic/4) [storage.Dispatcher] FINISH getVolumeSize
error=Connection timed out (dispatcher:86)
2018-10-17 11:50:06,837+0300 ERROR (periodic/4) [storage.TaskManager.Task]
(Task='c2782f6b-59c8-4e16-9bc0-99ed697861a8') Unexpected error (task:875)
2018-10-17 11:50:06,840+0300 ERROR (periodic/5) [Executor] Unhandled exception in <Task
discardable <UpdateVolumes vm=975f16fb-cf7f-4322-bb50-e3fc8b616378 at
0x7f5b6420ec90> timeout=30.0, duration=60 at 0x7f5b6420e890> (executor:317)
2018-10-17 11:50:06,840+0300 ERROR (periodic/5) [storage.Dispatcher] FINISH getVolumeSize
error=Connection timed out (dispatcher:86)
2018-10-17 11:50:06,840+0300 ERROR (periodic/5) [storage.TaskManager.Task]
(Task='b8f9b887-7558-4d1a-a264-af9f8c47b552') Unexpected error (task:875)
2018-10-17 11:50:06,841+0300 ERROR (periodic/6) [Executor] Unhandled exception in <Task
discardable <UpdateVolumes vm=2d0cf85c-f53c-42fd-bbdb-7f1a6d20e193 at
0x7f5b6420e610> timeout=30.0, duration=60 at 0x7f5b6420edd0> (executor:317)
2018-10-17 11:50:06,841+0300 ERROR (periodic/6) [storage.Dispatcher] FINISH getVolumeSize
error=Connection timed out (dispatcher:86)
2018-10-17 11:50:06,841+0300 ERROR (periodic/6) [storage.TaskManager.Task]
(Task='0359b365-cea7-436b-96c2-98d7efe9470b') Unexpected error (task:875)
2018-10-17 11:50:06,853+0300 ERROR (periodic/7) [storage.TaskManager.Task]
(Task='91d471ef-853e-4351-b0a8-3712961573d9') Unexpected error (task:875)
2018-10-17 11:50:06,854+0300 ERROR (periodic/7) [Executor] Unhandled exception in <Task
discardable <UpdateVolumes vm=efd80eea-5bb4-493c-9e88-7ad9f4accd78 at
0x7f5b6420ebd0> timeout=30.0, duration=60 at 0x7f5b6420e950> (executor:317)
2018-10-17 11:50:06,854+0300 ERROR (periodic/7) [storage.Dispatcher] FINISH getVolumeSize
error=Connection timed out (dispatcher:86)
2018-10-17 12:02:01,456+0300 ERROR (check/loop) [storage.Monitor] Error checking path
/rhev/data-center/mnt/glusterSD/10.10.1.3:_gl__data/b606bde9-21da-4c10-b77b-2d8e0c374ea2/dom_md/metadata
(monitor:498)
2018-10-17 12:04:01,456+0300 ERROR (check/loop) [storage.Monitor] Error checking path
/rhev/data-center/mnt/glusterSD/10.10.1.3:_gl__data/b606bde9-21da-4c10-b77b-2d8e0c374ea2/dom_md/metadata
(monitor:498)
2018-10-17 13:43:31,453+0300 ERROR (check/loop) [storage.Monitor] Error checking path
/rhev/data-center/mnt/glusterSD/10.10.1.3:_gl__data/b606bde9-21da-4c10-b77b-2d8e0c374ea2/dom_md/metadata
(monitor:498)
2018-10-17 13:44:27,300+0300 ERROR (monitor/b606bde) [storage.Monitor] Error checking
domain b606bde9-21da-4c10-b77b-2d8e0c374ea2 (monitor:424)
2018-10-17 13:44:36,992+0300 ERROR (periodic/12) [Executor] Unhandled exception in
<Task discardable <UpdateVolumes vm=18f90e1d-ad1d-4937-8b71-4f5e09e772a2 at
0x7f5b6463b910> timeout=30.0, duration=60 at 0x7f5b26e35450> (executor:317)
2018-10-17 13:44:36,992+0300 ERROR (periodic/12) [storage.Dispatcher] FINISH getVolumeSize
error=Connection timed out (dispatcher:86)
2018-10-17 13:44:36,992+0300 ERROR (periodic/12) [storage.TaskManager.Task]
(Task='12d1da8e-ba6b-48fb-8ae7-6b304891356d') Unexpected error (task:875)
2018-10-17 13:44:36,995+0300 ERROR (periodic/14) [Executor] Unhandled exception in
<Task discardable <UpdateVolumes vm=ddf00567-3842-4982-8035-5cdd8efa2c36 at
0x7f5b26e35610> timeout=30.0, duration=60 at 0x7f5b26cfce90> (executor:317)
2018-10-17 13:44:36,995+0300 ERROR (periodic/14) [storage.Dispatcher] FINISH getVolumeSize
error=Connection timed out (dispatcher:86)
2018-10-17 13:44:36,995+0300 ERROR (periodic/14) [storage.TaskManager.Task]
(Task='b198455f-2847-4c8e-821b-6c269bed2411') Unexpected error (task:875)
2018-10-17 13:44:36,996+0300 ERROR (periodic/16) [Executor] Unhandled exception in
<Task discardable <UpdateVolumes vm=b40f7d38-daf7-499b-988c-1c2d19d115a5 at
0x7f5b26cfca90> timeout=30.0, duration=60 at 0x7f5b26cfc790> (executor:317)
2018-10-17 13:44:36,996+0300 ERROR (periodic/16) [storage.Dispatcher] FINISH getVolumeSize
error=Connection timed out (dispatcher:86)
2018-10-17 13:44:36,996+0300 ERROR (periodic/16) [storage.TaskManager.Task]
(Task='60d13a5b-0b10-4a60-b9ee-05820a4e47a7') Unexpected error (task:875)
2018-10-17 13:44:37,001+0300 ERROR (periodic/17) [storage.TaskManager.Task]
(Task='6c827331-3373-4ed0-9630-0cea0d989a06') Unexpected error (task:875)
2018-10-17 13:44:37,002+0300 ERROR (periodic/17) [Executor] Unhandled exception in
<Task discardable <UpdateVolumes vm=acadd04a-b762-46eb-81b8-bf276758de64 at
0x7f5b26cfc690> timeout=30.0, duration=60 at 0x7f5b26fc9f90> (executor:317)
2018-10-17 13:44:37,002+0300 ERROR (periodic/17) [storage.Dispatcher] FINISH getVolumeSize
error=Connection timed out (dispatcher:86)
2018-10-17 14:56:02,788+0300 ERROR (migmon/f52e8e78) [root] Unhandled exception
(logutils:412)
2018-10-17 14:56:02,884+0300 ERROR (migmon/f52e8e78) [root] FINISH thread
<Thread(migmon/f52e8e78, stopped daemon 140026755127040)> failed (concurrent:201)
On 17 Oct 2018, at 14:42, Sahina Bose
<sabose@redhat.com<mailto:sabose@redhat.com>> wrote:
On Tue, Oct 16, 2018 at 11:39 PM Spickiy Nikita
<n.spickiy@outlook.com<mailto:n.spickiy@outlook.com>> wrote:
Hi, i have oVirt instance (4.2.1.6-1.el7.centos). So, i have cluster with gluster. Hosts
periodically non response and VM's is not responding. Usually it happens after get
message "command GetGlusterVolumeHealInfoVDS failed: Message timeout which can be
caused by communication issues".
Will solve the trouble if an increase timeout for get heat status? And how to do it?
I attach part log below:
https://paste.fedoraproject.org/paste/8TTzwjMbYk32d7wd7Ix0Pw/raw
2018-10-15 14:44:22,582+03 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(DefaultQuartzScheduler6) [70cfd553] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM
ovirt3.example.org<http://ovirt3.example.org/> command GetGlusterVolumeHealInfoVDS
failed: Message timeout which can be caused by communication issues
2018-10-15 14:44:22,584+03 ERROR
[org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVolumeHealInfoVDSCommand]
(DefaultQuartzScheduler6) [70cfd553] Command
'GetGlusterVolumeHealInfoVDSCommand(HostName =
ovirt3.example.org<http://ovirt3.example.org/>,
GlusterVolumeVDSParameters:{hostId='39215015-2537-4329-921f-c11256f99e04',
volumeName='domain1'})' execution failed: VDSGenericException:
VDSNetworkException: Message timeout which can be caused by communication issues
2018-10-15 14:44:22,584+03 WARN [org.ovirt.engine.core.vdsbroker.VdsManager]
(EE-ManagedThreadFactory-engine-Thread-7) [70cfd553] Host
'ovirt3.example.org<http://ovirt3.example.org/>' is not responding. It will
stay in Connecting state for a grace period of 77 seconds and after that an attempt to
fence the host will be issued.
2018-10-15 14:44:22,591+03 WARN
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(EE-ManagedThreadFactory-engine-Thread-7) [70cfd553] EVENT_ID:
VDS_HOST_NOT_RESPONDING_CONNECTING(9,008), Host
ovirt3.example.org<http://ovirt3.example.org/> is not responding. It will stay in
Connecting state for a grace period of 77 seconds and after that an attempt to fence the
host will be issued.
2018-10-15 14:44:54,620+03 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(EE-ManagedThreadFactory-engine-Thread-13) [] EVENT_ID: VDS_STORAGE_VDS_STATS_FAILED(189),
Host
ovirt3.example.org<http://ovirt3.example.org/> reports about one of the Active
Storage Domains as Problematic.
2018-10-15 14:44:54,827+03 WARN
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(EE-ManagedThreadFactory-engineScheduled-Thread-46) [6d9504d1] EVENT_ID:
VDS_SET_NONOPERATIONAL_DOMAIN(522), Host
ovirt3.example.org<http://ovirt3.example.org/> cannot access the Storage Domain(s)
DOMAIN1 attached to the Data Center Default. Setting Host state to Non-Operational.
2018-10-15 14:44:54,840+03 WARN
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(EE-ManagedThreadFactory-engineScheduled-Thread-46) [6d9504d1] EVENT_ID:
CONNECT_STORAGE_POOL_FAILED(995), Failed to connect Host
ovirt3.example.org<http://ovirt3.example.org/> to Storage Pool Default
2018-10-15 14:45:28,698+03 WARN
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(EE-ManagedThreadFactory-engineScheduled-Thread-87) [] EVENT_ID: VM_NOT_RESPONDING(126),
VM HostedEngine is not responding.
2018-10-15 14:45:30,296+03 WARN
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(EE-ManagedThreadFactory-engineScheduled-Thread-72) [] EVENT_ID: VM_NOT_RESPONDING(126),
VM vm2 is not responding.
2018-10-15 14:45:30,362+03 WARN
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(EE-ManagedThreadFactory-engineScheduled-Thread-72) [] EVENT_ID: VM_NOT_RESPONDING(126),
VM vm3 is not responding.
Can you check the vdsm log to see if you're running into
https://bugzilla.redhat.com/show_bug.cgi?id=1614430
_______________________________________________
Users mailing list -- users@ovirt.org<mailto:users@ovirt.org>
To unsubscribe send an email to users-leave@ovirt.org<mailto:users-leave@ovirt.org>
Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/XK7YX6FINFO...