Frequent events seen like VM "vmname" is not responding

I am using oVirt Open Virtualization Manager Software Version:4.4.6.8-1.el8 and we have single node running on oVirt Node 4.4.6. We are facing slowness while using VM and also we face connection reset issue between VM and Target servers ( Nexus ) We are getting below logs in ovirt manager events VM "vmname" is not responding Getting this for all the running VMs . We checked resources and utilization of the node everything seems very normal. Could you please guide on how can we solve/debug further on this ?

On Tuesday, 21 September 2021 10:25:49 CEST manoj.sharma99765@gmail.com wrote:
I am using oVirt Open Virtualization Manager Software Version:4.4.6.8-1.el8 and we have single node running on oVirt Node 4.4.6.
We are facing slowness while using VM and also we face connection reset issue between VM and Target servers ( Nexus )
We are getting below logs in ovirt manager events
VM "vmname" is not responding
Getting this for all the running VMs .
We checked resources and utilization of the node everything seems very normal. Could you please guide on how can we solve/debug further on this ?
I'd recommend to check the host, if vdsmd service is running (systemct status vdsmd) and check vdsm log (/var/log/vdsm/vdsm.log) for any errors and eventually share the error here if it's not straightforward how to solve it. HTH Vojta

Thanks Vojtech Juranek for quick response please find logs below We got vm non responsive and at the same time vdsm log shows "monitor became unresponsive (command timeout, age=61.88649625200196) (vm:5944)" Please refer below logs for more details. [root@host vdsm]# cat vdsm.log | grep WARN 2021-09-21 14:30:19,487+0530 WARN (vdsm.Scheduler) [Executor] executor state: count=5 workers={<Worker name=periodic/492 running <Task discardable <Operation action=<vdsm.virt.sampling.VMBulkstatsMonitor object at 0x7f025051dc18> at 0x7f02502ff5f8> timeout=7.5, duration=7.50 at 0x7f023860d6d8> discarded task#=2334 at 0x7f02709edc88>, <Worker name=periodic/493 waiting task#=0 at 0x7f02381016a0>, <Worker name=periodic/489 waiting task#=2701 at 0x7f02386faef0>, <Worker name=periodic/491 waiting task#=1807 at 0x7f0270903908>, <Worker name=periodic/483 waiting task#=2144 at 0x7f0270b12940>} (executor:213) 2021-09-21 14:30:34,487+0530 WARN (vdsm.Scheduler) [Executor] executor state: count=5 workers={<Worker name=periodic/493 waiting task#=3 at 0x7f02381016a0>, <Worker name=periodic/489 waiting task#=2704 at 0x7f02386faef0>, <Worker name=periodic/491 waiting task#=1809 at 0x7f0270903908>, <Worker name=periodic/483 running <Task discardable <Operation action=<vdsm.virt.sampling.VMBulkstatsMonitor object at 0x7f025051dc18> at 0x7f02502ff5f8> timeout=7.5, duration=7.50 at 0x7f02384a7780> discarded task#=2146 at 0x7f0270b12940>, <Worker name=periodic/494 waiting task#=0 at 0x7f02386d2f60>} (executor:213) 2021-09-21 14:30:49,844+0530 WARN (vdsm.Scheduler) [Executor] executor state: count=5 workers={<Worker name=periodic/493 waiting task#=15 at 0x7f02381016a0>, <Worker name=periodic/489 waiting task#=2717 at 0x7f02386faef0>, <Worker name=periodic/491 running <Task discardable <Operation action=<vdsm.virt.sampling.VMBulkstatsMonitor object at 0x7f025051dc18> at 0x7f02502ff5f8> timeout=7.5, duration=7.50 at 0x7f02147e0ac8> discarded task#=1897 at 0x7f0270903908>, <Worker name=periodic/495 waiting task#=0 at 0x7f023847d550>, <Worker name=periodic/494 waiting task#=12 at 0x7f02386d2f60>} (executor:213) 2021-09-21 14:31:04,491+0530 WARN (vdsm.Scheduler) [Executor] executor state: count=5 workers={<Worker name=periodic/496 waiting task#=0 at 0x7f0270c70c18>, <Worker name=periodic/493 waiting task#=17 at 0x7f02381016a0>, <Worker name=periodic/489 running <Task discardable <Operation action=<vdsm.virt.sampling.VMBulkstatsMonitor object at 0x7f025051dc18> at 0x7f02502ff5f8> timeout=7.5, duration=7.50 at 0x7f0238494240> discarded task#=2719 at 0x7f02386faef0>, <Worker name=periodic/495 waiting task#=2 at 0x7f023847d550>, <Worker name=periodic/494 waiting task#=15 at 0x7f02386d2f60>} (executor:213) 2021-09-21 14:31:19,491+0530 WARN (vdsm.Scheduler) [Executor] executor state: count=5 workers={<Worker name=periodic/496 waiting task#=2 at 0x7f0270c70c18>, <Worker name=periodic/497 waiting task#=0 at 0x7f0238756828>, <Worker name=periodic/493 waiting task#=20 at 0x7f02381016a0>, <Worker name=periodic/495 waiting task#=6 at 0x7f023847d550>, <Worker name=periodic/494 running <Task discardable <Operation action=<vdsm.virt.sampling.VMBulkstatsMonitor object at 0x7f025051dc18> at 0x7f02502ff5f8> timeout=7.5, duration=7.50 at 0x7f02386182e8> discarded task#=17 at 0x7f02386d2f60>} (executor:213) 2021-09-21 14:31:34,492+0530 WARN (vdsm.Scheduler) [Executor] executor state: count=5 workers={<Worker name=periodic/496 waiting task#=4 at 0x7f0270c70c18>, <Worker name=periodic/497 waiting task#=3 at 0x7f0238756828>, <Worker name=periodic/498 waiting task#=0 at 0x7f02384aca58>, <Worker name=periodic/493 running <Task discardable <Operation action=<vdsm.virt.sampling.VMBulkstatsMonitor object at 0x7f025051dc18> at 0x7f02502ff5f8> timeout=7.5, duration=7.50 at 0x7f02380652e8> discarded task#=22 at 0x7f02381016a0>, <Worker name=periodic/495 waiting task#=8 at 0x7f023847d550>} (executor:213) 2021-09-21 14:31:42,418+0530 WARN (jsonrpc/5) [virt.vm] (vmId='b994fb6c-4a7d-469a-82fb-1d776e9f7ae9') monitor became unresponsive (command timeout, age=60.07553722700686) (vm:5944) 2021-09-21 14:31:49,492+0530 WARN (vdsm.Scheduler) [Executor] executor state: count=5 workers={<Worker name=periodic/496 waiting task#=93 at 0x7f0270c70c18>, <Worker name=periodic/497 waiting task#=14 at 0x7f0238756828>, <Worker name=periodic/498 waiting task#=12 at 0x7f02384aca58>, <Worker name=periodic/499 waiting task#=0 at 0x7f0270d3d710>, <Worker name=periodic/495 running <Task discardable <Operation action=<vdsm.virt.sampling.VMBulkstatsMonitor object at 0x7f025051dc18> at 0x7f02502ff5f8> timeout=7.5, duration=7.50 at 0x7f02384768d0> discarded task#=22 at 0x7f023847d550>} (executor:213) 2021-09-21 14:32:19,492+0530 WARN (vdsm.Scheduler) [Executor] executor state: count=5 workers={<Worker name=periodic/496 running <Task discardable <Operation action=<vdsm.virt.sampling.VMBulkstatsMonitor object at 0x7f025051dc18> at 0x7f02502ff5f8> timeout=7.5, duration=7.50 at 0x7f027098a400> discarded task#=97 at 0x7f0270c70c18>, <Worker name=periodic/497 waiting task#=20 at 0x7f0238756828>, <Worker name=periodic/498 waiting task#=17 at 0x7f02384aca58>, <Worker name=periodic/500 waiting task#=0 at 0x7f0270ff02b0>, <Worker name=periodic/499 waiting task#=5 at 0x7f0270d3d710>} (executor:213) 2021-09-21 14:33:19,494+0530 WARN (vdsm.Scheduler) [Executor] executor state: count=5 workers={<Worker name=periodic/497 waiting task#=115 at 0x7f0238756828>, <Worker name=periodic/498 waiting task#=40 at 0x7f02384aca58>, <Worker name=periodic/500 waiting task#=22 at 0x7f0270ff02b0>, <Worker name=periodic/499 running <Task discardable <Operation action=<vdsm.virt.sampling.VMBulkstatsMonitor object at 0x7f025051dc18> at 0x7f02502ff5f8> timeout=7.5, duration=7.50 at 0x7f0270d25be0> discarded task#=23 at 0x7f0270d3d710>, <Worker name=periodic/501 waiting task#=0 at 0x7f02709ebdd8>} (executor:213) 2021-09-21 14:33:34,494+0530 WARN (vdsm.Scheduler) [Executor] executor state: count=5 workers={<Worker name=periodic/497 waiting task#=117 at 0x7f0238756828>, <Worker name=periodic/498 running <Task discardable <Operation action=<vdsm.virt.sampling.VMBulkstatsMonitor object at 0x7f025051dc18> at 0x7f02502ff5f8> timeout=7.5, duration=7.50 at 0x7f0270c4b198> discarded task#=42 at 0x7f02384aca58>, <Worker name=periodic/500 waiting task#=25 at 0x7f0270ff02b0>, <Worker name=periodic/502 waiting task#=0 at 0x7f0270d17978>, <Worker name=periodic/501 waiting task#=2 at 0x7f02709ebdd8>} (executor:213) 2021-09-21 14:33:40,755+0530 WARN (periodic/502) [virt.periodic.VmDispatcher] could not run <class 'vdsm.virt.periodic.UpdateVolumes'> on ['606a03d5-4a1a-40bc-94dc-d455bc2fe950'] (periodic:290) 2021-09-21 14:33:50,813+0530 WARN (vdsm.Scheduler) [Executor] executor state: count=5 workers={<Worker name=periodic/497 running <Task discardable <Operation action=<vdsm.virt.sampling.VMBulkstatsMonitor object at 0x7f025051dc18> at 0x7f02502ff5f8> timeout=7.5, duration=7.50 at 0x7f0270a87dd8> discarded task#=129 at 0x7f0238756828>, <Worker name=periodic/500 waiting task#=37 at 0x7f0270ff02b0>, <Worker name=periodic/503 waiting task#=0 at 0x7f02709f3ef0>, <Worker name=periodic/502 waiting task#=13 at 0x7f0270d17978>, <Worker name=periodic/501 waiting task#=90 at 0x7f02709ebdd8>} (executor:213) 2021-09-21 14:34:28,880+0530 WARN (jsonrpc/2) [virt.vm] (vmId='968aae88-8bca-4800-be9f-38067b73f271') monitor became unresponsive (command timeout, age=61.88649625200196) (vm:5944) 2021-09-21 14:40:34,503+0530 WARN (vdsm.Scheduler) [Executor] executor state: count=5 workers={<Worker name=periodic/504 waiting task#=0 at 0x7f0270faaa58>, <Worker name=periodic/500 waiting task#=245 at 0x7f0270ff02b0>, <Worker name=periodic/503 running <Task discardable <Operation action=<vdsm.virt.sampling.VMBulkstatsMonitor object at 0x7f025051dc18> at 0x7f02502ff5f8> timeout=7.5, duration=7.50 at 0x7f02386ada20> discarded task#=217 at 0x7f02709f3ef0>, <Worker name=periodic/502 waiting task#=359 at 0x7f0270d17978>, <Worker name=periodic/501 waiting task#=301 at 0x7f02709ebdd8>} (executor:213) Not able to understand the cause and required fix please help

On Tuesday, 21 September 2021 11:29:06 CEST manoj.sharma99765@gmail.com wrote:
Thanks Vojtech Juranek for quick response please find logs below We got vm non responsive and at the same time vdsm log shows "monitor became unresponsive (command timeout, age=61.88649625200196) (vm:5944)"
This means that thread which collects statistics about VM didn't get a response in expected time. Does VM response? You can also investigate VM using virsh. Or maybe overloaded host which makes all operations very slow and thus time out?
Please refer below logs for more details.
[root@host vdsm]# cat vdsm.log | grep WARN 2021-09-21 14:30:19,487+0530 WARN (vdsm.Scheduler) [Executor] executor state: count=5 workers={<Worker name=periodic/492 running <Task discardable <Operation action=<vdsm.virt.sampling.VMBulkstatsMonitor object at 0x7f025051dc18> at 0x7f02502ff5f8> timeout=7.5, duration=7.50 at 0x7f023860d6d8> discarded task#=2334 at 0x7f02709edc88>, <Worker name=periodic/493 waiting task#=0 at 0x7f02381016a0>, <Worker name=periodic/489 waiting task#=2701 at 0x7f02386faef0>, <Worker name=periodic/491 waiting task#=1807 at 0x7f0270903908>, <Worker name=periodic/483 waiting task#=2144 at 0x7f0270b12940>} (executor:213) 2021-09-21 14:30:34,487+0530 WARN (vdsm.Scheduler) [Executor] executor state: count=5 workers={<Worker name=periodic/493 waiting task#=3 at 0x7f02381016a0>, <Worker name=periodic/489 waiting task#=2704 at 0x7f02386faef0>, <Worker name=periodic/491 waiting task#=1809 at 0x7f0270903908>, <Worker name=periodic/483 running <Task discardable <Operation action=<vdsm.virt.sampling.VMBulkstatsMonitor object at 0x7f025051dc18> at 0x7f02502ff5f8> timeout=7.5, duration=7.50 at 0x7f02384a7780> discarded task#=2146 at 0x7f0270b12940>, <Worker name=periodic/494 waiting task#=0 at 0x7f02386d2f60>} (executor:213) 2021-09-21 14:30:49,844+0530 WARN (vdsm.Scheduler) [Executor] executor state: count=5 workers={<Worker name=periodic/493 waiting task#=15 at 0x7f02381016a0>, <Worker name=periodic/489 waiting task#=2717 at 0x7f02386faef0>, <Worker name=periodic/491 running <Task discardable <Operation action=<vdsm.virt.sampling.VMBulkstatsMonitor object at 0x7f025051dc18> at 0x7f02502ff5f8> timeout=7.5, duration=7.50 at 0x7f02147e0ac8> discarded task#=1897 at 0x7f0270903908>, <Worker name=periodic/495 waiting task#=0 at 0x7f023847d550>, <Worker name=periodic/494 waiting task#=12 at 0x7f02386d2f60>} (executor:213) 2021-09-21 14:31:04,491+0530 WARN (vdsm.Scheduler) [Executor] executor state: count=5 workers={<Worker name=periodic/496 waiting task#=0 at 0x7f0270c70c18>, <Worker name=periodic/493 waiting task#=17 at 0x7f02381016a0>, <Worker name=periodic/489 running <Task discardable <Operation action=<vdsm.virt.sampling.VMBulkstatsMonitor object at 0x7f025051dc18> at 0x7f02502ff5f8> timeout=7.5, duration=7.50 at 0x7f0238494240> discarded task#=2719 at 0x7f02386faef0>, <Worker name=periodic/495 waiting task#=2 at 0x7f023847d550>, <Worker name=periodic/494 waiting task#=15 at 0x7f02386d2f60>} (executor:213) 2021-09-21 14:31:19,491+0530 WARN (vdsm.Scheduler) [Executor] executor state: count=5 workers={<Worker name=periodic/496 waiting task#=2 at 0x7f0270c70c18>, <Worker name=periodic/497 waiting task#=0 at 0x7f0238756828>, <Worker name=periodic/493 waiting task#=20 at 0x7f02381016a0>, <Worker name=periodic/495 waiting task#=6 at 0x7f023847d550>, <Worker name=periodic/494 running <Task discardable <Operation action=<vdsm.virt.sampling.VMBulkstatsMonitor object at 0x7f025051dc18> at 0x7f02502ff5f8> timeout=7.5, duration=7.50 at 0x7f02386182e8> discarded task#=17 at 0x7f02386d2f60>} (executor:213) 2021-09-21 14:31:34,492+0530 WARN (vdsm.Scheduler) [Executor] executor state: count=5 workers={<Worker name=periodic/496 waiting task#=4 at 0x7f0270c70c18>, <Worker name=periodic/497 waiting task#=3 at 0x7f0238756828>, <Worker name=periodic/498 waiting task#=0 at 0x7f02384aca58>, <Worker name=periodic/493 running <Task discardable <Operation action=<vdsm.virt.sampling.VMBulkstatsMonitor object at 0x7f025051dc18> at 0x7f02502ff5f8> timeout=7.5, duration=7.50 at 0x7f02380652e8> discarded task#=22 at 0x7f02381016a0>, <Worker name=periodic/495 waiting task#=8 at 0x7f023847d550>} (executor:213) 2021-09-21 14:31:42,418+0530 WARN (jsonrpc/5) [virt.vm] (vmId='b994fb6c-4a7d-469a-82fb-1d776e9f7ae9') monitor became unresponsive (command timeout, age=60.07553722700686) (vm:5944) 2021-09-21 14:31:49,492+0530 WARN (vdsm.Scheduler) [Executor] executor state: count=5 workers={<Worker name=periodic/496 waiting task#=93 at 0x7f0270c70c18>, <Worker name=periodic/497 waiting task#=14 at 0x7f0238756828>, <Worker name=periodic/498 waiting task#=12 at 0x7f02384aca58>, <Worker name=periodic/499 waiting task#=0 at 0x7f0270d3d710>, <Worker name=periodic/495 running <Task discardable <Operation action=<vdsm.virt.sampling.VMBulkstatsMonitor object at 0x7f025051dc18> at 0x7f02502ff5f8> timeout=7.5, duration=7.50 at 0x7f02384768d0> discarded task#=22 at 0x7f023847d550>} (executor:213) 2021-09-21 14:32:19,492+0530 WARN (vdsm.Scheduler) [Executor] executor state: count=5 workers={<Worker name=periodic/496 running <Task discardable <Operation action=<vdsm.virt.sampling.VMBulkstatsMonitor object at 0x7f025051dc18> at 0x7f02502ff5f8> timeout=7.5, duration=7.50 at 0x7f027098a400> discarded task#=97 at 0x7f0270c70c18>, <Worker name=periodic/497 waiting task#=20 at 0x7f0238756828>, <Worker name=periodic/498 waiting task#=17 at 0x7f02384aca58>, <Worker name=periodic/500 waiting task#=0 at 0x7f0270ff02b0>, <Worker name=periodic/499 waiting task#=5 at 0x7f0270d3d710>} (executor:213) 2021-09-21 14:33:19,494+0530 WARN (vdsm.Scheduler) [Executor] executor state: count=5 workers={<Worker name=periodic/497 waiting task#=115 at 0x7f0238756828>, <Worker name=periodic/498 waiting task#=40 at 0x7f02384aca58>, <Worker name=periodic/500 waiting task#=22 at 0x7f0270ff02b0>, <Worker name=periodic/499 running <Task discardable <Operation action=<vdsm.virt.sampling.VMBulkstatsMonitor object at 0x7f025051dc18> at 0x7f02502ff5f8> timeout=7.5, duration=7.50 at 0x7f0270d25be0> discarded task#=23 at 0x7f0270d3d710>, <Worker name=periodic/501 waiting task#=0 at 0x7f02709ebdd8>} (executor:213) 2021-09-21 14:33:34,494+0530 WARN (vdsm.Scheduler) [Executor] executor state: count=5 workers={<Worker name=periodic/497 waiting task#=117 at 0x7f0238756828>, <Worker name=periodic/498 running <Task discardable <Operation action=<vdsm.virt.sampling.VMBulkstatsMonitor object at 0x7f025051dc18> at 0x7f02502ff5f8> timeout=7.5, duration=7.50 at 0x7f0270c4b198> discarded task#=42 at 0x7f02384aca58>, <Worker name=periodic/500 waiting task#=25 at 0x7f0270ff02b0>, <Worker name=periodic/502 waiting task#=0 at 0x7f0270d17978>, <Worker name=periodic/501 waiting task#=2 at 0x7f02709ebdd8>} (executor:213) 2021-09-21 14:33:40,755+0530 WARN (periodic/502) [virt.periodic.VmDispatcher] could not run <class 'vdsm.virt.periodic.UpdateVolumes'> on ['606a03d5-4a1a-40bc-94dc-d455bc2fe950'] (periodic:290) 2021-09-21 14:33:50,813+0530 WARN (vdsm.Scheduler) [Executor] executor state: count=5 workers={<Worker name=periodic/497 running <Task discardable <Operation action=<vdsm.virt.sampling.VMBulkstatsMonitor object at 0x7f025051dc18> at 0x7f02502ff5f8> timeout=7.5, duration=7.50 at 0x7f0270a87dd8> discarded task#=129 at 0x7f0238756828>, <Worker name=periodic/500 waiting task#=37 at 0x7f0270ff02b0>, <Worker name=periodic/503 waiting task#=0 at 0x7f02709f3ef0>, <Worker name=periodic/502 waiting task#=13 at 0x7f0270d17978>, <Worker name=periodic/501 waiting task#=90 at 0x7f02709ebdd8>} (executor:213) 2021-09-21 14:34:28,880+0530 WARN (jsonrpc/2) [virt.vm] (vmId='968aae88-8bca-4800-be9f-38067b73f271') monitor became unresponsive (command timeout, age=61.88649625200196) (vm:5944) 2021-09-21 14:40:34,503+0530 WARN (vdsm.Scheduler) [Executor] executor state: count=5 workers={<Worker name=periodic/504 waiting task#=0 at 0x7f0270faaa58>, <Worker name=periodic/500 waiting task#=245 at 0x7f0270ff02b0>, <Worker name=periodic/503 running <Task discardable <Operation action=<vdsm.virt.sampling.VMBulkstatsMonitor object at 0x7f025051dc18> at 0x7f02502ff5f8> timeout=7.5, duration=7.50 at 0x7f02386ada20> discarded task#=217 at 0x7f02709f3ef0>, <Worker name=periodic/502 waiting task#=359 at 0x7f0270d17978>, <Worker name=periodic/501 waiting task#=301 at 0x7f02709ebdd8>} (executor:213)
Not able to understand the cause and required fix please help _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/AJWT7KOCVJP6C M42MG7Y7W5ZPOSSZRDZ/

Thanks Vojtech Juranek for quick response. Could you please guide few steps to decide if host is really overloaded. Can you share any documentation link which can help us to solve / troubleshoot further.
participants (2)
-
manoj.sharma99765@gmail.com
-
Vojtech Juranek