Thanks Vojtech Juranek for quick response please find logs below
We got vm non responsive and at the same time vdsm log shows "monitor became
unresponsive (command timeout, age=61.88649625200196) (vm:5944)"
Please refer below logs for more details.
[root@host vdsm]# cat vdsm.log | grep WARN
2021-09-21 14:30:19,487+0530 WARN (vdsm.Scheduler) [Executor] executor state: count=5
workers={<Worker name=periodic/492 running <Task discardable <Operation
action=<vdsm.virt.sampling.VMBulkstatsMonitor object at 0x7f025051dc18> at
0x7f02502ff5f8> timeout=7.5, duration=7.50 at 0x7f023860d6d8> discarded task#=2334
at 0x7f02709edc88>, <Worker name=periodic/493 waiting task#=0 at 0x7f02381016a0>,
<Worker name=periodic/489 waiting task#=2701 at 0x7f02386faef0>, <Worker
name=periodic/491 waiting task#=1807 at 0x7f0270903908>, <Worker name=periodic/483
waiting task#=2144 at 0x7f0270b12940>} (executor:213)
2021-09-21 14:30:34,487+0530 WARN (vdsm.Scheduler) [Executor] executor state: count=5
workers={<Worker name=periodic/493 waiting task#=3 at 0x7f02381016a0>, <Worker
name=periodic/489 waiting task#=2704 at 0x7f02386faef0>, <Worker name=periodic/491
waiting task#=1809 at 0x7f0270903908>, <Worker name=periodic/483 running <Task
discardable <Operation action=<vdsm.virt.sampling.VMBulkstatsMonitor object at
0x7f025051dc18> at 0x7f02502ff5f8> timeout=7.5, duration=7.50 at 0x7f02384a7780>
discarded task#=2146 at 0x7f0270b12940>, <Worker name=periodic/494 waiting task#=0
at 0x7f02386d2f60>} (executor:213)
2021-09-21 14:30:49,844+0530 WARN (vdsm.Scheduler) [Executor] executor state: count=5
workers={<Worker name=periodic/493 waiting task#=15 at 0x7f02381016a0>, <Worker
name=periodic/489 waiting task#=2717 at 0x7f02386faef0>, <Worker name=periodic/491
running <Task discardable <Operation
action=<vdsm.virt.sampling.VMBulkstatsMonitor object at 0x7f025051dc18> at
0x7f02502ff5f8> timeout=7.5, duration=7.50 at 0x7f02147e0ac8> discarded task#=1897
at 0x7f0270903908>, <Worker name=periodic/495 waiting task#=0 at 0x7f023847d550>,
<Worker name=periodic/494 waiting task#=12 at 0x7f02386d2f60>} (executor:213)
2021-09-21 14:31:04,491+0530 WARN (vdsm.Scheduler) [Executor] executor state: count=5
workers={<Worker name=periodic/496 waiting task#=0 at 0x7f0270c70c18>, <Worker
name=periodic/493 waiting task#=17 at 0x7f02381016a0>, <Worker name=periodic/489
running <Task discardable <Operation
action=<vdsm.virt.sampling.VMBulkstatsMonitor object at 0x7f025051dc18> at
0x7f02502ff5f8> timeout=7.5, duration=7.50 at 0x7f0238494240> discarded task#=2719
at 0x7f02386faef0>, <Worker name=periodic/495 waiting task#=2 at 0x7f023847d550>,
<Worker name=periodic/494 waiting task#=15 at 0x7f02386d2f60>} (executor:213)
2021-09-21 14:31:19,491+0530 WARN (vdsm.Scheduler) [Executor] executor state: count=5
workers={<Worker name=periodic/496 waiting task#=2 at 0x7f0270c70c18>, <Worker
name=periodic/497 waiting task#=0 at 0x7f0238756828>, <Worker name=periodic/493
waiting task#=20 at 0x7f02381016a0>, <Worker name=periodic/495 waiting task#=6 at
0x7f023847d550>, <Worker name=periodic/494 running <Task discardable
<Operation action=<vdsm.virt.sampling.VMBulkstatsMonitor object at
0x7f025051dc18> at 0x7f02502ff5f8> timeout=7.5, duration=7.50 at 0x7f02386182e8>
discarded task#=17 at 0x7f02386d2f60>} (executor:213)
2021-09-21 14:31:34,492+0530 WARN (vdsm.Scheduler) [Executor] executor state: count=5
workers={<Worker name=periodic/496 waiting task#=4 at 0x7f0270c70c18>, <Worker
name=periodic/497 waiting task#=3 at 0x7f0238756828>, <Worker name=periodic/498
waiting task#=0 at 0x7f02384aca58>, <Worker name=periodic/493 running <Task
discardable <Operation action=<vdsm.virt.sampling.VMBulkstatsMonitor object at
0x7f025051dc18> at 0x7f02502ff5f8> timeout=7.5, duration=7.50 at 0x7f02380652e8>
discarded task#=22 at 0x7f02381016a0>, <Worker name=periodic/495 waiting task#=8 at
0x7f023847d550>} (executor:213)
2021-09-21 14:31:42,418+0530 WARN (jsonrpc/5) [virt.vm]
(vmId='b994fb6c-4a7d-469a-82fb-1d776e9f7ae9') monitor became unresponsive (command
timeout, age=60.07553722700686) (vm:5944)
2021-09-21 14:31:49,492+0530 WARN (vdsm.Scheduler) [Executor] executor state: count=5
workers={<Worker name=periodic/496 waiting task#=93 at 0x7f0270c70c18>, <Worker
name=periodic/497 waiting task#=14 at 0x7f0238756828>, <Worker name=periodic/498
waiting task#=12 at 0x7f02384aca58>, <Worker name=periodic/499 waiting task#=0 at
0x7f0270d3d710>, <Worker name=periodic/495 running <Task discardable
<Operation action=<vdsm.virt.sampling.VMBulkstatsMonitor object at
0x7f025051dc18> at 0x7f02502ff5f8> timeout=7.5, duration=7.50 at 0x7f02384768d0>
discarded task#=22 at 0x7f023847d550>} (executor:213)
2021-09-21 14:32:19,492+0530 WARN (vdsm.Scheduler) [Executor] executor state: count=5
workers={<Worker name=periodic/496 running <Task discardable <Operation
action=<vdsm.virt.sampling.VMBulkstatsMonitor object at 0x7f025051dc18> at
0x7f02502ff5f8> timeout=7.5, duration=7.50 at 0x7f027098a400> discarded task#=97 at
0x7f0270c70c18>, <Worker name=periodic/497 waiting task#=20 at 0x7f0238756828>,
<Worker name=periodic/498 waiting task#=17 at 0x7f02384aca58>, <Worker
name=periodic/500 waiting task#=0 at 0x7f0270ff02b0>, <Worker name=periodic/499
waiting task#=5 at 0x7f0270d3d710>} (executor:213)
2021-09-21 14:33:19,494+0530 WARN (vdsm.Scheduler) [Executor] executor state: count=5
workers={<Worker name=periodic/497 waiting task#=115 at 0x7f0238756828>, <Worker
name=periodic/498 waiting task#=40 at 0x7f02384aca58>, <Worker name=periodic/500
waiting task#=22 at 0x7f0270ff02b0>, <Worker name=periodic/499 running <Task
discardable <Operation action=<vdsm.virt.sampling.VMBulkstatsMonitor object at
0x7f025051dc18> at 0x7f02502ff5f8> timeout=7.5, duration=7.50 at 0x7f0270d25be0>
discarded task#=23 at 0x7f0270d3d710>, <Worker name=periodic/501 waiting task#=0 at
0x7f02709ebdd8>} (executor:213)
2021-09-21 14:33:34,494+0530 WARN (vdsm.Scheduler) [Executor] executor state: count=5
workers={<Worker name=periodic/497 waiting task#=117 at 0x7f0238756828>, <Worker
name=periodic/498 running <Task discardable <Operation
action=<vdsm.virt.sampling.VMBulkstatsMonitor object at 0x7f025051dc18> at
0x7f02502ff5f8> timeout=7.5, duration=7.50 at 0x7f0270c4b198> discarded task#=42 at
0x7f02384aca58>, <Worker name=periodic/500 waiting task#=25 at 0x7f0270ff02b0>,
<Worker name=periodic/502 waiting task#=0 at 0x7f0270d17978>, <Worker
name=periodic/501 waiting task#=2 at 0x7f02709ebdd8>} (executor:213)
2021-09-21 14:33:40,755+0530 WARN (periodic/502) [virt.periodic.VmDispatcher] could not
run <class 'vdsm.virt.periodic.UpdateVolumes'> on
['606a03d5-4a1a-40bc-94dc-d455bc2fe950'] (periodic:290)
2021-09-21 14:33:50,813+0530 WARN (vdsm.Scheduler) [Executor] executor state: count=5
workers={<Worker name=periodic/497 running <Task discardable <Operation
action=<vdsm.virt.sampling.VMBulkstatsMonitor object at 0x7f025051dc18> at
0x7f02502ff5f8> timeout=7.5, duration=7.50 at 0x7f0270a87dd8> discarded task#=129 at
0x7f0238756828>, <Worker name=periodic/500 waiting task#=37 at 0x7f0270ff02b0>,
<Worker name=periodic/503 waiting task#=0 at 0x7f02709f3ef0>, <Worker
name=periodic/502 waiting task#=13 at 0x7f0270d17978>, <Worker name=periodic/501
waiting task#=90 at 0x7f02709ebdd8>} (executor:213)
2021-09-21 14:34:28,880+0530 WARN (jsonrpc/2) [virt.vm]
(vmId='968aae88-8bca-4800-be9f-38067b73f271') monitor became unresponsive (command
timeout, age=61.88649625200196) (vm:5944)
2021-09-21 14:40:34,503+0530 WARN (vdsm.Scheduler) [Executor] executor state: count=5
workers={<Worker name=periodic/504 waiting task#=0 at 0x7f0270faaa58>, <Worker
name=periodic/500 waiting task#=245 at 0x7f0270ff02b0>, <Worker name=periodic/503
running <Task discardable <Operation
action=<vdsm.virt.sampling.VMBulkstatsMonitor object at 0x7f025051dc18> at
0x7f02502ff5f8> timeout=7.5, duration=7.50 at 0x7f02386ada20> discarded task#=217 at
0x7f02709f3ef0>, <Worker name=periodic/502 waiting task#=359 at 0x7f0270d17978>,
<Worker name=periodic/501 waiting task#=301 at 0x7f02709ebdd8>} (executor:213)
Not able to understand the cause and required fix please help