Restarted vdsmd on host:

mom.log:
2020-08-06 07:21:19,053 - mom.GuestManager - INFO - Guest Manager ending
2020-08-06 07:21:20,483 - mom.HostMonitor - INFO - Host Monitor ending
2020-08-06 07:21:24,795 - mom - INFO - MOM starting
2020-08-06 07:21:24,833 - mom - INFO - hypervisor interface vdsmjsonrpcclient
2020-08-06 07:21:24,833 - mom.HostMonitor - INFO - Host Monitor starting
2020-08-06 07:21:24,880 - mom - ERROR - Failed to initialize MOM threads
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/mom/__init__.py", line 29, in run
    hypervisor_iface = self.get_hypervisor_interface()
  File "/usr/lib/python2.7/site-packages/mom/__init__.py", line 217, in get_hypervisor_interface
    return module.instance(self.config)
  File "/usr/lib/python2.7/site-packages/mom/HypervisorInterfaces/vdsmjsonrpcclientInterface.py", line 96, in instance
    return JsonRpcVdsmClientInterface()
  File "/usr/lib/python2.7/site-packages/mom/HypervisorInterfaces/vdsmjsonrpcclientInterface.py", line 31, in __init__
    self._vdsm_api = client.connect(host="localhost")
  File "/usr/lib/python2.7/site-packages/vdsm/client.py", line 157, in connect
    raise ConnectionError(host, port, use_tls, timeout, e)
ConnectionError: Connection to localhost:54321 with use_tls=True, timeout=60 failed: [Errno 111] Connection refused
2020-08-06 07:21:30,085 - mom - INFO - MOM starting
2020-08-06 07:21:30,122 - mom.HostMonitor - INFO - Host Monitor starting
2020-08-06 07:21:30,123 - mom - INFO - hypervisor interface vdsmjsonrpcclient
2020-08-06 07:21:30,217 - mom.HostMonitor - INFO - HostMonitor is ready
2020-08-06 07:21:30,221 - mom.GuestManager - INFO - Guest Manager starting: multi-thread
2020-08-06 07:21:30,226 - mom.Policy - INFO - Loaded policy '00-defines'
2020-08-06 07:21:30,228 - mom.Policy - INFO - Loaded policy '01-parameters'
2020-08-06 07:21:30,241 - mom.Policy - INFO - Loaded policy '02-balloon'
2020-08-06 07:21:30,263 - mom.Policy - INFO - Loaded policy '03-ksm'
2020-08-06 07:21:30,290 - mom.Policy - INFO - Loaded policy '04-cputune'
2020-08-06 07:21:30,321 - mom.Policy - INFO - Loaded policy '05-iotune'
2020-08-06 07:21:30,321 - mom.PolicyEngine - INFO - Policy Engine starting
2020-08-06 07:21:30,322 - mom.RPCServer - INFO - Using unix socket /var/run/vdsm/mom-vdsm.sock
2020-08-06 07:21:30,323 - mom.RPCServer - INFO - RPC Server starting
2020-08-06 07:21:40,692 - mom.RPCServer - INFO - ping()
2020-08-06 07:21:40,692 - mom.RPCServer - INFO - getStatistics()
2020-08-06 07:21:45,356 - mom.Controllers.KSM - INFO - Updating KSM configuration: pages_to_scan:0 merge_across_nodes:1 run:0 sleep_millisecs:0
2020-08-06 07:21:55,838 - mom.RPCServer - INFO - ping()
2020-08-06 07:21:55,839 - mom.RPCServer - INFO - getStatistics()

supervdsm.log:
MainProcess|jsonrpc/3::DEBUG::2020-08-05 20:11:14,139::supervdsm_server::99::SuperVdsm.ServerCallback::(wrapper) call ksmTune with ({u'run': 0, u'merge_across_nodes': 1},) {}
MainProcess|jsonrpc/3::DEBUG::2020-08-05 20:11:14,139::supervdsm_server::106::SuperVdsm.ServerCallback::(wrapper) return ksmTune with None
MainProcess::DEBUG::2020-08-06 07:21:25,279::supervdsm_server::99::SuperVdsm.ServerCallback::(wrapper) call multipath_status with (<vdsm.supervdsm_server._SuperVdsm object at 0x7fcbf8dd0450>,) {}
MainProcess::DEBUG::2020-08-06 07:21:25,279::logutils::319::root::(_report_stats) ThreadedHandler is ok in the last 40234 seconds (max pending: 3)
MainProcess::DEBUG::2020-08-06 07:21:25,279::commands::198::storage.Misc.excCmd::(execCmd) /usr/bin/taskset --cpu-list 0-95 /usr/sbin/dmsetup status --target multipath (cwd None)
MainProcess::DEBUG::2020-08-06 07:21:25,283::commands::219::storage.Misc.excCmd::(execCmd) SUCCESS: <err> = ''; <rc> = 0
MainProcess::DEBUG::2020-08-06 07:21:25,289::supervdsm_server::106::SuperVdsm.ServerCallback::(wrapper) return multipath_status with {u'T1_58886_2121': [PathStatus(name=u'sdd', status=u'A'), PathStatus(name=u'sdm', status=u'A')], u'T0_someserver_boot_58886_20c2': [PathStatus(name=u'sdi', status=u'A'), PathStatus(name=u'sdr', status=u'A')], u'T0_R4_UCS_MOB1P_DIGIT_58886_20c8': [PathStatus(name=u'sdg', status=u'A'), PathStatus(name=u'sdp', status=u'A')], u'T0_58886_215d': [PathStatus(name=u'sdb', status=u'A'), PathStatus(name=u'sdk', status=u'A')], u'T0_R4_UCS_MOB1P_DIGIT_58886_20c7': [PathStatus(name=u'sdf', status=u'A'), PathStatus(name=u'sdo', status=u'A')], u'T0_58886_20b8': [PathStatus(name=u'sde', status=u'A'), PathStatus(name=u'sdn', status=u'A')], u'T0_58886_208a': [PathStatus(name=u'sdc', status=u'A'), PathStatus(name=u'sdl', status=u'A')], u'T0_58886_2124': [PathStatus(name=u'sdh', status=u'A'), PathStatus(name=u'sdq', status=u'A')], u'T0_58886_215c': [PathStatus(name=u'sda', status=u'A'), PathStatus(name=u'sdj', status=u'A')]}
MainProcess|hsm/init::DEBUG::2020-08-06 07:21:25,383::supervdsm_server::99::SuperVdsm.ServerCallback::(wrapper) call hbaRescan with (<vdsm.supervdsm_server._SuperVdsm object at 0x7fcbf8dd0450>,) {}
MainProcess|hsm/init::DEBUG::2020-08-06 07:21:25,384::commands::198::storage.HBA::(execCmd) /usr/bin/taskset --cpu-list 0-95 /usr/libexec/vdsm/fc-scan (cwd None)
MainProcess|hsm/init::DEBUG::2020-08-06 07:21:25,450::supervdsm_server::106::SuperVdsm.ServerCallback::(wrapper) return hbaRescan with None
MainProcess|jsonrpc/3::DEBUG::2020-08-06 07:21:45,359::supervdsm_server::99::SuperVdsm.ServerCallback::(wrapper) call ksmTune with ({u'run': 0, u'merge_across_nodes': 1},) {}
MainProcess|jsonrpc/3::DEBUG::2020-08-06 07:21:45,359::supervdsm_server::106::SuperVdsm.ServerCallback::(wrapper) return ksmTune with None

vdsm.log
2020-08-06 07:21:22,529+0200 INFO  (MainThread) [vds] Received signal 15, shutting down (vdsmd:71)                                                                                                      [130/49579]
2020-08-06 07:21:22,672+0200 INFO  (MainThread) [jsonrpc.JsonRpcServer] Stopping JsonRPC Server (__init__:442)                                                                  
2020-08-06 07:21:22,674+0200 INFO  (MainThread) [vds] Stopping http server (http:80)                                                                                                                              
2020-08-06 07:21:22,674+0200 INFO  (http) [vds] Server stopped (http:70)                                                                                                                                          
2020-08-06 07:21:22,674+0200 INFO  (MainThread) [root] Unregistering all secrets (secret:93)                                                                                                                      
2020-08-06 07:21:22,674+0200 INFO  (MainThread) [vds] Stopping QEMU-GA poller (qemuguestagent:158)                                                                                                                
2020-08-06 07:21:22,675+0200 INFO  (MainThread) [vdsm.api] START prepareForShutdown(options=None) from=internal, task_id=251a8da8-ef3c-4740-842a-42bca0541551 (api:48)                                            
2020-08-06 07:21:22,732+0200 INFO  (MainThread) [storage.Monitor] Shutting down domain monitors (monitor:223)                                                                                                      
2020-08-06 07:21:22,732+0200 INFO  (MainThread) [storage.check] Stopping check service (check:104)                                                                                                                
2020-08-06 07:21:22,732+0200 INFO  (check/loop) [storage.asyncevent] Stopping <EventLoop running=False closed=False at 0x140458524164304> (asyncevent:220)                                                        
2020-08-06 07:21:22,732+0200 INFO  (MainThread) [storage.udev] Stopping multipath event listener (udev:149)                                                                                    
2020-08-06 07:21:22,733+0200 INFO  (MainThread) [vdsm.api] FINISH prepareForShutdown return=None from=internal, task_id=251a8da8-ef3c-4740-842a-42bca0541551 (api:54)                      
2020-08-06 07:21:22,733+0200 INFO  (MainThread) [vds] Stopping threads (vdsmd:163)                                                                                                                                
2020-08-06 07:21:22,733+0200 INFO  (MainThread) [vds] Exiting (vdsmd:174)                                                                                                                                          
2020-08-06 07:21:25,134+0200 INFO  (MainThread) [vds] (PID: 217998) I am the actual vdsm 4.30.43.1 someserver (3.10.0-1127.el7.x86_64) (vdsmd:152)                                                    
2020-08-06 07:21:25,135+0200 INFO  (MainThread) [vds] VDSM will run with cpu affinity: frozenset([1]) (vdsmd:264)                                                              
2020-08-06 07:21:25,138+0200 INFO  (MainThread) [storage.HSM] START HSM init (hsm:367)                                                                                                                            
2020-08-06 07:21:25,140+0200 INFO  (MainThread) [storage.HSM] Creating data-center mount directory '/rhev/data-center/mnt' (hsm:372)                                                                              
2020-08-06 07:21:25,140+0200 INFO  (MainThread) [storage.fileUtils] Creating directory: /rhev/data-center/mnt mode: None (fileUtils:199)                                                                          
2020-08-06 07:21:25,269+0200 INFO  (MainThread) [storage.udev] Registering multipath event monitor <vdsm.storage.mpathhealth.Monitor object at 0x7fa060286d50> (udev:182)                  
2020-08-06 07:21:25,269+0200 INFO  (MainThread) [storage.udev] Starting multipath event listener (udev:116)                                                                                                        
2020-08-06 07:21:25,289+0200 INFO  (MainThread) [storage.check] Starting check service (check:91)                                                                                                                  
2020-08-06 07:21:25,293+0200 INFO  (MainThread) [storage.Dispatcher] Starting StorageDispatcher... (dispatcher:47)                                                                                                
2020-08-06 07:21:25,294+0200 INFO  (check/loop) [storage.asyncevent] Starting <EventLoop running=True closed=False at 0x140326784999632> (asyncevent:125)                                                          
2020-08-06 07:21:25,316+0200 INFO  (MainThread) [vdsm.api] START registerDomainStateChangeCallback(callbackFunc=<functools.partial object at 0x7fa06026a628>) from=internal, task_id=c31419c5-486f-4bdd-b206-207e61
a64825 (api:48)                                                                                                                                                                                                    
2020-08-06 07:21:25,316+0200 INFO  (MainThread) [vdsm.api] FINISH registerDomainStateChangeCallback return=None from=internal, task_id=c31419c5-486f-4bdd-b206-207e61a64825 (api:54)          
2020-08-06 07:21:25,316+0200 INFO  (MainThread) [MOM] Preparing MOM interface (momIF:45)                                                                                                                          
2020-08-06 07:21:25,316+0200 INFO  (MainThread) [MOM] MOM: Using named unix socket: /var/run/vdsm/mom-vdsm.sock (momIF:55)                                                                                        
2020-08-06 07:21:25,317+0200 INFO  (MainThread) [root] Unregistering all secrets (secret:93)                                                                                                                      
2020-08-06 07:21:25,330+0200 INFO  (MainThread) [vds] Setting channels' timeout to 30 seconds. (vmchannels:224)                                                                                
2020-08-06 07:21:25,331+0200 INFO  (MainThread) [vds] Starting QEMU-GA poller (qemuguestagent:151)                                                                                                                
2020-08-06 07:21:25,333+0200 INFO  (MainThread) [vds.MultiProtocolAcceptor] Listening at :::54321 (protocoldetector:183)                                                                  
2020-08-06 07:21:25,402+0200 INFO  (http) [vds] Server running (http:59)                                                                                                                                          
2020-08-06 07:21:25,471+0200 INFO  (vmrecovery) [vds] VM recovered: 4c044f9e-afdc-4ee7-bfa4-4b2b94115818 (recovery:97)
2020-08-06 07:21:25,472+0200 INFO  (vmrecovery) [vds] recovery [1:1/3]: recovered domain 4c044f9e-afdc-4ee7-bfa4-4b2b94115818 (recovery:123)
2020-08-06 07:21:25,480+0200 INFO  (hsm/init) [storage.LVM] Overriding read_only mode current=True override=False (lvm:402)
2020-08-06 07:21:25,507+0200 INFO  (vm/4c044f9e) [virt.vm] (vmId='4c044f9e-afdc-4ee7-bfa4-4b2b94115818') VM wrapper has started (vm:2786)
2020-08-06 07:21:25,510+0200 INFO  (vm/4c044f9e) [vdsm.api] START getVolumeSize(sdUUID='531b2674-0bfa-4d71-bc58-85e280e867b9', spUUID='a5f4c7eb-3640-49f9-b644-57f10e9f5279', imgUUID='c7bab10b-9721-4713-84d4-5718
0f8ce2f4', volUUID='1ed27351-77f9-4b69-acbe-a9bc35ed147a', options=None) from=internal, task_id=22cb50a2-fba6-4430-a47e-b1f6da73e7ed (api:48)
2020-08-06 07:21:25,516+0200 INFO  (vmrecovery) [vds] VM recovered: 543765d5-daee-41da-a9d4-34f36c99be82 (recovery:97)
2020-08-06 07:21:25,517+0200 INFO  (vmrecovery) [vds] recovery [1:2/3]: recovered domain 543765d5-daee-41da-a9d4-34f36c99be82 (recovery:123)
2020-08-06 07:21:25,530+0200 INFO  (vm/543765d5) [virt.vm] (vmId='543765d5-daee-41da-a9d4-34f36c99be82') VM wrapper has started (vm:2786)
2020-08-06 07:21:25,532+0200 INFO  (vm/543765d5) [vdsm.api] START getVolumeSize(sdUUID='531b2674-0bfa-4d71-bc58-85e280e867b9', spUUID='a5f4c7eb-3640-49f9-b644-57f10e9f5279', imgUUID='e4fcb0fc-3900-4422-a441-2147
bf8571f3', volUUID='12b3489b-90f5-4b94-aec1-1725296cd66e', options=None) from=internal, task_id=a1c0460b-4195-4b1c-af8a-d34474c5fa2e (api:48)
2020-08-06 07:21:25,545+0200 INFO  (vmrecovery) [vds] VM recovered: 4b472d29-5f40-438f-8bca-9b2affbd3d37 (recovery:97)
2020-08-06 07:21:25,546+0200 INFO  (vmrecovery) [vds] recovery [1:3/3]: recovered domain 4b472d29-5f40-438f-8bca-9b2affbd3d37 (recovery:123)
2020-08-06 07:21:25,546+0200 INFO  (vmrecovery) [vds] recovery: waiting for 3 domains to go up (clientIF:717)
2020-08-06 07:21:25,546+0200 INFO  (vm/4b472d29) [virt.vm] (vmId='4b472d29-5f40-438f-8bca-9b2affbd3d37') VM wrapper has started (vm:2786)
2020-08-06 07:21:25,546+0200 INFO  (periodic/1) [vdsm.api] START repoStats(domains=()) from=internal, task_id=d8e1926b-d289-4dc1-a81e-ec1592083751 (api:48)
2020-08-06 07:21:25,547+0200 INFO  (periodic/1) [vdsm.api] FINISH repoStats return={} from=internal, task_id=d8e1926b-d289-4dc1-a81e-ec1592083751 (api:54)
2020-08-06 07:22:41,636+0200 INFO  (vmrecovery) [vds] recovery: waiting for storage pool to go up (clientIF:723)
2020-08-06 07:22:45,317+0200 INFO  (jsonrpc/7) [api.host] START getAllVmStats() from=::ffff:127.0.0.1,41540 (api:48)
2020-08-06 07:22:45,317+0200 INFO  (jsonrpc/7) [api.host] FINISH getAllVmStats return={'status': {'message': 'Done', 'code': 0}, 'statsList': (suppressed)} from=::ffff:127.0.0.1,41540 (api:54)
2020-08-06 07:22:45,318+0200 INFO  (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call Host.getAllVmStats succeeded in 0.00 seconds (__init__:312)
2020-08-06 07:22:46,642+0200 INFO  (vmrecovery) [vdsm.api] START getConnectedStoragePoolsList(options=None) from=internal, task_id=ad72b73f-d970-4367-a037-94bef589da57 (api:48)
2020-08-06 07:22:46,642+0200 INFO  (vmrecovery) [vdsm.api] FINISH getConnectedStoragePoolsList return={'poollist': []} from=internal, task_id=ad72b73f-d970-4367-a037-94bef589da57 (api:54)
2020-08-06 07:22:46,642+0200 INFO  (vmrecovery) [vds] recovery: waiting for storage pool to go up (clientIF:723)
2020-08-06 07:22:51,648+0200 INFO  (vmrecovery) [vdsm.api] START getConnectedStoragePoolsList(options=None) from=internal, task_id=f14156c0-a819-4fec-aef7-f5f5b1dd0144 (api:48)
2020-08-06 07:22:51,648+0200 INFO  (vmrecovery) [vdsm.api] FINISH getConnectedStoragePoolsList return={'poollist': []} from=internal, task_id=f14156c0-a819-4fec-aef7-f5f5b1dd0144 (api:54)
2020-08-06 07:22:51,649+0200 INFO  (vmrecovery) [vds] recovery: waiting for storage pool to go up (clientIF:723)
2020-08-06 07:22:56,409+0200 INFO  (periodic/1) [vdsm.api] START repoStats(domains=()) from=internal, task_id=75381d37-a06d-4174-b956-f5e7f108ea39 (api:48)
2020-08-06 07:22:56,409+0200 INFO  (periodic/1) [vdsm.api] FINISH repoStats return={} from=internal, task_id=75381d37-a06d-4174-b956-f5e7f108ea39 (api:54)
2020-08-06 07:22:56,654+0200 INFO  (vmrecovery) [vdsm.api] START getConnectedStoragePoolsList(options=None) from=internal, task_id=8f98ae7e-a681-4af3-a4d8-12f0501c8f68 (api:48)
2020-08-06 07:22:56,654+0200 INFO  (vmrecovery) [vdsm.api] FINISH getConnectedStoragePoolsList return={'poollist': []} from=internal, task_id=8f98ae7e-a681-4af3-a4d8-12f0501c8f68 (api:54)
2020-08-06 07:22:56,655+0200 INFO  (vmrecovery) [vds] recovery: waiting for storage pool to go up (clientIF:723)
2020-08-06 07:23:00,337+0200 INFO  (jsonrpc/0) [api.host] START getAllVmStats() from=::ffff:127.0.0.1,41540 (api:48)
2020-08-06 07:23:00,337+0200 INFO  (jsonrpc/0) [api.host] FINISH getAllVmStats return={'status': {'message': 'Done', 'code': 0}, 'statsList': (suppressed)} from=::ffff:127.0.0.1,41540 (api:54)
2020-08-06 07:23:00,337+0200 INFO  (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC call Host.getAllVmStats succeeded in 0.00 seconds (__init__:312)
2020-08-06 07:23:01,660+0200 INFO  (vmrecovery) [vdsm.api] START getConnectedStoragePoolsList(options=None) from=internal, task_id=f984accc-8e20-48b6-ba24-e00418d28043 (api:48)
2020-08-06 07:23:01,660+0200 INFO  (vmrecovery) [vdsm.api] FINISH getConnectedStoragePoolsList return={'poollist': []} from=internal, task_id=f984accc-8e20-48b6-ba24-e00418d28043 (api:54)
2020-08-06 07:23:01,660+0200 INFO  (vmrecovery) [vds] recovery: waiting for storage pool to go up (clientIF:723)

I see "[vds] recovery: waiting for storage pool to go up (clientIF:723)" alot.

Regards

Nardus

On Thu, 6 Aug 2020 at 07:09, Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Can you fheck for errors on the affected host. Most probably you need the vdsm logs.

Best Regards,
Strahil Nikolov

На 6 август 2020 г. 7:40:23 GMT+03:00, Nardus Geldenhuys <nardusg@gmail.com> написа:
>Hi Strahil
>
>Hope you are well. I get the following error when I tried to confirm
>reboot:
>
>Error while executing action: Cannot confirm 'Host has been rebooted'
>Host.
>Valid Host statuses are "Non operational", "Maintenance" or
>"Connecting".
>
>And I can't put it in maintenance, only option is "restart" or "stop".
>
>Regards
>
>Nar
>
>On Thu, 6 Aug 2020 at 06:16, Strahil Nikolov <hunter86_bg@yahoo.com>
>wrote:
>
>> After rebooting the node, have you "marked" it that it was rebooted ?
>>
>> Best Regards,
>> Strahil Nikolov
>>
>> На 5 август 2020 г. 21:29:04 GMT+03:00, Nardus Geldenhuys <
>> nardusg@gmail.com> написа:
>> >Hi oVirt land
>> >
>> >Hope you are well. Got a bit of an issue, actually a big issue. We
>had
>> >some
>> >sort of dip of some sort. All the VM's is still running, but some of
>> >the
>> >hosts is show "Unassigned" or "NonResponsive". So all the hosts was
>> >showing
>> >UP and was fine before our dip. So I did increase
>vdsHeartbeatInSecond
>> >to
>> >240, no luck.
>> >
>> >I still get a timeout on the engine lock even thou I can connect to
>> >that
>> >host from the engine using nc to test to port 54321. I also did
>restart
>> >vdsmd and also rebooted the host with no luck.
>> >
>> > nc -v someserver 54321
>> >Ncat: Version 7.50 ( https://nmap.org/ncat )
>> >Ncat: Connected to 172.40.2.172:54321.
>> >
>> >2020-08-05 20:20:34,256+02 ERROR
>>
>>[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
>> >(EE-ManagedThreadFactory-engineScheduled-Thread-70) [] EVENT_ID:
>> >VDS_BROKER_COMMAND_FAILURE(10,802), VDSM someserver command Get Host
>> >Capabilities failed: Message timeout which can be caused by
>> >communication
>> >issues
>> >
>> >Any troubleshoot ideas will be gladly appreciated.
>> >
>> >Regards
>> >
>> >Nar
>>