
Restarted vdsmd on host: mom.log: 2020-08-06 07:21:19,053 - mom.GuestManager - INFO - Guest Manager ending 2020-08-06 07:21:20,483 - mom.HostMonitor - INFO - Host Monitor ending 2020-08-06 07:21:24,795 - mom - INFO - MOM starting 2020-08-06 07:21:24,833 - mom - INFO - hypervisor interface vdsmjsonrpcclient 2020-08-06 07:21:24,833 - mom.HostMonitor - INFO - Host Monitor starting 2020-08-06 07:21:24,880 - mom - ERROR - Failed to initialize MOM threads Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/mom/__init__.py", line 29, in run hypervisor_iface = self.get_hypervisor_interface() File "/usr/lib/python2.7/site-packages/mom/__init__.py", line 217, in get_hypervisor_interface return module.instance(self.config) File "/usr/lib/python2.7/site-packages/mom/HypervisorInterfaces/vdsmjsonrpcclientInterface.py", line 96, in instance return JsonRpcVdsmClientInterface() File "/usr/lib/python2.7/site-packages/mom/HypervisorInterfaces/vdsmjsonrpcclientInterface.py", line 31, in __init__ self._vdsm_api = client.connect(host="localhost") File "/usr/lib/python2.7/site-packages/vdsm/client.py", line 157, in connect raise ConnectionError(host, port, use_tls, timeout, e) ConnectionError: Connection to localhost:54321 with use_tls=True, timeout=60 failed: [Errno 111] Connection refused 2020-08-06 07:21:30,085 - mom - INFO - MOM starting 2020-08-06 07:21:30,122 - mom.HostMonitor - INFO - Host Monitor starting 2020-08-06 07:21:30,123 - mom - INFO - hypervisor interface vdsmjsonrpcclient 2020-08-06 07:21:30,217 - mom.HostMonitor - INFO - HostMonitor is ready 2020-08-06 07:21:30,221 - mom.GuestManager - INFO - Guest Manager starting: multi-thread 2020-08-06 07:21:30,226 - mom.Policy - INFO - Loaded policy '00-defines' 2020-08-06 07:21:30,228 - mom.Policy - INFO - Loaded policy '01-parameters' 2020-08-06 07:21:30,241 - mom.Policy - INFO - Loaded policy '02-balloon' 2020-08-06 07:21:30,263 - mom.Policy - INFO - Loaded policy '03-ksm' 2020-08-06 07:21:30,290 - mom.Policy - INFO - Loaded policy '04-cputune' 2020-08-06 07:21:30,321 - mom.Policy - INFO - Loaded policy '05-iotune' 2020-08-06 07:21:30,321 - mom.PolicyEngine - INFO - Policy Engine starting 2020-08-06 07:21:30,322 - mom.RPCServer - INFO - Using unix socket /var/run/vdsm/mom-vdsm.sock 2020-08-06 07:21:30,323 - mom.RPCServer - INFO - RPC Server starting 2020-08-06 07:21:40,692 - mom.RPCServer - INFO - ping() 2020-08-06 07:21:40,692 - mom.RPCServer - INFO - getStatistics() 2020-08-06 07:21:45,356 - mom.Controllers.KSM - INFO - Updating KSM configuration: pages_to_scan:0 merge_across_nodes:1 run:0 sleep_millisecs:0 2020-08-06 07:21:55,838 - mom.RPCServer - INFO - ping() 2020-08-06 07:21:55,839 - mom.RPCServer - INFO - getStatistics() supervdsm.log: MainProcess|jsonrpc/3::DEBUG::2020-08-05 20:11:14,139::supervdsm_server::99::SuperVdsm.ServerCallback::(wrapper) call ksmTune with ({u'run': 0, u'merge_across_nodes': 1},) {} MainProcess|jsonrpc/3::DEBUG::2020-08-05 20:11:14,139::supervdsm_server::106::SuperVdsm.ServerCallback::(wrapper) return ksmTune with None MainProcess::DEBUG::2020-08-06 07:21:25,279::supervdsm_server::99::SuperVdsm.ServerCallback::(wrapper) call multipath_status with (<vdsm.supervdsm_server._SuperVdsm object at 0x7fcbf8dd0450>,) {} MainProcess::DEBUG::2020-08-06 07:21:25,279::logutils::319::root::(_report_stats) ThreadedHandler is ok in the last 40234 seconds (max pending: 3) MainProcess::DEBUG::2020-08-06 07:21:25,279::commands::198::storage.Misc.excCmd::(execCmd) /usr/bin/taskset --cpu-list 0-95 /usr/sbin/dmsetup status --target multipath (cwd None) MainProcess::DEBUG::2020-08-06 07:21:25,283::commands::219::storage.Misc.excCmd::(execCmd) SUCCESS: <err> = ''; <rc> = 0 MainProcess::DEBUG::2020-08-06 07:21:25,289::supervdsm_server::106::SuperVdsm.ServerCallback::(wrapper) return multipath_status with {u'T1_58886_2121': [PathStatus(name=u'sdd', status=u'A'), PathStatus(name=u'sdm', status=u'A')], u'T0_someserver_boot_58886_20c2': [PathStatus(name=u'sdi', status=u'A'), PathStatus(name=u'sdr', status=u'A')], u'T0_R4_UCS_MOB1P_DIGIT_58886_20c8': [PathStatus(name=u'sdg', status=u'A'), PathStatus(name=u'sdp', status=u'A')], u'T0_58886_215d': [PathStatus(name=u'sdb', status=u'A'), PathStatus(name=u'sdk', status=u'A')], u'T0_R4_UCS_MOB1P_DIGIT_58886_20c7': [PathStatus(name=u'sdf', status=u'A'), PathStatus(name=u'sdo', status=u'A')], u'T0_58886_20b8': [PathStatus(name=u'sde', status=u'A'), PathStatus(name=u'sdn', status=u'A')], u'T0_58886_208a': [PathStatus(name=u'sdc', status=u'A'), PathStatus(name=u'sdl', status=u'A')], u'T0_58886_2124': [PathStatus(name=u'sdh', status=u'A'), PathStatus(name=u'sdq', status=u'A')], u'T0_58886_215c': [PathStatus(name=u'sda', status=u'A'), PathStatus(name=u'sdj', status=u'A')]} MainProcess|hsm/init::DEBUG::2020-08-06 07:21:25,383::supervdsm_server::99::SuperVdsm.ServerCallback::(wrapper) call hbaRescan with (<vdsm.supervdsm_server._SuperVdsm object at 0x7fcbf8dd0450>,) {} MainProcess|hsm/init::DEBUG::2020-08-06 07:21:25,384::commands::198::storage.HBA::(execCmd) /usr/bin/taskset --cpu-list 0-95 /usr/libexec/vdsm/fc-scan (cwd None) MainProcess|hsm/init::DEBUG::2020-08-06 07:21:25,450::supervdsm_server::106::SuperVdsm.ServerCallback::(wrapper) return hbaRescan with None MainProcess|jsonrpc/3::DEBUG::2020-08-06 07:21:45,359::supervdsm_server::99::SuperVdsm.ServerCallback::(wrapper) call ksmTune with ({u'run': 0, u'merge_across_nodes': 1},) {} MainProcess|jsonrpc/3::DEBUG::2020-08-06 07:21:45,359::supervdsm_server::106::SuperVdsm.ServerCallback::(wrapper) return ksmTune with None vdsm.log 2020-08-06 07:21:22,529+0200 INFO (MainThread) [vds] Received signal 15, shutting down (vdsmd:71) [130/49579] 2020-08-06 07:21:22,672+0200 INFO (MainThread) [jsonrpc.JsonRpcServer] Stopping JsonRPC Server (__init__:442) 2020-08-06 07:21:22,674+0200 INFO (MainThread) [vds] Stopping http server (http:80) 2020-08-06 07:21:22,674+0200 INFO (http) [vds] Server stopped (http:70) 2020-08-06 07:21:22,674+0200 INFO (MainThread) [root] Unregistering all secrets (secret:93) 2020-08-06 07:21:22,674+0200 INFO (MainThread) [vds] Stopping QEMU-GA poller (qemuguestagent:158) 2020-08-06 07:21:22,675+0200 INFO (MainThread) [vdsm.api] START prepareForShutdown(options=None) from=internal, task_id=251a8da8-ef3c-4740-842a-42bca0541551 (api:48) 2020-08-06 07:21:22,732+0200 INFO (MainThread) [storage.Monitor] Shutting down domain monitors (monitor:223) 2020-08-06 07:21:22,732+0200 INFO (MainThread) [storage.check] Stopping check service (check:104) 2020-08-06 07:21:22,732+0200 INFO (check/loop) [storage.asyncevent] Stopping <EventLoop running=False closed=False at 0x140458524164304> (asyncevent:220) 2020-08-06 07:21:22,732+0200 INFO (MainThread) [storage.udev] Stopping multipath event listener (udev:149) 2020-08-06 07:21:22,733+0200 INFO (MainThread) [vdsm.api] FINISH prepareForShutdown return=None from=internal, task_id=251a8da8-ef3c-4740-842a-42bca0541551 (api:54) 2020-08-06 07:21:22,733+0200 INFO (MainThread) [vds] Stopping threads (vdsmd:163) 2020-08-06 07:21:22,733+0200 INFO (MainThread) [vds] Exiting (vdsmd:174) 2020-08-06 07:21:25,134+0200 INFO (MainThread) [vds] (PID: 217998) I am the actual vdsm 4.30.43.1 someserver (3.10.0-1127.el7.x86_64) (vdsmd:152) 2020-08-06 07:21:25,135+0200 INFO (MainThread) [vds] VDSM will run with cpu affinity: frozenset([1]) (vdsmd:264) 2020-08-06 07:21:25,138+0200 INFO (MainThread) [storage.HSM] START HSM init (hsm:367) 2020-08-06 07:21:25,140+0200 INFO (MainThread) [storage.HSM] Creating data-center mount directory '/rhev/data-center/mnt' (hsm:372) 2020-08-06 07:21:25,140+0200 INFO (MainThread) [storage.fileUtils] Creating directory: /rhev/data-center/mnt mode: None (fileUtils:199) 2020-08-06 07:21:25,269+0200 INFO (MainThread) [storage.udev] Registering multipath event monitor <vdsm.storage.mpathhealth.Monitor object at 0x7fa060286d50> (udev:182) 2020-08-06 07:21:25,269+0200 INFO (MainThread) [storage.udev] Starting multipath event listener (udev:116) 2020-08-06 07:21:25,289+0200 INFO (MainThread) [storage.check] Starting check service (check:91) 2020-08-06 07:21:25,293+0200 INFO (MainThread) [storage.Dispatcher] Starting StorageDispatcher... (dispatcher:47) 2020-08-06 07:21:25,294+0200 INFO (check/loop) [storage.asyncevent] Starting <EventLoop running=True closed=False at 0x140326784999632> (asyncevent:125) 2020-08-06 07:21:25,316+0200 INFO (MainThread) [vdsm.api] START registerDomainStateChangeCallback(callbackFunc=<functools.partial object at 0x7fa06026a628>) from=internal, task_id=c31419c5-486f-4bdd-b206-207e61 a64825 (api:48) 2020-08-06 07:21:25,316+0200 INFO (MainThread) [vdsm.api] FINISH registerDomainStateChangeCallback return=None from=internal, task_id=c31419c5-486f-4bdd-b206-207e61a64825 (api:54) 2020-08-06 07:21:25,316+0200 INFO (MainThread) [MOM] Preparing MOM interface (momIF:45) 2020-08-06 07:21:25,316+0200 INFO (MainThread) [MOM] MOM: Using named unix socket: /var/run/vdsm/mom-vdsm.sock (momIF:55) 2020-08-06 07:21:25,317+0200 INFO (MainThread) [root] Unregistering all secrets (secret:93) 2020-08-06 07:21:25,330+0200 INFO (MainThread) [vds] Setting channels' timeout to 30 seconds. (vmchannels:224) 2020-08-06 07:21:25,331+0200 INFO (MainThread) [vds] Starting QEMU-GA poller (qemuguestagent:151) 2020-08-06 07:21:25,333+0200 INFO (MainThread) [vds.MultiProtocolAcceptor] Listening at :::54321 (protocoldetector:183) 2020-08-06 07:21:25,402+0200 INFO (http) [vds] Server running (http:59) 2020-08-06 07:21:25,471+0200 INFO (vmrecovery) [vds] VM recovered: 4c044f9e-afdc-4ee7-bfa4-4b2b94115818 (recovery:97) 2020-08-06 07:21:25,472+0200 INFO (vmrecovery) [vds] recovery [1:1/3]: recovered domain 4c044f9e-afdc-4ee7-bfa4-4b2b94115818 (recovery:123) 2020-08-06 07:21:25,480+0200 INFO (hsm/init) [storage.LVM] Overriding read_only mode current=True override=False (lvm:402) 2020-08-06 07:21:25,507+0200 INFO (vm/4c044f9e) [virt.vm] (vmId='4c044f9e-afdc-4ee7-bfa4-4b2b94115818') VM wrapper has started (vm:2786) 2020-08-06 07:21:25,510+0200 INFO (vm/4c044f9e) [vdsm.api] START getVolumeSize(sdUUID='531b2674-0bfa-4d71-bc58-85e280e867b9', spUUID='a5f4c7eb-3640-49f9-b644-57f10e9f5279', imgUUID='c7bab10b-9721-4713-84d4-5718 0f8ce2f4', volUUID='1ed27351-77f9-4b69-acbe-a9bc35ed147a', options=None) from=internal, task_id=22cb50a2-fba6-4430-a47e-b1f6da73e7ed (api:48) 2020-08-06 07:21:25,516+0200 INFO (vmrecovery) [vds] VM recovered: 543765d5-daee-41da-a9d4-34f36c99be82 (recovery:97) 2020-08-06 07:21:25,517+0200 INFO (vmrecovery) [vds] recovery [1:2/3]: recovered domain 543765d5-daee-41da-a9d4-34f36c99be82 (recovery:123) 2020-08-06 07:21:25,530+0200 INFO (vm/543765d5) [virt.vm] (vmId='543765d5-daee-41da-a9d4-34f36c99be82') VM wrapper has started (vm:2786) 2020-08-06 07:21:25,532+0200 INFO (vm/543765d5) [vdsm.api] START getVolumeSize(sdUUID='531b2674-0bfa-4d71-bc58-85e280e867b9', spUUID='a5f4c7eb-3640-49f9-b644-57f10e9f5279', imgUUID='e4fcb0fc-3900-4422-a441-2147 bf8571f3', volUUID='12b3489b-90f5-4b94-aec1-1725296cd66e', options=None) from=internal, task_id=a1c0460b-4195-4b1c-af8a-d34474c5fa2e (api:48) 2020-08-06 07:21:25,545+0200 INFO (vmrecovery) [vds] VM recovered: 4b472d29-5f40-438f-8bca-9b2affbd3d37 (recovery:97) 2020-08-06 07:21:25,546+0200 INFO (vmrecovery) [vds] recovery [1:3/3]: recovered domain 4b472d29-5f40-438f-8bca-9b2affbd3d37 (recovery:123) 2020-08-06 07:21:25,546+0200 INFO (vmrecovery) [vds] recovery: waiting for 3 domains to go up (clientIF:717) 2020-08-06 07:21:25,546+0200 INFO (vm/4b472d29) [virt.vm] (vmId='4b472d29-5f40-438f-8bca-9b2affbd3d37') VM wrapper has started (vm:2786) 2020-08-06 07:21:25,546+0200 INFO (periodic/1) [vdsm.api] START repoStats(domains=()) from=internal, task_id=d8e1926b-d289-4dc1-a81e-ec1592083751 (api:48) 2020-08-06 07:21:25,547+0200 INFO (periodic/1) [vdsm.api] FINISH repoStats return={} from=internal, task_id=d8e1926b-d289-4dc1-a81e-ec1592083751 (api:54) 2020-08-06 07:22:41,636+0200 INFO (vmrecovery) [vds] recovery: waiting for storage pool to go up (clientIF:723) 2020-08-06 07:22:45,317+0200 INFO (jsonrpc/7) [api.host] START getAllVmStats() from=::ffff:127.0.0.1,41540 (api:48) 2020-08-06 07:22:45,317+0200 INFO (jsonrpc/7) [api.host] FINISH getAllVmStats return={'status': {'message': 'Done', 'code': 0}, 'statsList': (suppressed)} from=::ffff:127.0.0.1,41540 (api:54) 2020-08-06 07:22:45,318+0200 INFO (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call Host.getAllVmStats succeeded in 0.00 seconds (__init__:312) 2020-08-06 07:22:46,642+0200 INFO (vmrecovery) [vdsm.api] START getConnectedStoragePoolsList(options=None) from=internal, task_id=ad72b73f-d970-4367-a037-94bef589da57 (api:48) 2020-08-06 07:22:46,642+0200 INFO (vmrecovery) [vdsm.api] FINISH getConnectedStoragePoolsList return={'poollist': []} from=internal, task_id=ad72b73f-d970-4367-a037-94bef589da57 (api:54) 2020-08-06 07:22:46,642+0200 INFO (vmrecovery) [vds] recovery: waiting for storage pool to go up (clientIF:723) 2020-08-06 07:22:51,648+0200 INFO (vmrecovery) [vdsm.api] START getConnectedStoragePoolsList(options=None) from=internal, task_id=f14156c0-a819-4fec-aef7-f5f5b1dd0144 (api:48) 2020-08-06 07:22:51,648+0200 INFO (vmrecovery) [vdsm.api] FINISH getConnectedStoragePoolsList return={'poollist': []} from=internal, task_id=f14156c0-a819-4fec-aef7-f5f5b1dd0144 (api:54) 2020-08-06 07:22:51,649+0200 INFO (vmrecovery) [vds] recovery: waiting for storage pool to go up (clientIF:723) 2020-08-06 07:22:56,409+0200 INFO (periodic/1) [vdsm.api] START repoStats(domains=()) from=internal, task_id=75381d37-a06d-4174-b956-f5e7f108ea39 (api:48) 2020-08-06 07:22:56,409+0200 INFO (periodic/1) [vdsm.api] FINISH repoStats return={} from=internal, task_id=75381d37-a06d-4174-b956-f5e7f108ea39 (api:54) 2020-08-06 07:22:56,654+0200 INFO (vmrecovery) [vdsm.api] START getConnectedStoragePoolsList(options=None) from=internal, task_id=8f98ae7e-a681-4af3-a4d8-12f0501c8f68 (api:48) 2020-08-06 07:22:56,654+0200 INFO (vmrecovery) [vdsm.api] FINISH getConnectedStoragePoolsList return={'poollist': []} from=internal, task_id=8f98ae7e-a681-4af3-a4d8-12f0501c8f68 (api:54) 2020-08-06 07:22:56,655+0200 INFO (vmrecovery) [vds] recovery: waiting for storage pool to go up (clientIF:723) 2020-08-06 07:23:00,337+0200 INFO (jsonrpc/0) [api.host] START getAllVmStats() from=::ffff:127.0.0.1,41540 (api:48) 2020-08-06 07:23:00,337+0200 INFO (jsonrpc/0) [api.host] FINISH getAllVmStats return={'status': {'message': 'Done', 'code': 0}, 'statsList': (suppressed)} from=::ffff:127.0.0.1,41540 (api:54) 2020-08-06 07:23:00,337+0200 INFO (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC call Host.getAllVmStats succeeded in 0.00 seconds (__init__:312) 2020-08-06 07:23:01,660+0200 INFO (vmrecovery) [vdsm.api] START getConnectedStoragePoolsList(options=None) from=internal, task_id=f984accc-8e20-48b6-ba24-e00418d28043 (api:48) 2020-08-06 07:23:01,660+0200 INFO (vmrecovery) [vdsm.api] FINISH getConnectedStoragePoolsList return={'poollist': []} from=internal, task_id=f984accc-8e20-48b6-ba24-e00418d28043 (api:54) 2020-08-06 07:23:01,660+0200 INFO (vmrecovery) [vds] recovery: waiting for storage pool to go up (clientIF:723) I see "[vds] recovery: waiting for storage pool to go up (clientIF:723)" alot. Regards Nardus On Thu, 6 Aug 2020 at 07:09, Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Can you fheck for errors on the affected host. Most probably you need the vdsm logs.
Best Regards, Strahil Nikolov
На 6 август 2020 г. 7:40:23 GMT+03:00, Nardus Geldenhuys < nardusg@gmail.com> написа:
Hi Strahil
Hope you are well. I get the following error when I tried to confirm reboot:
Error while executing action: Cannot confirm 'Host has been rebooted' Host. Valid Host statuses are "Non operational", "Maintenance" or "Connecting".
And I can't put it in maintenance, only option is "restart" or "stop".
Regards
Nar
On Thu, 6 Aug 2020 at 06:16, Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
After rebooting the node, have you "marked" it that it was rebooted ?
Best Regards, Strahil Nikolov
На 5 август 2020 г. 21:29:04 GMT+03:00, Nardus Geldenhuys < nardusg@gmail.com> написа:
Hi oVirt land
Hope you are well. Got a bit of an issue, actually a big issue. We had some sort of dip of some sort. All the VM's is still running, but some of the hosts is show "Unassigned" or "NonResponsive". So all the hosts was showing UP and was fine before our dip. So I did increase vdsHeartbeatInSecond to 240, no luck.
I still get a timeout on the engine lock even thou I can connect to that host from the engine using nc to test to port 54321. I also did restart vdsmd and also rebooted the host with no luck.
nc -v someserver 54321 Ncat: Version 7.50 ( https://nmap.org/ncat ) Ncat: Connected to 172.40.2.172:54321.
2020-08-05 20:20:34,256+02 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(EE-ManagedThreadFactory-engineScheduled-Thread-70) [] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM someserver command Get Host Capabilities failed: Message timeout which can be caused by communication issues
Any troubleshoot ideas will be gladly appreciated.
Regards
Nar