
On Mon, Aug 26, 2019 at 6:13 PM Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
On Mon, Aug 26, 2019 at 12:44 PM Ales Musil <amusil@redhat.com> wrote:
On Mon, Aug 26, 2019 at 12:30 PM Gianluca Cecchi < gianluca.cecchi@gmail.com> wrote:
On Mon, Aug 26, 2019 at 11:58 AM Ales Musil <amusil@redhat.com> wrote:
I can see that MOM is failing to start because some of the MOM dependencies is not starting. Can you please post output from 'systemctl status momd'?
● momd.service - Memory Overcommitment Manager Daemon Loaded: loaded (/usr/lib/systemd/system/momd.service; static; vendor preset: disabled) Active: inactive (dead)
perhaps any other daemon status? Or any momd related log file generated?
BTW: I see on a running oVirt 4.3.5 node from another environment that the status of momd is the same inactive (dead)
What happens if you try to start the momd?
[root@ovirt01 ~]# systemctl status momd ● momd.service - Memory Overcommitment Manager Daemon Loaded: loaded (/usr/lib/systemd/system/momd.service; static; vendor preset: disabled) Active: inactive (dead) [root@ovirt01 ~]# systemctl start momd [root@ovirt01 ~]#
[root@ovirt01 ~]# systemctl status momd -l ● momd.service - Memory Overcommitment Manager Daemon Loaded: loaded (/usr/lib/systemd/system/momd.service; static; vendor preset: disabled) Active: inactive (dead) since Mon 2019-08-26 18:10:20 CEST; 6s ago Process: 18417 ExecStart=/usr/sbin/momd -c /etc/momd.conf -d --pid-file /var/run/momd.pid (code=exited, status=0/SUCCESS) Main PID: 18419 (code=exited, status=0/SUCCESS)
Aug 26 18:10:20 ovirt01.mydomain systemd[1]: Starting Memory Overcommitment Manager Daemon... Aug 26 18:10:20 ovirt01.mydomain systemd[1]: momd.service: Supervising process 18419 which is not our child. We'll most likely not notice when it exits. Aug 26 18:10:20 ovirt01.mydomain systemd[1]: Started Memory Overcommitment Manager Daemon. Aug 26 18:10:20 ovirt01.mydomain python[18419]: No worthy mechs found [root@ovirt01 ~]#
[root@ovirt01 ~]# ps -fp 18419 UID PID PPID C STIME TTY TIME CMD [root@ovirt01 ~]#
[root@ovirt01 vdsm]# ps -fp 18417 UID PID PPID C STIME TTY TIME CMD [root@ovirt01 vdsm]#
No log file update under /var/log/vdsm
[root@ovirt01 vdsm]# ls -lt | head -5 total 118972 -rw-r--r--. 1 root root 3406465 Aug 23 00:25 supervdsm.log -rw-r--r--. 1 root root 73621 Aug 23 00:25 upgrade.log -rw-r--r--. 1 vdsm kvm 0 Aug 23 00:01 vdsm.log -rw-r--r--. 1 vdsm kvm 538480 Aug 22 23:46 vdsm.log.1.xz [root@ovirt01 vdsm]#
Gianluca
It seems that these steps below solved the problem (donna what it was though..). Based on this similar (No worthy mechs found) I found inspiration from: https://lists.ovirt.org/pipermail/users/2017-January/079009.html [root@ovirt01 ~]# vdsm-tool configure Checking configuration status... abrt is not configured for vdsm Managed volume database is already configured lvm is configured for vdsm libvirt is already configured for vdsm SUCCESS: ssl configured to true. No conflicts Manual override for multipath.conf detected - preserving current configuration This manual override for multipath.conf was based on downrevved template. You are strongly advised to contact your support representatives Running configure... Reconfiguration of abrt is done. Done configuring modules to VDSM. [root@ovirt01 ~]# [root@ovirt01 ~]# systemctl restart vdsmd [root@ovirt01 ~]# systemctl status vdsmd ● vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/etc/systemd/system/vdsmd.service; enabled; vendor preset: enabled) Active: active (running) since Mon 2019-08-26 18:23:29 CEST; 19s ago Process: 27326 ExecStopPost=/usr/libexec/vdsm/vdsmd_init_common.sh --post-stop (code=exited, status=0/SUCCESS) Process: 27329 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS) Main PID: 27401 (vdsmd) Tasks: 75 CGroup: /system.slice/vdsmd.service ├─27401 /usr/bin/python2 /usr/share/vdsm/vdsmd ├─27524 /usr/libexec/ioprocess --read-pipe-fd 49 --write-pipe-fd 47 --max-threads 10 --max-queued-requests 10 ├─27531 /usr/libexec/ioprocess --read-pipe-fd 55 --write-pipe-fd 54 --max-threads 10 --max-queued-requests 10 ├─27544 /usr/libexec/ioprocess --read-pipe-fd 60 --write-pipe-fd 59 --max-threads 10 --max-queued-requests 10 ├─27553 /usr/libexec/ioprocess --read-pipe-fd 67 --write-pipe-fd 66 --max-threads 10 --max-queued-requests 10 ├─27559 /usr/libexec/ioprocess --read-pipe-fd 72 --write-pipe-fd 71 --max-threads 10 --max-queued-requests 10 └─27566 /usr/libexec/ioprocess --read-pipe-fd 78 --write-pipe-fd 77 --max-threads 10 --max-queued-requests 10 Aug 26 18:23:29 ovirt01.mydomain vdsmd_init_common.sh[27329]: vdsm: Running dummybr Aug 26 18:23:29 ovirt01.mydomain vdsmd_init_common.sh[27329]: vdsm: Running tune_system Aug 26 18:23:29 ovirt01.mydomain vdsmd_init_common.sh[27329]: vdsm: Running test_space Aug 26 18:23:29 ovirt01.mydomain vdsmd_init_common.sh[27329]: vdsm: Running test_lo Aug 26 18:23:29 ovirt01.mydomain systemd[1]: Started Virtual Desktop Server Manager. Aug 26 18:23:30 ovirt01.mydomain vdsm[27401]: WARN unhandled write event Aug 26 18:23:30 ovirt01.mydomain vdsm[27401]: WARN MOM not available. Aug 26 18:23:30 ovirt01.mydomain vdsm[27401]: WARN MOM not available, KSM stats will be missing. Aug 26 18:23:31 ovirt01.mydomain vdsm[27401]: WARN Not ready yet, ignoring event '|virt|VM_status|4dae6016-ff01-4a...r shu Aug 26 18:23:45 ovirt01.mydomain vdsm[27401]: WARN Worker blocked: <Worker name=periodic/1 running <Task <Operatio...back: File: "/usr/lib64/python2.7/threading.py", line 785, i Hint: Some lines were ellipsized, use -l to show in full. [root@ovirt01 ~]# Previously I had restarted vdsmd many times without effect... After a while (2 minutes) [root@ovirt01 ~]# hosted-engine --vm-status !! Cluster is in GLOBAL MAINTENANCE mode !! --== Host ovirt01.mydomain (id: 1) status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt01.mydomain Host ID : 1 Engine status : {"reason": "bad vm status", "health": "bad", "vm": "down", "detail": "Down"} Score : 3000 stopped : False Local maintenance : False crc32 : a68d97bb local_conf_timestamp : 324335 Host timestamp : 324335 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=324335 (Mon Aug 26 18:30:39 2019) host-id=1 score=3000 vm_conf_refresh_time=324335 (Mon Aug 26 18:30:39 2019) conf_on_shared_storage=True maintenance=False state=GlobalMaintenance stopped=False !! Cluster is in GLOBAL MAINTENANCE mode !! [root@ovirt01 ~]# that was the state when I updated the only present node exit from global maintenance: [root@ovirt01 ~]# hosted-engine --set-maintenance --mode=none [root@ovirt01 ~]# [root@ovirt01 ~]# hosted-engine --vm-status --== Host ovirt01.mydomain (id: 1) status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt01.mydomain Host ID : 1 Engine status : {"reason": "bad vm status", "health": "bad", "vm": "down", "detail": "Down"} Score : 3000 stopped : False Local maintenance : False crc32 : 7b58fabd local_conf_timestamp : 324386 Host timestamp : 324386 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=324386 (Mon Aug 26 18:31:29 2019) host-id=1 score=3000 vm_conf_refresh_time=324386 (Mon Aug 26 18:31:30 2019) conf_on_shared_storage=True maintenance=False state=EngineStarting stopped=False [root@ovirt01 ~]# [root@ovirt01 ~]# hosted-engine --vm-status --== Host ovirt01.mydomain (id: 1) status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt01.mydomain Host ID : 1 Engine status : {"health": "good", "vm": "up", "detail": "Up"} Score : 3400 stopped : False Local maintenance : False crc32 : 5e824330 local_conf_timestamp : 324468 Host timestamp : 324468 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=324468 (Mon Aug 26 18:32:51 2019) host-id=1 score=3400 vm_conf_refresh_time=324468 (Mon Aug 26 18:32:51 2019) conf_on_shared_storage=True maintenance=False state=EngineUp stopped=False [root@ovirt01 ~]# And I'm able to connect to my engine web admin gui. After a couple of more minutes, 5 or so, the data domain comes up and I'm able to power on the other VMs. [root@ovirt01 vdsm]# ls -lt | head -5 total 123972 -rw-r--r--. 1 vdsm kvm 201533 Aug 26 18:38 mom.log -rw-r--r--. 1 vdsm kvm 2075421 Aug 26 18:38 vdsm.log -rw-r--r--. 1 root root 3923102 Aug 26 18:38 supervdsm.log -rw-r--r--. 1 root root 73621 Aug 23 00:25 upgrade.log [root@ovirt01 vdsm]# let me know if you want any file to read and think about the reason... Thanks for the moment. Gianluca