Update single node environment from 4.3.3 to 4.3.5 problem

Hello, after updating hosted engine from 4.3.3 to 4.3.5 and then the only host composing the environment (plain CentOS 7.6) it seems it is not able to start vdsm daemons kernel installed with update is kernel-3.10.0-957.27.2.el7.x86_64 Same problem also if using previous running kernel 3.10.0-957.12.2.el7.x86_64 [root@ovirt01 vdsm]# uptime 00:50:08 up 25 min, 3 users, load average: 0.60, 0.67, 0.60 [root@ovirt01 vdsm]# [root@ovirt01 vdsm]# systemctl status vdsmd -l ● vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/etc/systemd/system/vdsmd.service; enabled; vendor preset: enabled) Active: failed (Result: start-limit) since Fri 2019-08-23 00:37:27 CEST; 7s ago Process: 25810 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=1/FAILURE) Aug 23 00:37:27 ovirt01.mydomain systemd[1]: Failed to start Virtual Desktop Server Manager. Aug 23 00:37:27 ovirt01.mydomain systemd[1]: Unit vdsmd.service entered failed state. Aug 23 00:37:27 ovirt01.mydomain systemd[1]: vdsmd.service failed. Aug 23 00:37:27 ovirt01.mydomain systemd[1]: vdsmd.service holdoff time over, scheduling restart. Aug 23 00:37:27 ovirt01.mydomain systemd[1]: Stopped Virtual Desktop Server Manager. Aug 23 00:37:27 ovirt01.mydomain systemd[1]: start request repeated too quickly for vdsmd.service Aug 23 00:37:27 ovirt01.mydomain systemd[1]: Failed to start Virtual Desktop Server Manager. Aug 23 00:37:27 ovirt01.mydomain systemd[1]: Unit vdsmd.service entered failed state. Aug 23 00:37:27 ovirt01.mydomain systemd[1]: vdsmd.service failed. [root@ovirt01 vdsm]# [root@ovirt01 vdsm]# pwd /var/log/vdsm [root@ovirt01 vdsm]# ll -t | head total 118972 -rw-r--r--. 1 root root 3406465 Aug 23 00:25 supervdsm.log -rw-r--r--. 1 root root 73621 Aug 23 00:25 upgrade.log -rw-r--r--. 1 vdsm kvm 0 Aug 23 00:01 vdsm.log -rw-r--r--. 1 vdsm kvm 538480 Aug 22 23:46 vdsm.log.1.xz -rw-r--r--. 1 vdsm kvm 187486 Aug 22 23:46 mom.log -rw-r--r--. 1 vdsm kvm 621320 Aug 22 22:01 vdsm.log.2.xz -rw-r--r--. 1 root root 374464 Aug 22 22:00 supervdsm.log.1.xz -rw-r--r--. 1 vdsm kvm 2097122 Aug 22 21:53 mom.log.1 -rw-r--r--. 1 vdsm kvm 636212 Aug 22 20:01 vdsm.log.3.xz [root@ovirt01 vdsm]# link to upgrade.log contents here: https://drive.google.com/file/d/17jtX36oH1hlbNUAiVhdBkVDbd28QegXG/view?usp=s... link to supervdsm.log (in gzip format) here: https://drive.google.com/file/d/1l61ePU-eFS_xVHEAHnJthzTTnTyzu0MP/view?usp=s... It seems that since update I get these kind of lines inside it... restore-net::DEBUG::2019-08-22 23:56:38,591::cmdutils::133::root::(exec_cmd) /sbin/tc filter del dev eth0 pref 5000 (cwd None) restore-net::DEBUG::2019-08-22 23:56:38,595::cmdutils::141::root::(exec_cmd) FAILED: <err> = 'RTNETLINK answers: Invalid argument\nWe have an error talking to the kernel\n'; <rc> = 2 [root@ovirt01 vdsm]# systemctl status supervdsmd -l ● supervdsmd.service - Auxiliary vdsm service for running helper functions as root Loaded: loaded (/usr/lib/systemd/system/supervdsmd.service; static; vendor preset: enabled) Active: active (running) since Fri 2019-08-23 00:25:17 CEST; 23min ago Main PID: 4540 (supervdsmd) Tasks: 3 CGroup: /system.slice/supervdsmd.service └─4540 /usr/bin/python2 /usr/share/vdsm/supervdsmd --sockfile /var/run/vdsm/svdsm.sock Aug 23 00:25:17 ovirt01.mydomain systemd[1]: Started Auxiliary vdsm service for running helper functions as root. [root@ovirt01 vdsm]# [root@ovirt01 vdsm]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovirtmgmt state UP group default qlen 1000 link/ether b8:ae:ed:7f:17:11 brd ff:ff:ff:ff:ff:ff 3: wlan0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 00:c2:c6:a4:18:c5 brd ff:ff:ff:ff:ff:ff 4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 36:21:c1:5e:70:aa brd ff:ff:ff:ff:ff:ff 5: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 46:d8:db:81:41:4e brd ff:ff:ff:ff:ff:ff 22: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether b8:ae:ed:7f:17:11 brd ff:ff:ff:ff:ff:ff inet 192.168.1.211/24 brd 192.168.1.255 scope global ovirtmgmt valid_lft forever preferred_lft forever [root@ovirt01 vdsm]# [root@ovirt01 vdsm]# ip route show default via 192.168.1.1 dev ovirtmgmt 192.168.1.0/24 dev ovirtmgmt proto kernel scope link src 192.168.1.211 [root@ovirt01 vdsm]# [root@ovirt01 vdsm]# brctl show bridge name bridge id STP enabled interfaces ovirtmgmt 8000.b8aeed7f1711 no eth0 [root@ovirt01 vdsm]# [root@ovirt01 vdsm]# systemctl status openvswitch ● openvswitch.service - Open vSwitch Loaded: loaded (/usr/lib/systemd/system/openvswitch.service; enabled; vendor preset: disabled) Active: active (exited) since Fri 2019-08-23 00:25:09 CEST; 26min ago Process: 3894 ExecStart=/bin/true (code=exited, status=0/SUCCESS) Main PID: 3894 (code=exited, status=0/SUCCESS) Tasks: 0 CGroup: /system.slice/openvswitch.service Aug 23 00:25:09 ovirt01.mydomain systemd[1]: Starting Open vSwitch... Aug 23 00:25:09 ovirt01.mydomain systemd[1]: Started Open vSwitch. [root@ovirt01 vdsm]# ovs-vsctl show 02539902-1788-4796-9cdf-cf11ce8436bb Bridge br-int fail_mode: secure Port br-int Interface br-int type: internal ovs_version: "2.11.0" [root@ovirt01 vdsm]# any hints? Thanks Gianluca

Relevant error in the logs seems to be: MainThread::DEBUG::2016-04-30 19:45:56,428::unified_persistence::46::root::(run) upgrade-unified-persistence upgrade persisting networks {} and bondings {} MainThread::INFO::2016-04-30 19:45:56,428::netconfpersistence::187::root::(_clearDisk) Clearing /var/run/vdsm/netconf/nets/ and /var/run/vdsm/netconf/bonds/ MainThread::DEBUG::2016-04-30 19:45:56,428::netconfpersistence::195::root::(_clearDisk) No existent config to clear. MainThread::INFO::2016-04-30 19:45:56,428::netconfpersistence::187::root::(_clearDisk) Clearing /var/run/vdsm/netconf/nets/ and /var/run/vdsm/netconf/bonds/ MainThread::DEBUG::2016-04-30 19:45:56,428::netconfpersistence::195::root::(_clearDisk) No existent config to clear. MainThread::INFO::2016-04-30 19:45:56,428::netconfpersistence::131::root::(save) Saved new config RunningConfig({}, {}) to /var/run/vdsm/netconf/nets/ and /var/run/vdsm/netconf/bonds/ MainThread::DEBUG::2016-04-30 19:45:56,428::utils::671::root::(execCmd) /usr/bin/taskset --cpu-list 0-3 /usr/share/vdsm/vdsm-store-net-config unified (cwd None) MainThread::DEBUG::2016-04-30 19:45:56,440::utils::689::root::(execCmd) SUCCESS: <err> = 'cp: cannot stat \xe2\x80\x98/var/run/vdsm/netconf\xe2\x80\x99: No such file or directory\n'; <rc> = 0 MainThread::DEBUG::2016-04-30 19:45:56,441::upgrade::51::upgrade::(_upgrade_seal) Upgrade upgrade-unified-persistence successfully performed MainThread::DEBUG::2017-12-31 16:44:52,918::libvirtconnection::163::root::(get) trying to connect libvirt MainThread::INFO::2017-12-31 16:44:53,033::netconfpersistence::194::root::(_clearDisk) Clearing /var/lib/vdsm/persistence/netconf/nets/ and /var/lib/vdsm/persistence/netconf/bonds/ MainThread::WARNING::2017-12-31 16:44:53,034::fileutils::96::root::(rm_tree) Directory: /var/lib/vdsm/persistence/netconf/bonds/ already removed MainThread::INFO::2017-12-31 16:44:53,034::netconfpersistence::139::root::(save) Saved new config PersistentConfig({'ovirtmgmt': {'ipv6autoconf': False, 'nameservers': ['192.168.1.1', '8.8.8.8'], u'nic': u'eth0', 'dhcpv6': False, u'ipaddr': u'192.168.1.211', 'switch': 'legacy', 'mtu': 1500, u'netmask': u'255.255.255.0', u'bootproto': u'static', 'stp': False, 'bridged': True, u'gateway': u'192.168.1.1', u'defaultRoute': True}}, {}) to /var/lib/vdsm/persistence/netconf/nets/ and /var/lib/vdsm/persistence/netconf/bonds/ MainThread::DEBUG::2017-12-31 16:44:53,035::cmdutils::150::root::(exec_cmd) /usr/share/openvswitch/scripts/ovs-ctl status (cwd None) MainThread::DEBUG::2017-12-31 16:44:53,069::cmdutils::158::root::(exec_cmd) FAILED: <err> = ''; <rc> = 1 MainThread::DEBUG::2018-02-16 23:59:17,968::libvirtconnection::167::root::(get) trying to connect libvirt MainThread::INFO::2018-02-16 23:59:18,500::netconfpersistence::198::root::(_clearDisk) Clearing netconf: /var/lib/vdsm/staging/netconf MainThread::ERROR::2018-02-16 23:59:18,501::fileutils::53::root::(rm_file) Removing file: /var/lib/vdsm/staging/netconf failed Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/common/fileutils.py", line 48, in rm_file os.unlink(file_to_remove) OSError: [Errno 21] Is a directory: '/var/lib/vdsm/staging/netconf' +Dominik Holler <dholler@redhat.com> can you please have a look? Il giorno ven 23 ago 2019 alle ore 00:56 Gianluca Cecchi < gianluca.cecchi@gmail.com> ha scritto:
Hello, after updating hosted engine from 4.3.3 to 4.3.5 and then the only host composing the environment (plain CentOS 7.6) it seems it is not able to start vdsm daemons
kernel installed with update is kernel-3.10.0-957.27.2.el7.x86_64 Same problem also if using previous running kernel 3.10.0-957.12.2.el7.x86_64
[root@ovirt01 vdsm]# uptime 00:50:08 up 25 min, 3 users, load average: 0.60, 0.67, 0.60 [root@ovirt01 vdsm]#
[root@ovirt01 vdsm]# systemctl status vdsmd -l ● vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/etc/systemd/system/vdsmd.service; enabled; vendor preset: enabled) Active: failed (Result: start-limit) since Fri 2019-08-23 00:37:27 CEST; 7s ago Process: 25810 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=1/FAILURE)
Aug 23 00:37:27 ovirt01.mydomain systemd[1]: Failed to start Virtual Desktop Server Manager. Aug 23 00:37:27 ovirt01.mydomain systemd[1]: Unit vdsmd.service entered failed state. Aug 23 00:37:27 ovirt01.mydomain systemd[1]: vdsmd.service failed. Aug 23 00:37:27 ovirt01.mydomain systemd[1]: vdsmd.service holdoff time over, scheduling restart. Aug 23 00:37:27 ovirt01.mydomain systemd[1]: Stopped Virtual Desktop Server Manager. Aug 23 00:37:27 ovirt01.mydomain systemd[1]: start request repeated too quickly for vdsmd.service Aug 23 00:37:27 ovirt01.mydomain systemd[1]: Failed to start Virtual Desktop Server Manager. Aug 23 00:37:27 ovirt01.mydomain systemd[1]: Unit vdsmd.service entered failed state. Aug 23 00:37:27 ovirt01.mydomain systemd[1]: vdsmd.service failed. [root@ovirt01 vdsm]#
[root@ovirt01 vdsm]# pwd /var/log/vdsm [root@ovirt01 vdsm]# ll -t | head total 118972 -rw-r--r--. 1 root root 3406465 Aug 23 00:25 supervdsm.log -rw-r--r--. 1 root root 73621 Aug 23 00:25 upgrade.log -rw-r--r--. 1 vdsm kvm 0 Aug 23 00:01 vdsm.log -rw-r--r--. 1 vdsm kvm 538480 Aug 22 23:46 vdsm.log.1.xz -rw-r--r--. 1 vdsm kvm 187486 Aug 22 23:46 mom.log -rw-r--r--. 1 vdsm kvm 621320 Aug 22 22:01 vdsm.log.2.xz -rw-r--r--. 1 root root 374464 Aug 22 22:00 supervdsm.log.1.xz -rw-r--r--. 1 vdsm kvm 2097122 Aug 22 21:53 mom.log.1 -rw-r--r--. 1 vdsm kvm 636212 Aug 22 20:01 vdsm.log.3.xz [root@ovirt01 vdsm]#
link to upgrade.log contents here:
https://drive.google.com/file/d/17jtX36oH1hlbNUAiVhdBkVDbd28QegXG/view?usp=s...
link to supervdsm.log (in gzip format) here:
https://drive.google.com/file/d/1l61ePU-eFS_xVHEAHnJthzTTnTyzu0MP/view?usp=s...
It seems that since update I get these kind of lines inside it... restore-net::DEBUG::2019-08-22 23:56:38,591::cmdutils::133::root::(exec_cmd) /sbin/tc filter del dev eth0 pref 5000 (cwd None) restore-net::DEBUG::2019-08-22 23:56:38,595::cmdutils::141::root::(exec_cmd) FAILED: <err> = 'RTNETLINK answers: Invalid argument\nWe have an error talking to the kernel\n'; <rc> = 2
[root@ovirt01 vdsm]# systemctl status supervdsmd -l ● supervdsmd.service - Auxiliary vdsm service for running helper functions as root Loaded: loaded (/usr/lib/systemd/system/supervdsmd.service; static; vendor preset: enabled) Active: active (running) since Fri 2019-08-23 00:25:17 CEST; 23min ago Main PID: 4540 (supervdsmd) Tasks: 3 CGroup: /system.slice/supervdsmd.service └─4540 /usr/bin/python2 /usr/share/vdsm/supervdsmd --sockfile /var/run/vdsm/svdsm.sock
Aug 23 00:25:17 ovirt01.mydomain systemd[1]: Started Auxiliary vdsm service for running helper functions as root. [root@ovirt01 vdsm]#
[root@ovirt01 vdsm]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovirtmgmt state UP group default qlen 1000 link/ether b8:ae:ed:7f:17:11 brd ff:ff:ff:ff:ff:ff 3: wlan0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 00:c2:c6:a4:18:c5 brd ff:ff:ff:ff:ff:ff 4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 36:21:c1:5e:70:aa brd ff:ff:ff:ff:ff:ff 5: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 46:d8:db:81:41:4e brd ff:ff:ff:ff:ff:ff 22: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether b8:ae:ed:7f:17:11 brd ff:ff:ff:ff:ff:ff inet 192.168.1.211/24 brd 192.168.1.255 scope global ovirtmgmt valid_lft forever preferred_lft forever [root@ovirt01 vdsm]#
[root@ovirt01 vdsm]# ip route show default via 192.168.1.1 dev ovirtmgmt 192.168.1.0/24 dev ovirtmgmt proto kernel scope link src 192.168.1.211 [root@ovirt01 vdsm]#
[root@ovirt01 vdsm]# brctl show bridge name bridge id STP enabled interfaces ovirtmgmt 8000.b8aeed7f1711 no eth0 [root@ovirt01 vdsm]#
[root@ovirt01 vdsm]# systemctl status openvswitch ● openvswitch.service - Open vSwitch Loaded: loaded (/usr/lib/systemd/system/openvswitch.service; enabled; vendor preset: disabled) Active: active (exited) since Fri 2019-08-23 00:25:09 CEST; 26min ago Process: 3894 ExecStart=/bin/true (code=exited, status=0/SUCCESS) Main PID: 3894 (code=exited, status=0/SUCCESS) Tasks: 0 CGroup: /system.slice/openvswitch.service
Aug 23 00:25:09 ovirt01.mydomain systemd[1]: Starting Open vSwitch... Aug 23 00:25:09 ovirt01.mydomain systemd[1]: Started Open vSwitch.
[root@ovirt01 vdsm]# ovs-vsctl show 02539902-1788-4796-9cdf-cf11ce8436bb Bridge br-int fail_mode: secure Port br-int Interface br-int type: internal ovs_version: "2.11.0" [root@ovirt01 vdsm]#
any hints? Thanks
Gianluca _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/LLOINJQDWYQFJB...
-- Sandro Bonazzola MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA <https://www.redhat.com/> sbonazzo@redhat.com <https://www.redhat.com/>*Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours. <https://mojo.redhat.com/docs/DOC-1199578>*

On Fri, Aug 23, 2019 at 2:25 PM Sandro Bonazzola <sbonazzo@redhat.com> wrote:
Relevant error in the logs seems to be:
MainThread::DEBUG::2016-04-30 19:45:56,428::unified_persistence::46::root::(run) upgrade-unified-persistence upgrade persisting networks {} and bondings {} MainThread::INFO::2016-04-30 19:45:56,428::netconfpersistence::187::root::(_clearDisk) Clearing /var/run/vdsm/netconf/nets/ and /var/run/vdsm/netconf/bonds/ MainThread::DEBUG::2016-04-30 19:45:56,428::netconfpersistence::195::root::(_clearDisk) No existent config to clear. MainThread::INFO::2016-04-30 19:45:56,428::netconfpersistence::187::root::(_clearDisk) Clearing /var/run/vdsm/netconf/nets/ and /var/run/vdsm/netconf/bonds/ MainThread::DEBUG::2016-04-30 19:45:56,428::netconfpersistence::195::root::(_clearDisk) No existent config to clear. MainThread::INFO::2016-04-30 19:45:56,428::netconfpersistence::131::root::(save) Saved new config RunningConfig({}, {}) to /var/run/vdsm/netconf/nets/ and /var/run/vdsm/netconf/bonds/ MainThread::DEBUG::2016-04-30 19:45:56,428::utils::671::root::(execCmd) /usr/bin/taskset --cpu-list 0-3 /usr/share/vdsm/vdsm-store-net-config unified (cwd None) MainThread::DEBUG::2016-04-30 19:45:56,440::utils::689::root::(execCmd) SUCCESS: <err> = 'cp: cannot stat \xe2\x80\x98/var/run/vdsm/netconf\xe2\x80\x99: No such file or directory\n'; <rc> = 0 MainThread::DEBUG::2016-04-30 19:45:56,441::upgrade::51::upgrade::(_upgrade_seal) Upgrade upgrade-unified-persistence successfully performed MainThread::DEBUG::2017-12-31 16:44:52,918::libvirtconnection::163::root::(get) trying to connect libvirt MainThread::INFO::2017-12-31 16:44:53,033::netconfpersistence::194::root::(_clearDisk) Clearing /var/lib/vdsm/persistence/netconf/nets/ and /var/lib/vdsm/persistence/netconf/bonds/ MainThread::WARNING::2017-12-31 16:44:53,034::fileutils::96::root::(rm_tree) Directory: /var/lib/vdsm/persistence/netconf/bonds/ already removed MainThread::INFO::2017-12-31 16:44:53,034::netconfpersistence::139::root::(save) Saved new config PersistentConfig({'ovirtmgmt': {'ipv6autoconf': False, 'nameservers': ['192.168.1.1', '8.8.8.8'], u'nic': u'eth0', 'dhcpv6': False, u'ipaddr': u'192.168.1.211', 'switch': 'legacy', 'mtu': 1500, u'netmask': u'255.255.255.0', u'bootproto': u'static', 'stp': False, 'bridged': True, u'gateway': u'192.168.1.1', u'defaultRoute': True}}, {}) to /var/lib/vdsm/persistence/netconf/nets/ and /var/lib/vdsm/persistence/netconf/bonds/ MainThread::DEBUG::2017-12-31 16:44:53,035::cmdutils::150::root::(exec_cmd) /usr/share/openvswitch/scripts/ovs-ctl status (cwd None) MainThread::DEBUG::2017-12-31 16:44:53,069::cmdutils::158::root::(exec_cmd) FAILED: <err> = ''; <rc> = 1 MainThread::DEBUG::2018-02-16 23:59:17,968::libvirtconnection::167::root::(get) trying to connect libvirt MainThread::INFO::2018-02-16 23:59:18,500::netconfpersistence::198::root::(_clearDisk) Clearing netconf: /var/lib/vdsm/staging/netconf MainThread::ERROR::2018-02-16 23:59:18,501::fileutils::53::root::(rm_file) Removing file: /var/lib/vdsm/staging/netconf failed Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/common/fileutils.py", line 48, in rm_file os.unlink(file_to_remove) OSError: [Errno 21] Is a directory: '/var/lib/vdsm/staging/netconf'
+Dominik Holler <dholler@redhat.com> can you please have a look?
Just for reference today I also updated, but from 4.3.4 (not 4.3.4) to 4.3.5 an environment composed by three plain CentOS 7.6 servers (using iSCSI as storage domains, and not NFS as the single one) and I had no problems. In their log I don't see these strange attempts to kind of "destroy" and "regenerate" network config... The single server is actually a NUC while the 3 servers are Dell M610 blades. Both the single server and the three ones are without NetworkManager. The 3-servers environment has though an external engine, while the single server one is hosted-engine based, I don't know if this can make any difference. Gianluca

On Fri, Aug 23, 2019 at 2:25 PM Sandro Bonazzola <sbonazzo@redhat.com> wrote:
Relevant error in the logs seems to be:
MainThread::DEBUG::2016-04-30 19:45:56,428::unified_persistence::46::root::(run) upgrade-unified-persistence upgrade persisting networks {} and bondings {} MainThread::INFO::2016-04-30 19:45:56,428::netconfpersistence::187::root::(_clearDisk) Clearing /var/run/vdsm/netconf/nets/ and /var/run/vdsm/netconf/bonds/ MainThread::DEBUG::2016-04-30 19:45:56,428::netconfpersistence::195::root::(_clearDisk) No existent config to clear. MainThread::INFO::2016-04-30 19:45:56,428::netconfpersistence::187::root::(_clearDisk) Clearing /var/run/vdsm/netconf/nets/ and /var/run/vdsm/netconf/bonds/ MainThread::DEBUG::2016-04-30 19:45:56,428::netconfpersistence::195::root::(_clearDisk) No existent config to clear. MainThread::INFO::2016-04-30 19:45:56,428::netconfpersistence::131::root::(save) Saved new config RunningConfig({}, {}) to /var/run/vdsm/netconf/nets/ and /var/run/vdsm/netconf/bonds/ MainThread::DEBUG::2016-04-30 19:45:56,428::utils::671::root::(execCmd) /usr/bin/taskset --cpu-list 0-3 /usr/share/vdsm/vdsm-store-net-config unified (cwd None) MainThread::DEBUG::2016-04-30 19:45:56,440::utils::689::root::(execCmd) SUCCESS: <err> = 'cp: cannot stat \xe2\x80\x98/var/run/vdsm/netconf\xe2\x80\x99: No such file or directory\n'; <rc> = 0 MainThread::DEBUG::2016-04-30 19:45:56,441::upgrade::51::upgrade::(_upgrade_seal) Upgrade upgrade-unified-persistence successfully performed MainThread::DEBUG::2017-12-31 16:44:52,918::libvirtconnection::163::root::(get) trying to connect libvirt MainThread::INFO::2017-12-31 16:44:53,033::netconfpersistence::194::root::(_clearDisk) Clearing /var/lib/vdsm/persistence/netconf/nets/ and /var/lib/vdsm/persistence/netconf/bonds/ MainThread::WARNING::2017-12-31 16:44:53,034::fileutils::96::root::(rm_tree) Directory: /var/lib/vdsm/persistence/netconf/bonds/ already removed MainThread::INFO::2017-12-31 16:44:53,034::netconfpersistence::139::root::(save) Saved new config PersistentConfig({'ovirtmgmt': {'ipv6autoconf': False, 'nameservers': ['192.168.1.1', '8.8.8.8'], u'nic': u'eth0', 'dhcpv6': False, u'ipaddr': u'192.168.1.211', 'switch': 'legacy', 'mtu': 1500, u'netmask': u'255.255.255.0', u'bootproto': u'static', 'stp': False, 'bridged': True, u'gateway': u'192.168.1.1', u'defaultRoute': True}}, {}) to /var/lib/vdsm/persistence/netconf/nets/ and /var/lib/vdsm/persistence/netconf/bonds/ MainThread::DEBUG::2017-12-31 16:44:53,035::cmdutils::150::root::(exec_cmd) /usr/share/openvswitch/scripts/ovs-ctl status (cwd None) MainThread::DEBUG::2017-12-31 16:44:53,069::cmdutils::158::root::(exec_cmd) FAILED: <err> = ''; <rc> = 1 MainThread::DEBUG::2018-02-16 23:59:17,968::libvirtconnection::167::root::(get) trying to connect libvirt MainThread::INFO::2018-02-16 23:59:18,500::netconfpersistence::198::root::(_clearDisk) Clearing netconf: /var/lib/vdsm/staging/netconf MainThread::ERROR::2018-02-16 23:59:18,501::fileutils::53::root::(rm_file) Removing file: /var/lib/vdsm/staging/netconf failed Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/common/fileutils.py", line 48, in rm_file os.unlink(file_to_remove) OSError: [Errno 21] Is a directory: '/var/lib/vdsm/staging/netconf'
+Dominik Holler <dholler@redhat.com> can you please have a look?
Gianluca, can you please share the output of 'rpm -qa' of the affected host?
Il giorno ven 23 ago 2019 alle ore 00:56 Gianluca Cecchi < gianluca.cecchi@gmail.com> ha scritto:
Hello, after updating hosted engine from 4.3.3 to 4.3.5 and then the only host composing the environment (plain CentOS 7.6) it seems it is not able to start vdsm daemons
kernel installed with update is kernel-3.10.0-957.27.2.el7.x86_64 Same problem also if using previous running kernel 3.10.0-957.12.2.el7.x86_64
[root@ovirt01 vdsm]# uptime 00:50:08 up 25 min, 3 users, load average: 0.60, 0.67, 0.60 [root@ovirt01 vdsm]#
[root@ovirt01 vdsm]# systemctl status vdsmd -l ● vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/etc/systemd/system/vdsmd.service; enabled; vendor preset: enabled) Active: failed (Result: start-limit) since Fri 2019-08-23 00:37:27 CEST; 7s ago Process: 25810 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=1/FAILURE)
Aug 23 00:37:27 ovirt01.mydomain systemd[1]: Failed to start Virtual Desktop Server Manager. Aug 23 00:37:27 ovirt01.mydomain systemd[1]: Unit vdsmd.service entered failed state. Aug 23 00:37:27 ovirt01.mydomain systemd[1]: vdsmd.service failed. Aug 23 00:37:27 ovirt01.mydomain systemd[1]: vdsmd.service holdoff time over, scheduling restart. Aug 23 00:37:27 ovirt01.mydomain systemd[1]: Stopped Virtual Desktop Server Manager. Aug 23 00:37:27 ovirt01.mydomain systemd[1]: start request repeated too quickly for vdsmd.service Aug 23 00:37:27 ovirt01.mydomain systemd[1]: Failed to start Virtual Desktop Server Manager. Aug 23 00:37:27 ovirt01.mydomain systemd[1]: Unit vdsmd.service entered failed state. Aug 23 00:37:27 ovirt01.mydomain systemd[1]: vdsmd.service failed. [root@ovirt01 vdsm]#
[root@ovirt01 vdsm]# pwd /var/log/vdsm [root@ovirt01 vdsm]# ll -t | head total 118972 -rw-r--r--. 1 root root 3406465 Aug 23 00:25 supervdsm.log -rw-r--r--. 1 root root 73621 Aug 23 00:25 upgrade.log -rw-r--r--. 1 vdsm kvm 0 Aug 23 00:01 vdsm.log -rw-r--r--. 1 vdsm kvm 538480 Aug 22 23:46 vdsm.log.1.xz -rw-r--r--. 1 vdsm kvm 187486 Aug 22 23:46 mom.log -rw-r--r--. 1 vdsm kvm 621320 Aug 22 22:01 vdsm.log.2.xz -rw-r--r--. 1 root root 374464 Aug 22 22:00 supervdsm.log.1.xz -rw-r--r--. 1 vdsm kvm 2097122 Aug 22 21:53 mom.log.1 -rw-r--r--. 1 vdsm kvm 636212 Aug 22 20:01 vdsm.log.3.xz [root@ovirt01 vdsm]#
link to upgrade.log contents here:
https://drive.google.com/file/d/17jtX36oH1hlbNUAiVhdBkVDbd28QegXG/view?usp=s...
link to supervdsm.log (in gzip format) here:
https://drive.google.com/file/d/1l61ePU-eFS_xVHEAHnJthzTTnTyzu0MP/view?usp=s...
It seems that since update I get these kind of lines inside it... restore-net::DEBUG::2019-08-22 23:56:38,591::cmdutils::133::root::(exec_cmd) /sbin/tc filter del dev eth0 pref 5000 (cwd None) restore-net::DEBUG::2019-08-22 23:56:38,595::cmdutils::141::root::(exec_cmd) FAILED: <err> = 'RTNETLINK answers: Invalid argument\nWe have an error talking to the kernel\n'; <rc> = 2
[root@ovirt01 vdsm]# systemctl status supervdsmd -l ● supervdsmd.service - Auxiliary vdsm service for running helper functions as root Loaded: loaded (/usr/lib/systemd/system/supervdsmd.service; static; vendor preset: enabled) Active: active (running) since Fri 2019-08-23 00:25:17 CEST; 23min ago Main PID: 4540 (supervdsmd) Tasks: 3 CGroup: /system.slice/supervdsmd.service └─4540 /usr/bin/python2 /usr/share/vdsm/supervdsmd --sockfile /var/run/vdsm/svdsm.sock
Aug 23 00:25:17 ovirt01.mydomain systemd[1]: Started Auxiliary vdsm service for running helper functions as root. [root@ovirt01 vdsm]#
[root@ovirt01 vdsm]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovirtmgmt state UP group default qlen 1000 link/ether b8:ae:ed:7f:17:11 brd ff:ff:ff:ff:ff:ff 3: wlan0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 00:c2:c6:a4:18:c5 brd ff:ff:ff:ff:ff:ff 4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 36:21:c1:5e:70:aa brd ff:ff:ff:ff:ff:ff 5: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 46:d8:db:81:41:4e brd ff:ff:ff:ff:ff:ff 22: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether b8:ae:ed:7f:17:11 brd ff:ff:ff:ff:ff:ff inet 192.168.1.211/24 brd 192.168.1.255 scope global ovirtmgmt valid_lft forever preferred_lft forever [root@ovirt01 vdsm]#
[root@ovirt01 vdsm]# ip route show default via 192.168.1.1 dev ovirtmgmt 192.168.1.0/24 dev ovirtmgmt proto kernel scope link src 192.168.1.211 [root@ovirt01 vdsm]#
[root@ovirt01 vdsm]# brctl show bridge name bridge id STP enabled interfaces ovirtmgmt 8000.b8aeed7f1711 no eth0 [root@ovirt01 vdsm]#
[root@ovirt01 vdsm]# systemctl status openvswitch ● openvswitch.service - Open vSwitch Loaded: loaded (/usr/lib/systemd/system/openvswitch.service; enabled; vendor preset: disabled) Active: active (exited) since Fri 2019-08-23 00:25:09 CEST; 26min ago Process: 3894 ExecStart=/bin/true (code=exited, status=0/SUCCESS) Main PID: 3894 (code=exited, status=0/SUCCESS) Tasks: 0 CGroup: /system.slice/openvswitch.service
Aug 23 00:25:09 ovirt01.mydomain systemd[1]: Starting Open vSwitch... Aug 23 00:25:09 ovirt01.mydomain systemd[1]: Started Open vSwitch.
[root@ovirt01 vdsm]# ovs-vsctl show 02539902-1788-4796-9cdf-cf11ce8436bb Bridge br-int fail_mode: secure Port br-int Interface br-int type: internal ovs_version: "2.11.0" [root@ovirt01 vdsm]#
any hints? Thanks
Gianluca _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/LLOINJQDWYQFJB...
--
Sandro Bonazzola
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA <https://www.redhat.com/>
sbonazzo@redhat.com <https://www.redhat.com/>*Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours. <https://mojo.redhat.com/docs/DOC-1199578>*

On Fri, Aug 23, 2019 at 5:06 PM Dominik Holler <dholler@redhat.com> wrote:
Gianluca, can you please share the output of 'rpm -qa' of the affected host?
here it is output of "rpm -qa | sort" https://drive.google.com/file/d/1JG8XfomPSgqp4Y40KOwTGsixnkqkMfml/view?usp=s... and here contents of yum.log-20190823 that contains logs since March this year if it can help: https://drive.google.com/file/d/1zKXbY2ySLPM4TSyzzZ1_AvUrA8sCMYOm/view?usp=s... Thanks, Gianluca

Il Ven 23 Ago 2019, 18:00 Gianluca Cecchi <gianluca.cecchi@gmail.com> ha scritto:
On Fri, Aug 23, 2019 at 5:06 PM Dominik Holler <dholler@redhat.com> wrote:
Gianluca, can you please share the output of 'rpm -qa' of the affected host?
here it is output of "rpm -qa | sort"
https://drive.google.com/file/d/1JG8XfomPSgqp4Y40KOwTGsixnkqkMfml/view?usp=s...
Anything useful from list of pages for me to try? Thanks Gianluca

On Sun, Aug 25, 2019 at 4:33 PM Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
Il Ven 23 Ago 2019, 18:00 Gianluca Cecchi <gianluca.cecchi@gmail.com> ha scritto:
On Fri, Aug 23, 2019 at 5:06 PM Dominik Holler <dholler@redhat.com> wrote:
Gianluca, can you please share the output of 'rpm -qa' of the affected host?
here it is output of "rpm -qa | sort"
https://drive.google.com/file/d/1JG8XfomPSgqp4Y40KOwTGsixnkqkMfml/view?usp=s...
Anything useful from list of pages for me to try?
I was not able to understand why the services did not start as expected. Can you please share the relevant information from the journal?
Thanks Gianluca

On Mon, Aug 26, 2019 at 9:57 AM Dominik Holler <dholler@redhat.com> wrote:
On Sun, Aug 25, 2019 at 4:33 PM Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
Il Ven 23 Ago 2019, 18:00 Gianluca Cecchi <gianluca.cecchi@gmail.com> ha scritto:
On Fri, Aug 23, 2019 at 5:06 PM Dominik Holler <dholler@redhat.com> wrote:
Gianluca, can you please share the output of 'rpm -qa' of the affected host?
here it is output of "rpm -qa | sort"
https://drive.google.com/file/d/1JG8XfomPSgqp4Y40KOwTGsixnkqkMfml/view?usp=s...
Anything useful from list of pages for me to try?
I was not able to understand why the services did not start as expected. Can you please share the relevant information from the journal?
Are you interested only to the last boot journal entries, correct? Because I presume I have not set it as persistent and it seems oVirt doesn't set it. Any special switch to give to journalctl command? Thanks, Gianluca

On Mon, Aug 26, 2019 at 10:23 AM Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
On Mon, Aug 26, 2019 at 9:57 AM Dominik Holler <dholler@redhat.com> wrote:
On Sun, Aug 25, 2019 at 4:33 PM Gianluca Cecchi < gianluca.cecchi@gmail.com> wrote:
Il Ven 23 Ago 2019, 18:00 Gianluca Cecchi <gianluca.cecchi@gmail.com> ha scritto:
On Fri, Aug 23, 2019 at 5:06 PM Dominik Holler <dholler@redhat.com> wrote:
Gianluca, can you please share the output of 'rpm -qa' of the affected host?
here it is output of "rpm -qa | sort"
https://drive.google.com/file/d/1JG8XfomPSgqp4Y40KOwTGsixnkqkMfml/view?usp=s...
Anything useful from list of pages for me to try?
I was not able to understand why the services did not start as expected. Can you please share the relevant information from the journal?
Are you interested only to the last boot journal entries, correct? Because I presume I have not set it as persistent and it seems oVirt doesn't set it. Any special switch to give to journalctl command?
journalctl -xe could give us hint what is preventing vdsmd from starting.
Thanks, Gianluca _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/KDTVJXSAJRNCCW...
-- Ales Musil Associate Software Engineer - RHV Network Red Hat EMEA <https://www.redhat.com> amusil@redhat.com IM: amusil <https://red.ht/sig>

On Mon, Aug 26, 2019 at 10:41 AM Ales Musil <amusil@redhat.com> wrote:
On Mon, Aug 26, 2019 at 10:23 AM Gianluca Cecchi < gianluca.cecchi@gmail.com> wrote:
On Mon, Aug 26, 2019 at 9:57 AM Dominik Holler <dholler@redhat.com> wrote:
On Sun, Aug 25, 2019 at 4:33 PM Gianluca Cecchi < gianluca.cecchi@gmail.com> wrote:
Il Ven 23 Ago 2019, 18:00 Gianluca Cecchi <gianluca.cecchi@gmail.com> ha scritto:
On Fri, Aug 23, 2019 at 5:06 PM Dominik Holler <dholler@redhat.com> wrote:
Gianluca, can you please share the output of 'rpm -qa' of the affected host?
here it is output of "rpm -qa | sort"
https://drive.google.com/file/d/1JG8XfomPSgqp4Y40KOwTGsixnkqkMfml/view?usp=s...
Anything useful from list of pages for me to try?
I was not able to understand why the services did not start as expected. Can you please share the relevant information from the journal?
Are you interested only to the last boot journal entries, correct? Because I presume I have not set it as persistent and it seems oVirt doesn't set it. Any special switch to give to journalctl command?
journalctl -xe could give us hint what is preventing vdsmd from starting.
here it is: https://drive.google.com/file/d/1AyBUPTVqpiSAIYBqe8B6OU8Gn9Qg9cS5/view?usp=s... thanks, Gianluca

On Mon, Aug 26, 2019 at 11:49 AM Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
On Mon, Aug 26, 2019 at 10:41 AM Ales Musil <amusil@redhat.com> wrote:
On Mon, Aug 26, 2019 at 10:23 AM Gianluca Cecchi < gianluca.cecchi@gmail.com> wrote:
On Mon, Aug 26, 2019 at 9:57 AM Dominik Holler <dholler@redhat.com> wrote:
On Sun, Aug 25, 2019 at 4:33 PM Gianluca Cecchi < gianluca.cecchi@gmail.com> wrote:
Il Ven 23 Ago 2019, 18:00 Gianluca Cecchi <gianluca.cecchi@gmail.com> ha scritto:
On Fri, Aug 23, 2019 at 5:06 PM Dominik Holler <dholler@redhat.com> wrote:
> > > > Gianluca, can you please share the output of 'rpm -qa' of the > affected host? >
here it is output of "rpm -qa | sort"
https://drive.google.com/file/d/1JG8XfomPSgqp4Y40KOwTGsixnkqkMfml/view?usp=s...
Anything useful from list of pages for me to try?
I was not able to understand why the services did not start as expected. Can you please share the relevant information from the journal?
Are you interested only to the last boot journal entries, correct? Because I presume I have not set it as persistent and it seems oVirt doesn't set it. Any special switch to give to journalctl command?
journalctl -xe could give us hint what is preventing vdsmd from starting.
here it is:
https://drive.google.com/file/d/1AyBUPTVqpiSAIYBqe8B6OU8Gn9Qg9cS5/view?usp=s...
thanks, Gianluca
I can see that MOM is failing to start because some of the MOM dependencies is not starting. Can you please post output from 'systemctl status momd'? -- Ales Musil Associate Software Engineer - RHV Network Red Hat EMEA <https://www.redhat.com> amusil@redhat.com IM: amusil <https://red.ht/sig>

On Mon, Aug 26, 2019 at 11:58 AM Ales Musil <amusil@redhat.com> wrote:
I can see that MOM is failing to start because some of the MOM dependencies is not starting. Can you please post output from 'systemctl status momd'?
● momd.service - Memory Overcommitment Manager Daemon Loaded: loaded (/usr/lib/systemd/system/momd.service; static; vendor preset: disabled) Active: inactive (dead) perhaps any other daemon status? Or any momd related log file generated? BTW: I see on a running oVirt 4.3.5 node from another environment that the status of momd is the same inactive (dead)

On Mon, Aug 26, 2019 at 12:30 PM Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
On Mon, Aug 26, 2019 at 11:58 AM Ales Musil <amusil@redhat.com> wrote:
I can see that MOM is failing to start because some of the MOM dependencies is not starting. Can you please post output from 'systemctl status momd'?
● momd.service - Memory Overcommitment Manager Daemon Loaded: loaded (/usr/lib/systemd/system/momd.service; static; vendor preset: disabled) Active: inactive (dead)
perhaps any other daemon status? Or any momd related log file generated?
BTW: I see on a running oVirt 4.3.5 node from another environment that the status of momd is the same inactive (dead)
What happens if you try to start the momd? -- Ales Musil Associate Software Engineer - RHV Network Red Hat EMEA <https://www.redhat.com> amusil@redhat.com IM: amusil <https://red.ht/sig>

On Mon, Aug 26, 2019 at 12:44 PM Ales Musil <amusil@redhat.com> wrote:
On Mon, Aug 26, 2019 at 12:30 PM Gianluca Cecchi < gianluca.cecchi@gmail.com> wrote:
On Mon, Aug 26, 2019 at 11:58 AM Ales Musil <amusil@redhat.com> wrote:
I can see that MOM is failing to start because some of the MOM dependencies is not starting. Can you please post output from 'systemctl status momd'?
● momd.service - Memory Overcommitment Manager Daemon Loaded: loaded (/usr/lib/systemd/system/momd.service; static; vendor preset: disabled) Active: inactive (dead)
perhaps any other daemon status? Or any momd related log file generated?
BTW: I see on a running oVirt 4.3.5 node from another environment that the status of momd is the same inactive (dead)
What happens if you try to start the momd?
[root@ovirt01 ~]# systemctl status momd ● momd.service - Memory Overcommitment Manager Daemon Loaded: loaded (/usr/lib/systemd/system/momd.service; static; vendor preset: disabled) Active: inactive (dead) [root@ovirt01 ~]# systemctl start momd [root@ovirt01 ~]# [root@ovirt01 ~]# systemctl status momd -l ● momd.service - Memory Overcommitment Manager Daemon Loaded: loaded (/usr/lib/systemd/system/momd.service; static; vendor preset: disabled) Active: inactive (dead) since Mon 2019-08-26 18:10:20 CEST; 6s ago Process: 18417 ExecStart=/usr/sbin/momd -c /etc/momd.conf -d --pid-file /var/run/momd.pid (code=exited, status=0/SUCCESS) Main PID: 18419 (code=exited, status=0/SUCCESS) Aug 26 18:10:20 ovirt01.mydomain systemd[1]: Starting Memory Overcommitment Manager Daemon... Aug 26 18:10:20 ovirt01.mydomain systemd[1]: momd.service: Supervising process 18419 which is not our child. We'll most likely not notice when it exits. Aug 26 18:10:20 ovirt01.mydomain systemd[1]: Started Memory Overcommitment Manager Daemon. Aug 26 18:10:20 ovirt01.mydomain python[18419]: No worthy mechs found [root@ovirt01 ~]# [root@ovirt01 ~]# ps -fp 18419 UID PID PPID C STIME TTY TIME CMD [root@ovirt01 ~]# [root@ovirt01 vdsm]# ps -fp 18417 UID PID PPID C STIME TTY TIME CMD [root@ovirt01 vdsm]# No log file update under /var/log/vdsm [root@ovirt01 vdsm]# ls -lt | head -5 total 118972 -rw-r--r--. 1 root root 3406465 Aug 23 00:25 supervdsm.log -rw-r--r--. 1 root root 73621 Aug 23 00:25 upgrade.log -rw-r--r--. 1 vdsm kvm 0 Aug 23 00:01 vdsm.log -rw-r--r--. 1 vdsm kvm 538480 Aug 22 23:46 vdsm.log.1.xz [root@ovirt01 vdsm]# Gianluca

On Mon, Aug 26, 2019 at 6:13 PM Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
On Mon, Aug 26, 2019 at 12:44 PM Ales Musil <amusil@redhat.com> wrote:
On Mon, Aug 26, 2019 at 12:30 PM Gianluca Cecchi < gianluca.cecchi@gmail.com> wrote:
On Mon, Aug 26, 2019 at 11:58 AM Ales Musil <amusil@redhat.com> wrote:
I can see that MOM is failing to start because some of the MOM dependencies is not starting. Can you please post output from 'systemctl status momd'?
● momd.service - Memory Overcommitment Manager Daemon Loaded: loaded (/usr/lib/systemd/system/momd.service; static; vendor preset: disabled) Active: inactive (dead)
perhaps any other daemon status? Or any momd related log file generated?
BTW: I see on a running oVirt 4.3.5 node from another environment that the status of momd is the same inactive (dead)
What happens if you try to start the momd?
[root@ovirt01 ~]# systemctl status momd ● momd.service - Memory Overcommitment Manager Daemon Loaded: loaded (/usr/lib/systemd/system/momd.service; static; vendor preset: disabled) Active: inactive (dead) [root@ovirt01 ~]# systemctl start momd [root@ovirt01 ~]#
[root@ovirt01 ~]# systemctl status momd -l ● momd.service - Memory Overcommitment Manager Daemon Loaded: loaded (/usr/lib/systemd/system/momd.service; static; vendor preset: disabled) Active: inactive (dead) since Mon 2019-08-26 18:10:20 CEST; 6s ago Process: 18417 ExecStart=/usr/sbin/momd -c /etc/momd.conf -d --pid-file /var/run/momd.pid (code=exited, status=0/SUCCESS) Main PID: 18419 (code=exited, status=0/SUCCESS)
Aug 26 18:10:20 ovirt01.mydomain systemd[1]: Starting Memory Overcommitment Manager Daemon... Aug 26 18:10:20 ovirt01.mydomain systemd[1]: momd.service: Supervising process 18419 which is not our child. We'll most likely not notice when it exits. Aug 26 18:10:20 ovirt01.mydomain systemd[1]: Started Memory Overcommitment Manager Daemon. Aug 26 18:10:20 ovirt01.mydomain python[18419]: No worthy mechs found [root@ovirt01 ~]#
[root@ovirt01 ~]# ps -fp 18419 UID PID PPID C STIME TTY TIME CMD [root@ovirt01 ~]#
[root@ovirt01 vdsm]# ps -fp 18417 UID PID PPID C STIME TTY TIME CMD [root@ovirt01 vdsm]#
No log file update under /var/log/vdsm
[root@ovirt01 vdsm]# ls -lt | head -5 total 118972 -rw-r--r--. 1 root root 3406465 Aug 23 00:25 supervdsm.log -rw-r--r--. 1 root root 73621 Aug 23 00:25 upgrade.log -rw-r--r--. 1 vdsm kvm 0 Aug 23 00:01 vdsm.log -rw-r--r--. 1 vdsm kvm 538480 Aug 22 23:46 vdsm.log.1.xz [root@ovirt01 vdsm]#
Gianluca
It seems that these steps below solved the problem (donna what it was though..). Based on this similar (No worthy mechs found) I found inspiration from: https://lists.ovirt.org/pipermail/users/2017-January/079009.html [root@ovirt01 ~]# vdsm-tool configure Checking configuration status... abrt is not configured for vdsm Managed volume database is already configured lvm is configured for vdsm libvirt is already configured for vdsm SUCCESS: ssl configured to true. No conflicts Manual override for multipath.conf detected - preserving current configuration This manual override for multipath.conf was based on downrevved template. You are strongly advised to contact your support representatives Running configure... Reconfiguration of abrt is done. Done configuring modules to VDSM. [root@ovirt01 ~]# [root@ovirt01 ~]# systemctl restart vdsmd [root@ovirt01 ~]# systemctl status vdsmd ● vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/etc/systemd/system/vdsmd.service; enabled; vendor preset: enabled) Active: active (running) since Mon 2019-08-26 18:23:29 CEST; 19s ago Process: 27326 ExecStopPost=/usr/libexec/vdsm/vdsmd_init_common.sh --post-stop (code=exited, status=0/SUCCESS) Process: 27329 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS) Main PID: 27401 (vdsmd) Tasks: 75 CGroup: /system.slice/vdsmd.service ├─27401 /usr/bin/python2 /usr/share/vdsm/vdsmd ├─27524 /usr/libexec/ioprocess --read-pipe-fd 49 --write-pipe-fd 47 --max-threads 10 --max-queued-requests 10 ├─27531 /usr/libexec/ioprocess --read-pipe-fd 55 --write-pipe-fd 54 --max-threads 10 --max-queued-requests 10 ├─27544 /usr/libexec/ioprocess --read-pipe-fd 60 --write-pipe-fd 59 --max-threads 10 --max-queued-requests 10 ├─27553 /usr/libexec/ioprocess --read-pipe-fd 67 --write-pipe-fd 66 --max-threads 10 --max-queued-requests 10 ├─27559 /usr/libexec/ioprocess --read-pipe-fd 72 --write-pipe-fd 71 --max-threads 10 --max-queued-requests 10 └─27566 /usr/libexec/ioprocess --read-pipe-fd 78 --write-pipe-fd 77 --max-threads 10 --max-queued-requests 10 Aug 26 18:23:29 ovirt01.mydomain vdsmd_init_common.sh[27329]: vdsm: Running dummybr Aug 26 18:23:29 ovirt01.mydomain vdsmd_init_common.sh[27329]: vdsm: Running tune_system Aug 26 18:23:29 ovirt01.mydomain vdsmd_init_common.sh[27329]: vdsm: Running test_space Aug 26 18:23:29 ovirt01.mydomain vdsmd_init_common.sh[27329]: vdsm: Running test_lo Aug 26 18:23:29 ovirt01.mydomain systemd[1]: Started Virtual Desktop Server Manager. Aug 26 18:23:30 ovirt01.mydomain vdsm[27401]: WARN unhandled write event Aug 26 18:23:30 ovirt01.mydomain vdsm[27401]: WARN MOM not available. Aug 26 18:23:30 ovirt01.mydomain vdsm[27401]: WARN MOM not available, KSM stats will be missing. Aug 26 18:23:31 ovirt01.mydomain vdsm[27401]: WARN Not ready yet, ignoring event '|virt|VM_status|4dae6016-ff01-4a...r shu Aug 26 18:23:45 ovirt01.mydomain vdsm[27401]: WARN Worker blocked: <Worker name=periodic/1 running <Task <Operatio...back: File: "/usr/lib64/python2.7/threading.py", line 785, i Hint: Some lines were ellipsized, use -l to show in full. [root@ovirt01 ~]# Previously I had restarted vdsmd many times without effect... After a while (2 minutes) [root@ovirt01 ~]# hosted-engine --vm-status !! Cluster is in GLOBAL MAINTENANCE mode !! --== Host ovirt01.mydomain (id: 1) status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt01.mydomain Host ID : 1 Engine status : {"reason": "bad vm status", "health": "bad", "vm": "down", "detail": "Down"} Score : 3000 stopped : False Local maintenance : False crc32 : a68d97bb local_conf_timestamp : 324335 Host timestamp : 324335 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=324335 (Mon Aug 26 18:30:39 2019) host-id=1 score=3000 vm_conf_refresh_time=324335 (Mon Aug 26 18:30:39 2019) conf_on_shared_storage=True maintenance=False state=GlobalMaintenance stopped=False !! Cluster is in GLOBAL MAINTENANCE mode !! [root@ovirt01 ~]# that was the state when I updated the only present node exit from global maintenance: [root@ovirt01 ~]# hosted-engine --set-maintenance --mode=none [root@ovirt01 ~]# [root@ovirt01 ~]# hosted-engine --vm-status --== Host ovirt01.mydomain (id: 1) status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt01.mydomain Host ID : 1 Engine status : {"reason": "bad vm status", "health": "bad", "vm": "down", "detail": "Down"} Score : 3000 stopped : False Local maintenance : False crc32 : 7b58fabd local_conf_timestamp : 324386 Host timestamp : 324386 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=324386 (Mon Aug 26 18:31:29 2019) host-id=1 score=3000 vm_conf_refresh_time=324386 (Mon Aug 26 18:31:30 2019) conf_on_shared_storage=True maintenance=False state=EngineStarting stopped=False [root@ovirt01 ~]# [root@ovirt01 ~]# hosted-engine --vm-status --== Host ovirt01.mydomain (id: 1) status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt01.mydomain Host ID : 1 Engine status : {"health": "good", "vm": "up", "detail": "Up"} Score : 3400 stopped : False Local maintenance : False crc32 : 5e824330 local_conf_timestamp : 324468 Host timestamp : 324468 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=324468 (Mon Aug 26 18:32:51 2019) host-id=1 score=3400 vm_conf_refresh_time=324468 (Mon Aug 26 18:32:51 2019) conf_on_shared_storage=True maintenance=False state=EngineUp stopped=False [root@ovirt01 ~]# And I'm able to connect to my engine web admin gui. After a couple of more minutes, 5 or so, the data domain comes up and I'm able to power on the other VMs. [root@ovirt01 vdsm]# ls -lt | head -5 total 123972 -rw-r--r--. 1 vdsm kvm 201533 Aug 26 18:38 mom.log -rw-r--r--. 1 vdsm kvm 2075421 Aug 26 18:38 vdsm.log -rw-r--r--. 1 root root 3923102 Aug 26 18:38 supervdsm.log -rw-r--r--. 1 root root 73621 Aug 23 00:25 upgrade.log [root@ovirt01 vdsm]# let me know if you want any file to read and think about the reason... Thanks for the moment. Gianluca

On Mon, Aug 26, 2019 at 6:40 PM Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
It seems that these steps below solved the problem (donna what it was though..). Based on this similar (No worthy mechs found) I found inspiration from:
https://lists.ovirt.org/pipermail/users/2017-January/079009.html
[root@ovirt01 ~]# vdsm-tool configure
Checking configuration status...
abrt is not configured for vdsm Managed volume database is already configured lvm is configured for vdsm libvirt is already configured for vdsm SUCCESS: ssl configured to true. No conflicts Manual override for multipath.conf detected - preserving current configuration This manual override for multipath.conf was based on downrevved template. You are strongly advised to contact your support representatives
Running configure... Reconfiguration of abrt is done.
Done configuring modules to VDSM. [root@ovirt01 ~]#
[root@ovirt01 ~]# systemctl restart vdsmd
I have also to say, as it is in some way involved in the thread messages, that in my node, due to "historical reasons" I have: [root@ovirt01 vdsm]# getenforce Permissive [root@ovirt01 vdsm]# I don't know if in any way it can be part of the cause. Gianluca

On Mon, Aug 26, 2019 at 6:42 PM Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
On Mon, Aug 26, 2019 at 6:40 PM Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
It seems that these steps below solved the problem (donna what it was though..). Based on this similar (No worthy mechs found) I found inspiration from:
https://lists.ovirt.org/pipermail/users/2017-January/079009.html
[root@ovirt01 ~]# vdsm-tool configure
Checking configuration status...
abrt is not configured for vdsm Managed volume database is already configured lvm is configured for vdsm libvirt is already configured for vdsm SUCCESS: ssl configured to true. No conflicts Manual override for multipath.conf detected - preserving current configuration This manual override for multipath.conf was based on downrevved template. You are strongly advised to contact your support representatives
Running configure... Reconfiguration of abrt is done.
Done configuring modules to VDSM. [root@ovirt01 ~]#
[root@ovirt01 ~]# systemctl restart vdsmd
BTW: what are the effects of the command "vdsm-tool configure"? Is it a command meant to be used nornally from command line? Just to know for the future. Thanks, Gianluca
participants (4)
-
Ales Musil
-
Dominik Holler
-
Gianluca Cecchi
-
Sandro Bonazzola