Can't bring upgraded to 4.3 host back to cluster

Hello, May I ask you for and advise? I'm running a small oVirt cluster and couple of months ago I decided to do an upgrade from oVirt 4.2.8 to 4.3 and having an issues since that time. I can only guess what I did wrong - probably one of the problems that I haven't switched the cluster from iptables to firewalld. But this is just my guess. The problem is that I have upgraded the engine and one host, and then I done an upgrade of second host I can't bring it to active state. Looks like VDSM can't detect the network and fails to start. I even tried to reinstall the hosts from UI (I have seen that the packages being installed) but again, VDSM doesn't startup at the end and reinstallation fails. Looking at hosts process list I see script *wait_for_ipv4s* hanging forever. vdsm 8603 1 6 16:26 ? 00:00:00 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent *root 8630 1 0 16:26 ? 00:00:00 /bin/sh /usr/libexec/vdsm/vdsmd_init_common.sh --pre-startroot 8645 8630 6 16:26 ? 00:00:00 /usr/bin/python2 /usr/libexec/vdsm/wait_for_ipv4s* root 8688 1 30 16:27 ? 00:00:00 /usr/bin/python2 /usr/share/vdsm/supervdsmd --sockfile /var/run/vdsm/svdsm.sock vdsm 8715 1 0 16:27 ? 00:00:00 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker The all hosts in cluster are reachable from each other ... That could be the issue? Thank you in advance! -- Regards, Artem

Hi Artem, According to oVirt documentation [1], hosts on the same cluster should be reachable from one to each other. Can you please share your vdsm log? I suppose you do manage to ssh that inactive host (correct me if I'm wrong). While getting the vdsm log, maybe try to restart the network and vdsmd services on the host. Another thing you can try on the UI is putting the host on maintenance and then activate it. [1] https://www.ovirt.org/documentation/admin-guide/chap-Clusters.html#introduct... *Regards,* *Shani Leviim* On Mon, Jun 10, 2019 at 4:42 PM Artem Tambovskiy <artem.tambovskiy@gmail.com> wrote:
Hello,
May I ask you for and advise? I'm running a small oVirt cluster and couple of months ago I decided to do an upgrade from oVirt 4.2.8 to 4.3 and having an issues since that time. I can only guess what I did wrong - probably one of the problems that I haven't switched the cluster from iptables to firewalld. But this is just my guess.
The problem is that I have upgraded the engine and one host, and then I done an upgrade of second host I can't bring it to active state. Looks like VDSM can't detect the network and fails to start. I even tried to reinstall the hosts from UI (I have seen that the packages being installed) but again, VDSM doesn't startup at the end and reinstallation fails.
Looking at hosts process list I see script *wait_for_ipv4s* hanging forever.
vdsm 8603 1 6 16:26 ? 00:00:00 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent
*root 8630 1 0 16:26 ? 00:00:00 /bin/sh /usr/libexec/vdsm/vdsmd_init_common.sh --pre-startroot 8645 8630 6 16:26 ? 00:00:00 /usr/bin/python2 /usr/libexec/vdsm/wait_for_ipv4s* root 8688 1 30 16:27 ? 00:00:00 /usr/bin/python2 /usr/share/vdsm/supervdsmd --sockfile /var/run/vdsm/svdsm.sock vdsm 8715 1 0 16:27 ? 00:00:00 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker
The all hosts in cluster are reachable from each other ... That could be the issue?
Thank you in advance! -- Regards, Artem _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/TQX3LN2TEM4DEC...

Hi Shani, yes, you are right - I can do ssh form aby to any hosts in the cluster. vdsm.log attached. I have tried to restart vdsm manually and even done a host restart several times with no success. Host activation fails all the time ... Thank you in advance for your help! Regard, Artem On Tue, Jun 11, 2019 at 10:51 AM Shani Leviim <sleviim@redhat.com> wrote:
Hi Artem, According to oVirt documentation [1], hosts on the same cluster should be reachable from one to each other.
Can you please share your vdsm log? I suppose you do manage to ssh that inactive host (correct me if I'm wrong). While getting the vdsm log, maybe try to restart the network and vdsmd services on the host.
Another thing you can try on the UI is putting the host on maintenance and then activate it.
[1] https://www.ovirt.org/documentation/admin-guide/chap-Clusters.html#introduct...
*Regards,*
*Shani Leviim*
On Mon, Jun 10, 2019 at 4:42 PM Artem Tambovskiy < artem.tambovskiy@gmail.com> wrote:
Hello,
May I ask you for and advise? I'm running a small oVirt cluster and couple of months ago I decided to do an upgrade from oVirt 4.2.8 to 4.3 and having an issues since that time. I can only guess what I did wrong - probably one of the problems that I haven't switched the cluster from iptables to firewalld. But this is just my guess.
The problem is that I have upgraded the engine and one host, and then I done an upgrade of second host I can't bring it to active state. Looks like VDSM can't detect the network and fails to start. I even tried to reinstall the hosts from UI (I have seen that the packages being installed) but again, VDSM doesn't startup at the end and reinstallation fails.
Looking at hosts process list I see script *wait_for_ipv4s* hanging forever.
vdsm 8603 1 6 16:26 ? 00:00:00 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent
*root 8630 1 0 16:26 ? 00:00:00 /bin/sh /usr/libexec/vdsm/vdsmd_init_common.sh --pre-startroot 8645 8630 6 16:26 ? 00:00:00 /usr/bin/python2 /usr/libexec/vdsm/wait_for_ipv4s* root 8688 1 30 16:27 ? 00:00:00 /usr/bin/python2 /usr/share/vdsm/supervdsmd --sockfile /var/run/vdsm/svdsm.sock vdsm 8715 1 0 16:27 ? 00:00:00 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker
The all hosts in cluster are reachable from each other ... That could be the issue?
Thank you in advance! -- Regards, Artem _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/TQX3LN2TEM4DEC...
-- Regards, Artem

+Dan Kenigsberg <danken@redhat.com> Hi Artem, Thanks for the log. It seems that this error message appears quite a lot: 2019-06-11 12:10:35,283+0300 ERROR (MainThread) [root] Panic: Connect to supervdsm service failed: [Errno 2] No such file or directory (panic:29) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 86, in _connect self._manager.connect, Exception, timeout=60, tries=3) File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 58, in retry return func() File "/usr/lib64/python2.7/multiprocessing/managers.py", line 500, in connect conn = Client(self._address, authkey=self._authkey) File "/usr/lib64/python2.7/multiprocessing/connection.py", line 173, in Client c = SocketClient(address) File "/usr/lib64/python2.7/multiprocessing/connection.py", line 308, in SocketClient s.connect(address) File "/usr/lib64/python2.7/socket.py", line 224, in meth return getattr(self._sock,name)(*args) error: [Errno 2] No such file or directory Can you please verify that the 'supervdsmd.service' is running? *Regards,* *Shani Leviim* On Tue, Jun 11, 2019 at 3:04 PM Artem Tambovskiy <artem.tambovskiy@gmail.com> wrote:
Hi Shani,
yes, you are right - I can do ssh form aby to any hosts in the cluster. vdsm.log attached. I have tried to restart vdsm manually and even done a host restart several times with no success. Host activation fails all the time ...
Thank you in advance for your help! Regard, Artem
On Tue, Jun 11, 2019 at 10:51 AM Shani Leviim <sleviim@redhat.com> wrote:
Hi Artem, According to oVirt documentation [1], hosts on the same cluster should be reachable from one to each other.
Can you please share your vdsm log? I suppose you do manage to ssh that inactive host (correct me if I'm wrong). While getting the vdsm log, maybe try to restart the network and vdsmd services on the host.
Another thing you can try on the UI is putting the host on maintenance and then activate it.
[1] https://www.ovirt.org/documentation/admin-guide/chap-Clusters.html#introduct...
*Regards,*
*Shani Leviim*
On Mon, Jun 10, 2019 at 4:42 PM Artem Tambovskiy < artem.tambovskiy@gmail.com> wrote:
Hello,
May I ask you for and advise? I'm running a small oVirt cluster and couple of months ago I decided to do an upgrade from oVirt 4.2.8 to 4.3 and having an issues since that time. I can only guess what I did wrong - probably one of the problems that I haven't switched the cluster from iptables to firewalld. But this is just my guess.
The problem is that I have upgraded the engine and one host, and then I done an upgrade of second host I can't bring it to active state. Looks like VDSM can't detect the network and fails to start. I even tried to reinstall the hosts from UI (I have seen that the packages being installed) but again, VDSM doesn't startup at the end and reinstallation fails.
Looking at hosts process list I see script *wait_for_ipv4s* hanging forever.
vdsm 8603 1 6 16:26 ? 00:00:00 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent
*root 8630 1 0 16:26 ? 00:00:00 /bin/sh /usr/libexec/vdsm/vdsmd_init_common.sh --pre-startroot 8645 8630 6 16:26 ? 00:00:00 /usr/bin/python2 /usr/libexec/vdsm/wait_for_ipv4s* root 8688 1 30 16:27 ? 00:00:00 /usr/bin/python2 /usr/share/vdsm/supervdsmd --sockfile /var/run/vdsm/svdsm.sock vdsm 8715 1 0 16:27 ? 00:00:00 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker
The all hosts in cluster are reachable from each other ... That could be the issue?
Thank you in advance! -- Regards, Artem _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/TQX3LN2TEM4DEC...
-- Regards, Artem

Shani, supervdsm failing too. [root@ovirt1 vdsm]# systemctl status supervdsmd ● supervdsmd.service - Auxiliary vdsm service for running helper functions as root Loaded: loaded (/usr/lib/systemd/system/supervdsmd.service; static; vendor preset: enabled) Active: failed (Result: start-limit) since Tue 2019-06-11 16:18:16 MSK; 5s ago Process: 176025 ExecStart=/usr/share/vdsm/daemonAdapter /usr/share/vdsm/supervdsmd --sockfile /var/run/vdsm/svdsm.sock (code=exited, status=1/FAILURE) Main PID: 176025 (code=exited, status=1/FAILURE) Jun 11 16:18:16 ovirt1.telia.ru systemd[1]: Unit supervdsmd.service entered failed state. Jun 11 16:18:16 ovirt1.telia.ru systemd[1]: supervdsmd.service failed. Jun 11 16:18:16 ovirt1.telia.ru systemd[1]: supervdsmd.service holdoff time over, scheduling restart. Jun 11 16:18:16 ovirt1.telia.ru systemd[1]: Stopped Auxiliary vdsm service for running helper functions as root. Jun 11 16:18:16 ovirt1.telia.ru systemd[1]: start request repeated too quickly for supervdsmd.service Jun 11 16:18:16 ovirt1.telia.ru systemd[1]: Failed to start Auxiliary vdsm service for running helper functions as root. Jun 11 16:18:16 ovirt1.telia.ru systemd[1]: Unit supervdsmd.service entered failed state. Jun 11 16:18:16 ovirt1.telia.ru systemd[1]: supervdsmd.service failed. supervdsm.log is full of messages like logfile::DEBUG::2019-06-11 16:18:46,379::concurrent::193::root::(run) START thread <Thread(logfile, started daemon 139896277387008)> (func=<bound method ThreadedHandler._run of <vdsm.common.logutils.ThreadedHandler object at 0x7f3c243aa3d0>>, args=(), kwargs={}) logfile::DEBUG::2019-06-11 16:19:04,401::concurrent::193::root::(run) START thread <Thread(logfile, started daemon 140683302770432)> (func=<bound method ThreadedHandler._run of <vdsm.common.logutils.ThreadedHandler object at 0x7ff36298f3d0>>, args=(), kwargs={}) logfile::DEBUG::2019-06-11 16:19:06,289::concurrent::193::root::(run) START thread <Thread(logfile, started daemon 140242149250816)> (func=<bound method ThreadedHandler._run of <vdsm.common.logutils.ThreadedHandler object at 0x7f8cabcc73d0>>, args=(), kwargs={}) logfile::DEBUG::2019-06-11 16:19:17,535::concurrent::193::root::(run) START thread <Thread(logfile, started daemon 139830764660480)> (func=<bound method ThreadedHandler._run of <vdsm.common.logutils.ThreadedHandler object at 0x7f2ce35dc3d0>>, args=(), kwargs={}) logfile::DEBUG::2019-06-11 16:19:21,528::concurrent::193::root::(run) START thread <Thread(logfile, started daemon 140355511899904)> (func=<bound method ThreadedHandler._run of <vdsm.common.logutils.ThreadedHandler object at 0x7fa710bd33d0>>, args=(), kwargs={}) logfile::DEBUG::2019-06-11 16:19:24,541::concurrent::193::root::(run) START thread <Thread(logfile, started daemon 139879186089728)> (func=<bound method ThreadedHandler._run of <vdsm.common.logutils.ThreadedHandler object at 0x7f38298223d0>>, args=(), kwargs={}) logfile::DEBUG::2019-06-11 16:19:42,543::concurrent::193::root::(run) START thread <Thread(logfile, started daemon 140400833873664)> (func=<bound method ThreadedHandler._run of <vdsm.common.logutils.ThreadedHandler object at 0x7fb19e23a3d0>>, args=(), kwargs={}) logfile::DEBUG::2019-06-11 16:19:57,442::concurrent::193::root::(run) START thread <Thread(logfile, started daemon 140366930253568)> (func=<bound method ThreadedHandler._run of <vdsm.common.logutils.ThreadedHandler object at 0x7fa9b95373d0>>, args=(), kwargs={}) logfile::DEBUG::2019-06-11 16:20:18,539::concurrent::193::root::(run) START thread <Thread(logfile, started daemon 139647603230464)> (func=<bound method ThreadedHandler._run of <vdsm.common.logutils.ThreadedHandler object at 0x7f023e1823d0>>, args=(), kwargs={}) logfile::DEBUG::2019-06-11 16:20:32,041::concurrent::193::root::(run) START thread <Thread(logfile, started daemon 140076440143616)> (func=<bound method ThreadedHandler._run of <vdsm.common.logutils.ThreadedHandler object at 0x7f6616c423d0>>, args=(), kwargs={}) logfile::DEBUG::2019-06-11 16:20:41,051::concurrent::193::root::(run) START thread <Thread(logfile, started daemon 140054782310144)> (func=<bound method ThreadedHandler._run of <vdsm.common.logutils.ThreadedHandler object at 0x7f610bdbd3d0>>, args=(), kwargs={}) Regards, Artem On Tue, Jun 11, 2019 at 3:59 PM Shani Leviim <sleviim@redhat.com> wrote:
+Dan Kenigsberg <danken@redhat.com>
Hi Artem, Thanks for the log.
It seems that this error message appears quite a lot: 2019-06-11 12:10:35,283+0300 ERROR (MainThread) [root] Panic: Connect to supervdsm service failed: [Errno 2] No such file or directory (panic:29) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 86, in _connect self._manager.connect, Exception, timeout=60, tries=3) File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 58, in retry return func() File "/usr/lib64/python2.7/multiprocessing/managers.py", line 500, in connect conn = Client(self._address, authkey=self._authkey) File "/usr/lib64/python2.7/multiprocessing/connection.py", line 173, in Client c = SocketClient(address) File "/usr/lib64/python2.7/multiprocessing/connection.py", line 308, in SocketClient s.connect(address) File "/usr/lib64/python2.7/socket.py", line 224, in meth return getattr(self._sock,name)(*args) error: [Errno 2] No such file or directory
Can you please verify that the 'supervdsmd.service' is running?
*Regards,*
*Shani Leviim*
On Tue, Jun 11, 2019 at 3:04 PM Artem Tambovskiy < artem.tambovskiy@gmail.com> wrote:
Hi Shani,
yes, you are right - I can do ssh form aby to any hosts in the cluster. vdsm.log attached. I have tried to restart vdsm manually and even done a host restart several times with no success. Host activation fails all the time ...
Thank you in advance for your help! Regard, Artem
On Tue, Jun 11, 2019 at 10:51 AM Shani Leviim <sleviim@redhat.com> wrote:
Hi Artem, According to oVirt documentation [1], hosts on the same cluster should be reachable from one to each other.
Can you please share your vdsm log? I suppose you do manage to ssh that inactive host (correct me if I'm wrong). While getting the vdsm log, maybe try to restart the network and vdsmd services on the host.
Another thing you can try on the UI is putting the host on maintenance and then activate it.
[1] https://www.ovirt.org/documentation/admin-guide/chap-Clusters.html#introduct...
*Regards,*
*Shani Leviim*
On Mon, Jun 10, 2019 at 4:42 PM Artem Tambovskiy < artem.tambovskiy@gmail.com> wrote:
Hello,
May I ask you for and advise? I'm running a small oVirt cluster and couple of months ago I decided to do an upgrade from oVirt 4.2.8 to 4.3 and having an issues since that time. I can only guess what I did wrong - probably one of the problems that I haven't switched the cluster from iptables to firewalld. But this is just my guess.
The problem is that I have upgraded the engine and one host, and then I done an upgrade of second host I can't bring it to active state. Looks like VDSM can't detect the network and fails to start. I even tried to reinstall the hosts from UI (I have seen that the packages being installed) but again, VDSM doesn't startup at the end and reinstallation fails.
Looking at hosts process list I see script *wait_for_ipv4s* hanging forever.
vdsm 8603 1 6 16:26 ? 00:00:00 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent
*root 8630 1 0 16:26 ? 00:00:00 /bin/sh /usr/libexec/vdsm/vdsmd_init_common.sh --pre-startroot 8645 8630 6 16:26 ? 00:00:00 /usr/bin/python2 /usr/libexec/vdsm/wait_for_ipv4s* root 8688 1 30 16:27 ? 00:00:00 /usr/bin/python2 /usr/share/vdsm/supervdsmd --sockfile /var/run/vdsm/svdsm.sock vdsm 8715 1 0 16:27 ? 00:00:00 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker
The all hosts in cluster are reachable from each other ... That could be the issue?
Thank you in advance! -- Regards, Artem _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/TQX3LN2TEM4DEC...
-- Regards, Artem
-- Regards, Artem

probably one of the problems that I haven't switched the cluster from iptables to firewalld. But this is just my guess.
When switching to from 4.2.8 to 4.3.3 I did not change one host from iptables to firewalld as well. I was still able to change it later even if the documentation somewhere said iptables support is to be removed in 4.3.
participants (3)
-
Andreas Elvers
-
Artem Tambovskiy
-
Shani Leviim