Re: ost host addition failure

Any updates? The tests are still failing on vdsmd won't start from Sunday... master repos havn't been refreshed for a few days due to this. from host deploy log: [1] basic-suite-master-engine/_var_log_ovirt-engine/host-deploy/ovirt-host-deploy-20161227012930-192.168.201.4-14af2bf0.log the job links [2] [1] http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/lastComplet... 016-12-27 01:29:29 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:921 execute-output: ('/bin/systemctl', 'start', 'vdsmd.service') stdout: 2016-12-27 01:29:29 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:926 execute-output: ('/bin/systemctl', 'start', 'vdsmd.service') stderr: A dependency job for vdsmd.service failed. See 'journalctl -xe' for details. 2016-12-27 01:29:29 DEBUG otopi.context context._executeMethod:142 method exception Traceback (most recent call last): File "/tmp/ovirt-QZ1ucxWFfm/pythonlib/otopi/context.py", line 132, in _executeMethod method['method']() File "/tmp/ovirt-QZ1ucxWFfm/otopi-plugins/ovirt-host-deploy/vdsm/packages.py", line 209, in _start self.services.state('vdsmd', True) File "/tmp/ovirt-QZ1ucxWFfm/otopi-plugins/otopi/services/systemd.py", line 141, in state service=name, RuntimeError: Failed to start service 'vdsmd' 2016-12-27 01:29:29 ERROR otopi.context context._executeMethod:151 Failed to execute stage 'Closing up': Failed to start service 'vdsmd' 2016-12-27 01:29:29 DEBUG otopi.context context.dumpEnvironment:760 ENVIRONMENT DUMP - BEGIN 2016-12-27 01:29:29 DEBUG otopi.context context.dumpEnvironment:770 ENV BASE/error=bool:'True' 2016-12-27 01:29:29 DEBUG otopi.context context.dumpEnvironment:770 ENV BASE/excep [2] http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/lastComplet... On Sun, Dec 25, 2016 at 11:31 AM, Eyal Edri <eedri@redhat.com> wrote:
We should see it fixed here hopefully [1]
[1] http://jenkins.ovirt.org/view/All%20Running%20jobs/job/ test-repo_ovirt_experimental_master/4412/console
On Sun, Dec 25, 2016 at 11:19 AM, Dan Kenigsberg <danken@redhat.com> wrote:
On Sun, Dec 25, 2016 at 10:28 AM, Yaniv Kaul <ykaul@redhat.com> wrote:
On Sun, Dec 25, 2016 at 9:47 AM, Dan Kenigsberg <danken@redhat.com>
wrote:
Correct. https://gerrit.ovirt.org/#/c/69052/
Can you try adding lago shell "$vm_name" -c "mkdir -p /var/log/ovirt-imageio-daemon/ && chown vdsm:kvm /var/log/ovirt-imageio-daemon/"
How will it know what is the vdsm user before installing vdsm?
You're right. a hack would have to `chmod a+rwx /var/log/ovirt-imageio-daemon/` instead.
Why not either: 1. Fix it
yes, that's why we've opened https://bugzilla.redhat.com/show_bug.cgi?id=1400003 ; now a fix is getting merged. I don't know when it is going to be ready in lago's repos.
-or- 2. Revert the offending patch?
I'm not aware of such patch. It's a race that has been there since ever, and I don't know why it suddenly pops up so often.
-- Eyal Edri Associate Manager RHV DevOps EMEA ENG Virtualization R&D Red Hat Israel
phone: +972-9-7692018 <+972%209-769-2018> irc: eedri (on #tlv #rhev-dev #rhev-integ)
-- Eyal Edri Associate Manager RHV DevOps EMEA ENG Virtualization R&D Red Hat Israel phone: +972-9-7692018 irc: eedri (on #tlv #rhev-dev #rhev-integ)

On Tue, Dec 27, 2016 at 9:56 AM, Eyal Edri <eedri@redhat.com> wrote:
Any updates? The tests are still failing on vdsmd won't start from Sunday... master repos havn't been refreshed for a few days due to this.
from host deploy log: [1] basic-suite-master-engine/_ var_log_ovirt-engine/host-deploy/ovirt-host-deploy- 20161227012930-192.168.201.4-14af2bf0.log the job links [2]
[1] http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_ master/lastCompletedBuild/artifact/exported-artifacts/ basic_suite_master.sh-el7/exported-artifacts/test_logs/ basic-suite-master/post-002_bootstrap.py/lago-
Now with the full link: http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/lastComplet...
016-12-27 01:29:29 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:921 execute-output: ('/bin/systemctl', 'start', 'vdsmd.service') stdout:
2016-12-27 01:29:29 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:926 execute-output: ('/bin/systemctl', 'start', 'vdsmd.service') stderr: A dependency job for vdsmd.service failed. See 'journalctl -xe' for details.
2016-12-27 01:29:29 DEBUG otopi.context context._executeMethod:142 method exception Traceback (most recent call last): File "/tmp/ovirt-QZ1ucxWFfm/pythonlib/otopi/context.py", line 132, in _executeMethod method['method']() File "/tmp/ovirt-QZ1ucxWFfm/otopi-plugins/ovirt-host-deploy/vdsm/packages.py", line 209, in _start self.services.state('vdsmd', True) File "/tmp/ovirt-QZ1ucxWFfm/otopi-plugins/otopi/services/systemd.py", line 141, in state service=name, RuntimeError: Failed to start service 'vdsmd' 2016-12-27 01:29:29 ERROR otopi.context context._executeMethod:151 Failed to execute stage 'Closing up': Failed to start service 'vdsmd' 2016-12-27 01:29:29 DEBUG otopi.context context.dumpEnvironment:760 ENVIRONMENT DUMP - BEGIN 2016-12-27 01:29:29 DEBUG otopi.context context.dumpEnvironment:770 ENV BASE/error=bool:'True' 2016-12-27 01:29:29 DEBUG otopi.context context.dumpEnvironment:770 ENV BASE/excep
[2] http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/lastComplet...
On Sun, Dec 25, 2016 at 11:31 AM, Eyal Edri <eedri@redhat.com> wrote:
We should see it fixed here hopefully [1]
[1] http://jenkins.ovirt.org/view/All%20Running%20jobs/job/t est-repo_ovirt_experimental_master/4412/console
On Sun, Dec 25, 2016 at 11:19 AM, Dan Kenigsberg <danken@redhat.com> wrote:
On Sun, Dec 25, 2016 at 10:28 AM, Yaniv Kaul <ykaul@redhat.com> wrote:
On Sun, Dec 25, 2016 at 9:47 AM, Dan Kenigsberg <danken@redhat.com>
wrote:
Correct. https://gerrit.ovirt.org/#/c/69052/
Can you try adding lago shell "$vm_name" -c "mkdir -p /var/log/ovirt-imageio-daemon/ && chown vdsm:kvm /var/log/ovirt-imageio-daemon/"
How will it know what is the vdsm user before installing vdsm?
You're right. a hack would have to `chmod a+rwx /var/log/ovirt-imageio-daemon/` instead.
Why not either: 1. Fix it
yes, that's why we've opened https://bugzilla.redhat.com/show_bug.cgi?id=1400003 ; now a fix is getting merged. I don't know when it is going to be ready in lago's repos.
-or- 2. Revert the offending patch?
I'm not aware of such patch. It's a race that has been there since ever, and I don't know why it suddenly pops up so often.
-- Eyal Edri Associate Manager RHV DevOps EMEA ENG Virtualization R&D Red Hat Israel
phone: +972-9-7692018 <+972%209-769-2018> irc: eedri (on #tlv #rhev-dev #rhev-integ)
-- Eyal Edri Associate Manager RHV DevOps EMEA ENG Virtualization R&D Red Hat Israel
phone: +972-9-7692018 <+972%209-769-2018> irc: eedri (on #tlv #rhev-dev #rhev-integ)
-- Eyal Edri Associate Manager RHV DevOps EMEA ENG Virtualization R&D Red Hat Israel phone: +972-9-7692018 irc: eedri (on #tlv #rhev-dev #rhev-integ)

On Tue, Dec 27, 2016 at 9:59 AM, Eyal Edri <eedri@redhat.com> wrote:
On Tue, Dec 27, 2016 at 9:56 AM, Eyal Edri <eedri@redhat.com> wrote:
Any updates? The tests are still failing on vdsmd won't start from Sunday... master repos havn't been refreshed for a few days due to this.
from host deploy log: [1] basic-suite-master-engine/_var_log_ovirt-engine/host-deploy/ovirt-host-deploy-20161227012930-192.168.201.4-14af2bf0.log the job links [2]
[1] http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/lastComplet...
Now with the full link: http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/lastComplet...
016-12-27 01:29:29 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:921 execute-output: ('/bin/systemctl', 'start', 'vdsmd.service') stdout:
2016-12-27 01:29:29 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:926 execute-output: ('/bin/systemctl', 'start', 'vdsmd.service') stderr: A dependency job for vdsmd.service failed. See 'journalctl -xe' for details.
2016-12-27 01:29:29 DEBUG otopi.context context._executeMethod:142 method exception Traceback (most recent call last): File "/tmp/ovirt-QZ1ucxWFfm/pythonlib/otopi/context.py", line 132, in _executeMethod method['method']() File "/tmp/ovirt-QZ1ucxWFfm/otopi-plugins/ovirt-host-deploy/vdsm/packages.py", line 209, in _start self.services.state('vdsmd', True) File "/tmp/ovirt-QZ1ucxWFfm/otopi-plugins/otopi/services/systemd.py", line 141, in state service=name, RuntimeError: Failed to start service 'vdsmd' 2016-12-27 01:29:29 ERROR otopi.context context._executeMethod:151 Failed to execute stage 'Closing up': Failed to start service 'vdsmd' 2016-12-27 01:29:29 DEBUG otopi.context context.dumpEnvironment:760 ENVIRONMENT DUMP - BEGIN 2016-12-27 01:29:29 DEBUG otopi.context context.dumpEnvironment:770 ENV BASE/error=bool:'True' 2016-12-27 01:29:29 DEBUG otopi.context context.dumpEnvironment:770 ENV BASE/excep
[2] http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/lastComplet...
In the log I see Processing package vdsm-4.20.0-7.gitf851d1b.el7.centos.x86_64 which is from Dec 22 (last Thursday). This is because of us missing a master-branch tag. v4.20.0 wrongly tagged on the same commit as that of v4.19.1, removed, and never placed properly. I've re-pushed v4.20.0 properly, and now merged a patch to trigger build-artifacts in master. http://jenkins.ovirt.org/job/vdsm_master_build-artifacts-el7-x86_64/1544/ When this is done, could you use it to take the artifacts and try again? Regards, Dan.

After the following fix had been merged, we still had an issue with vm_run but it had been fixed as well. Master experimental is now working properly. Thanks Dan Gil On Tue, Dec 27, 2016 at 10:24 AM, Dan Kenigsberg <danken@redhat.com> wrote:
On Tue, Dec 27, 2016 at 9:59 AM, Eyal Edri <eedri@redhat.com> wrote:
On Tue, Dec 27, 2016 at 9:56 AM, Eyal Edri <eedri@redhat.com> wrote:
Any updates? The tests are still failing on vdsmd won't start from Sunday... master repos havn't been refreshed for a few days due to this.
from host deploy log: [1] basic-suite-master-engine/_var_log_ovirt-engine/host-
deploy/ovirt-host-deploy-20161227012930-192.168.201.4-14af2bf0.log
the job links [2]
[1] http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_ master/lastCompletedBuild/artifact/exported-artifacts/ basic_suite_master.sh-el7/exported-artifacts/test_logs/ basic-suite-master/post-002_bootstrap.py/lago-
Now with the full link: http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_ master/lastCompletedBuild/artifact/exported-artifacts/ basic_suite_master.sh-el7/exported-artifacts/test_logs/ basic-suite-master/post-002_bootstrap.py/lago-basic-suite- master-engine/_var_log_ovirt-engine/host-deploy/ovirt-host- deploy-20161227012930-192.168.201.4-14af2bf0.log
016-12-27 01:29:29 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:921 execute-output: ('/bin/systemctl', 'start', 'vdsmd.service') stdout:
2016-12-27 01:29:29 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:926 execute-output: ('/bin/systemctl', 'start', 'vdsmd.service') stderr: A dependency job for vdsmd.service failed. See 'journalctl -xe' for details.
2016-12-27 01:29:29 DEBUG otopi.context context._executeMethod:142
method
exception Traceback (most recent call last): File "/tmp/ovirt-QZ1ucxWFfm/pythonlib/otopi/context.py", line 132, in _executeMethod method['method']() File "/tmp/ovirt-QZ1ucxWFfm/otopi-plugins/ovirt-host-deploy/ vdsm/packages.py", line 209, in _start self.services.state('vdsmd', True) File "/tmp/ovirt-QZ1ucxWFfm/otopi-plugins/otopi/services/systemd.py", line 141, in state service=name, RuntimeError: Failed to start service 'vdsmd' 2016-12-27 01:29:29 ERROR otopi.context context._executeMethod:151 Failed to execute stage 'Closing up': Failed to start service 'vdsmd' 2016-12-27 01:29:29 DEBUG otopi.context context.dumpEnvironment:760 ENVIRONMENT DUMP - BEGIN 2016-12-27 01:29:29 DEBUG otopi.context context.dumpEnvironment:770 ENV BASE/error=bool:'True' 2016-12-27 01:29:29 DEBUG otopi.context context.dumpEnvironment:770 ENV BASE/excep
[2] http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_ master/lastCompletedBuild/testReport/
In the log I see
Processing package vdsm-4.20.0-7.gitf851d1b.el7.centos.x86_64
which is from Dec 22 (last Thursday). This is because of us missing a master-branch tag. v4.20.0 wrongly tagged on the same commit as that of v4.19.1, removed, and never placed properly.
I've re-pushed v4.20.0 properly, and now merged a patch to trigger build-artifacts in master. http://jenkins.ovirt.org/job/vdsm_master_build-artifacts-el7-x86_64/1544/
When this is done, could you use it to take the artifacts and try again?
Regards, Dan.

On Dec 27, 2016 3:01 PM, "Gil Shinar" <gshinar@redhat.com> wrote: After the following fix had been merged, we still had an issue with vm_run but it had been fixed as well. Master experimental is now working properly. Excellent news! 1. Can we publish it? 2. 4.1 branch? Y. Thanks Dan Gil On Tue, Dec 27, 2016 at 10:24 AM, Dan Kenigsberg <danken@redhat.com> wrote:
On Tue, Dec 27, 2016 at 9:59 AM, Eyal Edri <eedri@redhat.com> wrote:
On Tue, Dec 27, 2016 at 9:56 AM, Eyal Edri <eedri@redhat.com> wrote:
Any updates? The tests are still failing on vdsmd won't start from Sunday... master repos havn't been refreshed for a few days due to this.
from host deploy log: [1] basic-suite-master-engine/_var_log_ovirt-engine/host-deploy/
the job links [2]
[1] http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_ma ster/lastCompletedBuild/artifact/exported-artifacts/basic_ suite_master.sh-el7/exported-artifacts/test_logs/basic- suite-master/post-002_bootstrap.py/lago-
Now with the full link: http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_ma ster/lastCompletedBuild/artifact/exported-artifacts/basic_ suite_master.sh-el7/exported-artifacts/test_logs/basic- suite-master/post-002_bootstrap.py/lago-basic-suite-master- engine/_var_log_ovirt-engine/host-deploy/ovirt-host-deploy- 20161227012930-192.168.201.4-14af2bf0.log
016-12-27 01:29:29 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:921 execute-output: ('/bin/systemctl', 'start', 'vdsmd.service') stdout:
2016-12-27 01:29:29 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:926 execute-output: ('/bin/systemctl', 'start', 'vdsmd.service') stderr: A dependency job for vdsmd.service failed. See 'journalctl -xe' for details.
2016-12-27 01:29:29 DEBUG otopi.context context._executeMethod:142
method
exception Traceback (most recent call last): File "/tmp/ovirt-QZ1ucxWFfm/pythonlib/otopi/context.py", line 132, in _executeMethod method['method']() File "/tmp/ovirt-QZ1ucxWFfm/otopi-plugins/ovirt-host-deploy/vdsm/
ovirt-host-deploy-20161227012930-192.168.201.4-14af2bf0.log packages.py",
line 209, in _start self.services.state('vdsmd', True) File "/tmp/ovirt-QZ1ucxWFfm/otopi-plugins/otopi/services/systemd.py", line 141, in state service=name, RuntimeError: Failed to start service 'vdsmd' 2016-12-27 01:29:29 ERROR otopi.context context._executeMethod:151 Failed to execute stage 'Closing up': Failed to start service 'vdsmd' 2016-12-27 01:29:29 DEBUG otopi.context context.dumpEnvironment:760 ENVIRONMENT DUMP - BEGIN 2016-12-27 01:29:29 DEBUG otopi.context context.dumpEnvironment:770 ENV BASE/error=bool:'True' 2016-12-27 01:29:29 DEBUG otopi.context context.dumpEnvironment:770 ENV BASE/excep
[2] http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_ma ster/lastCompletedBuild/testReport/
In the log I see
Processing package vdsm-4.20.0-7.gitf851d1b.el7.centos.x86_64
which is from Dec 22 (last Thursday). This is because of us missing a master-branch tag. v4.20.0 wrongly tagged on the same commit as that of v4.19.1, removed, and never placed properly.
I've re-pushed v4.20.0 properly, and now merged a patch to trigger build-artifacts in master. http://jenkins.ovirt.org/job/vdsm_master_build-artifacts-el7-x86_64/1544/
When this is done, could you use it to take the artifacts and try again?
Regards, Dan.

On Tue, Dec 27, 2016 at 3:51 PM, Yaniv Kaul <ykaul@redhat.com> wrote:
On Dec 27, 2016 3:01 PM, "Gil Shinar" <gshinar@redhat.com> wrote:
After the following fix had been merged, we still had an issue with vm_run but it had been fixed as well. Master experimental is now working properly.
Excellent news! 1. Can we publish it? 2. 4.1 branch?
We merged [1], so you don't have to wait anymore for publishing, it will run on latest.test repos. [1] https://gerrit.ovirt.org/#/c/68896/
Y.
Thanks Dan Gil
On Tue, Dec 27, 2016 at 10:24 AM, Dan Kenigsberg <danken@redhat.com> wrote:
On Tue, Dec 27, 2016 at 9:59 AM, Eyal Edri <eedri@redhat.com> wrote:
On Tue, Dec 27, 2016 at 9:56 AM, Eyal Edri <eedri@redhat.com> wrote:
Any updates? The tests are still failing on vdsmd won't start from Sunday... master repos havn't been refreshed for a few days due to this.
from host deploy log: [1] basic-suite-master-engine/_var_log_ovirt-engine/host-deploy/
the job links [2]
[1] http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_ma ster/lastCompletedBuild/artifact/exported-artifacts/basic_su ite_master.sh-el7/exported-artifacts/test_logs/basic-suite- master/post-002_bootstrap.py/lago-
Now with the full link: http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_ma ster/lastCompletedBuild/artifact/exported-artifacts/basic_su ite_master.sh-el7/exported-artifacts/test_logs/basic-suite- master/post-002_bootstrap.py/lago-basic-suite-master-engine /_var_log_ovirt-engine/host-deploy/ovirt-host-deploy-20161 227012930-192.168.201.4-14af2bf0.log
016-12-27 01:29:29 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:921 execute-output: ('/bin/systemctl', 'start', 'vdsmd.service') stdout:
2016-12-27 01:29:29 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:926 execute-output: ('/bin/systemctl', 'start', 'vdsmd.service') stderr: A dependency job for vdsmd.service failed. See 'journalctl -xe' for details.
2016-12-27 01:29:29 DEBUG otopi.context context._executeMethod:142
method
exception Traceback (most recent call last): File "/tmp/ovirt-QZ1ucxWFfm/pythonlib/otopi/context.py", line 132, in _executeMethod method['method']() File "/tmp/ovirt-QZ1ucxWFfm/otopi-plugins/ovirt-host-deploy/vdsm/
line 209, in _start self.services.state('vdsmd', True) File "/tmp/ovirt-QZ1ucxWFfm/otopi-plugins/otopi/services/systemd.
ovirt-host-deploy-20161227012930-192.168.201.4-14af2bf0.log packages.py", py",
line 141, in state service=name, RuntimeError: Failed to start service 'vdsmd' 2016-12-27 01:29:29 ERROR otopi.context context._executeMethod:151 Failed to execute stage 'Closing up': Failed to start service 'vdsmd' 2016-12-27 01:29:29 DEBUG otopi.context context.dumpEnvironment:760 ENVIRONMENT DUMP - BEGIN 2016-12-27 01:29:29 DEBUG otopi.context context.dumpEnvironment:770 ENV BASE/error=bool:'True' 2016-12-27 01:29:29 DEBUG otopi.context context.dumpEnvironment:770 ENV BASE/excep
[2] http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_ma ster/lastCompletedBuild/testReport/
In the log I see
Processing package vdsm-4.20.0-7.gitf851d1b.el7.centos.x86_64
which is from Dec 22 (last Thursday). This is because of us missing a master-branch tag. v4.20.0 wrongly tagged on the same commit as that of v4.19.1, removed, and never placed properly.
I've re-pushed v4.20.0 properly, and now merged a patch to trigger build-artifacts in master. http://jenkins.ovirt.org/job/vdsm_master_build-artifacts-el7-x86_64/1544/
When this is done, could you use it to take the artifacts and try again?
Regards, Dan.
-- Eyal Edri Associate Manager RHV DevOps EMEA ENG Virtualization R&D Red Hat Israel phone: +972-9-7692018 irc: eedri (on #tlv #rhev-dev #rhev-integ)
participants (4)
-
Dan Kenigsberg
-
Eyal Edri
-
Gil Shinar
-
Yaniv Kaul