[ OST Failure Report ] [ oVirt master ] [ 14-11-2017 ] [ 002_bootstrap.add_hosts ]

This is a multi-part message in MIME format. --------------5185435FD38913B821CED967 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Hi, We had a failure in upgrade suite for 002_bootstrap.add_hosts. I am not seeing any error that can suggest on an issue in engine. I can see in the host messages host that we have stopped writing to the log for 15 minutes and it may suggest that there is something that is keeping the host from starting which causes us to fail the test on timeout. However, i can use some help in determining the cause for this failure and weather its connected to the bootstrap_add_host test in upgrade. ** *Link to suspected patches: As I said,**I do not think its related, but this is the patch that was reported. * * https://gerrit.ovirt.org/#/c/83854/ Link to Job: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3795/* * * *Link to all logs:* *http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3795/artifact/... * * *(Relevant) error snippet from the log: * * <error> ** _*Test error: *_** ** * Error Message False != True after 900 seconds Stacktrace Traceback (most recent call last): File "/usr/lib64/python2.7/unittest/case.py", line 369, in run testMethod() File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in runTest self.test(*self.arg) File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 129, in wrapped_test test() File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 59, in wrapper return func(get_test_prefix(), *args, **kwargs) File "/home/jenkins/workspace/ovirt-master_change-queue-tester/ovirt-system-tests/upgrade-from-release-suite-master/test-scenarios-after-upgrade/002_bootstrap.py", line 187, in add_hosts testlib.assert_true_within(_host_is_up_4, timeout=15*60) File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 263, in assert_true_within assert_equals_within(func, True, timeout, allowed_exceptions) File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 237, in assert_equals_within '%s != %s after %s seconds' % (res, value, timeout) AssertionError: False != True after 900 seconds ** ** ** **_lago log: _* * *2017-11-13 15:31:23,212::log_utils.py::__enter__::600::lago.prefix::INFO::ESC[0mESC[0m 2017-11-13 15:31:23,213::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:0468ed2f-b174-4d94-bc66-2b6e08087a86:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:23,213::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:91372413-f7fd-4b72-85b9-9f5216ca7ae9:Get ssh client for lago-upgrade-from-release-suite-master-engine: 2017-11-13 15:31:23,213::ssh.py::get_ssh_client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:23,213::ssh.py::get_ssh_client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-engine 2017-11-13 15:31:26,220::ssh.py::get_ssh_client::354::lago.ssh::DEBUG::Socket error connecting to lago-upgrade-from-release-suite-master-engine: [Errno None] Unable to connect to port 22 on 192.168.200.3 2017-11-13 15:31:26,221::ssh.py::get_ssh_client::354::lago.ssh::DEBUG::Socket error connecting to lago-upgrade-from-release-suite-master-host0: [Errno None] Unable to connect to port 22 on 192.168.200.2 2017-11-13 15:31:27,222::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:91372413-f7fd-4b72-85b9-9f5216ca7ae9:Get ssh client for lago-upgrade-from-release-suite-master-engine: 2017-11-13 15:31:27,222::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:0468ed2f-b174-4d94-bc66-2b6e08087a86:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:27,222::ssh.py::wait_for_ssh::129::lago.ssh::DEBUG::Got exception while sshing to lago-upgrade-from-release-suite-master-engine: Timed out (in 4 s) trying to ssh to lago-upgrade-from-release-suite-master-engine 2017-11-13 15:31:27,222::ssh.py::wait_for_ssh::129::lago.ssh::DEBUG::Got exception while sshing to lago-upgrade-from-release-suite-master-host0: Timed out (in 4 s) trying to ssh to lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:28,224::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:fc88f641-b012-4636-a471-9ccaaf361a53:Get ssh client for lago-upgrade-from-release-suite-master-engine: 2017-11-13 15:31:28,224::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:afb01a46-6338-407f-b7c9-9d5c6b91404d:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:28,224::ssh.py::get_ssh_client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-engine 2017-11-13 15:31:28,225::ssh.py::get_ssh_client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:28,226::ssh.py::get_ssh_client::354::lago.ssh::DEBUG::Socket error connecting to lago-upgrade-from-release-suite-master-host0: [Errno None] Unable to connect to port 22 on 192.168.200.2 2017-11-13 15:31:29,228::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:afb01a46-6338-407f-b7c9-9d5c6b91404d:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:29,228::ssh.py::wait_for_ssh::129::lago.ssh::DEBUG::Got exception while sshing to lago-upgrade-from-release-suite-master-host0: Timed out (in 1 s) trying to ssh to lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:29,229::ssh.py::get_ssh_client::354::lago.ssh::DEBUG::Socket error connecting to lago-upgrade-from-release-suite-master-engine: [Errno None] Unable to connect to port 22 on 192.168.200.3 2017-11-13 15:31:30,229::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:fc88f641-b012-4636-a471-9ccaaf361a53:Get ssh client for lago-upgrade-from-release-suite-master-engine: 2017-11-13 15:31:30,230::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:9df5ddca-bbc4-485a-9f3a-b0b9a5bb5990:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:30,230::ssh.py::wait_for_ssh::129::lago.ssh::DEBUG::Got exception while sshing to lago-upgrade-from-release-suite-master-engine: Timed out (in 2 s) trying to ssh to lago-upgrade-from-release-suite-master-engine 2017-11-13 15:31:30,230::ssh.py::get_ssh_client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:30,231::ssh.py::get_ssh_client::354::lago.ssh::DEBUG::Socket error connecting to lago-upgrade-from-release-suite-master-host0: [Errno None] Unable to connect to port 22 on 192.168.200.2 2017-11-13 15:31:31,231::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:53fe67da-a632-49fe-b697-e8a2c4cb7d23:Get ssh client for lago-upgrade-from-release-suite-master-engine: 2017-11-13 15:31:31,232::ssh.py::get_ssh_client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-engine 2017-11-13 15:31:31,232::ssh.py::get_ssh_client::354::lago.ssh::DEBUG::Socket error connecting to lago-upgrade-from-release-suite-master-engine: [Errno None] Unable to connect to port 22 on 192.168.200.3 2017-11-13 15:31:31,232::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:9df5ddca-bbc4-485a-9f3a-b0b9a5bb5990:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:31,233::ssh.py::wait_for_ssh::129::lago.ssh::DEBUG::Got exception while sshing to lago-upgrade-from-release-suite-master-host0: Timed out (in 1 s) trying to ssh to lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:32,234::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:53fe67da-a632-49fe-b697-e8a2c4cb7d23:Get ssh client for lago-upgrade-from-release-suite-master-engine: 2017-11-13 15:31:32,234::ssh.py::wait_for_ssh::129::lago.ssh::DEBUG::Got exception while sshing to lago-upgrade-from-release-suite-master-engine: Timed out (in 1 s) trying to ssh to lago-upgrade-from-release-suite-master-engine 2017-11-13 15:31:32,234::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:1969bf5b-4e66-43d7-91c6-a28949e98fe8:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:32,234::ssh.py::get_ssh_client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:32,235::ssh.py::get_ssh_client::354::lago.ssh::DEBUG::Socket error connecting to lago-upgrade-from-release-suite-master-host0: [Errno None] Unable to connect to port 22 on 192.168.200.2 2017-11-13 15:31:33,235::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:5fd0e3c3-d83c-46f9-9c88-bc739aa9c430:Get ssh client for lago-upgrade-from-release-suite-master-engine: 2017-11-13 15:31:33,235::ssh.py::get_ssh_client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-engine 2017-11-13 15:31:33,236::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:1969bf5b-4e66-43d7-91c6-a28949e98fe8:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:33,236::ssh.py::get_ssh_client::354::lago.ssh::DEBUG::Socket error connecting to lago-upgrade-from-release-suite-master-engine: [Errno None] Unable to connect to port 22 on 192.168.200.3 2017-11-13 15:31:33,237::ssh.py::wait_for_ssh::129::lago.ssh::DEBUG::Got exception while sshing to lago-upgrade-from-release-suite-master-host0: Timed out (in 1 s) trying to ssh to lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:34,238::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:5fd0e3c3-d83c-46f9-9c88-bc739aa9c430:Get ssh client for lago-upgrade-from-release-suite-master-engine: 2017-11-13 15:31:34,238::ssh.py::wait_for_ssh::129::lago.ssh::DEBUG::Got exception while sshing to lago-upgrade-from-release-suite-master-engine: Timed out (in 1 s) trying to ssh to lago-upgrade-from-release-suite-master-engine 2017-11-13 15:31:34,238::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:22550d6e-c397-43dc-9691-c8d9debfe416:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:34,239::ssh.py::get_ssh_client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:34,407::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:22550d6e-c397-43dc-9691-c8d9debfe416:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:34,639::ssh.py::ssh::58::lago.ssh::DEBUG::Running bb49adfa on lago-upgrade-from-release-suite-master-host0: true 2017-11-13 15:31:34,665::ssh.py::ssh::81::lago.ssh::DEBUG::Command bb49adfa on lago-upgrade-from-release-suite-master-host0 returned with 0 2017-11-13 15:31:34,665::ssh.py::wait_for_ssh::153::lago.ssh::DEBUG::Wait succeeded for ssh to lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:34,665::log_utils.py::__exit__::611::lago.prefix::INFO::ESC[32mSuccessESC[0m (in 0:00:11) 2017-11-13 15:31:34,666::log_utils.py::__enter__::600::lago.prefix::INFO::ESC[0mESC[0m 2017-11-13 15:31:34,666::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:4e2a7421-28f7-4679-955f-7261fe377939:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:34,667::ssh.py::get_ssh_client::339::lago.ssh::DEBUG::Still got 100 tries for lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:34,805::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:4e2a7421-28f7-4679-955f-7261fe377939:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:34,906::ssh.py::ssh::58::lago.ssh::DEBUG::Running bb724bf2 on lago-upgrade-from-release-suite-master-host0: bash -s < "set -xe DIST=$(uname -r | sed -r 's/^.*\.([^\.]+)\.[^\.]+$/\1/') * * * **_from host: _* * ions. Proceeding anyway. Nov 13 10:31:34 lago-upgrade-from-release-suite-master-host0 systemd: Configuration file /usr/lib/systemd/system/ebtables.service is marked executable. Please remove executable permission bits. Proceeding anyway. Nov 13 10:31:34 lago-upgrade-from-release-suite-master-host0 systemd: Configuration file /usr/lib/systemd/system/wpa_supplicant.service is marked executable. Please remove executable permission bits. Proceeding anyway. Nov 13 10:31:55 lago-upgrade-from-release-suite-master-host0 yum[1294]: Updated: iptables-1.4.21-18.2.el7_4.x86_64 Nov 13 10:31:56 lago-upgrade-from-release-suite-master-host0 systemd-logind: Removed session 3. Nov 13 10:31:56 lago-upgrade-from-release-suite-master-host0 systemd: Removed slice user-0.slice. Nov 13 10:31:56 lago-upgrade-from-release-suite-master-host0 systemd: Stopping user-0.slice. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Created slice user-0.slice. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Starting user-0.slice. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Started Session 4 of user root. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Starting Session 4 of user root. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd-logind: New session 4 of user root. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd-logind: Removed session 4. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Removed slice user-0.slice. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Stopping user-0.slice. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Created slice user-0.slice. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Starting user-0.slice. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Started Session 5 of user root. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Starting Session 5 of user root. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd-logind: New session 5 of user root. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd-logind: Removed session 5. Nov 13 10:44:01 lago-upgrade-from-release-suite-master-host0 systemd: Removed slice user-0.slice. Nov 13 10:44:01 lago-upgrade-from-release-suite-master-host0 systemd: Stopping user-0.slice. Nov 13 10:46:27 lago-upgrade-from-release-suite-master-host0 systemd: Starting Cleanup of Temporary Directories... Nov 13 10:46:27 lago-upgrade-from-release-suite-master-host0 systemd: Started Cleanup of Temporary Directories. Nov 13 10:54:36 lago-upgrade-from-release-suite-master-host0 dhclient[906]: DHCPREQUEST on eth0 to 192.168.200.1 port 67 (xid=0x564629cb) Nov 13 10:54:36 lago-upgrade-from-release-suite-master-host0 dhclient[906]: DHCPACK from 192.168.200.1 (xid=0x564629cb)** * * *</error>* ** --------------5185435FD38913B821CED967 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit <html> <head> <meta http-equiv="content-type" content="text/html; charset=utf-8"> </head> <body text="#000000" bgcolor="#FFFFFF"> <p>Hi, <br> </p> <p>We had a failure in upgrade suite for 002_bootstrap.add_hosts. I am not seeing any error that can suggest on an issue in engine. <br> </p> <p>I can see in the host messages host that we have stopped writing to the log for 15 minutes and it may suggest that there is something that is keeping the host from starting which causes us to fail the test on timeout. <br> </p> <p>However, i can use some help in determining the cause for this failure and weather its connected to the bootstrap_add_host test in upgrade.<br> </p> <p><b style="font-weight:normal;" id="docs-internal-guid-5859b7a1-ba60-843c-cfdd-7366571bcb57"> <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;">Link to suspected patches: As I said,<b> </b>I do not think its related, but this is the patch that was reported. </span></p> <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;"> </span></p> <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;"><a class="moz-txt-link-freetext" href="https://gerrit.ovirt.org/#/c/83854/">https://gerrit.ovirt.org/#/c/83854/</a></span></p> <br> <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;">Link to Job:</span></p> <a class="moz-txt-link-freetext" href="http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3795/">http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3795/</a></b></p> <p><b style="font-weight:normal;" id="docs-internal-guid-5859b7a1-ba60-843c-cfdd-7366571bcb57"><br> <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;">Link to all logs:</span></p> <a class="moz-txt-link-freetext" href="http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3795/artifact/">http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3795/artifact/</a></b></p> <p><b style="font-weight:normal;" id="docs-internal-guid-5859b7a1-ba60-843c-cfdd-7366571bcb57"><br> <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;">(Relevant) error snippet from the log: </span></p> <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;"><error></span></p> <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;"> </span></p> <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;"> </span></p> </b><b style="font-weight:normal;" id="docs-internal-guid-5859b7a1-ba60-843c-cfdd-7366571bcb57"> <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;"><u><font size="+1"><b>Test error: </b></font></u><b> </b></span></p> <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;"><b> </b></span></p> </b></p> <h3 style="box-sizing: border-box; color: rgb(51, 51, 51); font-family: Helvetica, Arial, sans-serif; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">Error Message</h3> <pre style="box-sizing: border-box; white-space: pre-wrap; word-wrap: break-word; margin: 0px; color: rgb(51, 51, 51); font-size: 14px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">False != True after 900 seconds</pre> <h3 style="box-sizing: border-box; color: rgb(51, 51, 51); font-family: Helvetica, Arial, sans-serif; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">Stacktrace</h3> <pre style="box-sizing: border-box; white-space: pre-wrap; word-wrap: break-word; margin: 0px; color: rgb(51, 51, 51); font-size: 14px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">Traceback (most recent call last): File "/usr/lib64/python2.7/unittest/case.py", line 369, in run testMethod() File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in runTest self.test(*self.arg) File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 129, in wrapped_test test() File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 59, in wrapper return func(get_test_prefix(), *args, **kwargs) File "/home/jenkins/workspace/ovirt-master_change-queue-tester/ovirt-system-tests/upgrade-from-release-suite-master/test-scenarios-after-upgrade/002_bootstrap.py", line 187, in add_hosts testlib.assert_true_within(_host_is_up_4, timeout=15*60) File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 263, in assert_true_within assert_equals_within(func, True, timeout, allowed_exceptions) File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 237, in assert_equals_within '%s != %s after %s seconds' % (res, value, timeout) AssertionError: False != True after 900 seconds</pre> <p><b style="font-weight:normal;" id="docs-internal-guid-5859b7a1-ba60-843c-cfdd-7366571bcb57"> <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;"></span></p> </b></p> <p><b style="font-weight:normal;" id="docs-internal-guid-5859b7a1-ba60-843c-cfdd-7366571bcb57"><b><u><font size="+1">lago log: </font></u></b><br> </b></p> <p><b style="font-weight:normal;" id="docs-internal-guid-5859b7a1-ba60-843c-cfdd-7366571bcb57">2017-11-13 15:31:23,212::log_utils.py::__enter__::600::lago.prefix::<a class="moz-txt-link-freetext" href="INFO::ESC">INFO::ESC</a>[0mESC[0m<br> 2017-11-13 15:31:23,213::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:0468ed2f-b174-4d94-bc66-2b6e08087a86:Get ssh client for lago-upgrade-from-release-suite-master-host0:<br> 2017-11-13 15:31:23,213::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:91372413-f7fd-4b72-85b9-9f5216ca7ae9:Get ssh client for lago-upgrade-from-release-suite-master-engine:<br> 2017-11-13 15:31:23,213::ssh.py::get_ssh_client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-host0<br> 2017-11-13 15:31:23,213::ssh.py::get_ssh_client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-engine<br> 2017-11-13 15:31:26,220::ssh.py::get_ssh_client::354::lago.ssh::DEBUG::Socket error connecting to lago-upgrade-from-release-suite-master-engine: [Errno None] Unable to connect to port 22 on 192.168.200.3<br> 2017-11-13 15:31:26,221::ssh.py::get_ssh_client::354::lago.ssh::DEBUG::Socket error connecting to lago-upgrade-from-release-suite-master-host0: [Errno None] Unable to connect to port 22 on 192.168.200.2<br> 2017-11-13 15:31:27,222::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:91372413-f7fd-4b72-85b9-9f5216ca7ae9:Get ssh client for lago-upgrade-from-release-suite-master-engine:<br> 2017-11-13 15:31:27,222::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:0468ed2f-b174-4d94-bc66-2b6e08087a86:Get ssh client for lago-upgrade-from-release-suite-master-host0:<br> 2017-11-13 15:31:27,222::ssh.py::wait_for_<a class="moz-txt-link-freetext" href="ssh::129::lago.ssh::DEBUG::Got">ssh::129::lago.ssh::DEBUG::Got</a> exception while sshing to lago-upgrade-from-release-suite-master-engine: Timed out (in 4 s) trying to ssh to lago-upgrade-from-release-suite-master-engine<br> 2017-11-13 15:31:27,222::ssh.py::wait_for_<a class="moz-txt-link-freetext" href="ssh::129::lago.ssh::DEBUG::Got">ssh::129::lago.ssh::DEBUG::Got</a> exception while sshing to lago-upgrade-from-release-suite-master-host0: Timed out (in 4 s) trying to ssh to lago-upgrade-from-release-suite-master-host0<br> 2017-11-13 15:31:28,224::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:fc88f641-b012-4636-a471-9ccaaf361a53:Get ssh client for lago-upgrade-from-release-suite-master-engine:<br> 2017-11-13 15:31:28,224::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:afb01a46-6338-407f-b7c9-9d5c6b91404d:Get ssh client for lago-upgrade-from-release-suite-master-host0:<br> 2017-11-13 15:31:28,224::ssh.py::get_ssh_client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-engine<br> 2017-11-13 15:31:28,225::ssh.py::get_ssh_client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-host0<br> 2017-11-13 15:31:28,226::ssh.py::get_ssh_client::354::lago.ssh::DEBUG::Socket error connecting to lago-upgrade-from-release-suite-master-host0: [Errno None] Unable to connect to port 22 on 192.168.200.2<br> 2017-11-13 15:31:29,228::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:afb01a46-6338-407f-b7c9-9d5c6b91404d:Get ssh client for lago-upgrade-from-release-suite-master-host0:<br> 2017-11-13 15:31:29,228::ssh.py::wait_for_<a class="moz-txt-link-freetext" href="ssh::129::lago.ssh::DEBUG::Got">ssh::129::lago.ssh::DEBUG::Got</a> exception while sshing to lago-upgrade-from-release-suite-master-host0: Timed out (in 1 s) trying to ssh to lago-upgrade-from-release-suite-master-host0<br> 2017-11-13 15:31:29,229::ssh.py::get_ssh_client::354::lago.ssh::DEBUG::Socket error connecting to lago-upgrade-from-release-suite-master-engine: [Errno None] Unable to connect to port 22 on 192.168.200.3<br> 2017-11-13 15:31:30,229::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:fc88f641-b012-4636-a471-9ccaaf361a53:Get ssh client for lago-upgrade-from-release-suite-master-engine:<br> 2017-11-13 15:31:30,230::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:9df5ddca-bbc4-485a-9f3a-b0b9a5bb5990:Get ssh client for lago-upgrade-from-release-suite-master-host0:<br> 2017-11-13 15:31:30,230::ssh.py::wait_for_<a class="moz-txt-link-freetext" href="ssh::129::lago.ssh::DEBUG::Got">ssh::129::lago.ssh::DEBUG::Got</a> exception while sshing to lago-upgrade-from-release-suite-master-engine: Timed out (in 2 s) trying to ssh to lago-upgrade-from-release-suite-master-engine<br> 2017-11-13 15:31:30,230::ssh.py::get_ssh_client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-host0<br> 2017-11-13 15:31:30,231::ssh.py::get_ssh_client::354::lago.ssh::DEBUG::Socket error connecting to lago-upgrade-from-release-suite-master-host0: [Errno None] Unable to connect to port 22 on 192.168.200.2<br> 2017-11-13 15:31:31,231::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:53fe67da-a632-49fe-b697-e8a2c4cb7d23:Get ssh client for lago-upgrade-from-release-suite-master-engine:<br> 2017-11-13 15:31:31,232::ssh.py::get_ssh_client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-engine<br> 2017-11-13 15:31:31,232::ssh.py::get_ssh_client::354::lago.ssh::DEBUG::Socket error connecting to lago-upgrade-from-release-suite-master-engine: [Errno None] Unable to connect to port 22 on 192.168.200.3<br> 2017-11-13 15:31:31,232::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:9df5ddca-bbc4-485a-9f3a-b0b9a5bb5990:Get ssh client for lago-upgrade-from-release-suite-master-host0:<br> 2017-11-13 15:31:31,233::ssh.py::wait_for_<a class="moz-txt-link-freetext" href="ssh::129::lago.ssh::DEBUG::Got">ssh::129::lago.ssh::DEBUG::Got</a> exception while sshing to lago-upgrade-from-release-suite-master-host0: Timed out (in 1 s) trying to ssh to lago-upgrade-from-release-suite-master-host0<br> 2017-11-13 15:31:32,234::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:53fe67da-a632-49fe-b697-e8a2c4cb7d23:Get ssh client for lago-upgrade-from-release-suite-master-engine:<br> 2017-11-13 15:31:32,234::ssh.py::wait_for_<a class="moz-txt-link-freetext" href="ssh::129::lago.ssh::DEBUG::Got">ssh::129::lago.ssh::DEBUG::Got</a> exception while sshing to lago-upgrade-from-release-suite-master-engine: Timed out (in 1 s) trying to ssh to lago-upgrade-from-release-suite-master-engine<br> 2017-11-13 15:31:32,234::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:1969bf5b-4e66-43d7-91c6-a28949e98fe8:Get ssh client for lago-upgrade-from-release-suite-master-host0:<br> 2017-11-13 15:31:32,234::ssh.py::get_ssh_client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-host0<br> 2017-11-13 15:31:32,235::ssh.py::get_ssh_client::354::lago.ssh::DEBUG::Socket error connecting to lago-upgrade-from-release-suite-master-host0: [Errno None] Unable to connect to port 22 on 192.168.200.2<br> 2017-11-13 15:31:33,235::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:5fd0e3c3-d83c-46f9-9c88-bc739aa9c430:Get ssh client for lago-upgrade-from-release-suite-master-engine:<br> 2017-11-13 15:31:33,235::ssh.py::get_ssh_client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-engine<br> 2017-11-13 15:31:33,236::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:1969bf5b-4e66-43d7-91c6-a28949e98fe8:Get ssh client for lago-upgrade-from-release-suite-master-host0:<br> 2017-11-13 15:31:33,236::ssh.py::get_ssh_client::354::lago.ssh::DEBUG::Socket error connecting to lago-upgrade-from-release-suite-master-engine: [Errno None] Unable to connect to port 22 on 192.168.200.3<br> 2017-11-13 15:31:33,237::ssh.py::wait_for_<a class="moz-txt-link-freetext" href="ssh::129::lago.ssh::DEBUG::Got">ssh::129::lago.ssh::DEBUG::Got</a> exception while sshing to lago-upgrade-from-release-suite-master-host0: Timed out (in 1 s) trying to ssh to lago-upgrade-from-release-suite-master-host0<br> 2017-11-13 15:31:34,238::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:5fd0e3c3-d83c-46f9-9c88-bc739aa9c430:Get ssh client for lago-upgrade-from-release-suite-master-engine:<br> 2017-11-13 15:31:34,238::ssh.py::wait_for_<a class="moz-txt-link-freetext" href="ssh::129::lago.ssh::DEBUG::Got">ssh::129::lago.ssh::DEBUG::Got</a> exception while sshing to lago-upgrade-from-release-suite-master-engine: Timed out (in 1 s) trying to ssh to lago-upgrade-from-release-suite-master-engine<br> 2017-11-13 15:31:34,238::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:22550d6e-c397-43dc-9691-c8d9debfe416:Get ssh client for lago-upgrade-from-release-suite-master-host0:<br> 2017-11-13 15:31:34,239::ssh.py::get_ssh_client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-host0<br> 2017-11-13 15:31:34,407::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:22550d6e-c397-43dc-9691-c8d9debfe416:Get ssh client for lago-upgrade-from-release-suite-master-host0:<br> 2017-11-13 15:31:34,639::ssh.py::<a class="moz-txt-link-freetext" href="ssh::58::lago.ssh::DEBUG::Running">ssh::58::lago.ssh::DEBUG::Running</a> bb49adfa on lago-upgrade-from-release-suite-master-host0: true<br> 2017-11-13 15:31:34,665::ssh.py::<a class="moz-txt-link-freetext" href="ssh::81::lago.ssh::DEBUG::Command">ssh::81::lago.ssh::DEBUG::Command</a> bb49adfa on lago-upgrade-from-release-suite-master-host0 returned with 0<br> 2017-11-13 15:31:34,665::ssh.py::wait_for_<a class="moz-txt-link-freetext" href="ssh::153::lago.ssh::DEBUG::Wait">ssh::153::lago.ssh::DEBUG::Wait</a> succeeded for ssh to lago-upgrade-from-release-suite-master-host0<br> 2017-11-13 15:31:34,665::log_utils.py::__exit__::611::lago.prefix::<a class="moz-txt-link-freetext" href="INFO::ESC">INFO::ESC</a>[32mSuccessESC[0m (in 0:00:11)<br> 2017-11-13 15:31:34,666::log_utils.py::__enter__::600::lago.prefix::<a class="moz-txt-link-freetext" href="INFO::ESC">INFO::ESC</a>[0mESC[0m<br> 2017-11-13 15:31:34,666::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:4e2a7421-28f7-4679-955f-7261fe377939:Get ssh client for lago-upgrade-from-release-suite-master-host0:<br> 2017-11-13 15:31:34,667::ssh.py::get_ssh_client::339::lago.ssh::DEBUG::Still got 100 tries for lago-upgrade-from-release-suite-master-host0<br> 2017-11-13 15:31:34,805::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:4e2a7421-28f7-4679-955f-7261fe377939:Get ssh client for lago-upgrade-from-release-suite-master-host0:<br> 2017-11-13 15:31:34,906::ssh.py::<a class="moz-txt-link-freetext" href="ssh::58::lago.ssh::DEBUG::Running">ssh::58::lago.ssh::DEBUG::Running</a> bb724bf2 on lago-upgrade-from-release-suite-master-host0: bash -s < "set -xe<br> DIST=$(uname -r | sed -r 's/^.*\.([^\.]+)\.[^\.]+$/\1/')<br> <br> </b></p> <p><b style="font-weight:normal;" id="docs-internal-guid-5859b7a1-ba60-843c-cfdd-7366571bcb57"><br> </b></p> <p><b style="font-weight:normal;" id="docs-internal-guid-5859b7a1-ba60-843c-cfdd-7366571bcb57"><b><u><font size="+1">from host: </font></u></b><br> </b></p> <p><span style="font-weight:normal;">ions. Proceeding anyway.<br> Nov 13 10:31:34 lago-upgrade-from-release-suite-master-host0 systemd: Configuration file /usr/lib/systemd/system/ebtables.service is marked executable. Please remove executable permission bits. Proceeding anyway.<br> Nov 13 10:31:34 lago-upgrade-from-release-suite-master-host0 systemd: Configuration file /usr/lib/systemd/system/wpa_supplicant.service is marked executable. Please remove executable permission bits. Proceeding anyway.<br> Nov 13 10:31:55 lago-upgrade-from-release-suite-master-host0 yum[1294]: Updated: iptables-1.4.21-18.2.el7_4.x86_64<br> Nov 13 10:31:56 lago-upgrade-from-release-suite-master-host0 systemd-logind: Removed session 3.<br> Nov 13 10:31:56 lago-upgrade-from-release-suite-master-host0 systemd: Removed slice user-0.slice.<br> Nov 13 10:31:56 lago-upgrade-from-release-suite-master-host0 systemd: Stopping user-0.slice.<br> Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Created slice user-0.slice.<br> Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Starting user-0.slice.<br> Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Started Session 4 of user root.<br> Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Starting Session 4 of user root.<br> Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd-logind: New session 4 of user root.<br> Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd-logind: Removed session 4.<br> Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Removed slice user-0.slice.<br> Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Stopping user-0.slice.<br> Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Created slice user-0.slice.<br> Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Starting user-0.slice.<br> Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Started Session 5 of user root.<br> Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Starting Session 5 of user root.<br> Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd-logind: New session 5 of user root.<br> Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd-logind: Removed session 5.<br> Nov 13 10:44:01 lago-upgrade-from-release-suite-master-host0 systemd: Removed slice user-0.slice.<br> Nov 13 10:44:01 lago-upgrade-from-release-suite-master-host0 systemd: Stopping user-0.slice.<br> Nov 13 10:46:27 lago-upgrade-from-release-suite-master-host0 systemd: Starting Cleanup of Temporary Directories...<br> Nov 13 10:46:27 lago-upgrade-from-release-suite-master-host0 systemd: Started Cleanup of Temporary Directories.<br> Nov 13 10:54:36 lago-upgrade-from-release-suite-master-host0 dhclient[906]: DHCPREQUEST on eth0 to 192.168.200.1 port 67 (xid=0x564629cb)<br> Nov 13 10:54:36 lago-upgrade-from-release-suite-master-host0 dhclient[906]: DHCPACK from 192.168.200.1 (xid=0x564629cb)</span><b style="font-weight:normal;" id="docs-internal-guid-5859b7a1-ba60-843c-cfdd-7366571bcb57"><b><br> </b><br> <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;"></error></span></p> </b><br class="Apple-interchange-newline"> </p> </body> </html> --------------5185435FD38913B821CED967--

On Tue, Nov 14, 2017 at 5:48 PM, Dafna Ron <dron@redhat.com> wrote:
Hi,
We had a failure in upgrade suite for 002_bootstrap.add_hosts. I am not seeing any error that can suggest on an issue in engine.
I can see in the host messages host that we have stopped writing to the log for 15 minutes and it may suggest that there is something that is keeping the host from starting which causes us to fail the test on timeout.
However, i can use some help in determining the cause for this failure and weather its connected to the bootstrap_add_host test in upgrade.
Link to suspected patches: As I said, I do not think its related, but this is the patch that was reported.
https://gerrit.ovirt.org/#/c/83854/
Link to Job:
http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3795/
Link to all logs:
http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3795/artifact/
(Relevant) error snippet from the log:
<error>
Test error:
Error Message
False != True after 900 seconds
Stacktrace
Traceback (most recent call last): File "/usr/lib64/python2.7/unittest/case.py", line 369, in run testMethod() File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in runTest self.test(*self.arg) File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 129, in wrapped_test test() File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 59, in wrapper return func(get_test_prefix(), *args, **kwargs) File "/home/jenkins/workspace/ovirt-master_change-queue-tester/ovirt-system-tests/upgrade-from-release-suite-master/test-scenarios-after-upgrade/002_bootstrap.py", line 187, in add_hosts testlib.assert_true_within(_host_is_up_4, timeout=15*60) File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 263, in assert_true_within assert_equals_within(func, True, timeout, allowed_exceptions) File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 237, in assert_equals_within '%s != %s after %s seconds' % (res, value, timeout) AssertionError: False != True after 900 seconds
Hi, It took me way too long to find the log file that includes the above stack trace. There is a known bug that pressing '(all files in zip)' does not get all of them, only some. If we can't fix it, please add something that will create such a zip/tar/whatever as part of the job, so that we do not rely on jenkins. Something causes things to be very slow, no idea what in particular, but could not find a real problem. Since we timed out and killed the engine in the middle of host-deploy, we do not have full logs of it. These are copied to the engine from the host only after it finishes. Before that, the log is written on the host in /tmp. We might want to make ost collect that as well, to help debug similar cases. What we can see in engine log is that we start installing the host [1]: 2017-11-13 11:00:12,331-05 INFO [org.ovirt.engine.core.bll.hostdeploy.InstallVdsInternalCommand] (EE-ManagedThreadFactory-engine-Thread-1) [45082420] Before Installation host 83090668-908c-4f49-8690-5348ec12f931, lago-upgrade-from-release-suite-master-host0 2017-11-13 11:00:12,354-05 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (EE-ManagedThreadFactory-engine-Thread-1) [45082420] START, SetVdsStatusVDSCommand(HostName = lago-upgrade-from-release-suite-master-host0, SetVdsStatusVDSCommandParameters:{hostId='83090668-908c-4f49-8690-5348ec12f931', status='Installing', nonOperationalReason='NONE', stopSpmFailureLogged='false', maintenanceReason='null'}), log id: 22a831f7 First line from host-deploy is: 2017-11-13 11:00:13,335-05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [45082420] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host lago-upgrade-from-release-suite-master-host0. Stage: Initializing. which is one second later, ok. ... 2017-11-13 11:00:13,459-05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [45082420] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host lago-upgrade-from-release-suite-master-host0. Stage: Environment packages setup. It goes install (immediately) packages that it needs for itself, in this case it updates the package 'dmidecode', from 2.12-9 to 3.0-5. I guess our image is a bit old. ... 2017-11-13 11:00:19,516-05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [45082420] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host lago-upgrade-from-release-suite-master-host0. Yum Verify: 2/2: dmidecode.x86_64 1:2.12-9.el7 - ud. So it took (the equivalent of 'yum update dmidecode') 6 seconds. Reasonable. ... Later on, it starts installing/updating the actual packages it should install/update. Starts with: 2017-11-13 11:00:24,982-05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [45082420] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host lago-upgrade-from-release-suite-master-host0. Yum Status: Downloading Packages. Then there are many lines about each package download. First is: 2017-11-13 11:00:25,049-05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [45082420] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host lago-upgrade-from-release-suite-master-host0. Yum Download/Verify: GeoIP-1.5.0-11.el7.x86_64. Last is: 2017-11-13 11:00:43,790-05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [45082420] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host lago-upgrade-from-release-suite-master-host0. Yum Download/Verify: zlib-1.2.7-17.el7.x86_64. Later we see it updates 570 packages. So downloading 570 packages took ~ 20 seconds, good too. Then it installs them. First is: 2017-11-13 11:00:46,114-05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [45082420] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host lago-upgrade-from-release-suite-master-host0. Yum update: 1/570: libgcc-4.8.5-16.el7.x86_64. And then many similar ones. I only skimmed through them, trying to find large gaps - didn't write a script - and a large one I find is: 2017-11-13 11:10:04,522-05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [45082420] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host lago-upgrade-from-release-suite-master-host0. Yum updated: 506/570: audit. 2017-11-13 11:13:06,483-05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [45082420] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host lago-upgrade-from-release-suite-master-host0. Yum updated: 507/570: openssl-libs. In between these two lines, there are a few unrelated ones, from some other engine thread. So something caused it to need 3 minutes to install openssl-libs. A wild guess: Perhaps we have low entropy, and openssl does something that needs entropy? Checking engine-setup log there [2], I see that the following command took almost 1.5 minutes to run: 2017-11-13 10:58:19,459-0500 DEBUG otopi.plugins.ovirt_engine_setup.ovirt_engine.config.sso plugin.executeRaw:813 execute: ('/usr/share/ovirt-engine/bin/ovirt-engine-crypto-tool.sh', 'pbe-encode', '--password=env:pass'), executable='None', cwd='None', env={'pass': '**FILTERED**', 'LESSOPEN': '||/usr/bin/lesspipe.sh %s', 'SSH_CLIENT': '192.168.200.1 52372 22', 'SELINUX_USE_CURRENT_RANGE': '', 'LOGNAME': 'root', 'USER': 'root', 'OVIRT_ENGINE_JAVA_HOME': u'/usr/lib/jvm/jre', 'PATH': '/opt/rh/rh-postgresql95/root/usr/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin', 'HOME': '/root', 'OVIRT_JBOSS_HOME': '/usr/share/ovirt-engine-wildfly', 'LD_LIBRARY_PATH': '/opt/rh/rh-postgresql95/root/usr/lib64', 'LANG': 'en_US.UTF-8', 'SHELL': '/bin/bash', 'LIBRARY_PATH': '/opt/rh/rh-postgresql95/root/usr/lib64', 'SHLVL': '4', 'POSTGRESQLENV': 'COMMAND/pg_dump=str:/opt/rh/rh-postgresql95/root/usr/bin/pg_dump COMMAND/psql=str:/opt/rh/rh-postgresql95/root/usr/bin/psql COMMAND/pg_restore=str:/opt/rh/rh-postgresql95/root/usr/bin/pg_restore COMMAND/postgresql-setup=str:/opt/rh/rh-postgresql95/root/usr/bin/postgresql-setup OVESETUP_PROVISIONING/postgresService=str:rh-postgresql95-postgresql OVESETUP_PROVISIONING/postgresConf=str:/var/opt/rh/rh-postgresql95/lib/pgsql/data/postgresql.conf OVESETUP_PROVISIONING/postgresPgHba=str:/var/opt/rh/rh-postgresql95/lib/pgsql/data/pg_hba.conf OVESETUP_PROVISIONING/postgresPgVersion=str:/var/opt/rh/rh-postgresql95/lib/pgsql/data/PG_VERSION', 'MANPATH': '/opt/rh/rh-postgresql95/root/usr/share/man:', 'X_SCLS': 'rh-postgresql95 ', 'XDG_RUNTIME_DIR': '/run/user/0', 'OVIRT_ENGINE_JAVA_HOME_FORCE': '1', 'PYTHONPATH': '/usr/share/ovirt-engine/setup/bin/..::', 'SELINUX_ROLE_REQUESTED': '', 'MAIL': '/var/mail/root', 'PKG_CONFIG_PATH': '/opt/rh/rh-postgresql95/root/usr/lib64/pkgconfig', 'XDG_SESSION_ID': '14', 'sclenv': 'rh-postgresql95', 'XDG_CONFIG_DIRS': '/etc/opt/rh/rh-postgresql95/xdg:/etc/xdg', 'JAVACONFDIRS': '/etc/opt/rh/rh-postgresql95/java:/etc/java', 'SELINUX_LEVEL_REQUESTED': '', 'XDG_DATA_DIRS': '/opt/rh/rh-postgresql95/root/usr/share', 'PWD': '/root', 'CPATH': '/opt/rh/rh-postgresql95/root/usr/include', 'OTOPI_LOGFILE': '/var/log/ovirt-engine/setup/ovirt-engine-setup-20171113105548-qrc7zo.log', 'SSH_CONNECTION': '192.168.200.1 52372 192.168.200.3 22', 'OTOPI_EXECDIR': '/root'} 2017-11-13 10:59:43,545-0500 DEBUG otopi.plugins.ovirt_engine_setup.ovirt_engine.config.sso plugin.executeRaw:863 execute-result: ('/usr/share/ovirt-engine/bin/ovirt-engine-crypto-tool.sh', 'pbe-encode', '--password=env:pass'), rc=0 2017-11-13 10:59:43,546-0500 DEBUG otopi.plugins.ovirt_engine_setup.ovirt_engine.config.sso plugin.execute:921 execute-output: ('/usr/share/ovirt-engine/bin/ovirt-engine-crypto-tool.sh', 'pbe-encode', '--password=env:pass') stdout: eyJhcnRpZmFjdCI6IkVudmVsb3BlUEJFIiwic2FsdCI6ImNjSmhFbHRnUEJxeUlNTUJSaU1OdFRYL3M0RGRJT1hOSWJjV2F1NFZGT0U9Iiwic2VjcmV0IjoiK2ZlTzVyZm9kNGlsVmZLRENaRjdseVVQZHZnWnBTWUF0cnBYUWVpQnJaTT0iLCJ2ZXJzaW9uIjoiMSIsIml0ZXJhdGlvbnMiOiI0MDAwIiwiYWxnb3JpdGhtIjoiUEJLREYyV2l0aEhtYWNTSEExIn0= 2017-11-13 10:59:43,546-0500 DEBUG otopi.plugins.ovirt_engine_setup.ovirt_engine.config.sso plugin.execute:926 execute-output: ('/usr/share/ovirt-engine/bin/ovirt-engine-crypto-tool.sh', 'pbe-encode', '--password=env:pass') stderr: This is almost always due to not enough entropy. So please check if everything - from physical machines to lago/ost/libvirt/etc., makes sure to supply all VMs enough entropy. We usually do this using virtio-rng. [1] http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3795/artifact/... [2] http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3795/artifact/...
lago log:
2017-11-13 15:31:23,212::log_utils.py::__enter__::600::lago.prefix::INFO::ESC[0mESC[0m 2017-11-13 15:31:23,213::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:0468ed2f-b174-4d94-bc66-2b6e08087a86:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:23,213::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:91372413-f7fd-4b72-85b9-9f5216ca7ae9:Get ssh client for lago-upgrade-from-release-suite-master-engine: 2017-11-13 15:31:23,213::ssh.py::get_ssh_client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:23,213::ssh.py::get_ssh_client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-engine 2017-11-13 15:31:26,220::ssh.py::get_ssh_client::354::lago.ssh::DEBUG::Socket error connecting to lago-upgrade-from-release-suite-master-engine: [Errno None] Unable to connect to port 22 on 192.168.200.3 2017-11-13 15:31:26,221::ssh.py::get_ssh_client::354::lago.ssh::DEBUG::Socket error connecting to lago-upgrade-from-release-suite-master-host0: [Errno None] Unable to connect to port 22 on 192.168.200.2 2017-11-13 15:31:27,222::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:91372413-f7fd-4b72-85b9-9f5216ca7ae9:Get ssh client for lago-upgrade-from-release-suite-master-engine: 2017-11-13 15:31:27,222::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:0468ed2f-b174-4d94-bc66-2b6e08087a86:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:27,222::ssh.py::wait_for_ssh::129::lago.ssh::DEBUG::Got exception while sshing to lago-upgrade-from-release-suite-master-engine: Timed out (in 4 s) trying to ssh to lago-upgrade-from-release-suite-master-engine 2017-11-13 15:31:27,222::ssh.py::wait_for_ssh::129::lago.ssh::DEBUG::Got exception while sshing to lago-upgrade-from-release-suite-master-host0: Timed out (in 4 s) trying to ssh to lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:28,224::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:fc88f641-b012-4636-a471-9ccaaf361a53:Get ssh client for lago-upgrade-from-release-suite-master-engine: 2017-11-13 15:31:28,224::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:afb01a46-6338-407f-b7c9-9d5c6b91404d:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:28,224::ssh.py::get_ssh_client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-engine 2017-11-13 15:31:28,225::ssh.py::get_ssh_client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:28,226::ssh.py::get_ssh_client::354::lago.ssh::DEBUG::Socket error connecting to lago-upgrade-from-release-suite-master-host0: [Errno None] Unable to connect to port 22 on 192.168.200.2 2017-11-13 15:31:29,228::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:afb01a46-6338-407f-b7c9-9d5c6b91404d:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:29,228::ssh.py::wait_for_ssh::129::lago.ssh::DEBUG::Got exception while sshing to lago-upgrade-from-release-suite-master-host0: Timed out (in 1 s) trying to ssh to lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:29,229::ssh.py::get_ssh_client::354::lago.ssh::DEBUG::Socket error connecting to lago-upgrade-from-release-suite-master-engine: [Errno None] Unable to connect to port 22 on 192.168.200.3 2017-11-13 15:31:30,229::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:fc88f641-b012-4636-a471-9ccaaf361a53:Get ssh client for lago-upgrade-from-release-suite-master-engine: 2017-11-13 15:31:30,230::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:9df5ddca-bbc4-485a-9f3a-b0b9a5bb5990:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:30,230::ssh.py::wait_for_ssh::129::lago.ssh::DEBUG::Got exception while sshing to lago-upgrade-from-release-suite-master-engine: Timed out (in 2 s) trying to ssh to lago-upgrade-from-release-suite-master-engine 2017-11-13 15:31:30,230::ssh.py::get_ssh_client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:30,231::ssh.py::get_ssh_client::354::lago.ssh::DEBUG::Socket error connecting to lago-upgrade-from-release-suite-master-host0: [Errno None] Unable to connect to port 22 on 192.168.200.2 2017-11-13 15:31:31,231::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:53fe67da-a632-49fe-b697-e8a2c4cb7d23:Get ssh client for lago-upgrade-from-release-suite-master-engine: 2017-11-13 15:31:31,232::ssh.py::get_ssh_client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-engine 2017-11-13 15:31:31,232::ssh.py::get_ssh_client::354::lago.ssh::DEBUG::Socket error connecting to lago-upgrade-from-release-suite-master-engine: [Errno None] Unable to connect to port 22 on 192.168.200.3 2017-11-13 15:31:31,232::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:9df5ddca-bbc4-485a-9f3a-b0b9a5bb5990:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:31,233::ssh.py::wait_for_ssh::129::lago.ssh::DEBUG::Got exception while sshing to lago-upgrade-from-release-suite-master-host0: Timed out (in 1 s) trying to ssh to lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:32,234::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:53fe67da-a632-49fe-b697-e8a2c4cb7d23:Get ssh client for lago-upgrade-from-release-suite-master-engine: 2017-11-13 15:31:32,234::ssh.py::wait_for_ssh::129::lago.ssh::DEBUG::Got exception while sshing to lago-upgrade-from-release-suite-master-engine: Timed out (in 1 s) trying to ssh to lago-upgrade-from-release-suite-master-engine 2017-11-13 15:31:32,234::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:1969bf5b-4e66-43d7-91c6-a28949e98fe8:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:32,234::ssh.py::get_ssh_client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:32,235::ssh.py::get_ssh_client::354::lago.ssh::DEBUG::Socket error connecting to lago-upgrade-from-release-suite-master-host0: [Errno None] Unable to connect to port 22 on 192.168.200.2 2017-11-13 15:31:33,235::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:5fd0e3c3-d83c-46f9-9c88-bc739aa9c430:Get ssh client for lago-upgrade-from-release-suite-master-engine: 2017-11-13 15:31:33,235::ssh.py::get_ssh_client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-engine 2017-11-13 15:31:33,236::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:1969bf5b-4e66-43d7-91c6-a28949e98fe8:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:33,236::ssh.py::get_ssh_client::354::lago.ssh::DEBUG::Socket error connecting to lago-upgrade-from-release-suite-master-engine: [Errno None] Unable to connect to port 22 on 192.168.200.3 2017-11-13 15:31:33,237::ssh.py::wait_for_ssh::129::lago.ssh::DEBUG::Got exception while sshing to lago-upgrade-from-release-suite-master-host0: Timed out (in 1 s) trying to ssh to lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:34,238::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:5fd0e3c3-d83c-46f9-9c88-bc739aa9c430:Get ssh client for lago-upgrade-from-release-suite-master-engine: 2017-11-13 15:31:34,238::ssh.py::wait_for_ssh::129::lago.ssh::DEBUG::Got exception while sshing to lago-upgrade-from-release-suite-master-engine: Timed out (in 1 s) trying to ssh to lago-upgrade-from-release-suite-master-engine 2017-11-13 15:31:34,238::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:22550d6e-c397-43dc-9691-c8d9debfe416:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:34,239::ssh.py::get_ssh_client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:34,407::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:22550d6e-c397-43dc-9691-c8d9debfe416:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:34,639::ssh.py::ssh::58::lago.ssh::DEBUG::Running bb49adfa on lago-upgrade-from-release-suite-master-host0: true 2017-11-13 15:31:34,665::ssh.py::ssh::81::lago.ssh::DEBUG::Command bb49adfa on lago-upgrade-from-release-suite-master-host0 returned with 0 2017-11-13 15:31:34,665::ssh.py::wait_for_ssh::153::lago.ssh::DEBUG::Wait succeeded for ssh to lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:34,665::log_utils.py::__exit__::611::lago.prefix::INFO::ESC[32mSuccessESC[0m (in 0:00:11) 2017-11-13 15:31:34,666::log_utils.py::__enter__::600::lago.prefix::INFO::ESC[0mESC[0m 2017-11-13 15:31:34,666::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:4e2a7421-28f7-4679-955f-7261fe377939:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:34,667::ssh.py::get_ssh_client::339::lago.ssh::DEBUG::Still got 100 tries for lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:34,805::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:4e2a7421-28f7-4679-955f-7261fe377939:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:34,906::ssh.py::ssh::58::lago.ssh::DEBUG::Running bb724bf2 on lago-upgrade-from-release-suite-master-host0: bash -s < "set -xe DIST=$(uname -r | sed -r 's/^.*\.([^\.]+)\.[^\.]+$/\1/')
from host:
ions. Proceeding anyway. Nov 13 10:31:34 lago-upgrade-from-release-suite-master-host0 systemd: Configuration file /usr/lib/systemd/system/ebtables.service is marked executable. Please remove executable permission bits. Proceeding anyway. Nov 13 10:31:34 lago-upgrade-from-release-suite-master-host0 systemd: Configuration file /usr/lib/systemd/system/wpa_supplicant.service is marked executable. Please remove executable permission bits. Proceeding anyway. Nov 13 10:31:55 lago-upgrade-from-release-suite-master-host0 yum[1294]: Updated: iptables-1.4.21-18.2.el7_4.x86_64 Nov 13 10:31:56 lago-upgrade-from-release-suite-master-host0 systemd-logind: Removed session 3. Nov 13 10:31:56 lago-upgrade-from-release-suite-master-host0 systemd: Removed slice user-0.slice. Nov 13 10:31:56 lago-upgrade-from-release-suite-master-host0 systemd: Stopping user-0.slice. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Created slice user-0.slice. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Starting user-0.slice. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Started Session 4 of user root. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Starting Session 4 of user root. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd-logind: New session 4 of user root. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd-logind: Removed session 4. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Removed slice user-0.slice. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Stopping user-0.slice. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Created slice user-0.slice. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Starting user-0.slice. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Started Session 5 of user root. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Starting Session 5 of user root. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd-logind: New session 5 of user root. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd-logind: Removed session 5. Nov 13 10:44:01 lago-upgrade-from-release-suite-master-host0 systemd: Removed slice user-0.slice. Nov 13 10:44:01 lago-upgrade-from-release-suite-master-host0 systemd: Stopping user-0.slice. Nov 13 10:46:27 lago-upgrade-from-release-suite-master-host0 systemd: Starting Cleanup of Temporary Directories... Nov 13 10:46:27 lago-upgrade-from-release-suite-master-host0 systemd: Started Cleanup of Temporary Directories. Nov 13 10:54:36 lago-upgrade-from-release-suite-master-host0 dhclient[906]: DHCPREQUEST on eth0 to 192.168.200.1 port 67 (xid=0x564629cb) Nov 13 10:54:36 lago-upgrade-from-release-suite-master-host0 dhclient[906]: DHCPACK from 192.168.200.1 (xid=0x564629cb)
</error>
-- Didi

Didi, Thank you for your detailed explanation and for taking the time to debug this issue. I opened the following Jira's: 1. for increasing entropy in the hosts: https://ovirt-jira.atlassian.net/browse/OVIRT-1763 2. I added a comment to the not all logs are downloaded Jira regarded a workaround which would save the logs on a different location: https://ovirt-jira.atlassian.net/browse/OVIRT-1583 3. adding /tmp logs to the job logs: https://ovirt-jira.atlassian.net/browse/OVIRT-1764 Again, thank you for your help Didi. Dafna On 11/15/2017 09:01 AM, Yedidyah Bar David wrote:
On Tue, Nov 14, 2017 at 5:48 PM, Dafna Ron <dron@redhat.com> wrote:
Hi,
We had a failure in upgrade suite for 002_bootstrap.add_hosts. I am not seeing any error that can suggest on an issue in engine.
I can see in the host messages host that we have stopped writing to the log for 15 minutes and it may suggest that there is something that is keeping the host from starting which causes us to fail the test on timeout.
However, i can use some help in determining the cause for this failure and weather its connected to the bootstrap_add_host test in upgrade.
Link to suspected patches: As I said, I do not think its related, but this is the patch that was reported.
https://gerrit.ovirt.org/#/c/83854/
Link to Job:
http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3795/
Link to all logs:
http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3795/artifact/
(Relevant) error snippet from the log:
<error>
Test error:
Error Message
False != True after 900 seconds
Stacktrace
Traceback (most recent call last): File "/usr/lib64/python2.7/unittest/case.py", line 369, in run testMethod() File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in runTest self.test(*self.arg) File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 129, in wrapped_test test() File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 59, in wrapper return func(get_test_prefix(), *args, **kwargs) File "/home/jenkins/workspace/ovirt-master_change-queue-tester/ovirt-system-tests/upgrade-from-release-suite-master/test-scenarios-after-upgrade/002_bootstrap.py", line 187, in add_hosts testlib.assert_true_within(_host_is_up_4, timeout=15*60) File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 263, in assert_true_within assert_equals_within(func, True, timeout, allowed_exceptions) File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 237, in assert_equals_within '%s != %s after %s seconds' % (res, value, timeout) AssertionError: False != True after 900 seconds
Hi,
It took me way too long to find the log file that includes the above stack trace. There is a known bug that pressing '(all files in zip)' does not get all of them, only some. If we can't fix it, please add something that will create such a zip/tar/whatever as part of the job, so that we do not rely on jenkins.
Something causes things to be very slow, no idea what in particular, but could not find a real problem.
Since we timed out and killed the engine in the middle of host-deploy, we do not have full logs of it. These are copied to the engine from the host only after it finishes. Before that, the log is written on the host in /tmp. We might want to make ost collect that as well, to help debug similar cases.
What we can see in engine log is that we start installing the host [1]:
2017-11-13 11:00:12,331-05 INFO [org.ovirt.engine.core.bll.hostdeploy.InstallVdsInternalCommand] (EE-ManagedThreadFactory-engine-Thread-1) [45082420] Before Installation host 83090668-908c-4f49-8690-5348ec12f931, lago-upgrade-from-release-suite-master-host0 2017-11-13 11:00:12,354-05 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (EE-ManagedThreadFactory-engine-Thread-1) [45082420] START, SetVdsStatusVDSCommand(HostName = lago-upgrade-from-release-suite-master-host0, SetVdsStatusVDSCommandParameters:{hostId='83090668-908c-4f49-8690-5348ec12f931', status='Installing', nonOperationalReason='NONE', stopSpmFailureLogged='false', maintenanceReason='null'}), log id: 22a831f7
First line from host-deploy is:
2017-11-13 11:00:13,335-05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [45082420] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host lago-upgrade-from-release-suite-master-host0. Stage: Initializing.
which is one second later, ok.
...
2017-11-13 11:00:13,459-05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [45082420] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host lago-upgrade-from-release-suite-master-host0. Stage: Environment packages setup.
It goes install (immediately) packages that it needs for itself, in this case it updates the package 'dmidecode', from 2.12-9 to 3.0-5. I guess our image is a bit old.
...
2017-11-13 11:00:19,516-05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [45082420] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host lago-upgrade-from-release-suite-master-host0. Yum Verify: 2/2: dmidecode.x86_64 1:2.12-9.el7 - ud.
So it took (the equivalent of 'yum update dmidecode') 6 seconds. Reasonable.
...
Later on, it starts installing/updating the actual packages it should install/update. Starts with:
2017-11-13 11:00:24,982-05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [45082420] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host lago-upgrade-from-release-suite-master-host0. Yum Status: Downloading Packages.
Then there are many lines about each package download. First is:
2017-11-13 11:00:25,049-05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [45082420] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host lago-upgrade-from-release-suite-master-host0. Yum Download/Verify: GeoIP-1.5.0-11.el7.x86_64.
Last is:
2017-11-13 11:00:43,790-05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [45082420] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host lago-upgrade-from-release-suite-master-host0. Yum Download/Verify: zlib-1.2.7-17.el7.x86_64.
Later we see it updates 570 packages. So downloading 570 packages took ~ 20 seconds, good too. Then it installs them. First is:
2017-11-13 11:00:46,114-05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [45082420] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host lago-upgrade-from-release-suite-master-host0. Yum update: 1/570: libgcc-4.8.5-16.el7.x86_64.
And then many similar ones. I only skimmed through them, trying to find large gaps - didn't write a script - and a large one I find is:
2017-11-13 11:10:04,522-05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [45082420] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host lago-upgrade-from-release-suite-master-host0. Yum updated: 506/570: audit.
2017-11-13 11:13:06,483-05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [45082420] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host lago-upgrade-from-release-suite-master-host0. Yum updated: 507/570: openssl-libs.
In between these two lines, there are a few unrelated ones, from some other engine thread. So something caused it to need 3 minutes to install openssl-libs.
A wild guess: Perhaps we have low entropy, and openssl does something that needs entropy?
Checking engine-setup log there [2], I see that the following command took almost 1.5 minutes to run:
2017-11-13 10:58:19,459-0500 DEBUG otopi.plugins.ovirt_engine_setup.ovirt_engine.config.sso plugin.executeRaw:813 execute: ('/usr/share/ovirt-engine/bin/ovirt-engine-crypto-tool.sh', 'pbe-encode', '--password=env:pass'), executable='None', cwd='None', env={'pass': '**FILTERED**', 'LESSOPEN': '||/usr/bin/lesspipe.sh %s', 'SSH_CLIENT': '192.168.200.1 52372 22', 'SELINUX_USE_CURRENT_RANGE': '', 'LOGNAME': 'root', 'USER': 'root', 'OVIRT_ENGINE_JAVA_HOME': u'/usr/lib/jvm/jre', 'PATH': '/opt/rh/rh-postgresql95/root/usr/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin', 'HOME': '/root', 'OVIRT_JBOSS_HOME': '/usr/share/ovirt-engine-wildfly', 'LD_LIBRARY_PATH': '/opt/rh/rh-postgresql95/root/usr/lib64', 'LANG': 'en_US.UTF-8', 'SHELL': '/bin/bash', 'LIBRARY_PATH': '/opt/rh/rh-postgresql95/root/usr/lib64', 'SHLVL': '4', 'POSTGRESQLENV': 'COMMAND/pg_dump=str:/opt/rh/rh-postgresql95/root/usr/bin/pg_dump COMMAND/psql=str:/opt/rh/rh-postgresql95/root/usr/bin/psql COMMAND/pg_restore=str:/opt/rh/rh-postgresql95/root/usr/bin/pg_restore COMMAND/postgresql-setup=str:/opt/rh/rh-postgresql95/root/usr/bin/postgresql-setup OVESETUP_PROVISIONING/postgresService=str:rh-postgresql95-postgresql OVESETUP_PROVISIONING/postgresConf=str:/var/opt/rh/rh-postgresql95/lib/pgsql/data/postgresql.conf OVESETUP_PROVISIONING/postgresPgHba=str:/var/opt/rh/rh-postgresql95/lib/pgsql/data/pg_hba.conf OVESETUP_PROVISIONING/postgresPgVersion=str:/var/opt/rh/rh-postgresql95/lib/pgsql/data/PG_VERSION', 'MANPATH': '/opt/rh/rh-postgresql95/root/usr/share/man:', 'X_SCLS': 'rh-postgresql95 ', 'XDG_RUNTIME_DIR': '/run/user/0', 'OVIRT_ENGINE_JAVA_HOME_FORCE': '1', 'PYTHONPATH': '/usr/share/ovirt-engine/setup/bin/..::', 'SELINUX_ROLE_REQUESTED': '', 'MAIL': '/var/mail/root', 'PKG_CONFIG_PATH': '/opt/rh/rh-postgresql95/root/usr/lib64/pkgconfig', 'XDG_SESSION_ID': '14', 'sclenv': 'rh-postgresql95', 'XDG_CONFIG_DIRS': '/etc/opt/rh/rh-postgresql95/xdg:/etc/xdg', 'JAVACONFDIRS': '/etc/opt/rh/rh-postgresql95/java:/etc/java', 'SELINUX_LEVEL_REQUESTED': '', 'XDG_DATA_DIRS': '/opt/rh/rh-postgresql95/root/usr/share', 'PWD': '/root', 'CPATH': '/opt/rh/rh-postgresql95/root/usr/include', 'OTOPI_LOGFILE': '/var/log/ovirt-engine/setup/ovirt-engine-setup-20171113105548-qrc7zo.log', 'SSH_CONNECTION': '192.168.200.1 52372 192.168.200.3 22', 'OTOPI_EXECDIR': '/root'} 2017-11-13 10:59:43,545-0500 DEBUG otopi.plugins.ovirt_engine_setup.ovirt_engine.config.sso plugin.executeRaw:863 execute-result: ('/usr/share/ovirt-engine/bin/ovirt-engine-crypto-tool.sh', 'pbe-encode', '--password=env:pass'), rc=0 2017-11-13 10:59:43,546-0500 DEBUG otopi.plugins.ovirt_engine_setup.ovirt_engine.config.sso plugin.execute:921 execute-output: ('/usr/share/ovirt-engine/bin/ovirt-engine-crypto-tool.sh', 'pbe-encode', '--password=env:pass') stdout: eyJhcnRpZmFjdCI6IkVudmVsb3BlUEJFIiwic2FsdCI6ImNjSmhFbHRnUEJxeUlNTUJSaU1OdFRYL3M0RGRJT1hOSWJjV2F1NFZGT0U9Iiwic2VjcmV0IjoiK2ZlTzVyZm9kNGlsVmZLRENaRjdseVVQZHZnWnBTWUF0cnBYUWVpQnJaTT0iLCJ2ZXJzaW9uIjoiMSIsIml0ZXJhdGlvbnMiOiI0MDAwIiwiYWxnb3JpdGhtIjoiUEJLREYyV2l0aEhtYWNTSEExIn0=
2017-11-13 10:59:43,546-0500 DEBUG otopi.plugins.ovirt_engine_setup.ovirt_engine.config.sso plugin.execute:926 execute-output: ('/usr/share/ovirt-engine/bin/ovirt-engine-crypto-tool.sh', 'pbe-encode', '--password=env:pass') stderr:
This is almost always due to not enough entropy.
So please check if everything - from physical machines to lago/ost/libvirt/etc., makes sure to supply all VMs enough entropy. We usually do this using virtio-rng.
[1] http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3795/artifact/... [2] http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3795/artifact/...
lago log:
2017-11-13 15:31:23,212::log_utils.py::__enter__::600::lago.prefix::INFO::ESC[0mESC[0m 2017-11-13 15:31:23,213::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:0468ed2f-b174-4d94-bc66-2b6e08087a86:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:23,213::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:91372413-f7fd-4b72-85b9-9f5216ca7ae9:Get ssh client for lago-upgrade-from-release-suite-master-engine: 2017-11-13 15:31:23,213::ssh.py::get_ssh_client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:23,213::ssh.py::get_ssh_client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-engine 2017-11-13 15:31:26,220::ssh.py::get_ssh_client::354::lago.ssh::DEBUG::Socket error connecting to lago-upgrade-from-release-suite-master-engine: [Errno None] Unable to connect to port 22 on 192.168.200.3 2017-11-13 15:31:26,221::ssh.py::get_ssh_client::354::lago.ssh::DEBUG::Socket error connecting to lago-upgrade-from-release-suite-master-host0: [Errno None] Unable to connect to port 22 on 192.168.200.2 2017-11-13 15:31:27,222::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:91372413-f7fd-4b72-85b9-9f5216ca7ae9:Get ssh client for lago-upgrade-from-release-suite-master-engine: 2017-11-13 15:31:27,222::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:0468ed2f-b174-4d94-bc66-2b6e08087a86:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:27,222::ssh.py::wait_for_ssh::129::lago.ssh::DEBUG::Got exception while sshing to lago-upgrade-from-release-suite-master-engine: Timed out (in 4 s) trying to ssh to lago-upgrade-from-release-suite-master-engine 2017-11-13 15:31:27,222::ssh.py::wait_for_ssh::129::lago.ssh::DEBUG::Got exception while sshing to lago-upgrade-from-release-suite-master-host0: Timed out (in 4 s) trying to ssh to lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:28,224::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:fc88f641-b012-4636-a471-9ccaaf361a53:Get ssh client for lago-upgrade-from-release-suite-master-engine: 2017-11-13 15:31:28,224::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:afb01a46-6338-407f-b7c9-9d5c6b91404d:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:28,224::ssh.py::get_ssh_client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-engine 2017-11-13 15:31:28,225::ssh.py::get_ssh_client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:28,226::ssh.py::get_ssh_client::354::lago.ssh::DEBUG::Socket error connecting to lago-upgrade-from-release-suite-master-host0: [Errno None] Unable to connect to port 22 on 192.168.200.2 2017-11-13 15:31:29,228::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:afb01a46-6338-407f-b7c9-9d5c6b91404d:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:29,228::ssh.py::wait_for_ssh::129::lago.ssh::DEBUG::Got exception while sshing to lago-upgrade-from-release-suite-master-host0: Timed out (in 1 s) trying to ssh to lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:29,229::ssh.py::get_ssh_client::354::lago.ssh::DEBUG::Socket error connecting to lago-upgrade-from-release-suite-master-engine: [Errno None] Unable to connect to port 22 on 192.168.200.3 2017-11-13 15:31:30,229::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:fc88f641-b012-4636-a471-9ccaaf361a53:Get ssh client for lago-upgrade-from-release-suite-master-engine: 2017-11-13 15:31:30,230::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:9df5ddca-bbc4-485a-9f3a-b0b9a5bb5990:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:30,230::ssh.py::wait_for_ssh::129::lago.ssh::DEBUG::Got exception while sshing to lago-upgrade-from-release-suite-master-engine: Timed out (in 2 s) trying to ssh to lago-upgrade-from-release-suite-master-engine 2017-11-13 15:31:30,230::ssh.py::get_ssh_client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:30,231::ssh.py::get_ssh_client::354::lago.ssh::DEBUG::Socket error connecting to lago-upgrade-from-release-suite-master-host0: [Errno None] Unable to connect to port 22 on 192.168.200.2 2017-11-13 15:31:31,231::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:53fe67da-a632-49fe-b697-e8a2c4cb7d23:Get ssh client for lago-upgrade-from-release-suite-master-engine: 2017-11-13 15:31:31,232::ssh.py::get_ssh_client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-engine 2017-11-13 15:31:31,232::ssh.py::get_ssh_client::354::lago.ssh::DEBUG::Socket error connecting to lago-upgrade-from-release-suite-master-engine: [Errno None] Unable to connect to port 22 on 192.168.200.3 2017-11-13 15:31:31,232::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:9df5ddca-bbc4-485a-9f3a-b0b9a5bb5990:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:31,233::ssh.py::wait_for_ssh::129::lago.ssh::DEBUG::Got exception while sshing to lago-upgrade-from-release-suite-master-host0: Timed out (in 1 s) trying to ssh to lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:32,234::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:53fe67da-a632-49fe-b697-e8a2c4cb7d23:Get ssh client for lago-upgrade-from-release-suite-master-engine: 2017-11-13 15:31:32,234::ssh.py::wait_for_ssh::129::lago.ssh::DEBUG::Got exception while sshing to lago-upgrade-from-release-suite-master-engine: Timed out (in 1 s) trying to ssh to lago-upgrade-from-release-suite-master-engine 2017-11-13 15:31:32,234::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:1969bf5b-4e66-43d7-91c6-a28949e98fe8:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:32,234::ssh.py::get_ssh_client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:32,235::ssh.py::get_ssh_client::354::lago.ssh::DEBUG::Socket error connecting to lago-upgrade-from-release-suite-master-host0: [Errno None] Unable to connect to port 22 on 192.168.200.2 2017-11-13 15:31:33,235::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:5fd0e3c3-d83c-46f9-9c88-bc739aa9c430:Get ssh client for lago-upgrade-from-release-suite-master-engine: 2017-11-13 15:31:33,235::ssh.py::get_ssh_client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-engine 2017-11-13 15:31:33,236::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:1969bf5b-4e66-43d7-91c6-a28949e98fe8:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:33,236::ssh.py::get_ssh_client::354::lago.ssh::DEBUG::Socket error connecting to lago-upgrade-from-release-suite-master-engine: [Errno None] Unable to connect to port 22 on 192.168.200.3 2017-11-13 15:31:33,237::ssh.py::wait_for_ssh::129::lago.ssh::DEBUG::Got exception while sshing to lago-upgrade-from-release-suite-master-host0: Timed out (in 1 s) trying to ssh to lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:34,238::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:5fd0e3c3-d83c-46f9-9c88-bc739aa9c430:Get ssh client for lago-upgrade-from-release-suite-master-engine: 2017-11-13 15:31:34,238::ssh.py::wait_for_ssh::129::lago.ssh::DEBUG::Got exception while sshing to lago-upgrade-from-release-suite-master-engine: Timed out (in 1 s) trying to ssh to lago-upgrade-from-release-suite-master-engine 2017-11-13 15:31:34,238::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:22550d6e-c397-43dc-9691-c8d9debfe416:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:34,239::ssh.py::get_ssh_client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:34,407::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:22550d6e-c397-43dc-9691-c8d9debfe416:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:34,639::ssh.py::ssh::58::lago.ssh::DEBUG::Running bb49adfa on lago-upgrade-from-release-suite-master-host0: true 2017-11-13 15:31:34,665::ssh.py::ssh::81::lago.ssh::DEBUG::Command bb49adfa on lago-upgrade-from-release-suite-master-host0 returned with 0 2017-11-13 15:31:34,665::ssh.py::wait_for_ssh::153::lago.ssh::DEBUG::Wait succeeded for ssh to lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:34,665::log_utils.py::__exit__::611::lago.prefix::INFO::ESC[32mSuccessESC[0m (in 0:00:11) 2017-11-13 15:31:34,666::log_utils.py::__enter__::600::lago.prefix::INFO::ESC[0mESC[0m 2017-11-13 15:31:34,666::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:4e2a7421-28f7-4679-955f-7261fe377939:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:34,667::ssh.py::get_ssh_client::339::lago.ssh::DEBUG::Still got 100 tries for lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:34,805::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:4e2a7421-28f7-4679-955f-7261fe377939:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:34,906::ssh.py::ssh::58::lago.ssh::DEBUG::Running bb724bf2 on lago-upgrade-from-release-suite-master-host0: bash -s < "set -xe DIST=$(uname -r | sed -r 's/^.*\.([^\.]+)\.[^\.]+$/\1/')
from host:
ions. Proceeding anyway. Nov 13 10:31:34 lago-upgrade-from-release-suite-master-host0 systemd: Configuration file /usr/lib/systemd/system/ebtables.service is marked executable. Please remove executable permission bits. Proceeding anyway. Nov 13 10:31:34 lago-upgrade-from-release-suite-master-host0 systemd: Configuration file /usr/lib/systemd/system/wpa_supplicant.service is marked executable. Please remove executable permission bits. Proceeding anyway. Nov 13 10:31:55 lago-upgrade-from-release-suite-master-host0 yum[1294]: Updated: iptables-1.4.21-18.2.el7_4.x86_64 Nov 13 10:31:56 lago-upgrade-from-release-suite-master-host0 systemd-logind: Removed session 3. Nov 13 10:31:56 lago-upgrade-from-release-suite-master-host0 systemd: Removed slice user-0.slice. Nov 13 10:31:56 lago-upgrade-from-release-suite-master-host0 systemd: Stopping user-0.slice. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Created slice user-0.slice. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Starting user-0.slice. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Started Session 4 of user root. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Starting Session 4 of user root. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd-logind: New session 4 of user root. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd-logind: Removed session 4. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Removed slice user-0.slice. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Stopping user-0.slice. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Created slice user-0.slice. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Starting user-0.slice. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Started Session 5 of user root. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Starting Session 5 of user root. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd-logind: New session 5 of user root. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd-logind: Removed session 5. Nov 13 10:44:01 lago-upgrade-from-release-suite-master-host0 systemd: Removed slice user-0.slice. Nov 13 10:44:01 lago-upgrade-from-release-suite-master-host0 systemd: Stopping user-0.slice. Nov 13 10:46:27 lago-upgrade-from-release-suite-master-host0 systemd: Starting Cleanup of Temporary Directories... Nov 13 10:46:27 lago-upgrade-from-release-suite-master-host0 systemd: Started Cleanup of Temporary Directories. Nov 13 10:54:36 lago-upgrade-from-release-suite-master-host0 dhclient[906]: DHCPREQUEST on eth0 to 192.168.200.1 port 67 (xid=0x564629cb) Nov 13 10:54:36 lago-upgrade-from-release-suite-master-host0 dhclient[906]: DHCPACK from 192.168.200.1 (xid=0x564629cb)
</error>

On Wed, Nov 15, 2017 at 1:35 PM, Dafna Ron <dron@redhat.com> wrote:
Didi,
Thank you for your detailed explanation and for taking the time to debug this issue.
I opened the following Jira's:
1. for increasing entropy in the hosts: https://ovirt-jira.atlassian.net/browse/OVIRT-1763
I do not think anymore this is the main reason for slowness, although it might still be useful to verify/improve. yuvalt pointed out in a private discussion that openssl lib does nothing related to random numbers in its pre/post install scripts. It does call ldconfig, as do several other packages, which can take quite a lot of time. Some took less. 3 minutes is definitely not reasonable. Also see this, from engine log: 2017-11-13 11:07:17,026-05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [45082420] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host lago-upgrade-from-release-suite-master-host0. Yum update: 398/570: 1:NetworkManager-team-1.8.0-11.el7_4.x86_64. 2017-11-13 11:07:30,573-05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [45082420] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host lago-upgrade-from-release-suite-master-host0. Yum obsoleting: 399/570: 1:NetworkManager-ppp-1.8.0-11.el7_4.x86_64. 2017-11-13 11:07:45,137-05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [45082420] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host lago-upgrade-from-release-suite-master-host0. Yum update: 400/570: 1:NetworkManager-tui-1.8.0-11.el7_4.x86_64. 2017-11-13 11:07:57,842-05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [45082420] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host lago-upgrade-from-release-suite-master-host0. Yum update: 401/570: audit-2.7.6-3.el7.x86_64. That's ~ 15 seconds per package, and the first 3 have no scripts at all. I'd say there is some serious storage issue there - bad disk, loaded storage network/hardware, something like this.
2. I added a comment to the not all logs are downloaded Jira regarded a workaround which would save the logs on a different location: https://ovirt-jira.atlassian.net/browse/OVIRT-1583
3. adding /tmp logs to the job logs: https://ovirt-jira.atlassian.net/browse/OVIRT-1764
Again, thank you for your help Didi.
Dafna
On 11/15/2017 09:01 AM, Yedidyah Bar David wrote:
On Tue, Nov 14, 2017 at 5:48 PM, Dafna Ron <dron@redhat.com> wrote:
Hi,
We had a failure in upgrade suite for 002_bootstrap.add_hosts. I am not seeing any error that can suggest on an issue in engine.
I can see in the host messages host that we have stopped writing to the log for 15 minutes and it may suggest that there is something that is keeping the host from starting which causes us to fail the test on timeout.
However, i can use some help in determining the cause for this failure and weather its connected to the bootstrap_add_host test in upgrade.
Link to suspected patches: As I said, I do not think its related, but this is the patch that was reported.
https://gerrit.ovirt.org/#/c/83854/
Link to Job:
http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3795/
Link to all logs:
http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3795/artifact/
(Relevant) error snippet from the log:
<error>
Test error:
Error Message
False != True after 900 seconds
Stacktrace
Traceback (most recent call last): File "/usr/lib64/python2.7/unittest/case.py", line 369, in run testMethod() File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in runTest self.test(*self.arg) File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 129, in wrapped_test test() File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 59, in wrapper return func(get_test_prefix(), *args, **kwargs) File "/home/jenkins/workspace/ovirt-master_change-queue-tester/ovirt-system-tests/upgrade-from-release-suite-master/test-scenarios-after-upgrade/002_bootstrap.py", line 187, in add_hosts testlib.assert_true_within(_host_is_up_4, timeout=15*60) File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 263, in assert_true_within assert_equals_within(func, True, timeout, allowed_exceptions) File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 237, in assert_equals_within '%s != %s after %s seconds' % (res, value, timeout) AssertionError: False != True after 900 seconds
Hi,
It took me way too long to find the log file that includes the above stack trace. There is a known bug that pressing '(all files in zip)' does not get all of them, only some. If we can't fix it, please add something that will create such a zip/tar/whatever as part of the job, so that we do not rely on jenkins.
Something causes things to be very slow, no idea what in particular, but could not find a real problem.
Since we timed out and killed the engine in the middle of host-deploy, we do not have full logs of it. These are copied to the engine from the host only after it finishes. Before that, the log is written on the host in /tmp. We might want to make ost collect that as well, to help debug similar cases.
What we can see in engine log is that we start installing the host [1]:
2017-11-13 11:00:12,331-05 INFO [org.ovirt.engine.core.bll.hostdeploy.InstallVdsInternalCommand] (EE-ManagedThreadFactory-engine-Thread-1) [45082420] Before Installation host 83090668-908c-4f49-8690-5348ec12f931, lago-upgrade-from-release-suite-master-host0 2017-11-13 11:00:12,354-05 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (EE-ManagedThreadFactory-engine-Thread-1) [45082420] START, SetVdsStatusVDSCommand(HostName = lago-upgrade-from-release-suite-master-host0, SetVdsStatusVDSCommandParameters:{hostId='83090668-908c-4f49-8690-5348ec12f931', status='Installing', nonOperationalReason='NONE', stopSpmFailureLogged='false', maintenanceReason='null'}), log id: 22a831f7
First line from host-deploy is:
2017-11-13 11:00:13,335-05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [45082420] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host lago-upgrade-from-release-suite-master-host0. Stage: Initializing.
which is one second later, ok.
...
2017-11-13 11:00:13,459-05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [45082420] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host lago-upgrade-from-release-suite-master-host0. Stage: Environment packages setup.
It goes install (immediately) packages that it needs for itself, in this case it updates the package 'dmidecode', from 2.12-9 to 3.0-5. I guess our image is a bit old.
...
2017-11-13 11:00:19,516-05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [45082420] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host lago-upgrade-from-release-suite-master-host0. Yum Verify: 2/2: dmidecode.x86_64 1:2.12-9.el7 - ud.
So it took (the equivalent of 'yum update dmidecode') 6 seconds. Reasonable.
...
Later on, it starts installing/updating the actual packages it should install/update. Starts with:
2017-11-13 11:00:24,982-05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [45082420] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host lago-upgrade-from-release-suite-master-host0. Yum Status: Downloading Packages.
Then there are many lines about each package download. First is:
2017-11-13 11:00:25,049-05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [45082420] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host lago-upgrade-from-release-suite-master-host0. Yum Download/Verify: GeoIP-1.5.0-11.el7.x86_64.
Last is:
2017-11-13 11:00:43,790-05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [45082420] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host lago-upgrade-from-release-suite-master-host0. Yum Download/Verify: zlib-1.2.7-17.el7.x86_64.
Later we see it updates 570 packages. So downloading 570 packages took ~ 20 seconds, good too. Then it installs them. First is:
2017-11-13 11:00:46,114-05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [45082420] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host lago-upgrade-from-release-suite-master-host0. Yum update: 1/570: libgcc-4.8.5-16.el7.x86_64.
And then many similar ones. I only skimmed through them, trying to find large gaps - didn't write a script - and a large one I find is:
2017-11-13 11:10:04,522-05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [45082420] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host lago-upgrade-from-release-suite-master-host0. Yum updated: 506/570: audit.
2017-11-13 11:13:06,483-05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [45082420] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host lago-upgrade-from-release-suite-master-host0. Yum updated: 507/570: openssl-libs.
In between these two lines, there are a few unrelated ones, from some other engine thread. So something caused it to need 3 minutes to install openssl-libs.
A wild guess: Perhaps we have low entropy, and openssl does something that needs entropy?
Checking engine-setup log there [2], I see that the following command took almost 1.5 minutes to run:
2017-11-13 10:58:19,459-0500 DEBUG otopi.plugins.ovirt_engine_setup.ovirt_engine.config.sso plugin.executeRaw:813 execute: ('/usr/share/ovirt-engine/bin/ovirt-engine-crypto-tool.sh', 'pbe-encode', '--password=env:pass'), executable='None', cwd='None', env={'pass': '**FILTERED**', 'LESSOPEN': '||/usr/bin/lesspipe.sh %s', 'SSH_CLIENT': '192.168.200.1 52372 22', 'SELINUX_USE_CURRENT_RANGE': '', 'LOGNAME': 'root', 'USER': 'root', 'OVIRT_ENGINE_JAVA_HOME': u'/usr/lib/jvm/jre', 'PATH': '/opt/rh/rh-postgresql95/root/usr/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin', 'HOME': '/root', 'OVIRT_JBOSS_HOME': '/usr/share/ovirt-engine-wildfly', 'LD_LIBRARY_PATH': '/opt/rh/rh-postgresql95/root/usr/lib64', 'LANG': 'en_US.UTF-8', 'SHELL': '/bin/bash', 'LIBRARY_PATH': '/opt/rh/rh-postgresql95/root/usr/lib64', 'SHLVL': '4', 'POSTGRESQLENV': 'COMMAND/pg_dump=str:/opt/rh/rh-postgresql95/root/usr/bin/pg_dump COMMAND/psql=str:/opt/rh/rh-postgresql95/root/usr/bin/psql COMMAND/pg_restore=str:/opt/rh/rh-postgresql95/root/usr/bin/pg_restore COMMAND/postgresql-setup=str:/opt/rh/rh-postgresql95/root/usr/bin/postgresql-setup OVESETUP_PROVISIONING/postgresService=str:rh-postgresql95-postgresql OVESETUP_PROVISIONING/postgresConf=str:/var/opt/rh/rh-postgresql95/lib/pgsql/data/postgresql.conf OVESETUP_PROVISIONING/postgresPgHba=str:/var/opt/rh/rh-postgresql95/lib/pgsql/data/pg_hba.conf OVESETUP_PROVISIONING/postgresPgVersion=str:/var/opt/rh/rh-postgresql95/lib/pgsql/data/PG_VERSION', 'MANPATH': '/opt/rh/rh-postgresql95/root/usr/share/man:', 'X_SCLS': 'rh-postgresql95 ', 'XDG_RUNTIME_DIR': '/run/user/0', 'OVIRT_ENGINE_JAVA_HOME_FORCE': '1', 'PYTHONPATH': '/usr/share/ovirt-engine/setup/bin/..::', 'SELINUX_ROLE_REQUESTED': '', 'MAIL': '/var/mail/root', 'PKG_CONFIG_PATH': '/opt/rh/rh-postgresql95/root/usr/lib64/pkgconfig', 'XDG_SESSION_ID': '14', 'sclenv': 'rh-postgresql95', 'XDG_CONFIG_DIRS': '/etc/opt/rh/rh-postgresql95/xdg:/etc/xdg', 'JAVACONFDIRS': '/etc/opt/rh/rh-postgresql95/java:/etc/java', 'SELINUX_LEVEL_REQUESTED': '', 'XDG_DATA_DIRS': '/opt/rh/rh-postgresql95/root/usr/share', 'PWD': '/root', 'CPATH': '/opt/rh/rh-postgresql95/root/usr/include', 'OTOPI_LOGFILE': '/var/log/ovirt-engine/setup/ovirt-engine-setup-20171113105548-qrc7zo.log', 'SSH_CONNECTION': '192.168.200.1 52372 192.168.200.3 22', 'OTOPI_EXECDIR': '/root'} 2017-11-13 10:59:43,545-0500 DEBUG otopi.plugins.ovirt_engine_setup.ovirt_engine.config.sso plugin.executeRaw:863 execute-result: ('/usr/share/ovirt-engine/bin/ovirt-engine-crypto-tool.sh', 'pbe-encode', '--password=env:pass'), rc=0 2017-11-13 10:59:43,546-0500 DEBUG otopi.plugins.ovirt_engine_setup.ovirt_engine.config.sso plugin.execute:921 execute-output: ('/usr/share/ovirt-engine/bin/ovirt-engine-crypto-tool.sh', 'pbe-encode', '--password=env:pass') stdout: eyJhcnRpZmFjdCI6IkVudmVsb3BlUEJFIiwic2FsdCI6ImNjSmhFbHRnUEJxeUlNTUJSaU1OdFRYL3M0RGRJT1hOSWJjV2F1NFZGT0U9Iiwic2VjcmV0IjoiK2ZlTzVyZm9kNGlsVmZLRENaRjdseVVQZHZnWnBTWUF0cnBYUWVpQnJaTT0iLCJ2ZXJzaW9uIjoiMSIsIml0ZXJhdGlvbnMiOiI0MDAwIiwiYWxnb3JpdGhtIjoiUEJLREYyV2l0aEhtYWNTSEExIn0=
2017-11-13 10:59:43,546-0500 DEBUG otopi.plugins.ovirt_engine_setup.ovirt_engine.config.sso plugin.execute:926 execute-output: ('/usr/share/ovirt-engine/bin/ovirt-engine-crypto-tool.sh', 'pbe-encode', '--password=env:pass') stderr:
This is almost always due to not enough entropy.
So please check if everything - from physical machines to lago/ost/libvirt/etc., makes sure to supply all VMs enough entropy. We usually do this using virtio-rng.
[1] http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3795/artifact/... [2] http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3795/artifact/...
lago log:
2017-11-13 15:31:23,212::log_utils.py::__enter__::600::lago.prefix::INFO::ESC[0mESC[0m 2017-11-13 15:31:23,213::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:0468ed2f-b174-4d94-bc66-2b6e08087a86:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:23,213::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:91372413-f7fd-4b72-85b9-9f5216ca7ae9:Get ssh client for lago-upgrade-from-release-suite-master-engine: 2017-11-13 15:31:23,213::ssh.py::get_ssh_client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:23,213::ssh.py::get_ssh_client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-engine 2017-11-13 15:31:26,220::ssh.py::get_ssh_client::354::lago.ssh::DEBUG::Socket error connecting to lago-upgrade-from-release-suite-master-engine: [Errno None] Unable to connect to port 22 on 192.168.200.3 2017-11-13 15:31:26,221::ssh.py::get_ssh_client::354::lago.ssh::DEBUG::Socket error connecting to lago-upgrade-from-release-suite-master-host0: [Errno None] Unable to connect to port 22 on 192.168.200.2 2017-11-13 15:31:27,222::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:91372413-f7fd-4b72-85b9-9f5216ca7ae9:Get ssh client for lago-upgrade-from-release-suite-master-engine: 2017-11-13 15:31:27,222::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:0468ed2f-b174-4d94-bc66-2b6e08087a86:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:27,222::ssh.py::wait_for_ssh::129::lago.ssh::DEBUG::Got exception while sshing to lago-upgrade-from-release-suite-master-engine: Timed out (in 4 s) trying to ssh to lago-upgrade-from-release-suite-master-engine 2017-11-13 15:31:27,222::ssh.py::wait_for_ssh::129::lago.ssh::DEBUG::Got exception while sshing to lago-upgrade-from-release-suite-master-host0: Timed out (in 4 s) trying to ssh to lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:28,224::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:fc88f641-b012-4636-a471-9ccaaf361a53:Get ssh client for lago-upgrade-from-release-suite-master-engine: 2017-11-13 15:31:28,224::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:afb01a46-6338-407f-b7c9-9d5c6b91404d:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:28,224::ssh.py::get_ssh_client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-engine 2017-11-13 15:31:28,225::ssh.py::get_ssh_client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:28,226::ssh.py::get_ssh_client::354::lago.ssh::DEBUG::Socket error connecting to lago-upgrade-from-release-suite-master-host0: [Errno None] Unable to connect to port 22 on 192.168.200.2 2017-11-13 15:31:29,228::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:afb01a46-6338-407f-b7c9-9d5c6b91404d:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:29,228::ssh.py::wait_for_ssh::129::lago.ssh::DEBUG::Got exception while sshing to lago-upgrade-from-release-suite-master-host0: Timed out (in 1 s) trying to ssh to lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:29,229::ssh.py::get_ssh_client::354::lago.ssh::DEBUG::Socket error connecting to lago-upgrade-from-release-suite-master-engine: [Errno None] Unable to connect to port 22 on 192.168.200.3 2017-11-13 15:31:30,229::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:fc88f641-b012-4636-a471-9ccaaf361a53:Get ssh client for lago-upgrade-from-release-suite-master-engine: 2017-11-13 15:31:30,230::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:9df5ddca-bbc4-485a-9f3a-b0b9a5bb5990:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:30,230::ssh.py::wait_for_ssh::129::lago.ssh::DEBUG::Got exception while sshing to lago-upgrade-from-release-suite-master-engine: Timed out (in 2 s) trying to ssh to lago-upgrade-from-release-suite-master-engine 2017-11-13 15:31:30,230::ssh.py::get_ssh_client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:30,231::ssh.py::get_ssh_client::354::lago.ssh::DEBUG::Socket error connecting to lago-upgrade-from-release-suite-master-host0: [Errno None] Unable to connect to port 22 on 192.168.200.2 2017-11-13 15:31:31,231::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:53fe67da-a632-49fe-b697-e8a2c4cb7d23:Get ssh client for lago-upgrade-from-release-suite-master-engine: 2017-11-13 15:31:31,232::ssh.py::get_ssh_client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-engine 2017-11-13 15:31:31,232::ssh.py::get_ssh_client::354::lago.ssh::DEBUG::Socket error connecting to lago-upgrade-from-release-suite-master-engine: [Errno None] Unable to connect to port 22 on 192.168.200.3 2017-11-13 15:31:31,232::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:9df5ddca-bbc4-485a-9f3a-b0b9a5bb5990:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:31,233::ssh.py::wait_for_ssh::129::lago.ssh::DEBUG::Got exception while sshing to lago-upgrade-from-release-suite-master-host0: Timed out (in 1 s) trying to ssh to lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:32,234::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:53fe67da-a632-49fe-b697-e8a2c4cb7d23:Get ssh client for lago-upgrade-from-release-suite-master-engine: 2017-11-13 15:31:32,234::ssh.py::wait_for_ssh::129::lago.ssh::DEBUG::Got exception while sshing to lago-upgrade-from-release-suite-master-engine: Timed out (in 1 s) trying to ssh to lago-upgrade-from-release-suite-master-engine 2017-11-13 15:31:32,234::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:1969bf5b-4e66-43d7-91c6-a28949e98fe8:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:32,234::ssh.py::get_ssh_client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:32,235::ssh.py::get_ssh_client::354::lago.ssh::DEBUG::Socket error connecting to lago-upgrade-from-release-suite-master-host0: [Errno None] Unable to connect to port 22 on 192.168.200.2 2017-11-13 15:31:33,235::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:5fd0e3c3-d83c-46f9-9c88-bc739aa9c430:Get ssh client for lago-upgrade-from-release-suite-master-engine: 2017-11-13 15:31:33,235::ssh.py::get_ssh_client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-engine 2017-11-13 15:31:33,236::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:1969bf5b-4e66-43d7-91c6-a28949e98fe8:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:33,236::ssh.py::get_ssh_client::354::lago.ssh::DEBUG::Socket error connecting to lago-upgrade-from-release-suite-master-engine: [Errno None] Unable to connect to port 22 on 192.168.200.3 2017-11-13 15:31:33,237::ssh.py::wait_for_ssh::129::lago.ssh::DEBUG::Got exception while sshing to lago-upgrade-from-release-suite-master-host0: Timed out (in 1 s) trying to ssh to lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:34,238::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:5fd0e3c3-d83c-46f9-9c88-bc739aa9c430:Get ssh client for lago-upgrade-from-release-suite-master-engine: 2017-11-13 15:31:34,238::ssh.py::wait_for_ssh::129::lago.ssh::DEBUG::Got exception while sshing to lago-upgrade-from-release-suite-master-engine: Timed out (in 1 s) trying to ssh to lago-upgrade-from-release-suite-master-engine 2017-11-13 15:31:34,238::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:22550d6e-c397-43dc-9691-c8d9debfe416:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:34,239::ssh.py::get_ssh_client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:34,407::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:22550d6e-c397-43dc-9691-c8d9debfe416:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:34,639::ssh.py::ssh::58::lago.ssh::DEBUG::Running bb49adfa on lago-upgrade-from-release-suite-master-host0: true 2017-11-13 15:31:34,665::ssh.py::ssh::81::lago.ssh::DEBUG::Command bb49adfa on lago-upgrade-from-release-suite-master-host0 returned with 0 2017-11-13 15:31:34,665::ssh.py::wait_for_ssh::153::lago.ssh::DEBUG::Wait succeeded for ssh to lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:34,665::log_utils.py::__exit__::611::lago.prefix::INFO::ESC[32mSuccessESC[0m (in 0:00:11) 2017-11-13 15:31:34,666::log_utils.py::__enter__::600::lago.prefix::INFO::ESC[0mESC[0m 2017-11-13 15:31:34,666::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:4e2a7421-28f7-4679-955f-7261fe377939:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:34,667::ssh.py::get_ssh_client::339::lago.ssh::DEBUG::Still got 100 tries for lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:34,805::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:4e2a7421-28f7-4679-955f-7261fe377939:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:34,906::ssh.py::ssh::58::lago.ssh::DEBUG::Running bb724bf2 on lago-upgrade-from-release-suite-master-host0: bash -s < "set -xe DIST=$(uname -r | sed -r 's/^.*\.([^\.]+)\.[^\.]+$/\1/')
from host:
ions. Proceeding anyway. Nov 13 10:31:34 lago-upgrade-from-release-suite-master-host0 systemd: Configuration file /usr/lib/systemd/system/ebtables.service is marked executable. Please remove executable permission bits. Proceeding anyway. Nov 13 10:31:34 lago-upgrade-from-release-suite-master-host0 systemd: Configuration file /usr/lib/systemd/system/wpa_supplicant.service is marked executable. Please remove executable permission bits. Proceeding anyway. Nov 13 10:31:55 lago-upgrade-from-release-suite-master-host0 yum[1294]: Updated: iptables-1.4.21-18.2.el7_4.x86_64 Nov 13 10:31:56 lago-upgrade-from-release-suite-master-host0 systemd-logind: Removed session 3. Nov 13 10:31:56 lago-upgrade-from-release-suite-master-host0 systemd: Removed slice user-0.slice. Nov 13 10:31:56 lago-upgrade-from-release-suite-master-host0 systemd: Stopping user-0.slice. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Created slice user-0.slice. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Starting user-0.slice. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Started Session 4 of user root. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Starting Session 4 of user root. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd-logind: New session 4 of user root. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd-logind: Removed session 4. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Removed slice user-0.slice. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Stopping user-0.slice. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Created slice user-0.slice. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Starting user-0.slice. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Started Session 5 of user root. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Starting Session 5 of user root. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd-logind: New session 5 of user root. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd-logind: Removed session 5. Nov 13 10:44:01 lago-upgrade-from-release-suite-master-host0 systemd: Removed slice user-0.slice. Nov 13 10:44:01 lago-upgrade-from-release-suite-master-host0 systemd: Stopping user-0.slice. Nov 13 10:46:27 lago-upgrade-from-release-suite-master-host0 systemd: Starting Cleanup of Temporary Directories... Nov 13 10:46:27 lago-upgrade-from-release-suite-master-host0 systemd: Started Cleanup of Temporary Directories. Nov 13 10:54:36 lago-upgrade-from-release-suite-master-host0 dhclient[906]: DHCPREQUEST on eth0 to 192.168.200.1 port 67 (xid=0x564629cb) Nov 13 10:54:36 lago-upgrade-from-release-suite-master-host0 dhclient[906]: DHCPACK from 192.168.200.1 (xid=0x564629cb)
</error>
-- Didi

On Wed, Nov 15, 2017 at 4:00 PM, Yedidyah Bar David <didi@redhat.com> wrote:
On Wed, Nov 15, 2017 at 1:35 PM, Dafna Ron <dron@redhat.com> wrote:
Didi,
Thank you for your detailed explanation and for taking the time to debug this issue.
I opened the following Jira's:
1. for increasing entropy in the hosts: https://ovirt-jira.atlassian.net/browse/OVIRT-1763
I do not think anymore this is the main reason for slowness, although it might still be useful to verify/improve.
yuvalt pointed out in a private discussion that openssl lib does nothing related to random numbers in its pre/post install scripts.
It does call ldconfig, as do several other packages, which can take quite a lot of time. Some took less. 3 minutes is definitely not reasonable.
Also see this, from engine log:
2017-11-13 11:07:17,026-05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [45082420] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host lago-upgrade-from-release-suite-master-host0. Yum update: 398/570: 1:NetworkManager-team-1.8.0-11.el7_4.x86_64. 2017-11-13 11:07:30,573-05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [45082420] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host lago-upgrade-from-release-suite-master-host0. Yum obsoleting: 399/570: 1:NetworkManager-ppp-1.8.0-11.el7_4.x86_64. 2017-11-13 11:07:45,137-05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [45082420] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host lago-upgrade-from-release-suite-master-host0. Yum update: 400/570: 1:NetworkManager-tui-1.8.0-11.el7_4.x86_64. 2017-11-13 11:07:57,842-05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [45082420] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host lago-upgrade-from-release-suite-master-host0. Yum update: 401/570: audit-2.7.6-3.el7.x86_64.
That's ~ 15 seconds per package, and the first 3 have no scripts at all.
I'd say there is some serious storage issue there - bad disk, loaded storage network/hardware, something like this.
That was my guess as well. Sometimes selinux relabeling takes time (I believe ovirt console does that). Y.
2. I added a comment to the not all logs are downloaded Jira regarded a workaround which would save the logs on a different location: https://ovirt-jira.atlassian.net/browse/OVIRT-1583
3. adding /tmp logs to the job logs: https://ovirt-jira.atlassian.net/browse/OVIRT-1764
Again, thank you for your help Didi.
Dafna
On 11/15/2017 09:01 AM, Yedidyah Bar David wrote:
On Tue, Nov 14, 2017 at 5:48 PM, Dafna Ron <dron@redhat.com> wrote:
Hi,
We had a failure in upgrade suite for 002_bootstrap.add_hosts. I am not seeing any error that can suggest on an issue in engine.
I can see in the host messages host that we have stopped writing to
for 15 minutes and it may suggest that there is something that is keeping the host from starting which causes us to fail the test on timeout.
However, i can use some help in determining the cause for this failure and weather its connected to the bootstrap_add_host test in upgrade.
Link to suspected patches: As I said, I do not think its related, but
is the patch that was reported.
https://gerrit.ovirt.org/#/c/83854/
Link to Job:
http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3795/
Link to all logs:
http://jenkins.ovirt.org/job/ovirt-master_change-queue- tester/3795/artifact/
(Relevant) error snippet from the log:
<error>
Test error:
Error Message
False != True after 900 seconds
Stacktrace
Traceback (most recent call last): File "/usr/lib64/python2.7/unittest/case.py", line 369, in run testMethod() File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in runTest self.test(*self.arg) File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 129, in wrapped_test test() File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 59, in wrapper return func(get_test_prefix(), *args, **kwargs) File "/home/jenkins/workspace/ovirt-master_change-queue- tester/ovirt-system-tests/upgrade-from-release-suite- master/test-scenarios-after-upgrade/002_bootstrap.py", line 187, in add_hosts testlib.assert_true_within(_host_is_up_4, timeout=15*60) File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 263, in assert_true_within assert_equals_within(func, True, timeout, allowed_exceptions) File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 237, in assert_equals_within '%s != %s after %s seconds' % (res, value, timeout) AssertionError: False != True after 900 seconds
Hi,
It took me way too long to find the log file that includes the above stack trace. There is a known bug that pressing '(all files in zip)' does not get all of them, only some. If we can't fix it, please add something that will create such a zip/tar/whatever as part of the job, so that we do not rely on jenkins.
Something causes things to be very slow, no idea what in particular, but could not find a real problem.
Since we timed out and killed the engine in the middle of host-deploy, we do not have full logs of it. These are copied to the engine from the host only after it finishes. Before that, the log is written on the host in /tmp. We might want to make ost collect that as well, to help debug similar cases.
What we can see in engine log is that we start installing the host [1]:
2017-11-13 11:00:12,331-05 INFO [org.ovirt.engine.core.bll.hostdeploy.InstallVdsInternalCommand] (EE-ManagedThreadFactory-engine-Thread-1) [45082420] Before Installation host 83090668-908c-4f49-8690-5348ec12f931, lago-upgrade-from-release-suite-master-host0 2017-11-13 11:00:12,354-05 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (EE-ManagedThreadFactory-engine-Thread-1) [45082420] START, SetVdsStatusVDSCommand(HostName = lago-upgrade-from-release-suite-master-host0, SetVdsStatusVDSCommandParameters:{hostId='83090668-908c- 4f49-8690-5348ec12f931', status='Installing', nonOperationalReason='NONE', stopSpmFailureLogged='false', maintenanceReason='null'}), log id: 22a831f7
First line from host-deploy is:
2017-11-13 11:00:13,335-05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [45082420] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host lago-upgrade-from-release-suite-master-host0. Stage: Initializing.
which is one second later, ok.
...
2017-11-13 11:00:13,459-05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [45082420] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host lago-upgrade-from-release-suite-master-host0. Stage: Environment packages setup.
It goes install (immediately) packages that it needs for itself, in this case it updates the package 'dmidecode', from 2.12-9 to 3.0-5. I guess our image is a bit old.
...
2017-11-13 11:00:19,516-05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [45082420] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host lago-upgrade-from-release-suite-master-host0. Yum Verify: 2/2: dmidecode.x86_64 1:2.12-9.el7 - ud.
So it took (the equivalent of 'yum update dmidecode') 6 seconds. Reasonable.
...
Later on, it starts installing/updating the actual packages it should install/update. Starts with:
2017-11-13 11:00:24,982-05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [45082420] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host lago-upgrade-from-release-suite-master-host0. Yum Status: Downloading Packages.
Then there are many lines about each package download. First is:
2017-11-13 11:00:25,049-05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [45082420] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host lago-upgrade-from-release-suite-master-host0. Yum Download/Verify: GeoIP-1.5.0-11.el7.x86_64.
Last is:
2017-11-13 11:00:43,790-05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [45082420] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host lago-upgrade-from-release-suite-master-host0. Yum Download/Verify: zlib-1.2.7-17.el7.x86_64.
Later we see it updates 570 packages. So downloading 570 packages took ~ 20 seconds, good too. Then it installs them. First is:
2017-11-13 11:00:46,114-05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [45082420] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host lago-upgrade-from-release-suite-master-host0. Yum update: 1/570: libgcc-4.8.5-16.el7.x86_64.
And then many similar ones. I only skimmed through them, trying to find large gaps - didn't write a script - and a large one I find is:
2017-11-13 11:10:04,522-05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [45082420] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host lago-upgrade-from-release-suite-master-host0. Yum updated: 506/570: audit.
2017-11-13 11:13:06,483-05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [45082420] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host lago-upgrade-from-release-suite-master-host0. Yum updated: 507/570: openssl-libs.
In between these two lines, there are a few unrelated ones, from some other engine thread. So something caused it to need 3 minutes to install openssl-libs.
A wild guess: Perhaps we have low entropy, and openssl does something that needs entropy?
Checking engine-setup log there [2], I see that the following command took almost 1.5 minutes to run:
2017-11-13 10:58:19,459-0500 DEBUG otopi.plugins.ovirt_engine_setup.ovirt_engine.config.sso plugin.executeRaw:813 execute: ('/usr/share/ovirt-engine/bin/ovirt-engine-crypto-tool.sh', 'pbe-encode', '--password=env:pass'), executable='None', cwd='None', env={'pass': '**FILTERED**', 'LESSOPEN': '||/usr/bin/lesspipe.sh %s', 'SSH_CLIENT': '192.168.200.1 52372 22', 'SELINUX_USE_CURRENT_RANGE': '', 'LOGNAME': 'root', 'USER': 'root', 'OVIRT_ENGINE_JAVA_HOME': u'/usr/lib/jvm/jre', 'PATH': '/opt/rh/rh-postgresql95/root/usr/bin:/usr/local/sbin:/usr/ local/bin:/usr/sbin:/usr/bin', 'HOME': '/root', 'OVIRT_JBOSS_HOME': '/usr/share/ovirt-engine-wildfly', 'LD_LIBRARY_PATH': '/opt/rh/rh-postgresql95/root/usr/lib64', 'LANG': 'en_US.UTF-8', 'SHELL': '/bin/bash', 'LIBRARY_PATH': '/opt/rh/rh-postgresql95/root/usr/lib64', 'SHLVL': '4', 'POSTGRESQLENV': 'COMMAND/pg_dump=str:/opt/rh/rh-postgresql95/root/usr/bin/pg_dump COMMAND/psql=str:/opt/rh/rh-postgresql95/root/usr/bin/psql COMMAND/pg_restore=str:/opt/rh/rh-postgresql95/root/usr/bin/pg_restore COMMAND/postgresql-setup=str:/opt/rh/rh-postgresql95/root/ usr/bin/postgresql-setup OVESETUP_PROVISIONING/postgresService=str:rh-
the log this postgresql95-postgresql
OVESETUP_PROVISIONING/postgresConf=str:/var/opt/rh/
rh-postgresql95/lib/pgsql/data/postgresql.conf
OVESETUP_PROVISIONING/postgresPgHba=str:/var/opt/rh/
rh-postgresql95/lib/pgsql/data/pg_hba.conf
OVESETUP_PROVISIONING/postgresPgVersion=str:/var/
opt/rh/rh-postgresql95/lib/pgsql/data/PG_VERSION',
'MANPATH': '/opt/rh/rh-postgresql95/root/usr/share/man:', 'X_SCLS': 'rh-postgresql95 ', 'XDG_RUNTIME_DIR': '/run/user/0', 'OVIRT_ENGINE_JAVA_HOME_FORCE': '1', 'PYTHONPATH': '/usr/share/ovirt-engine/setup/bin/..::', 'SELINUX_ROLE_REQUESTED': '', 'MAIL': '/var/mail/root', 'PKG_CONFIG_PATH': '/opt/rh/rh-postgresql95/root/usr/lib64/pkgconfig', 'XDG_SESSION_ID': '14', 'sclenv': 'rh-postgresql95', 'XDG_CONFIG_DIRS': '/etc/opt/rh/rh-postgresql95/xdg:/etc/xdg', 'JAVACONFDIRS': '/etc/opt/rh/rh-postgresql95/java:/etc/java', 'SELINUX_LEVEL_REQUESTED': '', 'XDG_DATA_DIRS': '/opt/rh/rh-postgresql95/root/usr/share', 'PWD': '/root', 'CPATH': '/opt/rh/rh-postgresql95/root/usr/include', 'OTOPI_LOGFILE': '/var/log/ovirt-engine/setup/ovirt-engine-setup- 20171113105548-qrc7zo.log', 'SSH_CONNECTION': '192.168.200.1 52372 192.168.200.3 22', 'OTOPI_EXECDIR': '/root'} 2017-11-13 10:59:43,545-0500 DEBUG otopi.plugins.ovirt_engine_setup.ovirt_engine.config.sso plugin.executeRaw:863 execute-result: ('/usr/share/ovirt-engine/bin/ovirt-engine-crypto-tool.sh', 'pbe-encode', '--password=env:pass'), rc=0 2017-11-13 10:59:43,546-0500 DEBUG otopi.plugins.ovirt_engine_setup.ovirt_engine.config.sso plugin.execute:921 execute-output: ('/usr/share/ovirt-engine/bin/ovirt-engine-crypto-tool.sh', 'pbe-encode', '--password=env:pass') stdout: eyJhcnRpZmFjdCI6IkVudmVsb3BlUEJFIiwic2FsdCI6ImNjSmhFbHRnUEJx eUlNTUJSaU1OdFRYL3M0RGRJT1hOSWJjV2F1NFZGT0U9Iiwic2VjcmV0Ijoi K2ZlTzVyZm9kNGlsVmZLRENaRjdseVVQZHZnWnBTWUF0cnBYUWVpQnJaTT0i LCJ2ZXJzaW9uIjoiMSIsIml0ZXJhdGlvbnMiOiI0MDAwIiwiYWxnb3JpdGht IjoiUEJLREYyV2l0aEhtYWNTSEExIn0=
2017-11-13 10:59:43,546-0500 DEBUG otopi.plugins.ovirt_engine_setup.ovirt_engine.config.sso plugin.execute:926 execute-output: ('/usr/share/ovirt-engine/bin/ovirt-engine-crypto-tool.sh', 'pbe-encode', '--password=env:pass') stderr:
This is almost always due to not enough entropy.
So please check if everything - from physical machines to lago/ost/libvirt/etc., makes sure to supply all VMs enough entropy. We usually do this using virtio-rng.
[1] http://jenkins.ovirt.org/job/ovirt-master_change-queue- tester/3795/artifact/exported-artifacts/upgrade-from- release-suit-master-el7/test_logs/upgrade-from-release- suite-master/post-002_bootstrap.py/lago-upgrade-from-release-suite-master- engine/_var_log/ovirt-engine/engine.log [2] http://jenkins.ovirt.org/job/ovirt-master_change-queue- tester/3795/artifact/exported-artifacts/upgrade-from- release-suit-master-el7/test_logs/upgrade-from-release- suite-master/post-002_bootstrap.py/lago-upgrade-from-release-suite-master- engine/_var_log/ovirt-engine/setup/ovirt-engine-setup- 20171113105548-qrc7zo.log
lago log:
2017-11-13 15:31:23,212::log_utils.py::__enter__::600::lago.prefix:: INFO::ESC[0mESC[0m 2017-11-13 15:31:23,213::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:0468ed2f-b174-4d94-bc66-2b6e08087a86:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:23,213::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:91372413-f7fd-4b72-85b9-9f5216ca7ae9:Get ssh client for lago-upgrade-from-release-suite-master-engine: 2017-11-13 15:31:23,213::ssh.py::get_ssh_ client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:23,213::ssh.py::get_ssh_ client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-engine 2017-11-13 15:31:26,220::ssh.py::get_ssh_client::354::lago.ssh::DEBUG::Socket error connecting to lago-upgrade-from-release-suite-master-engine: [Errno None] Unable to connect to port 22 on 192.168.200.3 2017-11-13 15:31:26,221::ssh.py::get_ssh_client::354::lago.ssh::DEBUG::Socket error connecting to lago-upgrade-from-release-suite-master-host0: [Errno None] Unable to connect to port 22 on 192.168.200.2 2017-11-13 15:31:27,222::log_utils.py::__ exit__::611::lago.ssh::DEBUG::end task:91372413-f7fd-4b72-85b9-9f5216ca7ae9:Get ssh client for lago-upgrade-from-release-suite-master-engine: 2017-11-13 15:31:27,222::log_utils.py::__ exit__::611::lago.ssh::DEBUG::end task:0468ed2f-b174-4d94-bc66-2b6e08087a86:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:27,222::ssh.py::wait_for_ssh::129::lago.ssh::DEBUG: :Got exception while sshing to lago-upgrade-from-release- suite-master-engine: Timed out (in 4 s) trying to ssh to lago-upgrade-from-release-suite-master-engine 2017-11-13 15:31:27,222::ssh.py::wait_for_ssh::129::lago.ssh::DEBUG: :Got exception while sshing to lago-upgrade-from-release- suite-master-host0: Timed out (in 4 s) trying to ssh to lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:28,224::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:fc88f641-b012-4636-a471-9ccaaf361a53:Get ssh client for lago-upgrade-from-release-suite-master-engine: 2017-11-13 15:31:28,224::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:afb01a46-6338-407f-b7c9-9d5c6b91404d:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:28,224::ssh.py::get_ssh_ client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-engine 2017-11-13 15:31:28,225::ssh.py::get_ssh_ client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:28,226::ssh.py::get_ssh_client::354::lago.ssh::DEBUG::Socket error connecting to lago-upgrade-from-release-suite-master-host0: [Errno None] Unable to connect to port 22 on 192.168.200.2 2017-11-13 15:31:29,228::log_utils.py::__ exit__::611::lago.ssh::DEBUG::end task:afb01a46-6338-407f-b7c9-9d5c6b91404d:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:29,228::ssh.py::wait_for_ssh::129::lago.ssh::DEBUG: :Got exception while sshing to lago-upgrade-from-release- suite-master-host0: Timed out (in 1 s) trying to ssh to lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:29,229::ssh.py::get_ssh_client::354::lago.ssh::DEBUG::Socket error connecting to lago-upgrade-from-release-suite-master-engine: [Errno None] Unable to connect to port 22 on 192.168.200.3 2017-11-13 15:31:30,229::log_utils.py::__ exit__::611::lago.ssh::DEBUG::end task:fc88f641-b012-4636-a471-9ccaaf361a53:Get ssh client for lago-upgrade-from-release-suite-master-engine: 2017-11-13 15:31:30,230::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:9df5ddca-bbc4-485a-9f3a-b0b9a5bb5990:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:30,230::ssh.py::wait_for_ssh::129::lago.ssh::DEBUG: :Got exception while sshing to lago-upgrade-from-release- suite-master-engine: Timed out (in 2 s) trying to ssh to lago-upgrade-from-release-suite-master-engine 2017-11-13 15:31:30,230::ssh.py::get_ssh_ client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:30,231::ssh.py::get_ssh_client::354::lago.ssh::DEBUG::Socket error connecting to lago-upgrade-from-release-suite-master-host0: [Errno None] Unable to connect to port 22 on 192.168.200.2 2017-11-13 15:31:31,231::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:53fe67da-a632-49fe-b697-e8a2c4cb7d23:Get ssh client for lago-upgrade-from-release-suite-master-engine: 2017-11-13 15:31:31,232::ssh.py::get_ssh_ client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-engine 2017-11-13 15:31:31,232::ssh.py::get_ssh_client::354::lago.ssh::DEBUG::Socket error connecting to lago-upgrade-from-release-suite-master-engine: [Errno None] Unable to connect to port 22 on 192.168.200.3 2017-11-13 15:31:31,232::log_utils.py::__ exit__::611::lago.ssh::DEBUG::end task:9df5ddca-bbc4-485a-9f3a-b0b9a5bb5990:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:31,233::ssh.py::wait_for_ssh::129::lago.ssh::DEBUG: :Got exception while sshing to lago-upgrade-from-release- suite-master-host0: Timed out (in 1 s) trying to ssh to lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:32,234::log_utils.py::__ exit__::611::lago.ssh::DEBUG::end task:53fe67da-a632-49fe-b697-e8a2c4cb7d23:Get ssh client for lago-upgrade-from-release-suite-master-engine: 2017-11-13 15:31:32,234::ssh.py::wait_for_ssh::129::lago.ssh::DEBUG: :Got exception while sshing to lago-upgrade-from-release- suite-master-engine: Timed out (in 1 s) trying to ssh to lago-upgrade-from-release-suite-master-engine 2017-11-13 15:31:32,234::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:1969bf5b-4e66-43d7-91c6-a28949e98fe8:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:32,234::ssh.py::get_ssh_ client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:32,235::ssh.py::get_ssh_client::354::lago.ssh::DEBUG::Socket error connecting to lago-upgrade-from-release-suite-master-host0: [Errno None] Unable to connect to port 22 on 192.168.200.2 2017-11-13 15:31:33,235::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:5fd0e3c3-d83c-46f9-9c88-bc739aa9c430:Get ssh client for lago-upgrade-from-release-suite-master-engine: 2017-11-13 15:31:33,235::ssh.py::get_ssh_ client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-engine 2017-11-13 15:31:33,236::log_utils.py::__ exit__::611::lago.ssh::DEBUG::end task:1969bf5b-4e66-43d7-91c6-a28949e98fe8:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:33,236::ssh.py::get_ssh_client::354::lago.ssh::DEBUG::Socket error connecting to lago-upgrade-from-release-suite-master-engine: [Errno None] Unable to connect to port 22 on 192.168.200.3 2017-11-13 15:31:33,237::ssh.py::wait_for_ssh::129::lago.ssh::DEBUG: :Got exception while sshing to lago-upgrade-from-release- suite-master-host0: Timed out (in 1 s) trying to ssh to lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:34,238::log_utils.py::__ exit__::611::lago.ssh::DEBUG::end task:5fd0e3c3-d83c-46f9-9c88-bc739aa9c430:Get ssh client for lago-upgrade-from-release-suite-master-engine: 2017-11-13 15:31:34,238::ssh.py::wait_for_ssh::129::lago.ssh::DEBUG: :Got exception while sshing to lago-upgrade-from-release- suite-master-engine: Timed out (in 1 s) trying to ssh to lago-upgrade-from-release-suite-master-engine 2017-11-13 15:31:34,238::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:22550d6e-c397-43dc-9691-c8d9debfe416:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:34,239::ssh.py::get_ssh_ client::339::lago.ssh::DEBUG::Still got 1 tries for lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:34,407::log_utils.py::__ exit__::611::lago.ssh::DEBUG::end task:22550d6e-c397-43dc-9691-c8d9debfe416:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:34,639::ssh.py::ssh::58::lago.ssh::DEBUG::Running bb49adfa on lago-upgrade-from-release-suite-master-host0: true 2017-11-13 15:31:34,665::ssh.py::ssh::81::lago.ssh::DEBUG::Command bb49adfa on lago-upgrade-from-release-suite-master-host0 returned with 0 2017-11-13 15:31:34,665::ssh.py::wait_for_ssh::153::lago.ssh::DEBUG: :Wait succeeded for ssh to lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:34,665::log_utils.py::__exit__::611::lago.prefix:: INFO::ESC[32mSuccessESC[0m (in 0:00:11) 2017-11-13 15:31:34,666::log_utils.py::__enter__::600::lago.prefix:: INFO::ESC[0mESC[0m 2017-11-13 15:31:34,666::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:4e2a7421-28f7-4679-955f-7261fe377939:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:34,667::ssh.py::get_ssh_ client::339::lago.ssh::DEBUG::Still got 100 tries for lago-upgrade-from-release-suite-master-host0 2017-11-13 15:31:34,805::log_utils.py::__ exit__::611::lago.ssh::DEBUG::end task:4e2a7421-28f7-4679-955f-7261fe377939:Get ssh client for lago-upgrade-from-release-suite-master-host0: 2017-11-13 15:31:34,906::ssh.py::ssh::58::lago.ssh::DEBUG::Running bb724bf2 on lago-upgrade-from-release-suite-master-host0: bash -s < "set -xe DIST=$(uname -r | sed -r 's/^.*\.([^\.]+)\.[^\.]+$/\1/')
from host:
ions. Proceeding anyway. Nov 13 10:31:34 lago-upgrade-from-release-suite-master-host0 systemd: Configuration file /usr/lib/systemd/system/ebtables.service is marked executable. Please remove executable permission bits. Proceeding anyway. Nov 13 10:31:34 lago-upgrade-from-release-suite-master-host0 systemd: Configuration file /usr/lib/systemd/system/wpa_supplicant.service is marked executable. Please remove executable permission bits. Proceeding anyway. Nov 13 10:31:55 lago-upgrade-from-release-suite-master-host0 yum[1294]: Updated: iptables-1.4.21-18.2.el7_4.x86_64 Nov 13 10:31:56 lago-upgrade-from-release-suite-master-host0 systemd-logind: Removed session 3. Nov 13 10:31:56 lago-upgrade-from-release-suite-master-host0 systemd: Removed slice user-0.slice. Nov 13 10:31:56 lago-upgrade-from-release-suite-master-host0 systemd: Stopping user-0.slice. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Created slice user-0.slice. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Starting user-0.slice. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Started Session 4 of user root. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Starting Session 4 of user root. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd-logind: New session 4 of user root. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd-logind: Removed session 4. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Removed slice user-0.slice. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Stopping user-0.slice. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Created slice user-0.slice. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Starting user-0.slice. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Started Session 5 of user root. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd: Starting Session 5 of user root. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd-logind: New session 5 of user root. Nov 13 10:44:00 lago-upgrade-from-release-suite-master-host0 systemd-logind: Removed session 5. Nov 13 10:44:01 lago-upgrade-from-release-suite-master-host0 systemd: Removed slice user-0.slice. Nov 13 10:44:01 lago-upgrade-from-release-suite-master-host0 systemd: Stopping user-0.slice. Nov 13 10:46:27 lago-upgrade-from-release-suite-master-host0 systemd: Starting Cleanup of Temporary Directories... Nov 13 10:46:27 lago-upgrade-from-release-suite-master-host0 systemd: Started Cleanup of Temporary Directories. Nov 13 10:54:36 lago-upgrade-from-release-suite-master-host0 dhclient[906]: DHCPREQUEST on eth0 to 192.168.200.1 port 67 (xid=0x564629cb) Nov 13 10:54:36 lago-upgrade-from-release-suite-master-host0 dhclient[906]: DHCPACK from 192.168.200.1 (xid=0x564629cb)
</error>
-- Didi
participants (3)
-
Dafna Ron
-
Yaniv Kaul
-
Yedidyah Bar David