Lago + ovirt-system-tests run fail on collecting logs

After running: ./run_suite.sh --cleanup basic_suite_3.6 ./run_suite.sh basic_suite_3.6 The tests failed at [1] Does it mean the cleanup stage failed? why would log collection fail on a duplicate file? I think it will be beneficial to support multiple logs and rotate it like we do in ovirt for e.g, Since a developer might run a few times without cleanning up the env. Do we have an open issue for it? , cause I didn't find anything similar. [1]. @ Collect artifacts: ERROR (in 0:00:00) Error occured, aborting Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirtlago/cmd.py", line 248, in do_run self.cli_plugins[args.ovirtverb].do_run(args) File "/usr/lib/python2.7/site-packages/lago/plugins/cli.py", line 180, in do_run self._do_run(**vars(args)) File "/usr/lib/python2.7/site-packages/lago/utils.py", line 501, in wrapper return func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/lago/utils.py", line 512, in wrapper return func(*args, prefix=prefix, **kwargs) File "/usr/lib/python2.7/site-packages/ovirtlago/cmd.py", line 217, in do_ovirt_collect prefix.collect_artifacts(output) File "/usr/lib/python2.7/site-packages/lago/log_utils.py", line 598, in wrapper return func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/lago/prefix.py", line 928, in collect_artifacts os.makedirs(output_dir) File "/usr/lib64/python2.7/os.py", line 157, in makedirs mkdir(name, mode) OSError: [Errno 17] File exists: '/home/eedri/lago/ovirt-system-tests/test_logs/basic_suite_3.6/post-001_initialize_engine.py' -- Eyal Edri Associate Manager RHEV DevOps EMEA ENG Virtualization R&D Red Hat Israel phone: +972-9-7692018 irc: eedri (on #tlv #rhev-dev #rhev-integ)

On Sun, Apr 10, 2016 at 6:06 PM, Eyal Edri <eedri@redhat.com> wrote:
test_logs/
This is an annoying change of behavior. In the past, I believe the logs were under the deployment dir. Now, they are here. It requires cleaning them manually every time. It's part of issues we'll have to fix if we want (and I believe we do) support multiple execution. I consider it as a regression in a way, since it's a changed behavior - and I'm not sure for the better. Y.

I agree. I saw [1], but its attached to an abandoned patch in gerrit with no pull request to replace it. [1] https://github.com/lago-project/lago/issues/11 On Sun, Apr 10, 2016 at 8:52 PM, Yaniv Kaul <ykaul@redhat.com> wrote:
On Sun, Apr 10, 2016 at 6:06 PM, Eyal Edri <eedri@redhat.com> wrote:
test_logs/
This is an annoying change of behavior. In the past, I believe the logs were under the deployment dir. Now, they are here. It requires cleaning them manually every time. It's part of issues we'll have to fix if we want (and I believe we do) support multiple execution. I consider it as a regression in a way, since it's a changed behavior - and I'm not sure for the better. Y.
-- Eyal Edri Associate Manager RHEV DevOps EMEA ENG Virtualization R&D Red Hat Israel phone: +972-9-7692018 irc: eedri (on #tlv #rhev-dev #rhev-integ)

--Rmm1Stw9KgbdL9/H Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 04/10 20:52, Yaniv Kaul wrote:
test_logs/ =20 =20 This is an annoying change of behavior. In the past, I believe the logs were under the deployment dir. Now, they are here. It requires cleaning
On Sun, Apr 10, 2016 at 6:06 PM, Eyal Edri <eedri@redhat.com> wrote: =20 them manually every time.
Before it also required manual cleanup every time, it just turned out, that while doing the manual cleanup of the prefix (with ./run_suite -c) the logs were removed too (that was also an issue on jenkins, as you had to extract = the logs before the cleanup)
It's part of issues we'll have to fix if we want (and I believe we do) support multiple execution.
It supports multiple execution as long as you are not running the same suit= e, same as before, the issue here is that you are using a very specific flow t= hat is not used anywhere else, and thus, facing issues and user cases that noone else has. I really recommend: * Moving to the same flow jenkins uses * Moving jenkins to the same flow you use
I consider it as a regression in a way, since it's a changed behavior - a= nd I'm not sure for the better.
It changed behavior yes, and it improved the log collection and cleanup procedures on jenkins. To alleviate that issue, I sent a patch for that in the beginning of the la= go project that was never merged, feel free to open a task for that too, shoul= d be relatively easy to implement some kind of log rotation if the destination directory already exists.
Y.
_______________________________________________ lago-devel mailing list lago-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/lago-devel
--=20 David Caro Red Hat S.L. Continuous Integration Engineer - EMEA ENG Virtualization R&D Tel.: +420 532 294 605 Email: dcaro@redhat.com IRC: dcaro|dcaroest@{freenode|oftc|redhat} Web: www.redhat.com RHT Global #: 82-62605 --Rmm1Stw9KgbdL9/H Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJXDhnmAAoJEEBxx+HSYmnDuqoH/ibq1kqOTjGyCYzWvIfhf1S2 +5XROcDvwF48mkWPhyyG7/5OzBw+lkWaL/pfw9fz1r2al7QlyuN5Wtyd6i3onfxr 7KAfqDu71Xj2hsMq/475NO05T+pZN6bv2iAcDCxldZWVriitwPGzitt3eTKtCB5E KzPcyXBiJaa14MTrHYjbRgv6XxuNRThc2hoX/2S1il9TBp7tIOs3ttaCJXN/fUaP vQAEsxml98ZYR7mjjDmrcqvmFsd1a/FOtXTY+XpAwYWxWxf32eEWK+aVXEHJeOn3 DKgbAISY0vr6fgJjdnd+MXGE36GpLRtOmUOnHFg8gkkkhysW9LexbcEvpchCZOY= =+Zkk -----END PGP SIGNATURE----- --Rmm1Stw9KgbdL9/H--

On Wed, Apr 13, 2016 at 1:05 PM, David Caro <dcaro@redhat.com> wrote:
On 04/10 20:52, Yaniv Kaul wrote:
On Sun, Apr 10, 2016 at 6:06 PM, Eyal Edri <eedri@redhat.com> wrote:
test_logs/
This is an annoying change of behavior. In the past, I believe the logs were under the deployment dir. Now, they are here. It requires cleaning them manually every time.
Before it also required manual cleanup every time, it just turned out, that while doing the manual cleanup of the prefix (with ./run_suite -c) the logs were removed too (that was also an issue on jenkins, as you had to extract the logs before the cleanup)
1. Makes sense to me that you'll extract the logs before cleanup. 2. It did not cause a re-run to fail, which now, unless you cleanup AND rm the files, it will.
It's part of issues we'll have to fix if we want (and I believe we do) support multiple execution.
Yep.
It supports multiple execution as long as you are not running the same suite, same as before, the issue here is that you are using a very specific flow that is not used anywhere else, and thus, facing issues and user cases that noone else has. I really recommend: * Moving to the same flow jenkins uses * Moving jenkins to the same flow you use
I'm running ./run_suite.sh -o /home/zram/3.6 basic_suite_3.6 and cleanup: lagocli --prefix-path /home/zram/3.6/current cleanup over and over and over and over... What should I be running? Y.
I consider it as a regression in a way, since it's a changed behavior - and I'm not sure for the better.
It changed behavior yes, and it improved the log collection and cleanup procedures on jenkins. To alleviate that issue, I sent a patch for that in the beginning of the lago project that was never merged, feel free to open a task for that too, should be relatively easy to implement some kind of log rotation if the destination directory already exists.
Y.
_______________________________________________ lago-devel mailing list lago-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/lago-devel
-- David Caro
Red Hat S.L. Continuous Integration Engineer - EMEA ENG Virtualization R&D
Tel.: +420 532 294 605 Email: dcaro@redhat.com IRC: dcaro|dcaroest@{freenode|oftc|redhat} Web: www.redhat.com RHT Global #: 82-62605

On Wed, Apr 13, 2016 at 1:05 PM, David Caro <dcaro@redhat.com> wrote: =20
On 04/10 20:52, Yaniv Kaul wrote:
On Sun, Apr 10, 2016 at 6:06 PM, Eyal Edri <eedri@redhat.com> wrote:
test_logs/
This is an annoying change of behavior. In the past, I believe the lo= gs were under the deployment dir. Now, they are here. It requires cleani= ng them manually every time.
Before it also required manual cleanup every time, it just turned out, =
--M9pltayyoy9lWEMH Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 04/13 13:35, Yaniv Kaul wrote: that
while doing the manual cleanup of the prefix (with ./run_suite -c) the = logs were removed too (that was also an issue on jenkins, as you had to extr= act the logs before the cleanup)
=20 1. Makes sense to me that you'll extract the logs before cleanup.
The collect command is the one extracting the logs, what we did before is extract them, into the same prefix (that's what I think it's wrong, the pre= fix is one thing, and the artifacts you collect are another)
2. It did not cause a re-run to fail, which now, unless you cleanup AND rm the files, it will. =20
It's part of issues we'll have to fix if we want (and I believe we do) support multiple execution.
=20 Yep. =20 =20
It supports multiple execution as long as you are not running the same suite, same as before, the issue here is that you are using a very specific fl=
ow
that is not used anywhere else, and thus, facing issues and user cases that noone else has. I really recommend: * Moving to the same flow jenkins uses * Moving jenkins to the same flow you use
=20 I'm running ./run_suite.sh -o /home/zram/3.6 basic_suite_3.6 and cleanup: lagocli --prefix-path /home/zram/3.6/current cleanup =20 over and over and over and over... =20 What should I be running?
=20
I consider it as a regression in a way, since it's a changed behavior=
rm -rf test_logs ./run_suite.sh basic_suite_3.6 ./run_suite.sh --cleanup basic_suite_3.6 That's the jenkins flow more or less, you could even run something closer by using: mock_runner.sh --execute-script automation/basic_suite_3.6.sh fc22 That is quite closer to what jenkins runs, that already takes care of the test_logs directory, moving it to exported-artifacts (where jenkins archives it) and removing it if it exists. Though you will not have any caching and zram execution (just as the jenkins slaves don't have it). -
and
I'm not sure for the better.
It changed behavior yes, and it improved the log collection and cleanup procedures on jenkins. To alleviate that issue, I sent a patch for that in the beginning of the lago project that was never merged, feel free to open a task for that too, should be relatively easy to implement some kind of log rotation if the destinati= on directory already exists.
Y.
_______________________________________________ lago-devel mailing list lago-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/lago-devel
-- David Caro
Red Hat S.L. Continuous Integration Engineer - EMEA ENG Virtualization R&D
Tel.: +420 532 294 605 Email: dcaro@redhat.com IRC: dcaro|dcaroest@{freenode|oftc|redhat} Web: www.redhat.com RHT Global #: 82-62605
--=20 David Caro Red Hat S.L. Continuous Integration Engineer - EMEA ENG Virtualization R&D Tel.: +420 532 294 605 Email: dcaro@redhat.com IRC: dcaro|dcaroest@{freenode|oftc|redhat} Web: www.redhat.com RHT Global #: 82-62605 --M9pltayyoy9lWEMH Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJXDidCAAoJEEBxx+HSYmnDpoQIAJvRW9IIMJkad/KOY37dw78g 6VWoIWDhARqe79n0/0Eaf5elhq2Ey813fi3myclmgcdOZbJ3XswY6ODt1P85I7ME cJTqQLpisB0Bn+07PxsgF8LONGRjt6k6IEFaBXIMYnpg5mlppA3p78zVXZscchAr 9LV5E/Wpk7H3osaoQ2fCj81eYBGxtMm8ZRodz1rlAKTemTRupnbL9bECWeFTU4KZ nSBqhDdcrC+ce2LazFiunShomSY+E7usd85WFihZr8QrfbSxEcSjOo0ypJEU6h6A ZrDE0UGDW8cCNucOHuxj6f7bUpXFkShZiZUpkT2lmQ+vOPctQq7/BZS8k1UqOKY= =ty+F -----END PGP SIGNATURE----- --M9pltayyoy9lWEMH--

On Wed, Apr 13, 2016 at 2:02 PM, David Caro <dcaro@redhat.com> wrote:
Though you will not have any caching and zram execution (just as the jenkins slaves don't have it).
That should change. The sooner the better. If they have enough ram, they can actually run in /dev/shm/something. That's what I'm doing with the VDSM functional tests. Y.

--bX/mw5riLlTkt+Gv Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 04/13 15:01, Yaniv Kaul wrote:
On Wed, Apr 13, 2016 at 2:02 PM, David Caro <dcaro@redhat.com> wrote: =20
Though you will not have any caching and zram execution (just as the jenkins slaves don't have it).
=20 That should change. The sooner the better.
So as I said, there's an usync between what you do and everyone else, it sh= ould be synced, either one side or the other, in any case, please open a task and prioritize properly, if it only affects the test scripts (like running on r= am) then to bugzilla, if it affects lago, then github.
=20 If they have enough ram, they can actually run in /dev/shm/something. That's what I'm doing with the VDSM functional tests. Y.
--=20 David Caro Red Hat S.L. Continuous Integration Engineer - EMEA ENG Virtualization R&D Tel.: +420 532 294 605 Email: dcaro@redhat.com IRC: dcaro|dcaroest@{freenode|oftc|redhat} Web: www.redhat.com RHT Global #: 82-62605 --bX/mw5riLlTkt+Gv Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJXDjZbAAoJEEBxx+HSYmnDnoIH/2tIyXOIAadX6dbSK3wyK1XW CTFT55dF7WvtehRDSFCiazSuybPJ7laCDySYu7NbsYqwk710O6ZnljCJmDyBK/MI roV62VSTDQZAxBK53e+BAVx0ZuevwfLPzZKpQ6h+GRvKWlNr/Kfc+GtEYijXPCH0 kwP7aFyrnaxEy+lLDK+UiWhdyErsTVaVPv/5Lv0UeD5KbfZPKpAip/tluUjV+VMv b4JqHwoXqXMkyXutBHe/+XW9m+lrSU5LbaxcX2Oh51+kCRIY38kyUuXEsPnfaaaA 5VvzPIHM0yByRzAlaUNrHOE431q9+VWmF9PC2LGA/7dDT1t2XSbqgVhfRM+qrnU= =nvYf -----END PGP SIGNATURE----- --bX/mw5riLlTkt+Gv--
participants (3)
-
David Caro
-
Eyal Edri
-
Yaniv Kaul