Adding +Martin Perina <mperina(a)redhat.com>
On Mon, Mar 15, 2021 at 9:59 AM Yedidyah Bar David <didi(a)redhat.com> wrote:
On Mon, Mar 15, 2021 at 7:55 AM Yedidyah Bar David
<didi(a)redhat.com>
wrote:
>
> Hi all,
>
> This started a few days ago [1] and randomly happens since then:
>
> E DEBUG: Configuration:
> E DEBUG: command: collect
> E DEBUG: Traceback (most recent call last):
> E DEBUG: File
> "/usr/lib/python3.6/site-packages/ovirt_log_collector/__main__.py",
> line 2067, in <module>
> E DEBUG: '%s directory is not empty.' %
(conf["local_tmp_dir"])
> E DEBUG: Exception: /dev/shm/log directory is not
> empty.ERROR: /dev/shm/log directory is not empty.non-zero return code
>
> Michal tried to fix this by using a random directory but it still fails
[2]:
>
> DEBUG: command: collect
> DEBUG: Traceback (most recent call last):
> DEBUG: File
"/usr/lib/python3.6/site-packages/ovirt_log_collector/__main__.py",
> line 2067, in <module>
> DEBUG: '%s directory is not empty.' % (conf["local_tmp_dir"])
> DEBUG: Exception: /dev/shm/kaN7uY directory is not empty.ERROR:
> /dev/shm/kaN7uY directory is not empty.non-zero return code
>
> Since I suppose that the randomness of mktemp is good enough, it must
> be something else. Also, the last successful run before [1] used the
> same OST git commit (same code), so I do not think it's something in
> OST's code.
>
> Any idea?
>
> I think I'll push a patch to create and use the directory right before
> calling ovirt-log-collector, which is probably better in other ways.
My patch [1] still fails, with a somewhat different error message, but
this made me check further, and while I still do not understand, I have
this to add:
In the failing runs, ovirt-log-collector is called *twice* in parallel.
E.g.
in [2] (the check-patch of [1]):
Mar 15 07:38:59 lago-basic-suite-master-engine platform-python[59099]:
ansible-command Invoked with _raw_params=lctmp=$(mktemp -d -p
/dev/shm); ovirt-log-collector --verbose --batch --no-hypervisors
--local-tmp="${lctmp}" --conf-file=/root/ovirt-log-collector.conf
_uses_shell=True warn=True stdin_add_newline=True
strip_empty_ends=True argv=None chdir=None executable=None
creates=None removes=None stdin=None
Mar 15 07:38:59 lago-basic-suite-master-engine platform-python[59124]:
ansible-command Invoked with _raw_params=lctmp=$(mktemp -d -p
/dev/shm); ovirt-log-collector --verbose --batch --no-hypervisors
--local-tmp="${lctmp}" --conf-file=/root/ovirt-log-collector.conf
_uses_shell=True warn=True stdin_add_newline=True
strip_empty_ends=True argv=None chdir=None executable=None
creates=None removes=None stdin=None
It also generates two logs, which you can check/compare.
It's the same for previous ones, e.g. latest nightly [3][4]:
Mar 15 06:23:30 lago-basic-suite-master-engine platform-python[59343]:
ansible-command Invoked with _raw_params=ovirt-log-collector --verbose
--batch --no-hypervisors --conf-file=/root/ovirt-log-collector.conf
_uses_shell=True warn=True stdin_add_newline=True
strip_empty_ends=True argv=None chdir=None executable=None
creates=None removes=None stdin=None
Mar 15 06:23:30 lago-basic-suite-master-engine setroubleshoot[58889]:
SELinux is preventing /usr/lib/systemd/systemd from unlink access on
the sock_file ansible-ssh-lago-basic-suite-master-host-1-22-root. For
complete SELinux messages run: sealert -l
d03a8655-9430-4fcf-9892-3b4df1939899
Mar 15 06:23:30 lago-basic-suite-master-engine setroubleshoot[58889]:
SELinux is preventing /usr/lib/systemd/systemd from unlink access on
the sock_file
ansible-ssh-lago-basic-suite-master-host-1-22-root.#012#012*****
Plugin catchall (100. confidence) suggests
**************************#012#012If you believe that systemd should
be allowed unlink access on the
ansible-ssh-lago-basic-suite-master-host-1-22-root sock_file by
default.#012Then you should report this as a bug.#012You can generate
a local policy module to allow this access.#012Do#012allow this access
for now by executing:#012# ausearch -c 'systemd' --raw | audit2allow
-M my-systemd#012# semodule -X 300 -i my-systemd.pp#012
Mar 15 06:23:30 lago-basic-suite-master-engine platform-python[59361]:
ansible-command Invoked with _raw_params=ovirt-log-collector --verbose
--batch --no-hypervisors --conf-file=/root/ovirt-log-collector.conf
_uses_shell=True warn=True stdin_add_newline=True
strip_empty_ends=True argv=None chdir=None executable=None
creates=None removes=None stdin=None
Any idea what might have caused this to start happening? Perhaps
a bug in ansible, or ansible-runner? It reminds me of [5].
Adding Dana and Martin.
I think [5] is quite a serious bug, btw, should be a 4.4.5 blocker.
Best regards,
[1]
https://gerrit.ovirt.org/c/ovirt-system-tests/+/113875
[2]
https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/159...
[3]
https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_night...
[4]
https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_night...
[5]
https://bugzilla.redhat.com/show_bug.cgi?id=1917707
>
> Best regards,
>
> [1]
https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_night...
>
> [2]
https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_night...
>
>
> --
> Didi
--
Didi