
On Wed, Mar 17, 2021 at 3:24 PM Michal Skrivanek < michal.skrivanek@redhat.com> wrote:
On 17. 3. 2021, at 13:53, Dana Elfassy <delfassy@redhat.com> wrote:
Adding +Marcin Sobczyk <msobczyk@redhat.com>
On Mon, Mar 15, 2021 at 9:59 AM Yedidyah Bar David <didi@redhat.com> wrote:
On Mon, Mar 15, 2021 at 7:55 AM Yedidyah Bar David <didi@redhat.com> wrote:
Hi all,
This started a few days ago [1] and randomly happens since then:
E DEBUG: Configuration: E DEBUG: command: collect E DEBUG: Traceback (most recent call last): E DEBUG: File "/usr/lib/python3.6/site-packages/ovirt_log_collector/__main__.py", line 2067, in <module> E DEBUG: '%s directory is not empty.' %
(conf["local_tmp_dir"])
E DEBUG: Exception: /dev/shm/log directory is not empty.ERROR: /dev/shm/log directory is not empty.non-zero return code
Michal tried to fix this by using a random directory but it still fails [2]:
DEBUG: command: collect DEBUG: Traceback (most recent call last): DEBUG: File "/usr/lib/python3.6/site-packages/ovirt_log_collector/__main__.py", line 2067, in <module> DEBUG: '%s directory is not empty.' % (conf["local_tmp_dir"]) DEBUG: Exception: /dev/shm/kaN7uY directory is not empty.ERROR: /dev/shm/kaN7uY directory is not empty.non-zero return code
Since I suppose that the randomness of mktemp is good enough, it must be something else. Also, the last successful run before [1] used the same OST git commit (same code), so I do not think it's something in OST's code.
Any idea?
I think I'll push a patch to create and use the directory right before calling ovirt-log-collector, which is probably better in other ways.
My patch [1] still fails, with a somewhat different error message, but this made me check further, and while I still do not understand, I have this to add:
In the failing runs, ovirt-log-collector is called *twice* in parallel. E.g. in [2] (the check-patch of [1]):
Mar 15 07:38:59 lago-basic-suite-master-engine platform-python[59099]: ansible-command Invoked with _raw_params=lctmp=$(mktemp -d -p /dev/shm); ovirt-log-collector --verbose --batch --no-hypervisors --local-tmp="${lctmp}" --conf-file=/root/ovirt-log-collector.conf _uses_shell=True warn=True stdin_add_newline=True strip_empty_ends=True argv=None chdir=None executable=None creates=None removes=None stdin=None Mar 15 07:38:59 lago-basic-suite-master-engine platform-python[59124]: ansible-command Invoked with _raw_params=lctmp=$(mktemp -d -p /dev/shm); ovirt-log-collector --verbose --batch --no-hypervisors --local-tmp="${lctmp}" --conf-file=/root/ovirt-log-collector.conf _uses_shell=True warn=True stdin_add_newline=True strip_empty_ends=True argv=None chdir=None executable=None creates=None removes=None stdin=None
It also generates two logs, which you can check/compare.
It's the same for previous ones, e.g. latest nightly [3][4]:
Mar 15 06:23:30 lago-basic-suite-master-engine platform-python[59343]: ansible-command Invoked with _raw_params=ovirt-log-collector --verbose --batch --no-hypervisors --conf-file=/root/ovirt-log-collector.conf _uses_shell=True warn=True stdin_add_newline=True strip_empty_ends=True argv=None chdir=None executable=None creates=None removes=None stdin=None Mar 15 06:23:30 lago-basic-suite-master-engine setroubleshoot[58889]: SELinux is preventing /usr/lib/systemd/systemd from unlink access on the sock_file ansible-ssh-lago-basic-suite-master-host-1-22-root. For complete SELinux messages run: sealert -l d03a8655-9430-4fcf-9892-3b4df1939899 Mar 15 06:23:30 lago-basic-suite-master-engine setroubleshoot[58889]: SELinux is preventing /usr/lib/systemd/systemd from unlink access on the sock_file ansible-ssh-lago-basic-suite-master-host-1-22-root.#012#012***** Plugin catchall (100. confidence) suggests **************************#012#012If you believe that systemd should be allowed unlink access on the ansible-ssh-lago-basic-suite-master-host-1-22-root sock_file by default.#012Then you should report this as a bug.#012You can generate a local policy module to allow this access.#012Do#012allow this access for now by executing:#012# ausearch -c 'systemd' --raw | audit2allow -M my-systemd#012# semodule -X 300 -i my-systemd.pp#012 Mar 15 06:23:30 lago-basic-suite-master-engine platform-python[59361]: ansible-command Invoked with _raw_params=ovirt-log-collector --verbose --batch --no-hypervisors --conf-file=/root/ovirt-log-collector.conf _uses_shell=True warn=True stdin_add_newline=True strip_empty_ends=True argv=None chdir=None executable=None creates=None removes=None stdin=None
Any idea what might have caused this to start happening? Perhaps a bug in ansible, or ansible-runner? It reminds me of [5]. Adding Dana and Martin.
I think [5] is quite a serious bug, btw, should be a 4.4.5 blocker.
it’s from January and tehre are no comments there. Dana, any update? It does look serious but perhaps not really hit in real world scenarios?
It's reproducible outside OST, Dana is investigating intensively last 2 weeks, but so far we haven't been able to find where events are multiplied. But AFAIK this issue is visble only with ansible-runner-service, respectively only in the part when engine fetches events from ARS and create audit_log records from them. It doesn't mean, that playbook tasks are executed multiple times! OST is using ansible-runner directly, so it doesn't seem to me like the same issue So [5] is not a blocker, it's just really annoying if you need to works with audit_log records
Best regards,
[1] https://gerrit.ovirt.org/c/ovirt-system-tests/+/113875
[2] https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/15980/...
[3] https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_nightly/...
[4] https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_nightly/...
[5] https://bugzilla.redhat.com/show_bug.cgi?id=1917707
Best regards,
[1]
https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_nightly/...
[2]
https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_nightly/...
-- Didi
-- Didi
_______________________________________________ Infra mailing list -- infra@ovirt.org To unsubscribe send an email to infra-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/infra@ovirt.org/message/PSO5IRX73FSH2R...
_______________________________________________ Devel mailing list -- devel@ovirt.org To unsubscribe send an email to devel-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/devel@ovirt.org/message/6DZRAMSAJCXNSM...
-- Martin Perina Manager, Software Engineering Red Hat Czech s.r.o.