Adding Marcin who is investigating the issue
On Thu, Mar 28, 2019 at 11:56 AM Yedidyah Bar David <didi(a)redhat.com
<mailto:didi@redhat.com>> wrote:
Hi all,
I want to verify [1]. So I ran the manual job, basic suite 4.3 [2].
It failed [3] with $subject.
Right before that, verify_add_hosts did succeed, and took 59 seconds.
I gave a brief look at the code of verify_add_hosts, and it checks
that at least one host is UP.
The patch [1] *might* have caused ansible-host-deploy to take longer,
still not sure. In any case, ansible-host-deploy finished at 05:48:45
[5], 14 seconds before verify_add_hosts finished at 09:48:59 [4].
Can it be, that a host is considered UP (from the POV of
verify_add_hosts), but is still not ready for creating storage?
Yes, unfortunately from the beginning of oVirt the hosts will change
status to Up as soon as engine is able to communicate with it and only
afterwards additional actions like connect to storage are executed on
the host:
https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules...
And if host will fail any of those actions, its status will change to
NonOperational. But I'm not aware of any limitation which would the
1st host to prevent adding master storage domain.
Tal/Freddy any thoughts?
AFAIR Marcin told me, that he was not able to reproduce outside
Jenkins (both hosts were always on Up before adding master storage
domain), but in the failed Jenkins OST there was always available only
single host (the 2nd one was still installing).
Marcin, do you have any updates?
Unfortunately not yet. I will try to correlate logs from Didi's run with
mines as soon as my jenkins download finishes (12KB/s...)