On 16 Sep 2019, at 10:30, Milan Zamazal <mzamazal(a)redhat.com>
wrote:
Dusan Fodor <dfodor(a)redhat.com> writes:
> After even more investigation, root of issue seems to lie in vdsm receiving
> SIGTERM in the only host that is in state up [1]:
> *[vds] Received signal 15, shutting down (vdsmd:70)*
I see, thank you for looking into it and finding the signal. Can you
see in the logs what could cause this? Are Engine fencing attempts
issued before or after this signal? If it is not caused by Engine
fencing, is there anything in the system logs explaining that SIGTERM?
unrelated
Let's take the upcoming OST gating as an opportunity to fix that host
status flipping problem. It must be fixed before OST gating is enabled.
it seems rather infra-related to the initOnVdsUp() processing. Best for now would be to
wait a little and try again to check the Host status once it’s Up for the first time.
Thanks,
michal
> while the other host is still in status Installing (so it cannot be used
> for fencing- hence the fence action failure).
> The vdsm then goes back up in few moments, but engine, expecting the host
> is up all the time, meanwhile fails doing an operation that requires host
> to be up.
>
> [1]
>
https://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/15829/arti...
>
> On Fri, Sep 13, 2019 at 5:18 PM Dusan Fodor <dfodor(a)redhat.com> wrote:
>
>> For brave investigators, similar issue in later stage of the same test can
>> be found here [1]. Same symptom of fence action fail, but this time it
>> causes failure for adding storage itself:
>> *2019-09-12 09:53:32,571-04 ERROR
>> [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default
>> task-1) [] Operation Failed: [Cannot attach Storage. There is no active
>> Host in the Data Center.]*
>>
>> [1]
https://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/15821
>>
>> On Fri, Sep 13, 2019 at 5:09 PM Dusan Fodor <dfodor(a)redhat.com> wrote:
>>
>>> Hello all,
>>> lately i witnessed multiple failures for add_master_storage_domain test,
>>> which were not related to changes themselves, nor any infra issue. One
>>> example can be found here [1].
>>> After investigation with huge help of Milan, issue is that Host falls
>>> from up state to whatever-but-not-up suddenly.
>>>
>>>
>>> 1. add_storage_domain picks a random host that is in up state
>>> 2. meantime engine starts fence action for it, so probably something
>>> gone bad with the host; the fence action fails with:
>>> *[org.ovirt.engine.core.bll.pm.FenceProxyLocator]
>>> (EE-ManagedThreadFactory-engineScheduled-Thread-38) [6692895f] Can not run
>>> fence action on host 'lago-basic-suite-master-host-0', no suitable
proxy
>>> host was found.*
>>> 3. test fails on not being able to attach the domain to non-up
>>> host:
>>> *[org.ovirt.engine.api.restapi.resource.AbstractBackendResource]
>>> (default task-1) [] Operation Failed: [Cannot add storage server
connection
>>> when Host status is not up]*
>>>
>>> For better orientation in failed job's engine log [1], fence action for
>>> host fails at
>>> :46:12,842-04
>>> engine learns it cannot connect storage to host at
>>> :46:16,105-04
>>>
>>> The test itself add_master_storage_domain starts at ~ :46:13,753
>>> (according to lago log).
>>>
>>> Could you please check this?
>>> Thanks
>>>
>>> [1]
https://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/15829
>>> [2]
>>>
https://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/15829/arti...
>>>
>>>
> _______________________________________________
> Devel mailing list -- devel(a)ovirt.org
> To unsubscribe send an email to devel-leave(a)ovirt.org
> Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
>
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/MMH7DGCH24G...
_______________________________________________
Devel mailing list -- devel(a)ovirt.org
To unsubscribe send an email to devel-leave(a)ovirt.org
Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/KQY5JULWUDT...