Hello all,
lately i witnessed multiple failures for add_master_storage_domain test, which were not related to changes themselves, nor any infra issue. One example can be found here [1].
After investigation with huge help of Milan, issue is that Host falls from up state to whatever-but-not-up suddenly.
- add_storage_domain picks a random host that is in up state
- meantime engine starts fence action for it, so probably something gone bad with the host; the fence action fails with: [org.ovirt.engine.core.bll.pm.FenceProxyLocator] (EE-ManagedThreadFactory-engineScheduled-Thread-38) [6692895f] Can not run fence action on host 'lago-basic-suite-master-host-0', no suitable proxy host was found.
- test fails on not being able to attach the domain to non-up host: [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-1) [] Operation Failed: [Cannot add storage server connection when Host status is not up]
For better orientation in failed job's engine log [1], fence action for host fails at
:46:12,842-04
engine learns it cannot connect storage to host at
:46:16,105-04
The test itself add_master_storage_domain starts at ~ :46:13,753 (according to lago log).
Could you please check this?
Thanks