On Wed, Feb 19, 2020 at 4:51 AM <jenkins@jenkins.phx.ovirt.org> wrote:
>
> Project: https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-role-remote-suite-4.3/
> Build: https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-role-remote-suite-4.3/283/
> Build Number: 283
> Build Status: Failure
> Triggered By: Started by timer
>
> -------------------------------------
> Changes Since Last Success:
> -------------------------------------
> Changes for Build #283
> [Anton Marchukov] Added "sar" system resources collection on VMs.
>
> [Yedidyah Bar David] Move ovirt-engine-extension-aaa-ldap master to stdci v2
>
> [Gal Ben Haim] Fix the return value of update_upstream_sources
>
>
>
>
> -----------------
> Failed Tests:
> -----------------
> 1 tests failed.
> FAILED: 008_restart_he_vm.restart_he_vm
>
> Error Message:
> 1 != 0
> -------------------- >> begin captured logging << --------------------
> lago.ssh: DEBUG: start task:02e64977-e3bd-4e7c-9084-61ddeaebb791:Get ssh client for lago-he-basic-role-remote-suite-4-3-host-1:
> lago.ssh: DEBUG: end task:02e64977-e3bd-4e7c-9084-61ddeaebb791:Get ssh client for lago-he-basic-role-remote-suite-4-3-host-1:
> lago.ssh: DEBUG: Running 56796ff6 on lago-he-basic-role-remote-suite-4-3-host-1: hosted-engine --vm-status --json
> lago.ssh: DEBUG: Command 56796ff6 on lago-he-basic-role-remote-suite-4-3-host-1 returned with 0
> lago.ssh: DEBUG: Command 56796ff6 on lago-he-basic-role-remote-suite-4-3-host-1 output:
> {"1": {"conf_on_shared_storage": true, "live-data": true, "extra": "metadata_parse_version=1\nmetadata_feature_version=1\ntimestamp=4583 (Tue Feb 18 21:48:18 2020)\nhost-id=1\nscore=3400\nvm_conf_refresh_time=4584 (Tue Feb 18 21:48:19 2020)\nconf_on_shared_storage=True\nmaintenance=False\nstate=EngineUp\nstopped=False\n", "hostname": "lago-he-basic-role-remote-suite-4-3-host-0.lago.local", "host-id": 1, "engine-status": {"health": "good", "vm": "up", "detail": "Up"}, "score": 3400, "stopped": false, "maintenance": false, "crc32": "eda7a0ea", "local_conf_timestamp": 4584, "host-ts": 4583}, "2": {"conf_on_shared_storage": true, "live-data": true, "extra": "metadata_parse_version=1\nmetadata_feature_version=1\ntimestamp=4604 (Tue Feb 18 21:48:40 2020)\nhost-id=2\nscore=3400\nvm_conf_refresh_time=4605 (Tue Feb 18 21:48:40 2020)\nconf_on_shared_storage=True\nmaintenance=False\nstate=GlobalMaintenance\nstopped=False\n", "hostname": "lago-he-basic-role-remote-suite-4-3-host-1", "host-id": 2, "engine-status": {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}, "score": 3400, "stopped": false, "maintenance": false, "crc32": "23d07ed7", "local_conf_timestamp": 4605, "host-ts": 4604}, "global_maintenance": true}
>
> root: INFO: Engine VM is on host lago-he-basic-role-remote-suite-4-3-host-0, restarting the VM
> root: INFO: Shutting down HE VM on host: lago-he-basic-role-remote-suite-4-3-host-0
> lago.ssh: DEBUG: start task:90671f4b-c54e-4efe-b0c1-cfa47015a9db:Get ssh client for lago-he-basic-role-remote-suite-4-3-host-0:
> lago.ssh: DEBUG: end task:90671f4b-c54e-4efe-b0c1-cfa47015a9db:Get ssh client for lago-he-basic-role-remote-suite-4-3-host-0:
> lago.ssh: DEBUG: Running 57a55eee on lago-he-basic-role-remote-suite-4-3-host-0: hosted-engine --vm-shutdown
> lago.ssh: DEBUG: Command 57a55eee on lago-he-basic-role-remote-suite-4-3-host-0 returned with 0
> root: INFO: Command succeeded
> root: INFO: Waiting for VM to be down...
> lago.ssh: DEBUG: start task:4b1d8dce-3c5b-4d53-b81f-1de8bb71eed7:Get ssh client for lago-he-basic-role-remote-suite-4-3-host-0:
> lago.ssh: DEBUG: end task:4b1d8dce-3c5b-4d53-b81f-1de8bb71eed7:Get ssh client for lago-he-basic-role-remote-suite-4-3-host-0:
> lago.ssh: DEBUG: Running 5901a70c on lago-he-basic-role-remote-suite-4-3-host-0: hosted-engine --vm-status --json
> lago.ssh: DEBUG: Command 5901a70c on lago-he-basic-role-remote-suite-4-3-host-0 returned with 1
> lago.ssh: DEBUG: Command 5901a70c on lago-he-basic-role-remote-suite-4-3-host-0 output:
> The hosted engine configuration has not been retrieved from shared storage. Please ensure that ovirt-ha-agent is running and the storage server is reachable.
agent.log has:
StatusStorageThread::ERROR::2020-02-18
21:48:30,263::status_broker::90::ovirt_hosted_engine_ha.broker.status_broker.StatusBroker.Update::(run)
Failed to update state.
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/status_broker.py",
line 82, in run
if (self._status_broker._inquire_whiteboard_lock() or
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/status_broker.py",
line 195, in _inquire_whiteboard_lock
self.host_id, self._lease_file)
SanlockException: (104, 'Sanlock lockspace inquire failure',
'Connection reset by peer')
StatusStorageThread::ERROR::2020-02-18
21:48:30,300::status_broker::70::ovirt_hosted_engine_ha.broker.status_broker.StatusBroker.Update::(trigger_restart)
Trying to restart the broker
Another "Connection reset by peer", looks to me similar to the one
reported a few days ago (with subject "[oVirt Jenkins]
ovirt-system-tests_he-basic-scsi-suite-4.3 - Build # 350 - Failure!").
Are we ok with this? Stable 4.3 jobs failing with no clear and
acceptable reason and no further handling?
It seems like a communication issue, to me. Is anyone looking at it?
Thanks,
--
Didi
_______________________________________________
Infra mailing list -- infra@ovirt.org
To unsubscribe send an email to infra-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/infra@ovirt.org/message/AU3FSIDH6RSLX2PYXTPXJ4EQXCPENWEN/
Liora Milbaum
She - Her - Hers
Senior Principal Software Engineer, CNV DevOps
Israel
lmilbaum@redhat.com M: 054-656-0051