
On Wed, Feb 19, 2020 at 4:51 AM <jenkins@jenkins.phx.ovirt.org> wrote:
Project: https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-role-remote-suite-... Build: https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-role-remote-suite-... Build Number: 283 Build Status: Failure Triggered By: Started by timer
------------------------------------- Changes Since Last Success: ------------------------------------- Changes for Build #283 [Anton Marchukov] Added "sar" system resources collection on VMs.
[Yedidyah Bar David] Move ovirt-engine-extension-aaa-ldap master to stdci v2
[Gal Ben Haim] Fix the return value of update_upstream_sources
----------------- Failed Tests: ----------------- 1 tests failed. FAILED: 008_restart_he_vm.restart_he_vm
Error Message: 1 != 0 -------------------- >> begin captured logging << -------------------- lago.ssh: DEBUG: start task:02e64977-e3bd-4e7c-9084-61ddeaebb791:Get ssh client for lago-he-basic-role-remote-suite-4-3-host-1: lago.ssh: DEBUG: end task:02e64977-e3bd-4e7c-9084-61ddeaebb791:Get ssh client for lago-he-basic-role-remote-suite-4-3-host-1: lago.ssh: DEBUG: Running 56796ff6 on lago-he-basic-role-remote-suite-4-3-host-1: hosted-engine --vm-status --json lago.ssh: DEBUG: Command 56796ff6 on lago-he-basic-role-remote-suite-4-3-host-1 returned with 0 lago.ssh: DEBUG: Command 56796ff6 on lago-he-basic-role-remote-suite-4-3-host-1 output: {"1": {"conf_on_shared_storage": true, "live-data": true, "extra": "metadata_parse_version=1\nmetadata_feature_version=1\ntimestamp=4583 (Tue Feb 18 21:48:18 2020)\nhost-id=1\nscore=3400\nvm_conf_refresh_time=4584 (Tue Feb 18 21:48:19 2020)\nconf_on_shared_storage=True\nmaintenance=False\nstate=EngineUp\nstopped=False\n", "hostname": "lago-he-basic-role-remote-suite-4-3-host-0.lago.local", "host-id": 1, "engine-status": {"health": "good", "vm": "up", "detail": "Up"}, "score": 3400, "stopped": false, "maintenance": false, "crc32": "eda7a0ea", "local_conf_timestamp": 4584, "host-ts": 4583}, "2": {"conf_on_shared_storage": true, "live-data": true, "extra": "metadata_parse_version=1\nmetadata_feature_version=1\ntimestamp=4604 (Tue Feb 18 21:48:40 2020)\nhost-id=2\nscore=3400\nvm_conf_refresh_time=4605 (Tue Feb 18 21:48:40 2020)\nconf_on_shared_storage=True\nmaintenance=False\nstate=GlobalMaintenance\nstopped=False\n", "hostname": "lago-he-basic-role-remote-suite-4-3-host-1", "host-id": 2, "engine-status": {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}, "score": 3400, "stopped": false, "maintenance": false, "crc32": "23d07ed7", "local_conf_timestamp": 4605, "host-ts": 4604}, "global_maintenance": true}
root: INFO: Engine VM is on host lago-he-basic-role-remote-suite-4-3-host-0, restarting the VM root: INFO: Shutting down HE VM on host: lago-he-basic-role-remote-suite-4-3-host-0 lago.ssh: DEBUG: start task:90671f4b-c54e-4efe-b0c1-cfa47015a9db:Get ssh client for lago-he-basic-role-remote-suite-4-3-host-0: lago.ssh: DEBUG: end task:90671f4b-c54e-4efe-b0c1-cfa47015a9db:Get ssh client for lago-he-basic-role-remote-suite-4-3-host-0: lago.ssh: DEBUG: Running 57a55eee on lago-he-basic-role-remote-suite-4-3-host-0: hosted-engine --vm-shutdown lago.ssh: DEBUG: Command 57a55eee on lago-he-basic-role-remote-suite-4-3-host-0 returned with 0 root: INFO: Command succeeded root: INFO: Waiting for VM to be down... lago.ssh: DEBUG: start task:4b1d8dce-3c5b-4d53-b81f-1de8bb71eed7:Get ssh client for lago-he-basic-role-remote-suite-4-3-host-0: lago.ssh: DEBUG: end task:4b1d8dce-3c5b-4d53-b81f-1de8bb71eed7:Get ssh client for lago-he-basic-role-remote-suite-4-3-host-0: lago.ssh: DEBUG: Running 5901a70c on lago-he-basic-role-remote-suite-4-3-host-0: hosted-engine --vm-status --json lago.ssh: DEBUG: Command 5901a70c on lago-he-basic-role-remote-suite-4-3-host-0 returned with 1 lago.ssh: DEBUG: Command 5901a70c on lago-he-basic-role-remote-suite-4-3-host-0 output: The hosted engine configuration has not been retrieved from shared storage. Please ensure that ovirt-ha-agent is running and the storage server is reachable.
agent.log has: StatusStorageThread::ERROR::2020-02-18 21:48:30,263::status_broker::90::ovirt_hosted_engine_ha.broker.status_broker.StatusBroker.Update::(run) Failed to update state. Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/status_broker.py", line 82, in run if (self._status_broker._inquire_whiteboard_lock() or File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/status_broker.py", line 195, in _inquire_whiteboard_lock self.host_id, self._lease_file) SanlockException: (104, 'Sanlock lockspace inquire failure', 'Connection reset by peer') StatusStorageThread::ERROR::2020-02-18 21:48:30,300::status_broker::70::ovirt_hosted_engine_ha.broker.status_broker.StatusBroker.Update::(trigger_restart) Trying to restart the broker Another "Connection reset by peer", looks to me similar to the one reported a few days ago (with subject "[oVirt Jenkins] ovirt-system-tests_he-basic-scsi-suite-4.3 - Build # 350 - Failure!"). Are we ok with this? Stable 4.3 jobs failing with no clear and acceptable reason and no further handling? It seems like a communication issue, to me. Is anyone looking at it? Thanks, -- Didi