Looping @Galit Rosenthal <grosenth(a)redhat.com>
Who is looking into it?
On Wed, Feb 19, 2020 at 9:58 AM Yedidyah Bar David <didi(a)redhat.com> wrote:
On Wed, Feb 19, 2020 at 4:51 AM <jenkins(a)jenkins.phx.ovirt.org>
wrote:
>
> Project:
https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-role-remote-sui...
> Build:
https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-role-remote-sui...
> Build Number: 283
> Build Status: Failure
> Triggered By: Started by timer
>
> -------------------------------------
> Changes Since Last Success:
> -------------------------------------
> Changes for Build #283
> [Anton Marchukov] Added "sar" system resources collection on VMs.
>
> [Yedidyah Bar David] Move ovirt-engine-extension-aaa-ldap master to
stdci v2
>
> [Gal Ben Haim] Fix the return value of update_upstream_sources
>
>
>
>
> -----------------
> Failed Tests:
> -----------------
> 1 tests failed.
> FAILED: 008_restart_he_vm.restart_he_vm
>
> Error Message:
> 1 != 0
> -------------------- >> begin captured logging << --------------------
> lago.ssh: DEBUG: start task:02e64977-e3bd-4e7c-9084-61ddeaebb791:Get ssh
client for lago-he-basic-role-remote-suite-4-3-host-1:
> lago.ssh: DEBUG: end task:02e64977-e3bd-4e7c-9084-61ddeaebb791:Get ssh
client for lago-he-basic-role-remote-suite-4-3-host-1:
> lago.ssh: DEBUG: Running 56796ff6 on
lago-he-basic-role-remote-suite-4-3-host-1: hosted-engine --vm-status --json
> lago.ssh: DEBUG: Command 56796ff6 on
lago-he-basic-role-remote-suite-4-3-host-1 returned with 0
> lago.ssh: DEBUG: Command 56796ff6 on
lago-he-basic-role-remote-suite-4-3-host-1 output:
> {"1": {"conf_on_shared_storage": true, "live-data":
true, "extra":
"metadata_parse_version=1\nmetadata_feature_version=1\ntimestamp=4583 (Tue
Feb 18 21:48:18 2020)\nhost-id=1\nscore=3400\nvm_conf_refresh_time=4584
(Tue Feb 18 21:48:19
2020)\nconf_on_shared_storage=True\nmaintenance=False\nstate=EngineUp\nstopped=False\n",
"hostname": "lago-he-basic-role-remote-suite-4-3-host-0.lago.local",
"host-id": 1, "engine-status": {"health": "good",
"vm": "up", "detail":
"Up"}, "score": 3400, "stopped": false,
"maintenance": false, "crc32":
"eda7a0ea", "local_conf_timestamp": 4584, "host-ts": 4583},
"2":
{"conf_on_shared_storage": true, "live-data": true,
"extra":
"metadata_parse_version=1\nmetadata_feature_version=1\ntimestamp=4604 (Tue
Feb 18 21:48:40 2020)\nhost-id=2\nscore=3400\nvm_conf_refresh_time=4605
(Tue Feb 18 21:48:40
2020)\nconf_on_shared_storage=True\nmaintenance=False\nstate=GlobalMaintenance\nstopped=False\n",
"hostname": "lago-he-basic-role-remote-suite-4-3-host-1",
"host-id": 2,
"engine-status": {"reason": "vm not running on this host",
"health": "bad",
"vm": "down", "detail": "unknown"},
"score": 3400, "stopped": false,
"maintenance": false, "crc32": "23d07ed7",
"local_conf_timestamp": 4605,
"host-ts": 4604}, "global_maintenance": true}
>
> root: INFO: Engine VM is on host
lago-he-basic-role-remote-suite-4-3-host-0, restarting the VM
> root: INFO: Shutting down HE VM on host:
lago-he-basic-role-remote-suite-4-3-host-0
> lago.ssh: DEBUG: start task:90671f4b-c54e-4efe-b0c1-cfa47015a9db:Get ssh
client for lago-he-basic-role-remote-suite-4-3-host-0:
> lago.ssh: DEBUG: end task:90671f4b-c54e-4efe-b0c1-cfa47015a9db:Get ssh
client for lago-he-basic-role-remote-suite-4-3-host-0:
> lago.ssh: DEBUG: Running 57a55eee on
lago-he-basic-role-remote-suite-4-3-host-0: hosted-engine --vm-shutdown
> lago.ssh: DEBUG: Command 57a55eee on
lago-he-basic-role-remote-suite-4-3-host-0 returned with 0
> root: INFO: Command succeeded
> root: INFO: Waiting for VM to be down...
> lago.ssh: DEBUG: start task:4b1d8dce-3c5b-4d53-b81f-1de8bb71eed7:Get ssh
client for lago-he-basic-role-remote-suite-4-3-host-0:
> lago.ssh: DEBUG: end task:4b1d8dce-3c5b-4d53-b81f-1de8bb71eed7:Get ssh
client for lago-he-basic-role-remote-suite-4-3-host-0:
> lago.ssh: DEBUG: Running 5901a70c on
lago-he-basic-role-remote-suite-4-3-host-0: hosted-engine --vm-status --json
> lago.ssh: DEBUG: Command 5901a70c on
lago-he-basic-role-remote-suite-4-3-host-0 returned with 1
> lago.ssh: DEBUG: Command 5901a70c on
lago-he-basic-role-remote-suite-4-3-host-0 output:
> The hosted engine configuration has not been retrieved from shared
storage. Please ensure that ovirt-ha-agent is running and the storage
server is reachable.
agent.log has:
StatusStorageThread::ERROR::2020-02-18
21:48:30,263::status_broker::90::ovirt_hosted_engine_ha.broker.status_broker.StatusBroker.Update::(run)
Failed to update state.
Traceback (most recent call last):
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/status_broker.py",
line 82, in run
if (self._status_broker._inquire_whiteboard_lock() or
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/status_broker.py",
line 195, in _inquire_whiteboard_lock
self.host_id, self._lease_file)
SanlockException: (104, 'Sanlock lockspace inquire failure',
'Connection reset by peer')
StatusStorageThread::ERROR::2020-02-18
21:48:30,300::status_broker::70::ovirt_hosted_engine_ha.broker.status_broker.StatusBroker.Update::(trigger_restart)
Trying to restart the broker
Another "Connection reset by peer", looks to me similar to the one
reported a few days ago (with subject "[oVirt Jenkins]
ovirt-system-tests_he-basic-scsi-suite-4.3 - Build # 350 - Failure!").
Are we ok with this? Stable 4.3 jobs failing with no clear and
acceptable reason and no further handling?
It seems like a communication issue, to me. Is anyone looking at it?
Thanks,
--
Didi
_______________________________________________
Infra mailing list -- infra(a)ovirt.org
To unsubscribe send an email to infra-leave(a)ovirt.org
Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/infra@ovirt.org/message/AU3FSIDH6RS...
--
Liora Milbaum
She - Her - Hers
Senior Principal Software Engineer, CNV DevOps
Red Hat <