On Wed, Jul 15, 2020 at 8:14 AM AK via Users <users(a)ovirt.org>
wrote:
>
> Yes sir, I run the clean up script after each failure, clean out the gluster volume,
and remove any network the deploy scripts create. I just conducted the deployment on
different hardware (different drives, different CPU, raid controller, SSD's) and it
produced the same result (failure at OVF_STore_check). The only deployment items that are
consistent are creating the physical network bonds and gluster volumes which can be
mounted across the network and have been tested as storage pools for other virtualization
and storage platforms.
Can you please check engine-side logs? If you can access the engine VM
(search hosted-engine logs for local_vm_ip if it's still on the local
network), check /var/log/ovirt-engine/*, otherwise, on the host,
/var/log/ovirt-hosted-engine-setup/engine-logs*/*.
That said, we also have (what seems like) a very similar failure on
CI, for some time now - check e.g. the latest nightly run:
https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/1672/
https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/16...
2020-07-14 22:59:42,414-0400 INFO ansible task start {'status': 'OK',
'ansible_type': 'task', 'ansible_playbook':
'/usr/share/ovirt-hosted-engine-setup/ansible/trigger_role.yml',
'ansible_task': 'ovirt.hosted_engine_setup : Check OVF_STORE volume
status'}
It tries some time, eventually fails, like your case. engine log has:
https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/16...
2020-07-14 22:57:03,197-04 INFO
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(default task-1) [4abbccc] EVENT_ID: USER_VDC_LOGOUT(31), User
admin@internal-authz connected from '192.168.222.1' using session
'W5qdcPNyRLHmMnbMz7i+ZP85De1GjKq7+V1hqbKEeD+QJtpcFGpITEVFIHbUvz+2wF+GTAB6qnCY1gHxBHkGLA=='
logged out.
2020-07-14 22:57:03,242-04 ERROR
[org.ovirt.engine.core.vdsbroker.irsbroker.UploadStreamVDSCommand]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-29)
[313eed07] Command 'UploadStreamVDSCommand(HostName =
lago-he-basic-suite-master-host-0.lago.local,
UploadStreamVDSCommandParameters:{hostId='c6d33fd9-5137-49fc-815a-94baf2d58b93'})'
execution failed: javax.net.ssl.SSLPeerUnverifiedException:
Certificate for <lago-he-basic-suite-master-host-0.lago.local> doesn't
match any of the subject alternative names:
[lago-he-basic-suite-master-host-0.lago.local]
This is currently discussed on the devel list, in thread:
execution failed: javax.net.ssl.SSLPeerUnverifiedException (was:
[ovirt-devel] vdsm.storage.exception.UnknownTask: Task id unknown
(was: [oVirt Jenkins] ovirt-system-tests_he-basic-suite-master - Build
# 1641 - Still Failing!))
We are still not sure about the exact cause, but I have a feeling that
it's somehow related to naming/name resolution/hostname/etc.
In any case, I didn't manage to reproduce this locally on my own machine.
I suggest checking everything you can think of related to this -
dhcp/dns, output of 'hostname' on the host, etc.
Good luck and best regards,
The javax.net.ssl.SSLPeerUnverifiedException may be caused by a recent
regression in HttpClient:
This only affects domains which are not publicly accessible, e.g. .local
or .test. According to the bug report it was fixed upstream.
Best regards,
Alistair.