Didi,

thanks for the reply and appreciate the help.  I was/am unable to get to the engine however I have collected and attached all the setup logs from the host.  After looking at the thread you suggested, I will change the hostname of the engine, try an install and report back.  Again thanks for the help .

Andy

On Wednesday, July 15, 2020, 3:40:31 AM EDT, Yedidyah Bar David <didi@redhat.com> wrote:


On Wed, Jul 15, 2020 at 8:14 AM AK via Users <users@ovirt.org> wrote:

>
> Yes sir,  I run the clean up script after each failure, clean out the gluster volume, and remove any network the deploy scripts create.  I just conducted the deployment on different hardware (different drives, different CPU, raid controller, SSD's) and it produced the same result (failure at OVF_STore_check).  The only deployment items that are consistent are creating the physical network bonds and gluster volumes which can be mounted across the network and have been tested as storage pools for other virtualization and storage platforms.


Can you please check engine-side logs? If you can access the engine VM
(search hosted-engine logs for local_vm_ip if it's still on the local
network), check /var/log/ovirt-engine/*, otherwise, on the host,
/var/log/ovirt-hosted-engine-setup/engine-logs*/*.

That said, we also have (what seems like) a very similar failure on
CI, for some time now - check e.g. the latest nightly run:

https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/1672/

https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/1672/artifact/exported-artifacts/test_logs/he-basic-suite-master/post-he_deploy/lago-he-basic-suite-master-host-0/_var_log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-ansible-create_target_vm-20200714225605-ueg6k8.log

2020-07-14 22:59:42,414-0400 INFO ansible task start {'status': 'OK',
'ansible_type': 'task', 'ansible_playbook':
'/usr/share/ovirt-hosted-engine-setup/ansible/trigger_role.yml',
'ansible_task': 'ovirt.hosted_engine_setup : Check OVF_STORE volume
status'}

It tries some time, eventually fails, like your case. engine log has:

https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/1672/artifact/exported-artifacts/test_logs/he-basic-suite-master/post-he_deploy/lago-he-basic-suite-master-host-0/_var_log/ovirt-hosted-engine-setup/engine-logs-2020-07-15T03%3A04%3A29Z/ovirt-engine/engine.log

2020-07-14 22:57:03,197-04 INFO
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(default task-1) [4abbccc] EVENT_ID: USER_VDC_LOGOUT(31), User
admin@internal-authz connected from '192.168.222.1' using session
'W5qdcPNyRLHmMnbMz7i+ZP85De1GjKq7+V1hqbKEeD+QJtpcFGpITEVFIHbUvz+2wF+GTAB6qnCY1gHxBHkGLA=='
logged out.
2020-07-14 22:57:03,242-04 ERROR
[org.ovirt.engine.core.vdsbroker.irsbroker.UploadStreamVDSCommand]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-29)
[313eed07] Command 'UploadStreamVDSCommand(HostName =
lago-he-basic-suite-master-host-0.lago.local,
UploadStreamVDSCommandParameters:{hostId='c6d33fd9-5137-49fc-815a-94baf2d58b93'})'
execution failed: javax.net.ssl.SSLPeerUnverifiedException:
Certificate for <lago-he-basic-suite-master-host-0.lago.local> doesn't
match any of the subject alternative names:
[lago-he-basic-suite-master-host-0.lago.local]

This is currently discussed on the devel list, in thread:

    execution failed: javax.net.ssl.SSLPeerUnverifiedException (was:
[ovirt-devel] vdsm.storage.exception.UnknownTask: Task id unknown
(was: [oVirt Jenkins] ovirt-system-tests_he-basic-suite-master - Build
# 1641 - Still Failing!))

We are still not sure about the exact cause, but I have a feeling that
it's somehow related to naming/name resolution/hostname/etc.

In any case, I didn't manage to reproduce this locally on my own machine.

I suggest checking everything you can think of related to this -
dhcp/dns, output of 'hostname' on the host, etc.

Good luck and best regards,
--
Didi