[ovirt-users] Re: hosted engine not reachable

Monday, 20 August 2018

To update I opened a bug report
https://bugzilla.redhat.com/show_bug.cgi?id=1618410

One cannot mask firewalld otherwise hosted-engine deploy will fail in this
manner.

Would be nice if a check would tell user to unmask firewalld rather than
fail in this way.

Thanks,

Douglas Duckworth, MSc, LFCS
HPC System Administrator
Scientific Computing Unit
Weill Cornell Medicine
E: doug(a)med.cornell.edu
O: 212-746-6305
F: 212-746-8690

On Thu, Aug 16, 2018 at 7:57 AM, Douglas Duckworth <dod2014(a)med.cornell.edu&gt;
wrote:

...
 I cannot get past this task in "/usr/share/ovirt-hosted-engin
 e-setup/ansible/bootstrap_local_vm.yml"

       - name: Add host
         ovirt_hosts:
           # TODO: add to the first cluster of the datacenter
           # where we set the vlan id
           name: "{{ HOST_NAME }}"
           state: present
           public_key: true
           address: "{{ HOST_ADDRESS }}"
           auth: "{{ ovirt_auth }}"
         async: 1
         poll: 0
       - name: Wait for the host to be up
         ovirt_hosts_facts:
           pattern: name={{ HOST_NAME }}
           auth: "{{ ovirt_auth }}"
         register: host_result_up_check
         until: host_result_up_check is succeeded and
 host_result_up_check.ansible_facts.ovirt_hosts|length >= 1 and
 (host_result_up_check.ansible_facts.ovirt_hosts[0].status == 'up' or
 host_result_up_check.ansible_facts.ovirt_hosts[0].status ==
 'non_operational')
         retries: 120
         delay: 5
       - debug: var=host_result_up_check
       - name: Check host status
         fail:
           msg: >
             The host has been set in non_operational status,
             please check engine logs,
             fix accordingly and re-deploy.
         when: host_result_up_check is succeeded and
 host_result_up_check.ansible_facts.ovirt_hosts|length >= 1 and
 host_result_up_check.ansible_facts.ovirt_hosts[0].status ==
 'non_operational'

 The error:

 [ INFO  ] TASK [Wait for the host to be up]
 [ ERROR ] fatal: [localhost]: FAILED! => {"ansible_facts":
{"ovirt_hosts":
 [{"address": "ovirt-hv1.pbtech", "affinity_labels": [],
"auto_numa_status":
 "unknown", "certificate": {"organization":
"pbtech", "subject":
 "O=pbtech,CN=ovirt-hv1.
 pbtech"}, "cluster": {"href":
"/ovirt-engine/api/clusters/a4
 b6cd02-a0ef-11e8-a347-00163e54fb7f", "id":
"a4b6cd02-a0ef-11e8-a347-00163e54fb7f"},
 "comment": "", "cpu": {"speed": 0.0,
"topology": {}}, "device_passthrough":
 {"enabled": false}
 , "devices": [], "external_network_provider_configurations": [],
 "external_status": "ok", "hardware_information":
{"supported_rng_sources":
 []}, "hooks": [], "href": "/ovirt-engine/api/hosts/609e7
 eba-8b85-4830-9a5f-99e561bb503a", "id": "6
 09e7eba-8b85-4830-9a5f-99e561bb503a", "katello_errata": [],
 "kdump_status": "unknown", "ksm": {"enabled":
false},
 "max_scheduling_memory": 0, "memory": 0, "name":
"ovirt-hv1.pbtech",
 "network_attachments": [], "nics": [], "numa_nodes": []
 , "numa_supported": false, "os": {"custom_kernel_cmdline":
""},
 "permissions": [], "port": 54321, "power_management":
 {"automatic_pm_enabled": true, "enabled": false,
"kdump_detection": true,
 "pm_proxies": []}, "protocol": "stomp", "se_li
 nux": {}, "spm": {"priority": 5, "status":
"none"}, "ssh": {"fingerprint":
 "SHA256:X+3GNzNZ09Ct7xt6T3sEgVGecyG3QjG71h+D6RnYZU8", "port": 22},
 "statistics": [], "status": "install_failed",
"storage_connection_extensions":
 [], "summary": {"
 total": 0}, "tags": [], "transparent_huge_pages":
{"enabled": false},
 "type": "rhel", "unmanaged_networks": [],
"update_available": false}]},
 "attempts": 120, "changed": false}
 [ INFO  ] TASK [Fetch logs from the engine VM]

 Though the VM's up:

 [root@ovirt-hv1 tmp]# ping ovirt-engine.pbtech
 PING ovirt-engine.pbtech (192.168.122.69) 56(84) bytes of data.
 64 bytes from ovirt-engine.pbtech (192.168.122.69): icmp_seq=1 ttl=64
 time=0.186 ms
 64 bytes from ovirt-engine.pbtech (192.168.122.69): icmp_seq=2 ttl=64
 time=0.153 ms

 root@ovirt-hv1 tmp]# wget --no-check-certificate
 https://ovirt-engine.pbtech/ovirt-engine/api
 --2018-08-16 07:44:36--  https://ovirt-engine.pbtech/ovirt-engine/api
 Resolving ovirt-engine.pbtech (ovirt-engine.pbtech)... 192.168.122.69

 Connecting to ovirt-engine.pbtech (ovirt-engine.pbtech)|192.168.122.69|:443...
 connected.
 WARNING: cannot verify ovirt-engine.pbtech's certificate, issued by
 ‘/C=US/O=pbtech/CN=ovirt-engine.pbtech.84693’:
   Self-signed certificate encountered.
 HTTP request sent, awaiting response... 401 Unauthorized

 I running oVirt 4.2.3-1 having reinstalled several times.  Skipping the
 above Ansible task in't a viable workaround.

 Here are networks on the host.  Note, em1 has ovirtmgmt bridge whereas
 ib0 provides NFS storage domain.

 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group
 default qlen 1000
     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
     inet 127.0.0.1/8 scope host lo
        valid_lft forever preferred_lft forever
 2: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master
 ovirtmgmt state UP group default qlen 1000
     link/ether 50:9a:4c:89:c6:bd brd ff:ff:ff:ff:ff:ff
 3: em2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default
 qlen 1000
     link/ether 50:9a:4c:89:c6:be brd ff:ff:ff:ff:ff:ff
 4: p1p1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group
 default qlen 1000
     link/ether b4:96:91:13:ee:68 brd ff:ff:ff:ff:ff:ff
 5: p1p2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group
 default qlen 1000
     link/ether b4:96:91:13:ee:6a brd ff:ff:ff:ff:ff:ff
 6: idrac: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
 state UNKNOWN group default qlen 1000
     link/ether 50:9a:4c:89:c6:c0 brd ff:ff:ff:ff:ff:ff
     inet 169.254.0.2/16 brd 169.254.255.255 scope global idrac
        valid_lft forever preferred_lft forever
 7: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc mq state UP
 group default qlen 256
     link/infiniband a0:00:02:08:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:1d:19:e1
 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
     inet 172.16.0.204/24 brd 172.16.0.255 scope global ib0
        valid_lft forever preferred_lft forever
 8: virbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state
 UP group default qlen 1000
     link/ether 52:54:00:78:d1:c5 brd ff:ff:ff:ff:ff:ff
     inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
        valid_lft forever preferred_lft forever
 9: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master
 virbr0 state DOWN group default qlen 1000
     link/ether 52:54:00:78:d1:c5 brd ff:ff:ff:ff:ff:ff
 41: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
 state UP group default qlen 1000
     link/ether 50:9a:4c:89:c6:bd brd ff:ff:ff:ff:ff:ff
     inet 10.0.0.176/16 brd 10.0.255.255 scope global ovirtmgmt
        valid_lft forever preferred_lft forever
 42: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
 group default qlen 1000
     link/ether 5e:ac:28:79:c9:0e brd ff:ff:ff:ff:ff:ff
 43: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group
 default qlen 1000
     link/ether 62:a8:d5:20:26:88 brd ff:ff:ff:ff:ff:ff
 44: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group
 default qlen 1000
     link/ether ea:41:13:ce:b6:4e brd ff:ff:ff:ff:ff:ff
 48: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
 master virbr0 state UNKNOWN group default qlen 1000
     link/ether fe:16:3e:54:fb:7f brd ff:ff:ff:ff:ff:ff

 default via 10.0.0.52 dev ovirtmgmt
 10.0.0.0/16 dev ovirtmgmt proto kernel scope link src 10.0.0.176
 169.254.0.0/16 dev idrac proto kernel scope link src 169.254.0.2
 169.254.0.0/16 dev ib0 scope link metric 1007
 169.254.0.0/16 dev ovirtmgmt scope link metric 1041
 172.16.0.0/24 dev ib0 proto kernel scope link src 172.16.0.204
 192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1

 The oVirt engine has been attached.

 Thank you!

 Thanks,

 Douglas Duckworth, MSc, LFCS
 HPC System Administrator
 Scientific Computing Unit
 Weill Cornell Medicine
 E: doug(a)med.cornell.edu
 O: 212-746-6305
 F: 212-746-8690

 On Wed, Aug 15, 2018 at 4:37 PM, Douglas Duckworth <
 dod2014(a)med.cornell.edu&gt; wrote:

> Eventually failed.
>
> I am running CentOS 7.5 on the host.  After re-reading documentation it
> seems that my /var partition might not be large enough, as it's only 30GB,
> but no warning message indicating that's an issue.
>
> Thanks,
>
> Douglas Duckworth, MSc, LFCS
> HPC System Administrator
> Scientific Computing Unit
> Weill Cornell Medicine
> E: doug(a)med.cornell.edu
> O: 212-746-6305
> F: 212-746-8690
>
> On Wed, Aug 15, 2018 at 2:10 PM, Douglas Duckworth <
> dod2014(a)med.cornell.edu&gt; wrote:
>
>> Ok the ansible engine-deploy now seems to be stuck and same step:
>>
>> [ INFO  ] TASK [Force host-deploy in offline mode]
>> [ INFO  ] ok: [localhost]
>> [ INFO  ] TASK [Add host]
>> [ INFO  ] changed: [localhost]
>> [ INFO  ] TASK [Wait for the host to be up]
>>
>> On the hypervisor in syslog I see:
>>
>> Aug 15 14:09:26 ovirt-hv1 python: ansible-ovirt_hosts_facts Invoked with
>> pattern=name=ovirt-hv1.pbtech fetch_nested=False nested_attributes=[]
>> auth={'timeout': 0, 'url':
'https://ovirt-engine.pbtech/ovirt-engine/api
>> ',
>>
>> Within the VM, which I can access over virtual machine network, I see:
>>
>> Aug 15 18:08:06 ovirt-engine python: 192.168.122.69 - - [15/Aug/2018
>> 14:08:06] "GET /v2.0/networks HTTP/1.1" 200 -
>> Aug 15 18:08:11 ovirt-engine ovsdb-server: ovs|00008|stream_ssl|WARN|SSL_read:
>> system error (Connection reset by peer)
>> Aug 15 18:08:11 ovirt-engine ovsdb-server: ovs|00009|jsonrpc|WARN|ssl:
>> 127.0.0.1:50356: receive error: Connection reset by peer
>> Aug 15 18:08:11 ovirt-engine ovsdb-server: ovs|00010|reconnect|WARN|ssl:
>> 127.0.0.1:50356: connection dropped (Connection reset by peer)
>>
>> Thanks,
>>
>> Douglas Duckworth, MSc, LFCS
>> HPC System Administrator
>> Scientific Computing Unit
>> Weill Cornell Medicine
>> E: doug(a)med.cornell.edu
>> O: 212-746-6305
>> F: 212-746-8690
>>
>> On Wed, Aug 15, 2018 at 1:21 PM, Douglas Duckworth <
>> dod2014(a)med.cornell.edu&gt; wrote:
>>
>>> Same VDSM error
>>>
>>> This is the state shown by service after the failed state messages:
>>>
>>> ● vdsmd.service - Virtual Desktop Server Manager
>>>    Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled;
>>> vendor preset: enabled)
>>>    Active: active (running) since Wed 2018-08-15 13:07:48 EDT; 4min 10s
>>> ago
>>>  Main PID: 18378 (vdsmd)
>>>     Tasks: 56
>>>    CGroup: /system.slice/vdsmd.service
>>>            ├─18378 /usr/bin/python2 /usr/share/vdsm/vdsmd
>>>            ├─18495 /usr/libexec/ioprocess --read-pipe-fd 45
>>> --write-pipe-fd 44 --max-threads 10 --max-queued-requests 10
>>>            ├─18504 /usr/libexec/ioprocess --read-pipe-fd 53
>>> --write-pipe-fd 51 --max-threads 10 --max-queued-requests 10
>>>            └─20825 /usr/libexec/ioprocess --read-pipe-fd 60
>>> --write-pipe-fd 59 --max-threads 10 --max-queued-requests 10
>>>
>>> Aug 15 13:07:49 ovirt-hv1.pbtech vdsm[18378]: WARN Not ready yet,
>>> ignoring event
'|virt|VM_status|c5463d87-c964-4430-9fdb-0e97d56cf812'
>>> args={'c5463d87-c964-4430-9fdb-0e97d56cf812': {'status':
'Up',
>>> 'displayInfo': [{'tlsPort': '-1',
'ipAddress': '0', 'type': 'vnc', 'port':
>>> '5900'}], 'hash': '6802750603520244794',
'cpuUser': '0.00',
>>> 'monitorResponse': '0', 'cpuUsage': '0.00',
'elapsedTime': '124', 'cpuSys':
>>> '0.00', 'vcpuPeriod': 100000L, 'timeOffset':
'0', 'clientIp': '',
>>> 'pauseCode': 'NOERR', 'vcpuQuota': '-1'}}
>>> Aug 15 13:07:49 ovirt-hv1.pbtech vdsm[18378]: WARN MOM not available.
>>> Aug 15 13:07:49 ovirt-hv1.pbtech vdsm[18378]: WARN MOM not available,
>>> KSM stats will be missing.
>>> Aug 15 13:07:49 ovirt-hv1.pbtech vdsm[18378]: ERROR failed to retrieve
>>> Hosted Engine HA score '[Errno 2] No such file or directory'Is the
Hosted
>>> Engine setup finished?
>>> Aug 15 13:07:50 ovirt-hv1.pbtech vdsm[18378]: WARN Not ready yet,
>>> ignoring event
'|virt|VM_status|c5463d87-c964-4430-9fdb-0e97d56cf812'
>>> args={'c5463d87-c964-4430-9fdb-0e97d56cf812': {'status':
'Up',
>>> 'username': 'Unknown', 'memUsage': '40',
'guestFQDN': '', 'memoryStats':
>>> {'swap_out': '0', 'majflt': '0',
'mem_cached': '772684', 'mem_free':
>>> '1696572', 'mem_buffers': '9348', 'swap_in':
'0', 'pageflt': '3339',
>>> 'mem_total': '3880652', 'mem_unused':
'1696572'}, 'session': 'Unknown',
>>> 'netIfaces': [], 'guestCPUCount': -1, 'appsList': (),
'guestIPs': '',
>>> 'disksUsage': []}}
>>> Aug 15 13:08:04 ovirt-hv1.pbtech vdsm[18378]: ERROR failed to retrieve
>>> Hosted Engine HA score '[Errno 2] No such file or directory'Is the
Hosted
>>> Engine setup finished?
>>> Aug 15 13:08:16 ovirt-hv1.pbtech vdsm[18378]: WARN File:
>>>
/var/lib/libvirt/qemu/channels/c5463d87-c964-4430-9fdb-0e97d56cf812.com.redhat.rhevm.vdsm
>>> already removed
>>> Aug 15 13:08:16 ovirt-hv1.pbtech vdsm[18378]: WARN File:
>>>
/var/lib/libvirt/qemu/channels/c5463d87-c964-4430-9fdb-0e97d56cf812.org.qemu.guest_agent.0
>>> already removed
>>> Aug 15 13:08:16 ovirt-hv1.pbtech vdsm[18378]: WARN File:
>>> /var/run/ovirt-vmconsole-console/c5463d87-c964-4430-9fdb-0e97d56cf812.sock
>>> already removed
>>> Aug 15 13:08:19 ovirt-hv1.pbtech vdsm[18378]: ERROR failed to retrieve
>>> Hosted Engine HA score '[Errno 2] No such file or directory'Is the
Hosted
>>> Engine setup finished?
>>>
>>> Note 'ipAddress': '0' though I see IP was leased out via DHCP
server:
>>>
>>> Aug 15 13:05:55 server dhcpd: DHCPACK on 10.0.0.178 to
>>> 00:16:3e:54:fb:7f via em1
>>>
>>> While I can ping it from my NFS server which provides storage domain:
>>>
>>> 64 bytes from ovirt-hv1.pbtech (10.0.0.176): icmp_seq=1 ttl=64
>>> time=0.253 ms
>>>
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Douglas Duckworth, MSc, LFCS
>>> HPC System Administrator
>>> Scientific Computing Unit
>>> Weill Cornell Medicine
>>> E: doug(a)med.cornell.edu
>>> O: 212-746-6305
>>> F: 212-746-8690
>>>
>>> On Wed, Aug 15, 2018 at 12:50 PM, Douglas Duckworth <
>>> dod2014(a)med.cornell.edu&gt; wrote:
>>>
>>>> Ok
>>>>
>>>> I was now able to get to the step:
>>>>
>>>> Engine replied: DB Up!Welcome to Health Status!
>>>>
>>>> By removing a bad entry from /etc/hosts for ovirt-engine.pbech which
>>>> pointed to an IP on the local virtualization network.
>>>>
>>>> Though now when trying to connect to engine during deploy:
>>>>
>>>> [ ERROR ] The VDSM host was found in a failed state. Please check
>>>> engine and bootstrap installation logs.
>>>>
>>>> [ ERROR ] Unable to add ovirt-hv1.pbtech to the manager
>>>>
>>>> Then repeating
>>>>
>>>> [ INFO  ] Still waiting for engine to start...
>>>>
>>>> Thanks,
>>>>
>>>> Douglas Duckworth, MSc, LFCS
>>>> HPC System Administrator
>>>> Scientific Computing Unit
>>>> Weill Cornell Medicine
>>>> E: doug(a)med.cornell.edu
>>>> O: 212-746-6305
>>>> F: 212-746-8690
>>>>
>>>> On Wed, Aug 15, 2018 at 10:34 AM, Douglas Duckworth <
>>>> dod2014(a)med.cornell.edu&gt; wrote:
>>>>
>>>>> Hi
>>>>>
>>>>> I keep getting this error after running
>>>>>
>>>>> sudo hosted-engine --deploy --noansible
>>>>>
>>>>> [ INFO  ] Engine is still not reachable, waiting...
>>>>> [ ERROR ] Failed to execute stage 'Closing up': Engine is
still not
>>>>> reachable
>>>>>
>>>>> I do see a VM running
>>>>>
>>>>> 10:20   2:51 /usr/libexec/qemu-kvm -name
>>>>> guest=HostedEngine,debug-threads=on
>>>>>
>>>>> Though
>>>>>
>>>>> sudo hosted-engine --vm-status
>>>>> [Errno 2] No such file or directory
>>>>> Cannot connect to the HA daemon, please check the logs
>>>>> An error occured while retrieving vm status, please make sure the HA
>>>>> daemon is ready and reachable.
>>>>> Unable to connect the HA Broker
>>>>>
>>>>> Can someone please help?
>>>>>
>>>>> Each time this failed I ran
"/usr/sbin/ovirt-hosted-engine-cleanup"
>>>>> then tried deployment again.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Douglas Duckworth, MSc, LFCS
>>>>> HPC System Administrator
>>>>> Scientific Computing Unit
>>>>> Weill Cornell Medicine
>>>>> E: doug(a)med.cornell.edu
>>>>> O: 212-746-6305
>>>>> F: 212-746-8690
>>>>>
>>>>
>>>>
>>>
>>
>

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

[ovirt-users] Re: hosted engine not reachable