[ovirt-users] Re: hosted engine not reachable

Thursday, 16 August 2018

I cannot get past this task in "/usr/share/ovirt-hosted-
engine-setup/ansible/bootstrap_local_vm.yml"

      - name: Add host
        ovirt_hosts:
          # TODO: add to the first cluster of the datacenter
          # where we set the vlan id
          name: "{{ HOST_NAME }}"
          state: present
          public_key: true
          address: "{{ HOST_ADDRESS }}"
          auth: "{{ ovirt_auth }}"
        async: 1
        poll: 0
      - name: Wait for the host to be up
        ovirt_hosts_facts:
          pattern: name={{ HOST_NAME }}
          auth: "{{ ovirt_auth }}"
        register: host_result_up_check
        until: host_result_up_check is succeeded and
host_result_up_check.ansible_facts.ovirt_hosts|length >= 1 and
(host_result_up_check.ansible_facts.ovirt_hosts[0].status == 'up' or
host_result_up_check.ansible_facts.ovirt_hosts[0].status ==
'non_operational')
        retries: 120
        delay: 5
      - debug: var=host_result_up_check
      - name: Check host status
        fail:
          msg: >
            The host has been set in non_operational status,
            please check engine logs,
            fix accordingly and re-deploy.
        when: host_result_up_check is succeeded and
host_result_up_check.ansible_facts.ovirt_hosts|length >= 1 and
host_result_up_check.ansible_facts.ovirt_hosts[0].status ==
'non_operational'

The error:

[ INFO  ] TASK [Wait for the host to be up]
[ ERROR ] fatal: [localhost]: FAILED! => {"ansible_facts":
{"ovirt_hosts":
[{"address": "ovirt-hv1.pbtech", "affinity_labels": [],
"auto_numa_status":
"unknown", "certificate": {"organization":
"pbtech", "subject":
"O=pbtech,CN=ovirt-hv1.
pbtech"}, "cluster": {"href": "/ovirt-engine/api/clusters/
a4b6cd02-a0ef-11e8-a347-00163e54fb7f", "id":
"a4b6cd02-a0ef-11e8-a347-00163e54fb7f"},
"comment": "", "cpu": {"speed": 0.0,
"topology": {}}, "device_passthrough":
{"enabled": false}
, "devices": [], "external_network_provider_configurations": [],
"external_status": "ok", "hardware_information":
{"supported_rng_sources":
[]}, "hooks": [], "href":
"/ovirt-engine/api/hosts/609e7eba-8b85-4830-9a5f-99e561bb503a",
"id": "6
09e7eba-8b85-4830-9a5f-99e561bb503a", "katello_errata": [],
"kdump_status":
"unknown", "ksm": {"enabled": false},
"max_scheduling_memory": 0, "memory":
0, "name": "ovirt-hv1.pbtech", "network_attachments": [],
"nics": [],
"numa_nodes": []
, "numa_supported": false, "os": {"custom_kernel_cmdline":
""},
"permissions": [], "port": 54321, "power_management":
{"automatic_pm_enabled": true, "enabled": false,
"kdump_detection": true,
"pm_proxies": []}, "protocol": "stomp", "se_li
nux": {}, "spm": {"priority": 5, "status":
"none"}, "ssh": {"fingerprint":
"SHA256:X+3GNzNZ09Ct7xt6T3sEgVGecyG3QjG71h+D6RnYZU8", "port": 22},
"statistics": [], "status": "install_failed",
"storage_connection_extensions":
[], "summary": {"
total": 0}, "tags": [], "transparent_huge_pages":
{"enabled": false},
"type": "rhel", "unmanaged_networks": [],
"update_available": false}]},
"attempts": 120, "changed": false}
[ INFO  ] TASK [Fetch logs from the engine VM]

Though the VM's up:

[root@ovirt-hv1 tmp]# ping ovirt-engine.pbtech
PING ovirt-engine.pbtech (192.168.122.69) 56(84) bytes of data.
64 bytes from ovirt-engine.pbtech (192.168.122.69): icmp_seq=1 ttl=64
time=0.186 ms
64 bytes from ovirt-engine.pbtech (192.168.122.69): icmp_seq=2 ttl=64
time=0.153 ms

root@ovirt-hv1 tmp]# wget --no-check-certificate
https://ovirt-engine.pbtech/ovirt-engine/api
--2018-08-16 07:44:36--  https://ovirt-engine.pbtech/ovirt-engine/api
Resolving ovirt-engine.pbtech (ovirt-engine.pbtech)... 192.168.122.69

Connecting to ovirt-engine.pbtech (ovirt-engine.pbtech)|192.168.122.69|:443...
connected.
WARNING: cannot verify ovirt-engine.pbtech's certificate, issued by
‘/C=US/O=pbtech/CN=ovirt-engine.pbtech.84693’:
  Self-signed certificate encountered.
HTTP request sent, awaiting response... 401 Unauthorized

I running oVirt 4.2.3-1 having reinstalled several times.  Skipping the
above Ansible task in't a viable workaround.

Here are networks on the host.  Note, em1 has ovirtmgmt bridge whereas ib0
provides NFS storage domain.

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group
default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master
ovirtmgmt state UP group default qlen 1000
    link/ether 50:9a:4c:89:c6:bd brd ff:ff:ff:ff:ff:ff
3: em2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default
qlen 1000
    link/ether 50:9a:4c:89:c6:be brd ff:ff:ff:ff:ff:ff
4: p1p1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default
qlen 1000
    link/ether b4:96:91:13:ee:68 brd ff:ff:ff:ff:ff:ff
5: p1p2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default
qlen 1000
    link/ether b4:96:91:13:ee:6a brd ff:ff:ff:ff:ff:ff
6: idrac: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state
UNKNOWN group default qlen 1000
    link/ether 50:9a:4c:89:c6:c0 brd ff:ff:ff:ff:ff:ff
    inet 169.254.0.2/16 brd 169.254.255.255 scope global idrac
       valid_lft forever preferred_lft forever
7: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc mq state UP group
default qlen 256
    link/infiniband a0:00:02:08:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:1d:19:e1
brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
    inet 172.16.0.204/24 brd 172.16.0.255 scope global ib0
       valid_lft forever preferred_lft forever
8: virbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state
UP group default qlen 1000
    link/ether 52:54:00:78:d1:c5 brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
       valid_lft forever preferred_lft forever
9: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master
virbr0 state DOWN group default qlen 1000
    link/ether 52:54:00:78:d1:c5 brd ff:ff:ff:ff:ff:ff
41: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
state UP group default qlen 1000
    link/ether 50:9a:4c:89:c6:bd brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.176/16 brd 10.0.255.255 scope global ovirtmgmt
       valid_lft forever preferred_lft forever
42: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group
default qlen 1000
    link/ether 5e:ac:28:79:c9:0e brd ff:ff:ff:ff:ff:ff
43: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group
default qlen 1000
    link/ether 62:a8:d5:20:26:88 brd ff:ff:ff:ff:ff:ff
44: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group
default qlen 1000
    link/ether ea:41:13:ce:b6:4e brd ff:ff:ff:ff:ff:ff
48: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
master virbr0 state UNKNOWN group default qlen 1000
    link/ether fe:16:3e:54:fb:7f brd ff:ff:ff:ff:ff:ff

default via 10.0.0.52 dev ovirtmgmt
10.0.0.0/16 dev ovirtmgmt proto kernel scope link src 10.0.0.176
169.254.0.0/16 dev idrac proto kernel scope link src 169.254.0.2
169.254.0.0/16 dev ib0 scope link metric 1007
169.254.0.0/16 dev ovirtmgmt scope link metric 1041
172.16.0.0/24 dev ib0 proto kernel scope link src 172.16.0.204
192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1

The oVirt engine has been attached.

Thank you!

Thanks,

Douglas Duckworth, MSc, LFCS
HPC System Administrator
Scientific Computing Unit
Weill Cornell Medicine
E: doug(a)med.cornell.edu
O: 212-746-6305
F: 212-746-8690

On Wed, Aug 15, 2018 at 4:37 PM, Douglas Duckworth <dod2014(a)med.cornell.edu&gt;
wrote:

...
 Eventually failed.

 I am running CentOS 7.5 on the host.  After re-reading documentation it
 seems that my /var partition might not be large enough, as it's only 30GB,
 but no warning message indicating that's an issue.

 Thanks,

 Douglas Duckworth, MSc, LFCS
 HPC System Administrator
 Scientific Computing Unit
 Weill Cornell Medicine
 E: doug(a)med.cornell.edu
 O: 212-746-6305
 F: 212-746-8690

 On Wed, Aug 15, 2018 at 2:10 PM, Douglas Duckworth <
 dod2014(a)med.cornell.edu&gt; wrote:

> Ok the ansible engine-deploy now seems to be stuck and same step:
>
> [ INFO  ] TASK [Force host-deploy in offline mode]
> [ INFO  ] ok: [localhost]
> [ INFO  ] TASK [Add host]
> [ INFO  ] changed: [localhost]
> [ INFO  ] TASK [Wait for the host to be up]
>
> On the hypervisor in syslog I see:
>
> Aug 15 14:09:26 ovirt-hv1 python: ansible-ovirt_hosts_facts Invoked with
> pattern=name=ovirt-hv1.pbtech fetch_nested=False nested_attributes=[]
> auth={'timeout': 0, 'url':
'https://ovirt-engine.pbtech/ovirt-engine/api
> ',
>
> Within the VM, which I can access over virtual machine network, I see:
>
> Aug 15 18:08:06 ovirt-engine python: 192.168.122.69 - - [15/Aug/2018
> 14:08:06] "GET /v2.0/networks HTTP/1.1" 200 -
> Aug 15 18:08:11 ovirt-engine ovsdb-server: ovs|00008|stream_ssl|WARN|SSL_read:
> system error (Connection reset by peer)
> Aug 15 18:08:11 ovirt-engine ovsdb-server: ovs|00009|jsonrpc|WARN|ssl:127
> .0.0.1:50356: receive error: Connection reset by peer
> Aug 15 18:08:11 ovirt-engine ovsdb-server: ovs|00010|reconnect|WARN|ssl:1
> 27.0.0.1:50356: connection dropped (Connection reset by peer)
>
> Thanks,
>
> Douglas Duckworth, MSc, LFCS
> HPC System Administrator
> Scientific Computing Unit
> Weill Cornell Medicine
> E: doug(a)med.cornell.edu
> O: 212-746-6305
> F: 212-746-8690
>
> On Wed, Aug 15, 2018 at 1:21 PM, Douglas Duckworth <
> dod2014(a)med.cornell.edu&gt; wrote:
>
>> Same VDSM error
>>
>> This is the state shown by service after the failed state messages:
>>
>> ● vdsmd.service - Virtual Desktop Server Manager
>>    Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled;
>> vendor preset: enabled)
>>    Active: active (running) since Wed 2018-08-15 13:07:48 EDT; 4min 10s
>> ago
>>  Main PID: 18378 (vdsmd)
>>     Tasks: 56
>>    CGroup: /system.slice/vdsmd.service
>>            ├─18378 /usr/bin/python2 /usr/share/vdsm/vdsmd
>>            ├─18495 /usr/libexec/ioprocess --read-pipe-fd 45
>> --write-pipe-fd 44 --max-threads 10 --max-queued-requests 10
>>            ├─18504 /usr/libexec/ioprocess --read-pipe-fd 53
>> --write-pipe-fd 51 --max-threads 10 --max-queued-requests 10
>>            └─20825 /usr/libexec/ioprocess --read-pipe-fd 60
>> --write-pipe-fd 59 --max-threads 10 --max-queued-requests 10
>>
>> Aug 15 13:07:49 ovirt-hv1.pbtech vdsm[18378]: WARN Not ready yet,
>> ignoring event '|virt|VM_status|c5463d87-c964-4430-9fdb-0e97d56cf812'
>> args={'c5463d87-c964-4430-9fdb-0e97d56cf812': {'status':
'Up',
>> 'displayInfo': [{'tlsPort': '-1', 'ipAddress':
'0', 'type': 'vnc', 'port':
>> '5900'}], 'hash': '6802750603520244794',
'cpuUser': '0.00',
>> 'monitorResponse': '0', 'cpuUsage': '0.00',
'elapsedTime': '124', 'cpuSys':
>> '0.00', 'vcpuPeriod': 100000L, 'timeOffset': '0',
'clientIp': '',
>> 'pauseCode': 'NOERR', 'vcpuQuota': '-1'}}
>> Aug 15 13:07:49 ovirt-hv1.pbtech vdsm[18378]: WARN MOM not available.
>> Aug 15 13:07:49 ovirt-hv1.pbtech vdsm[18378]: WARN MOM not available,
>> KSM stats will be missing.
>> Aug 15 13:07:49 ovirt-hv1.pbtech vdsm[18378]: ERROR failed to retrieve
>> Hosted Engine HA score '[Errno 2] No such file or directory'Is the
Hosted
>> Engine setup finished?
>> Aug 15 13:07:50 ovirt-hv1.pbtech vdsm[18378]: WARN Not ready yet,
>> ignoring event '|virt|VM_status|c5463d87-c964-4430-9fdb-0e97d56cf812'
>> args={'c5463d87-c964-4430-9fdb-0e97d56cf812': {'status':
'Up',
>> 'username': 'Unknown', 'memUsage': '40',
'guestFQDN': '', 'memoryStats':
>> {'swap_out': '0', 'majflt': '0',
'mem_cached': '772684', 'mem_free':
>> '1696572', 'mem_buffers': '9348', 'swap_in':
'0', 'pageflt': '3339',
>> 'mem_total': '3880652', 'mem_unused': '1696572'},
'session': 'Unknown',
>> 'netIfaces': [], 'guestCPUCount': -1, 'appsList': (),
'guestIPs': '',
>> 'disksUsage': []}}
>> Aug 15 13:08:04 ovirt-hv1.pbtech vdsm[18378]: ERROR failed to retrieve
>> Hosted Engine HA score '[Errno 2] No such file or directory'Is the
Hosted
>> Engine setup finished?
>> Aug 15 13:08:16 ovirt-hv1.pbtech vdsm[18378]: WARN File:
>>
/var/lib/libvirt/qemu/channels/c5463d87-c964-4430-9fdb-0e97d56cf812.com.redhat.rhevm.vdsm
>> already removed
>> Aug 15 13:08:16 ovirt-hv1.pbtech vdsm[18378]: WARN File:
>>
/var/lib/libvirt/qemu/channels/c5463d87-c964-4430-9fdb-0e97d56cf812.org.qemu.guest_agent.0
>> already removed
>> Aug 15 13:08:16 ovirt-hv1.pbtech vdsm[18378]: WARN File:
>> /var/run/ovirt-vmconsole-console/c5463d87-c964-4430-9fdb-0e97d56cf812.sock
>> already removed
>> Aug 15 13:08:19 ovirt-hv1.pbtech vdsm[18378]: ERROR failed to retrieve
>> Hosted Engine HA score '[Errno 2] No such file or directory'Is the
Hosted
>> Engine setup finished?
>>
>> Note 'ipAddress': '0' though I see IP was leased out via DHCP
server:
>>
>> Aug 15 13:05:55 server dhcpd: DHCPACK on 10.0.0.178 to 00:16:3e:54:fb:7f
>> via em1
>>
>> While I can ping it from my NFS server which provides storage domain:
>>
>> 64 bytes from ovirt-hv1.pbtech (10.0.0.176): icmp_seq=1 ttl=64
>> time=0.253 ms
>>
>>
>>
>>
>> Thanks,
>>
>> Douglas Duckworth, MSc, LFCS
>> HPC System Administrator
>> Scientific Computing Unit
>> Weill Cornell Medicine
>> E: doug(a)med.cornell.edu
>> O: 212-746-6305
>> F: 212-746-8690
>>
>> On Wed, Aug 15, 2018 at 12:50 PM, Douglas Duckworth <
>> dod2014(a)med.cornell.edu&gt; wrote:
>>
>>> Ok
>>>
>>> I was now able to get to the step:
>>>
>>> Engine replied: DB Up!Welcome to Health Status!
>>>
>>> By removing a bad entry from /etc/hosts for ovirt-engine.pbech which
>>> pointed to an IP on the local virtualization network.
>>>
>>> Though now when trying to connect to engine during deploy:
>>>
>>> [ ERROR ] The VDSM host was found in a failed state. Please check
>>> engine and bootstrap installation logs.
>>>
>>> [ ERROR ] Unable to add ovirt-hv1.pbtech to the manager
>>>
>>> Then repeating
>>>
>>> [ INFO  ] Still waiting for engine to start...
>>>
>>> Thanks,
>>>
>>> Douglas Duckworth, MSc, LFCS
>>> HPC System Administrator
>>> Scientific Computing Unit
>>> Weill Cornell Medicine
>>> E: doug(a)med.cornell.edu
>>> O: 212-746-6305
>>> F: 212-746-8690
>>>
>>> On Wed, Aug 15, 2018 at 10:34 AM, Douglas Duckworth <
>>> dod2014(a)med.cornell.edu&gt; wrote:
>>>
>>>> Hi
>>>>
>>>> I keep getting this error after running
>>>>
>>>> sudo hosted-engine --deploy --noansible
>>>>
>>>> [ INFO  ] Engine is still not reachable, waiting...
>>>> [ ERROR ] Failed to execute stage 'Closing up': Engine is still
not
>>>> reachable
>>>>
>>>> I do see a VM running
>>>>
>>>> 10:20   2:51 /usr/libexec/qemu-kvm -name guest=HostedEngine,debug-threa
>>>> ds=on
>>>>
>>>> Though
>>>>
>>>> sudo hosted-engine --vm-status
>>>> [Errno 2] No such file or directory
>>>> Cannot connect to the HA daemon, please check the logs
>>>> An error occured while retrieving vm status, please make sure the HA
>>>> daemon is ready and reachable.
>>>> Unable to connect the HA Broker
>>>>
>>>> Can someone please help?
>>>>
>>>> Each time this failed I ran
"/usr/sbin/ovirt-hosted-engine-cleanup"
>>>> then tried deployment again.
>>>>
>>>> Thanks,
>>>>
>>>> Douglas Duckworth, MSc, LFCS
>>>> HPC System Administrator
>>>> Scientific Computing Unit
>>>> Weill Cornell Medicine
>>>> E: doug(a)med.cornell.edu
>>>> O: 212-746-6305
>>>> F: 212-746-8690
>>>>
>>>
>>>
>>
>

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

[ovirt-users] Re: hosted engine not reachable