One cannot mask firewalld otherwise hosted-engine deploy will fail in this
manner.
Would be nice if a check would tell user to unmask firewalld rather than
fail in this way.
Thanks,
Douglas Duckworth, MSc, LFCS
HPC System Administrator
Scientific Computing Unit
Weill Cornell Medicine
E: doug(a)med.cornell.edu
O: 212-746-6305
F: 212-746-8690
On Thu, Aug 16, 2018 at 7:57 AM, Douglas Duckworth <dod2014(a)med.cornell.edu>
wrote:
I cannot get past this task in "/usr/share/ovirt-hosted-engin
e-setup/ansible/bootstrap_local_vm.yml"
- name: Add host
ovirt_hosts:
# TODO: add to the first cluster of the datacenter
# where we set the vlan id
name: "{{ HOST_NAME }}"
state: present
public_key: true
address: "{{ HOST_ADDRESS }}"
auth: "{{ ovirt_auth }}"
async: 1
poll: 0
- name: Wait for the host to be up
ovirt_hosts_facts:
pattern: name={{ HOST_NAME }}
auth: "{{ ovirt_auth }}"
register: host_result_up_check
until: host_result_up_check is succeeded and
host_result_up_check.ansible_facts.ovirt_hosts|length >= 1 and
(host_result_up_check.ansible_facts.ovirt_hosts[0].status == 'up' or
host_result_up_check.ansible_facts.ovirt_hosts[0].status ==
'non_operational')
retries: 120
delay: 5
- debug: var=host_result_up_check
- name: Check host status
fail:
msg: >
The host has been set in non_operational status,
please check engine logs,
fix accordingly and re-deploy.
when: host_result_up_check is succeeded and
host_result_up_check.ansible_facts.ovirt_hosts|length >= 1 and
host_result_up_check.ansible_facts.ovirt_hosts[0].status ==
'non_operational'
The error:
[ INFO ] TASK [Wait for the host to be up]
[ ERROR ] fatal: [localhost]: FAILED! => {"ansible_facts":
{"ovirt_hosts":
[{"address": "ovirt-hv1.pbtech", "affinity_labels": [],
"auto_numa_status":
"unknown", "certificate": {"organization":
"pbtech", "subject":
"O=pbtech,CN=ovirt-hv1.
pbtech"}, "cluster": {"href":
"/ovirt-engine/api/clusters/a4
b6cd02-a0ef-11e8-a347-00163e54fb7f", "id":
"a4b6cd02-a0ef-11e8-a347-00163e54fb7f"},
"comment": "", "cpu": {"speed": 0.0,
"topology": {}}, "device_passthrough":
{"enabled": false}
, "devices": [], "external_network_provider_configurations": [],
"external_status": "ok", "hardware_information":
{"supported_rng_sources":
[]}, "hooks": [], "href": "/ovirt-engine/api/hosts/609e7
eba-8b85-4830-9a5f-99e561bb503a", "id": "6
09e7eba-8b85-4830-9a5f-99e561bb503a", "katello_errata": [],
"kdump_status": "unknown", "ksm": {"enabled":
false},
"max_scheduling_memory": 0, "memory": 0, "name":
"ovirt-hv1.pbtech",
"network_attachments": [], "nics": [], "numa_nodes": []
, "numa_supported": false, "os": {"custom_kernel_cmdline":
""},
"permissions": [], "port": 54321, "power_management":
{"automatic_pm_enabled": true, "enabled": false,
"kdump_detection": true,
"pm_proxies": []}, "protocol": "stomp", "se_li
nux": {}, "spm": {"priority": 5, "status":
"none"}, "ssh": {"fingerprint":
"SHA256:X+3GNzNZ09Ct7xt6T3sEgVGecyG3QjG71h+D6RnYZU8", "port": 22},
"statistics": [], "status": "install_failed",
"storage_connection_extensions":
[], "summary": {"
total": 0}, "tags": [], "transparent_huge_pages":
{"enabled": false},
"type": "rhel", "unmanaged_networks": [],
"update_available": false}]},
"attempts": 120, "changed": false}
[ INFO ] TASK [Fetch logs from the engine VM]
Though the VM's up:
[root@ovirt-hv1 tmp]# ping ovirt-engine.pbtech
PING ovirt-engine.pbtech (192.168.122.69) 56(84) bytes of data.
64 bytes from ovirt-engine.pbtech (192.168.122.69): icmp_seq=1 ttl=64
time=0.186 ms
64 bytes from ovirt-engine.pbtech (192.168.122.69): icmp_seq=2 ttl=64
time=0.153 ms
root@ovirt-hv1 tmp]# wget --no-check-certificate
https://ovirt-engine.pbtech/ovirt-engine/api
--2018-08-16 07:44:36--
https://ovirt-engine.pbtech/ovirt-engine/api
Resolving ovirt-engine.pbtech (ovirt-engine.pbtech)... 192.168.122.69
Connecting to ovirt-engine.pbtech (ovirt-engine.pbtech)|192.168.122.69|:443...
connected.
WARNING: cannot verify ovirt-engine.pbtech's certificate, issued by
‘/C=US/O=pbtech/CN=ovirt-engine.pbtech.84693’:
Self-signed certificate encountered.
HTTP request sent, awaiting response... 401 Unauthorized
I running oVirt 4.2.3-1 having reinstalled several times. Skipping the
above Ansible task in't a viable workaround.
Here are networks on the host. Note, em1 has ovirtmgmt bridge whereas
ib0 provides NFS storage domain.
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group
default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master
ovirtmgmt state UP group default qlen 1000
link/ether 50:9a:4c:89:c6:bd brd ff:ff:ff:ff:ff:ff
3: em2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default
qlen 1000
link/ether 50:9a:4c:89:c6:be brd ff:ff:ff:ff:ff:ff
4: p1p1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group
default qlen 1000
link/ether b4:96:91:13:ee:68 brd ff:ff:ff:ff:ff:ff
5: p1p2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group
default qlen 1000
link/ether b4:96:91:13:ee:6a brd ff:ff:ff:ff:ff:ff
6: idrac: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
state UNKNOWN group default qlen 1000
link/ether 50:9a:4c:89:c6:c0 brd ff:ff:ff:ff:ff:ff
inet 169.254.0.2/16 brd 169.254.255.255 scope global idrac
valid_lft forever preferred_lft forever
7: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc mq state UP
group default qlen 256
link/infiniband a0:00:02:08:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:1d:19:e1
brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
inet 172.16.0.204/24 brd 172.16.0.255 scope global ib0
valid_lft forever preferred_lft forever
8: virbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state
UP group default qlen 1000
link/ether 52:54:00:78:d1:c5 brd ff:ff:ff:ff:ff:ff
inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
valid_lft forever preferred_lft forever
9: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master
virbr0 state DOWN group default qlen 1000
link/ether 52:54:00:78:d1:c5 brd ff:ff:ff:ff:ff:ff
41: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
state UP group default qlen 1000
link/ether 50:9a:4c:89:c6:bd brd ff:ff:ff:ff:ff:ff
inet 10.0.0.176/16 brd 10.0.255.255 scope global ovirtmgmt
valid_lft forever preferred_lft forever
42: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
group default qlen 1000
link/ether 5e:ac:28:79:c9:0e brd ff:ff:ff:ff:ff:ff
43: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group
default qlen 1000
link/ether 62:a8:d5:20:26:88 brd ff:ff:ff:ff:ff:ff
44: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group
default qlen 1000
link/ether ea:41:13:ce:b6:4e brd ff:ff:ff:ff:ff:ff
48: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
master virbr0 state UNKNOWN group default qlen 1000
link/ether fe:16:3e:54:fb:7f brd ff:ff:ff:ff:ff:ff
default via 10.0.0.52 dev ovirtmgmt
10.0.0.0/16 dev ovirtmgmt proto kernel scope link src 10.0.0.176
169.254.0.0/16 dev idrac proto kernel scope link src 169.254.0.2
169.254.0.0/16 dev ib0 scope link metric 1007
169.254.0.0/16 dev ovirtmgmt scope link metric 1041
172.16.0.0/24 dev ib0 proto kernel scope link src 172.16.0.204
192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1
The oVirt engine has been attached.
Thank you!
Thanks,
Douglas Duckworth, MSc, LFCS
HPC System Administrator
Scientific Computing Unit
Weill Cornell Medicine
E: doug(a)med.cornell.edu
O: 212-746-6305
F: 212-746-8690
On Wed, Aug 15, 2018 at 4:37 PM, Douglas Duckworth <
dod2014(a)med.cornell.edu> wrote:
> Eventually failed.
>
> I am running CentOS 7.5 on the host. After re-reading documentation it
> seems that my /var partition might not be large enough, as it's only 30GB,
> but no warning message indicating that's an issue.
>
> Thanks,
>
> Douglas Duckworth, MSc, LFCS
> HPC System Administrator
> Scientific Computing Unit
> Weill Cornell Medicine
> E: doug(a)med.cornell.edu
> O: 212-746-6305
> F: 212-746-8690
>
> On Wed, Aug 15, 2018 at 2:10 PM, Douglas Duckworth <
> dod2014(a)med.cornell.edu> wrote:
>
>> Ok the ansible engine-deploy now seems to be stuck and same step:
>>
>> [ INFO ] TASK [Force host-deploy in offline mode]
>> [ INFO ] ok: [localhost]
>> [ INFO ] TASK [Add host]
>> [ INFO ] changed: [localhost]
>> [ INFO ] TASK [Wait for the host to be up]
>>
>> On the hypervisor in syslog I see:
>>
>> Aug 15 14:09:26 ovirt-hv1 python: ansible-ovirt_hosts_facts Invoked with
>> pattern=name=ovirt-hv1.pbtech fetch_nested=False nested_attributes=[]
>> auth={'timeout': 0, 'url':
'https://ovirt-engine.pbtech/ovirt-engine/api
>> ',
>>
>> Within the VM, which I can access over virtual machine network, I see:
>>
>> Aug 15 18:08:06 ovirt-engine python: 192.168.122.69 - - [15/Aug/2018
>> 14:08:06] "GET /v2.0/networks HTTP/1.1" 200 -
>> Aug 15 18:08:11 ovirt-engine ovsdb-server: ovs|00008|stream_ssl|WARN|SSL_read:
>> system error (Connection reset by peer)
>> Aug 15 18:08:11 ovirt-engine ovsdb-server: ovs|00009|jsonrpc|WARN|ssl:
>> 127.0.0.1:50356: receive error: Connection reset by peer
>> Aug 15 18:08:11 ovirt-engine ovsdb-server: ovs|00010|reconnect|WARN|ssl:
>> 127.0.0.1:50356: connection dropped (Connection reset by peer)
>>
>> Thanks,
>>
>> Douglas Duckworth, MSc, LFCS
>> HPC System Administrator
>> Scientific Computing Unit
>> Weill Cornell Medicine
>> E: doug(a)med.cornell.edu
>> O: 212-746-6305
>> F: 212-746-8690
>>
>> On Wed, Aug 15, 2018 at 1:21 PM, Douglas Duckworth <
>> dod2014(a)med.cornell.edu> wrote:
>>
>>> Same VDSM error
>>>
>>> This is the state shown by service after the failed state messages:
>>>
>>> ● vdsmd.service - Virtual Desktop Server Manager
>>> Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled;
>>> vendor preset: enabled)
>>> Active: active (running) since Wed 2018-08-15 13:07:48 EDT; 4min 10s
>>> ago
>>> Main PID: 18378 (vdsmd)
>>> Tasks: 56
>>> CGroup: /system.slice/vdsmd.service
>>> ├─18378 /usr/bin/python2 /usr/share/vdsm/vdsmd
>>> ├─18495 /usr/libexec/ioprocess --read-pipe-fd 45
>>> --write-pipe-fd 44 --max-threads 10 --max-queued-requests 10
>>> ├─18504 /usr/libexec/ioprocess --read-pipe-fd 53
>>> --write-pipe-fd 51 --max-threads 10 --max-queued-requests 10
>>> └─20825 /usr/libexec/ioprocess --read-pipe-fd 60
>>> --write-pipe-fd 59 --max-threads 10 --max-queued-requests 10
>>>
>>> Aug 15 13:07:49 ovirt-hv1.pbtech vdsm[18378]: WARN Not ready yet,
>>> ignoring event
'|virt|VM_status|c5463d87-c964-4430-9fdb-0e97d56cf812'
>>> args={'c5463d87-c964-4430-9fdb-0e97d56cf812': {'status':
'Up',
>>> 'displayInfo': [{'tlsPort': '-1',
'ipAddress': '0', 'type': 'vnc', 'port':
>>> '5900'}], 'hash': '6802750603520244794',
'cpuUser': '0.00',
>>> 'monitorResponse': '0', 'cpuUsage': '0.00',
'elapsedTime': '124', 'cpuSys':
>>> '0.00', 'vcpuPeriod': 100000L, 'timeOffset':
'0', 'clientIp': '',
>>> 'pauseCode': 'NOERR', 'vcpuQuota': '-1'}}
>>> Aug 15 13:07:49 ovirt-hv1.pbtech vdsm[18378]: WARN MOM not available.
>>> Aug 15 13:07:49 ovirt-hv1.pbtech vdsm[18378]: WARN MOM not available,
>>> KSM stats will be missing.
>>> Aug 15 13:07:49 ovirt-hv1.pbtech vdsm[18378]: ERROR failed to retrieve
>>> Hosted Engine HA score '[Errno 2] No such file or directory'Is the
Hosted
>>> Engine setup finished?
>>> Aug 15 13:07:50 ovirt-hv1.pbtech vdsm[18378]: WARN Not ready yet,
>>> ignoring event
'|virt|VM_status|c5463d87-c964-4430-9fdb-0e97d56cf812'
>>> args={'c5463d87-c964-4430-9fdb-0e97d56cf812': {'status':
'Up',
>>> 'username': 'Unknown', 'memUsage': '40',
'guestFQDN': '', 'memoryStats':
>>> {'swap_out': '0', 'majflt': '0',
'mem_cached': '772684', 'mem_free':
>>> '1696572', 'mem_buffers': '9348', 'swap_in':
'0', 'pageflt': '3339',
>>> 'mem_total': '3880652', 'mem_unused':
'1696572'}, 'session': 'Unknown',
>>> 'netIfaces': [], 'guestCPUCount': -1, 'appsList': (),
'guestIPs': '',
>>> 'disksUsage': []}}
>>> Aug 15 13:08:04 ovirt-hv1.pbtech vdsm[18378]: ERROR failed to retrieve
>>> Hosted Engine HA score '[Errno 2] No such file or directory'Is the
Hosted
>>> Engine setup finished?
>>> Aug 15 13:08:16 ovirt-hv1.pbtech vdsm[18378]: WARN File:
>>>
/var/lib/libvirt/qemu/channels/c5463d87-c964-4430-9fdb-0e97d56cf812.com.redhat.rhevm.vdsm
>>> already removed
>>> Aug 15 13:08:16 ovirt-hv1.pbtech vdsm[18378]: WARN File:
>>>
/var/lib/libvirt/qemu/channels/c5463d87-c964-4430-9fdb-0e97d56cf812.org.qemu.guest_agent.0
>>> already removed
>>> Aug 15 13:08:16 ovirt-hv1.pbtech vdsm[18378]: WARN File:
>>> /var/run/ovirt-vmconsole-console/c5463d87-c964-4430-9fdb-0e97d56cf812.sock
>>> already removed
>>> Aug 15 13:08:19 ovirt-hv1.pbtech vdsm[18378]: ERROR failed to retrieve
>>> Hosted Engine HA score '[Errno 2] No such file or directory'Is the
Hosted
>>> Engine setup finished?
>>>
>>> Note 'ipAddress': '0' though I see IP was leased out via DHCP
server:
>>>
>>> Aug 15 13:05:55 server dhcpd: DHCPACK on 10.0.0.178 to
>>> 00:16:3e:54:fb:7f via em1
>>>
>>> While I can ping it from my NFS server which provides storage domain:
>>>
>>> 64 bytes from ovirt-hv1.pbtech (10.0.0.176): icmp_seq=1 ttl=64
>>> time=0.253 ms
>>>
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Douglas Duckworth, MSc, LFCS
>>> HPC System Administrator
>>> Scientific Computing Unit
>>> Weill Cornell Medicine
>>> E: doug(a)med.cornell.edu
>>> O: 212-746-6305
>>> F: 212-746-8690
>>>
>>> On Wed, Aug 15, 2018 at 12:50 PM, Douglas Duckworth <
>>> dod2014(a)med.cornell.edu> wrote:
>>>
>>>> Ok
>>>>
>>>> I was now able to get to the step:
>>>>
>>>> Engine replied: DB Up!Welcome to Health Status!
>>>>
>>>> By removing a bad entry from /etc/hosts for ovirt-engine.pbech which
>>>> pointed to an IP on the local virtualization network.
>>>>
>>>> Though now when trying to connect to engine during deploy:
>>>>
>>>> [ ERROR ] The VDSM host was found in a failed state. Please check
>>>> engine and bootstrap installation logs.
>>>>
>>>> [ ERROR ] Unable to add ovirt-hv1.pbtech to the manager
>>>>
>>>> Then repeating
>>>>
>>>> [ INFO ] Still waiting for engine to start...
>>>>
>>>> Thanks,
>>>>
>>>> Douglas Duckworth, MSc, LFCS
>>>> HPC System Administrator
>>>> Scientific Computing Unit
>>>> Weill Cornell Medicine
>>>> E: doug(a)med.cornell.edu
>>>> O: 212-746-6305
>>>> F: 212-746-8690
>>>>
>>>> On Wed, Aug 15, 2018 at 10:34 AM, Douglas Duckworth <
>>>> dod2014(a)med.cornell.edu> wrote:
>>>>
>>>>> Hi
>>>>>
>>>>> I keep getting this error after running
>>>>>
>>>>> sudo hosted-engine --deploy --noansible
>>>>>
>>>>> [ INFO ] Engine is still not reachable, waiting...
>>>>> [ ERROR ] Failed to execute stage 'Closing up': Engine is
still not
>>>>> reachable
>>>>>
>>>>> I do see a VM running
>>>>>
>>>>> 10:20 2:51 /usr/libexec/qemu-kvm -name
>>>>> guest=HostedEngine,debug-threads=on
>>>>>
>>>>> Though
>>>>>
>>>>> sudo hosted-engine --vm-status
>>>>> [Errno 2] No such file or directory
>>>>> Cannot connect to the HA daemon, please check the logs
>>>>> An error occured while retrieving vm status, please make sure the HA
>>>>> daemon is ready and reachable.
>>>>> Unable to connect the HA Broker
>>>>>
>>>>> Can someone please help?
>>>>>
>>>>> Each time this failed I ran
"/usr/sbin/ovirt-hosted-engine-cleanup"
>>>>> then tried deployment again.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Douglas Duckworth, MSc, LFCS
>>>>> HPC System Administrator
>>>>> Scientific Computing Unit
>>>>> Weill Cornell Medicine
>>>>> E: doug(a)med.cornell.edu
>>>>> O: 212-746-6305
>>>>> F: 212-746-8690
>>>>>
>>>>
>>>>
>>>
>>
>