ovirt 4.2.1 pre hosted engine deploy failure

Hello, at the end of the command hosted-engine --deploy I get [ INFO ] TASK [Detect ovirt-hosted-engine-ha version] [ INFO ] changed: [localhost] [ INFO ] TASK [Set ha_version] [ INFO ] ok: [localhost] [ INFO ] TASK [Create configuration templates] [ INFO ] TASK [Create configuration archive] [ INFO ] changed: [localhost] [ INFO ] TASK [Create ovirt-hosted-engine-ha run directory] [ INFO ] changed: [localhost] [ INFO ] TASK [Copy configuration files to the right location on host] [ INFO ] TASK [Copy configuration archive to storage] [ ERROR ] [WARNING]: Failure using method (v2_runner_on_failed) in callback plugin [ ERROR ] (<ansible.plugins.callback.1_otopi_json.CallbackModule object at 0x2dd7d90>): [ ERROR ] 'ascii' codec can't encode character u'\u2018' in position 496: ordinal not in [ ERROR ] range(128) [ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook [ INFO ] Stage: Clean up [ INFO ] Cleaning temporary resources [ INFO ] TASK [Gathering Facts] [ INFO ] ok: [localhost] [ INFO ] TASK [Remove local vm dir] [ INFO ] changed: [localhost] [ INFO ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20180129164431.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination [ ERROR ] Hosted Engine deployment failed: this system is not reliable, please check the issue,fix and redeploy Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20180129160956-a7itm9.log [root@ov42 ~]# Is there any known bug for this? In log file I have: 2018-01-29 16:44:28,159+0100 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils.run:173 [WARNING]: Failure using method (v2_runner_on_failed) in callback plugin 2018-01-29 16:44:28,160+0100 DEBUG otopi.plugins.otopi.dialog.human human.format:69 newline sent to logger 2018-01-29 16:44:28,160+0100 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils.run:173 (<ansible.plugins.callback.1_otopi_json.CallbackModule object at 0x2dd7d90>): 2018-01-29 16:44:28,160+0100 DEBUG otopi.plugins.otopi.dialog.human human.format:69 newline sent to logger 2018-01-29 16:44:28,160+0100 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils.run:173 'ascii' codec can't encode character u'\u2018' in position 496: ordinal not in 2018-01-29 16:44:28,161+0100 DEBUG otopi.plugins.otopi.dialog.human human.format:69 newline sent to logger 2018-01-29 16:44:28,161+0100 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils.run:173 range(128) 2018-01-29 16:44:28,161+0100 DEBUG otopi.plugins.otopi.dialog.human human.format:69 newline sent to logger 2018-01-29 16:44:28,161+0100 DEBUG otopi.context context._executeMethod:143 method exception Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/otopi/context.py", line 133, in _executeMethod method['method']() File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-ansiblesetup/core/target_vm.py", line 193, in _closeup r = ah.run() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/ansible_utils.py", line 175, in run raise RuntimeError(_('Failed executing ansible-playbook')) RuntimeError: Failed executing ansible-playbook 2018-01-29 16:44:28,162+0100 ERROR otopi.context context._executeMethod:152 Failed to execute stage 'Closing up': Failed executing ansible-playbook I'm testing deploy of nested self hosted engine with HE on NFS. Thanks, Gianluca

On Mon, Jan 29, 2018 at 4:53 PM, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
Hello, at the end of the command hosted-engine --deploy
I get [ INFO ] TASK [Detect ovirt-hosted-engine-ha version] [ INFO ] changed: [localhost] [ INFO ] TASK [Set ha_version] [ INFO ] ok: [localhost] [ INFO ] TASK [Create configuration templates] [ INFO ] TASK [Create configuration archive] [ INFO ] changed: [localhost] [ INFO ] TASK [Create ovirt-hosted-engine-ha run directory] [ INFO ] changed: [localhost] [ INFO ] TASK [Copy configuration files to the right location on host] [ INFO ] TASK [Copy configuration archive to storage] [ ERROR ] [WARNING]: Failure using method (v2_runner_on_failed) in callback plugin
[ ERROR ] (<ansible.plugins.callback.1_otopi_json.CallbackModule object at 0x2dd7d90>):
[ ERROR ] 'ascii' codec can't encode character u'\u2018' in position 496: ordinal not in
[ ERROR ] range(128)
[ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook [ INFO ] Stage: Clean up [ INFO ] Cleaning temporary resources [ INFO ] TASK [Gathering Facts] [ INFO ] ok: [localhost] [ INFO ] TASK [Remove local vm dir] [ INFO ] changed: [localhost] [ INFO ] Generating answer file '/var/lib/ovirt-hosted-engine- setup/answers/answers-20180129164431.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination [ ERROR ] Hosted Engine deployment failed: this system is not reliable, please check the issue,fix and redeploy Log file is located at /var/log/ovirt-hosted-engine-s etup/ovirt-hosted-engine-setup-20180129160956-a7itm9.log [root@ov42 ~]#
Is there any known bug for this?
Ciao Gianluca, we have an issue logging messages with special unicode chars from ansible, it's tracked here: https://bugzilla.redhat.com/show_bug.cgi?id=1533500 but this is just hiding your real issue. I'm almost sure that you are facing an issue writing on NFS and thwn dd returns us an error message with \u2018 and \u2019. Can you please check your NFS permissions?
In log file I have:
2018-01-29 16:44:28,159+0100 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils.run:173 [WARNING]: Failure using method (v2_runner_on_failed) in callback plugin
2018-01-29 16:44:28,160+0100 DEBUG otopi.plugins.otopi.dialog.human human.format:69 newline sent to logger 2018-01-29 16:44:28,160+0100 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils.run:173 (<ansible.plugins.callback.1_otopi_json.CallbackModule object at 0x2dd7d90>):
2018-01-29 16:44:28,160+0100 DEBUG otopi.plugins.otopi.dialog.human human.format:69 newline sent to logger 2018-01-29 16:44:28,160+0100 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils.run:173 'ascii' codec can't encode character u'\u2018' in position 496: ordinal not in
2018-01-29 16:44:28,161+0100 DEBUG otopi.plugins.otopi.dialog.human human.format:69 newline sent to logger 2018-01-29 16:44:28,161+0100 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils.run:173 range(128) 2018-01-29 16:44:28,161+0100 DEBUG otopi.plugins.otopi.dialog.human human.format:69 newline sent to logger 2018-01-29 16:44:28,161+0100 DEBUG otopi.context context._executeMethod:143 method exception Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/otopi/context.py", line 133, in _executeMethod method['method']() File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr- he-ansiblesetup/core/target_vm.py", line 193, in _closeup r = ah.run() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/ansible_utils.py", line 175, in run raise RuntimeError(_('Failed executing ansible-playbook')) RuntimeError: Failed executing ansible-playbook 2018-01-29 16:44:28,162+0100 ERROR otopi.context context._executeMethod:152 Failed to execute stage 'Closing up': Failed executing ansible-playbook
I'm testing deploy of nested self hosted engine with HE on NFS.
Thanks, Gianluca
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

On Wed, Jan 31, 2018 at 11:48 AM, Simone Tiraboschi <stirabos@redhat.com> wrote:
Ciao Gianluca, we have an issue logging messages with special unicode chars from ansible, it's tracked here: https://bugzilla.redhat.com/show_bug.cgi?id=1533500 but this is just hiding your real issue.
I'm almost sure that you are facing an issue writing on NFS and thwn dd returns us an error message with \u2018 and \u2019. Can you please check your NFS permissions?
Ciao Simone, thanks for answering. I think you were right. Previously I had this: /nfs/SHE_DOMAIN *(rw) Now I have changed to: /nfs/SHE_DOMAIN *(rw,anonuid=36,anongid=36,all_squash) I restarted the deploy with the answer file # hosted-engine --deploy --config-append=/var/lib/ovirt-hosted-engine-setup/answers/answers-20180129164431.conf and it went ahead... and I have contents inside the directory: # ll /nfs/SHE_DOMAIN/a0351a82-734d-4d9a-a75e-3313d2ffe23a/ total 12 drwxr-xr-x. 2 vdsm kvm 4096 Jan 29 16:40 dom_md drwxr-xr-x. 6 vdsm kvm 4096 Jan 29 16:43 images drwxr-xr-x. 4 vdsm kvm 4096 Jan 29 16:40 master But it ended with a problem regarding engine vm: [ INFO ] TASK [Wait for engine to start] [ INFO ] ok: [localhost] [ INFO ] TASK [Set engine pub key as authorized key without validating the TLS/SSL certificates] [ INFO ] changed: [localhost] [ INFO ] TASK [Force host-deploy in offline mode] [ INFO ] changed: [localhost] [ INFO ] TASK [include_tasks] [ INFO ] ok: [localhost] [ INFO ] TASK [Obtain SSO token using username/password credentials] [ INFO ] ok: [localhost] [ INFO ] TASK [Add host] [ INFO ] changed: [localhost] [ INFO ] TASK [Wait for the host to become non operational] [ INFO ] ok: [localhost] [ INFO ] TASK [Get virbr0 routing configuration] [ INFO ] changed: [localhost] [ INFO ] TASK [Get ovirtmgmt route table id] [ INFO ] changed: [localhost] [ INFO ] TASK [Check network configuration] [ INFO ] changed: [localhost] [ INFO ] TASK [Clean network configuration] [ INFO ] changed: [localhost] [ INFO ] TASK [Restore network configuration] [ INFO ] changed: [localhost] [ INFO ] TASK [Wait for the host to be up] [ ERROR ] Error: Failed to read response. [ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 50, "changed": false, "msg": "Failed to read response."} [ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook [ INFO ] Stage: Clean up [ INFO ] Cleaning temporary resources [ INFO ] TASK [Gathering Facts] [ INFO ] ok: [localhost] [ INFO ] TASK [Remove local vm dir] [ INFO ] ok: [localhost] [ INFO ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20180201104600.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination [ ERROR ] Hosted Engine deployment failed: this system is not reliable, please check the issue,fix and redeploy Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20180201102603-1of5a1.log Under /var/log/libvirt/qemu of host from where I'm running the hosted-engine deploy I see this 2018-02-01 09:29:05.515+0000: starting up libvirt version: 3.2.0, package: 14.el7_4.7 (CentOS BuildSystem <http://bugs.centos.org>, 2018-01-04-19:31:34, c1bm.rdu2.centos.org), qemu version: 2.9.0(qemu-kvm-ev-2.9.0-16.el7_4.13.1), hostname: ov42.mydomain LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name guest=HostedEngineLocal,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-HostedEngineLocal/master-key.aes -machine pc-i440fx-rhel7.4.0,accel=kvm,usb=off,dump-guest-core=off -cpu Westmere,+kvmclock -m 6184 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 8c8f8163-5b69-4ff5-b67c-07b1a9b8f100 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-1-HostedEngineLocal/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot menu=off,strict=on -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4 -drive file=/var/tmp/localvm1ClXud/images/918bbfc1-d599-4170-9a92-1ac417bf7658/bb8b3078-fddb-4ce3-8da0-0a191768a357,format=qcow2,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/var/tmp/localvm1ClXud/seed.iso,format=raw,if=none,id=drive-ide0-0-0,readonly=on -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=28 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:16:3e:15:7b:27,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-1-HostedEngineLocal/org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -vnc 127.0.0.1:0 -device VGA,id=video0,vgamem_mb=16,bus=pci.0,addr=0x2 -object rng-random,id=objrng0,filename=/dev/random -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x6 -msg timestamp=on 2018-02-01T09:29:05.771459Z qemu-kvm: -chardev pty,id=charserial0: char device redirected to /dev/pts/3 (label charserial0) 2018-02-01T09:34:19.445774Z qemu-kvm: terminating on signal 15 from pid 6052 (/usr/sbin/libvirtd) 2018-02-01 09:34:19.668+0000: shutting down, reason=shutdown In /var/log/messages: Feb 1 10:29:05 ov42 systemd-machined: New machine qemu-1-HostedEngineLocal. Feb 1 10:29:05 ov42 systemd: Started Virtual Machine qemu-1-HostedEngineLocal. Feb 1 10:29:05 ov42 systemd: Starting Virtual Machine qemu-1-HostedEngineLocal. Feb 1 10:29:05 ov42 kvm: 1 guest now active Feb 1 10:29:06 ov42 python: ansible-command Invoked with warn=True executable=None _uses_shell=True _raw_params=virsh -r net-dhcp-leases default | grep -i 00:16:3e:15:7b:27 | awk '{ print $5 }' | cut -f1 -d'/' removes=None creates=None chdir=None stdin=None Feb 1 10:29:07 ov42 kernel: virbr0: port 2(vnet0) entered learning state Feb 1 10:29:09 ov42 kernel: virbr0: port 2(vnet0) entered forwarding state Feb 1 10:29:09 ov42 kernel: virbr0: topology change detected, propagating Feb 1 10:29:09 ov42 NetworkManager[749]: <info> [1517477349.5180] device (virbr0): link connected Feb 1 10:29:16 ov42 python: ansible-command Invoked with warn=True executable=None _uses_shell=True _raw_params=virsh -r net-dhcp-leases default | grep -i 00:16:3e:15:7b:27 | awk '{ print $5 }' | cut -f1 -d'/' removes=None creates=None chdir=None stdin=None Feb 1 10:29:27 ov42 python: ansible-command Invoked with warn=True executable=None _uses_shell=True _raw_params=virsh -r net-dhcp-leases default | grep -i 00:16:3e:15:7b:27 | awk '{ print $5 }' | cut -f1 -d'/' removes=None creates=None chdir=None stdin=None Feb 1 10:29:30 ov42 dnsmasq-dhcp[6322]: DHCPDISCOVER(virbr0) 00:16:3e:15:7b:27 Feb 1 10:29:30 ov42 dnsmasq-dhcp[6322]: DHCPOFFER(virbr0) 192.168.122.200 00:16:3e:15:7b:27 Feb 1 10:29:30 ov42 dnsmasq-dhcp[6322]: DHCPREQUEST(virbr0) 192.168.122.200 00:16:3e:15:7b:27 Feb 1 10:29:30 ov42 dnsmasq-dhcp[6322]: DHCPACK(virbr0) 192.168.122.200 00:16:3e:15:7b:27 . . . Feb 1 10:34:00 ov42 systemd: Starting Virtualization daemon... Feb 1 10:34:00 ov42 python: ansible-ovirt_hosts_facts Invoked with pattern=name=ov42.mydomain status=up fetch_nested=False nested_attributes=[] auth={'ca_file': None, 'url': ' https://ov42she.mydomain/ovirt-engine/api', 'insecure': True, 'kerberos': False, 'compress': True, 'headers': None, 'token': 'GOK2wLFZ0PIs1GbXVQjNW-yBlUtZoGRa2I92NkCkm6lwdlQV-dUdP5EjInyGGN_zEVEHFKgR6nuZ-eIlfaM_lw', 'timeout': 0} Feb 1 10:34:03 ov42 systemd: Started Virtualization daemon. Feb 1 10:34:03 ov42 systemd: Reloading. Feb 1 10:34:03 ov42 systemd: [/usr/lib/systemd/system/ip6tables.service:3] Failed to add dependency on syslog.target,iptables.service, ignoring: Invalid argument Feb 1 10:34:03 ov42 systemd: Cannot add dependency job for unit lvm2-lvmetad.socket, ignoring: Unit is masked. Feb 1 10:34:03 ov42 systemd: Starting Cockpit Web Service... Feb 1 10:34:03 ov42 dnsmasq[6322]: read /etc/hosts - 4 addresses Feb 1 10:34:03 ov42 dnsmasq[6322]: read /var/lib/libvirt/dnsmasq/default.addnhosts - 0 addresses Feb 1 10:34:03 ov42 dnsmasq-dhcp[6322]: read /var/lib/libvirt/dnsmasq/default.hostsfile Feb 1 10:34:03 ov42 systemd: Started Cockpit Web Service. Feb 1 10:34:03 ov42 cockpit-ws: Using certificate: /etc/cockpit/ws-certs.d/0-self-signed.cert Feb 1 10:34:03 ov42 libvirtd: 2018-02-01 09:34:03.840+0000: 6076: info : libvirt version: 3.2.0, package: 14.el7_4.7 (CentOS BuildSystem < http://bugs.centos.org>, 2018-01-04-19:31:34, c1bm.rdu2.centos.org) Feb 1 10:34:03 ov42 libvirtd: 2018-02-01 09:34:03.840+0000: 6076: info : hostname: ov42.mydomain Feb 1 10:34:03 ov42 libvirtd: 2018-02-01 09:34:03.840+0000: 6076: error : virDirOpenInternal:2829 : cannot open directory '/var/tmp/localvm7I0SSJ/images/918bbfc1-d599-4170-9a92-1ac417bf7658': No such file or directory Feb 1 10:34:03 ov42 libvirtd: 2018-02-01 09:34:03.841+0000: 6076: error : storageDriverAutostart:204 : internal error: Failed to autostart storage pool '918bbfc1-d599-4170-9a92-1ac417bf7658': cannot open directory '/var/tmp/localvm7I0SSJ/images/918bbfc1-d599-4170-9a92-1ac417bf7658': No such file or directory Feb 1 10:34:03 ov42 libvirtd: 2018-02-01 09:34:03.841+0000: 6076: error : virDirOpenInternal:2829 : cannot open directory '/var/tmp/localvm7I0SSJ': No such file or directory Feb 1 10:34:03 ov42 libvirtd: 2018-02-01 09:34:03.841+0000: 6076: error : storageDriverAutostart:204 : internal error: Failed to autostart storage pool 'localvm7I0SSJ': cannot open directory '/var/tmp/localvm7I0SSJ': No such file or directory Feb 1 10:34:03 ov42 systemd: Stopping Suspend/Resume Running libvirt Guests... Feb 1 10:34:04 ov42 libvirt-guests.sh: Running guests on qemu+tls://ov42.mydomain/system URI: HostedEngineLocal Feb 1 10:34:04 ov42 libvirt-guests.sh: Shutting down guests on qemu+tls://ov42.mydomain/system URI... Feb 1 10:34:04 ov42 libvirt-guests.sh: Starting shutdown on guest: HostedEngineLocal If I understood corrctly it seems that libvirtd took in charge the ip assignement, using the default 192.168.122.x network, while my host and my engine should be on 10.4.4.x...?? Currently on host, after the failed deploy, I have: # brctl show bridge name bridge id STP enabled interfaces ;vdsmdummy; 8000.000000000000 no ovirtmgmt 8000.001a4a17015d no eth0 virbr0 8000.52540084b832 yes virbr0-nic BTW: on host I have network managed by NetworkManager. It is supported now in upcoming 4.2.1, isn't it? Gianluca

On Thu, Feb 1, 2018 at 11:31 AM, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
On Wed, Jan 31, 2018 at 11:48 AM, Simone Tiraboschi <stirabos@redhat.com> wrote:
Ciao Gianluca, we have an issue logging messages with special unicode chars from ansible, it's tracked here: https://bugzilla.redhat.com/show_bug.cgi?id=1533500 but this is just hiding your real issue.
I'm almost sure that you are facing an issue writing on NFS and thwn dd returns us an error message with \u2018 and \u2019. Can you please check your NFS permissions?
Ciao Simone, thanks for answering. I think you were right. Previously I had this:
/nfs/SHE_DOMAIN *(rw)
Now I have changed to:
/nfs/SHE_DOMAIN *(rw,anonuid=36,anongid=36,all_squash)
I restarted the deploy with the answer file
# hosted-engine --deploy --config-append=/var/lib/ ovirt-hosted-engine-setup/answers/answers-20180129164431.conf
and it went ahead... and I have contents inside the directory:
# ll /nfs/SHE_DOMAIN/a0351a82-734d-4d9a-a75e-3313d2ffe23a/ total 12 drwxr-xr-x. 2 vdsm kvm 4096 Jan 29 16:40 dom_md drwxr-xr-x. 6 vdsm kvm 4096 Jan 29 16:43 images drwxr-xr-x. 4 vdsm kvm 4096 Jan 29 16:40 master
But it ended with a problem regarding engine vm:
[ INFO ] TASK [Wait for engine to start] [ INFO ] ok: [localhost] [ INFO ] TASK [Set engine pub key as authorized key without validating the TLS/SSL certificates] [ INFO ] changed: [localhost] [ INFO ] TASK [Force host-deploy in offline mode] [ INFO ] changed: [localhost] [ INFO ] TASK [include_tasks] [ INFO ] ok: [localhost] [ INFO ] TASK [Obtain SSO token using username/password credentials] [ INFO ] ok: [localhost] [ INFO ] TASK [Add host] [ INFO ] changed: [localhost] [ INFO ] TASK [Wait for the host to become non operational] [ INFO ] ok: [localhost] [ INFO ] TASK [Get virbr0 routing configuration] [ INFO ] changed: [localhost] [ INFO ] TASK [Get ovirtmgmt route table id] [ INFO ] changed: [localhost] [ INFO ] TASK [Check network configuration] [ INFO ] changed: [localhost] [ INFO ] TASK [Clean network configuration] [ INFO ] changed: [localhost] [ INFO ] TASK [Restore network configuration] [ INFO ] changed: [localhost] [ INFO ] TASK [Wait for the host to be up] [ ERROR ] Error: Failed to read response. [ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 50, "changed": false, "msg": "Failed to read response."} [ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook [ INFO ] Stage: Clean up [ INFO ] Cleaning temporary resources [ INFO ] TASK [Gathering Facts] [ INFO ] ok: [localhost] [ INFO ] TASK [Remove local vm dir] [ INFO ] ok: [localhost] [ INFO ] Generating answer file '/var/lib/ovirt-hosted-engine- setup/answers/answers-20180201104600.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination [ ERROR ] Hosted Engine deployment failed: this system is not reliable, please check the issue,fix and redeploy Log file is located at /var/log/ovirt-hosted-engine- setup/ovirt-hosted-engine-setup-20180201102603-1of5a1.log
Under /var/log/libvirt/qemu of host from where I'm running the hosted-engine deploy I see this
2018-02-01 09:29:05.515+0000: starting up libvirt version: 3.2.0, package: 14.el7_4.7 (CentOS BuildSystem <http://bugs.centos.org>, 2018-01-04-19:31:34, c1bm.rdu2.centos.org), qemu version: 2.9.0(qemu-kvm-ev-2.9.0-16.el7_4.13.1), hostname: ov42.mydomain LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name guest=HostedEngineLocal,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1- HostedEngineLocal/master-key.aes -machine pc-i440fx-rhel7.4.0,accel=kvm,usb=off,dump-guest-core=off -cpu Westmere,+kvmclock -m 6184 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 8c8f8163-5b69-4ff5-b67c-07b1a9b8f100 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/ var/lib/libvirt/qemu/domain-1-HostedEngineLocal/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot menu=off,strict=on -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4 -drive file=/var/tmp/localvm1ClXud/images/918bbfc1-d599-4170- 9a92-1ac417bf7658/bb8b3078-fddb-4ce3-8da0-0a191768a357, format=qcow2,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive- virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/var/tmp/localvm1ClXud/seed.iso,format=raw,if=none,id=drive-ide0-0-0,readonly=on -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=28 -device virtio-net-pci,netdev= hostnet0,id=net0,mac=00:16:3e:15:7b:27,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/ target/domain-1-HostedEngineLocal/org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev= charchannel0,id=channel0,name=org.qemu.guest_agent.0 -vnc 127.0.0.1:0 -device VGA,id=video0,vgamem_mb=16,bus=pci.0,addr=0x2 -object rng-random,id=objrng0,filename=/dev/random -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x6 -msg timestamp=on 2018-02-01T09:29:05.771459Z qemu-kvm: -chardev pty,id=charserial0: char device redirected to /dev/pts/3 (label charserial0) 2018-02-01T09:34:19.445774Z qemu-kvm: terminating on signal 15 from pid 6052 (/usr/sbin/libvirtd) 2018-02-01 09:34:19.668+0000: shutting down, reason=shutdown
In /var/log/messages:
Feb 1 10:29:05 ov42 systemd-machined: New machine qemu-1-HostedEngineLocal. Feb 1 10:29:05 ov42 systemd: Started Virtual Machine qemu-1-HostedEngineLocal. Feb 1 10:29:05 ov42 systemd: Starting Virtual Machine qemu-1-HostedEngineLocal. Feb 1 10:29:05 ov42 kvm: 1 guest now active Feb 1 10:29:06 ov42 python: ansible-command Invoked with warn=True executable=None _uses_shell=True _raw_params=virsh -r net-dhcp-leases default | grep -i 00:16:3e:15:7b:27 | awk '{ print $5 }' | cut -f1 -d'/' removes=None creates=None chdir=None stdin=None Feb 1 10:29:07 ov42 kernel: virbr0: port 2(vnet0) entered learning state Feb 1 10:29:09 ov42 kernel: virbr0: port 2(vnet0) entered forwarding state Feb 1 10:29:09 ov42 kernel: virbr0: topology change detected, propagating Feb 1 10:29:09 ov42 NetworkManager[749]: <info> [1517477349.5180] device (virbr0): link connected Feb 1 10:29:16 ov42 python: ansible-command Invoked with warn=True executable=None _uses_shell=True _raw_params=virsh -r net-dhcp-leases default | grep -i 00:16:3e:15:7b:27 | awk '{ print $5 }' | cut -f1 -d'/' removes=None creates=None chdir=None stdin=None Feb 1 10:29:27 ov42 python: ansible-command Invoked with warn=True executable=None _uses_shell=True _raw_params=virsh -r net-dhcp-leases default | grep -i 00:16:3e:15:7b:27 | awk '{ print $5 }' | cut -f1 -d'/' removes=None creates=None chdir=None stdin=None Feb 1 10:29:30 ov42 dnsmasq-dhcp[6322]: DHCPDISCOVER(virbr0) 00:16:3e:15:7b:27 Feb 1 10:29:30 ov42 dnsmasq-dhcp[6322]: DHCPOFFER(virbr0) 192.168.122.200 00:16:3e:15:7b:27 Feb 1 10:29:30 ov42 dnsmasq-dhcp[6322]: DHCPREQUEST(virbr0) 192.168.122.200 00:16:3e:15:7b:27 Feb 1 10:29:30 ov42 dnsmasq-dhcp[6322]: DHCPACK(virbr0) 192.168.122.200 00:16:3e:15:7b:27 . . . Feb 1 10:34:00 ov42 systemd: Starting Virtualization daemon... Feb 1 10:34:00 ov42 python: ansible-ovirt_hosts_facts Invoked with pattern=name=ov42.mydomain status=up fetch_nested=False nested_attributes=[] auth={'ca_file': None, 'url': ' https://ov42she.mydomain/ovirt-engine/api', 'insecure': True, 'kerberos': False, 'compress': True, 'headers': None, 'token': 'GOK2wLFZ0PIs1GbXVQjNW- yBlUtZoGRa2I92NkCkm6lwdlQV-dUdP5EjInyGGN_zEVEHFKgR6nuZ-eIlfaM_lw', 'timeout': 0} Feb 1 10:34:03 ov42 systemd: Started Virtualization daemon. Feb 1 10:34:03 ov42 systemd: Reloading. Feb 1 10:34:03 ov42 systemd: [/usr/lib/systemd/system/ip6tables.service:3] Failed to add dependency on syslog.target,iptables.service, ignoring: Invalid argument Feb 1 10:34:03 ov42 systemd: Cannot add dependency job for unit lvm2-lvmetad.socket, ignoring: Unit is masked. Feb 1 10:34:03 ov42 systemd: Starting Cockpit Web Service... Feb 1 10:34:03 ov42 dnsmasq[6322]: read /etc/hosts - 4 addresses Feb 1 10:34:03 ov42 dnsmasq[6322]: read /var/lib/libvirt/dnsmasq/default.addnhosts - 0 addresses Feb 1 10:34:03 ov42 dnsmasq-dhcp[6322]: read /var/lib/libvirt/dnsmasq/ default.hostsfile Feb 1 10:34:03 ov42 systemd: Started Cockpit Web Service. Feb 1 10:34:03 ov42 cockpit-ws: Using certificate: /etc/cockpit/ws-certs.d/0-self-signed.cert Feb 1 10:34:03 ov42 libvirtd: 2018-02-01 09:34:03.840+0000: 6076: info : libvirt version: 3.2.0, package: 14.el7_4.7 (CentOS BuildSystem < http://bugs.centos.org>, 2018-01-04-19:31:34, c1bm.rdu2.centos.org) Feb 1 10:34:03 ov42 libvirtd: 2018-02-01 09:34:03.840+0000: 6076: info : hostname: ov42.mydomain Feb 1 10:34:03 ov42 libvirtd: 2018-02-01 09:34:03.840+0000: 6076: error : virDirOpenInternal:2829 : cannot open directory '/var/tmp/localvm7I0SSJ/ images/918bbfc1-d599-4170-9a92-1ac417bf7658': No such file or directory Feb 1 10:34:03 ov42 libvirtd: 2018-02-01 09:34:03.841+0000: 6076: error : storageDriverAutostart:204 : internal error: Failed to autostart storage pool '918bbfc1-d599-4170-9a92-1ac417bf7658': cannot open directory '/var/tmp/localvm7I0SSJ/images/918bbfc1-d599-4170-9a92-1ac417bf7658': No such file or directory Feb 1 10:34:03 ov42 libvirtd: 2018-02-01 09:34:03.841+0000: 6076: error : virDirOpenInternal:2829 : cannot open directory '/var/tmp/localvm7I0SSJ': No such file or directory Feb 1 10:34:03 ov42 libvirtd: 2018-02-01 09:34:03.841+0000: 6076: error : storageDriverAutostart:204 : internal error: Failed to autostart storage pool 'localvm7I0SSJ': cannot open directory '/var/tmp/localvm7I0SSJ': No such file or directory Feb 1 10:34:03 ov42 systemd: Stopping Suspend/Resume Running libvirt Guests... Feb 1 10:34:04 ov42 libvirt-guests.sh: Running guests on qemu+tls://ov42.mydomain/system URI: HostedEngineLocal Feb 1 10:34:04 ov42 libvirt-guests.sh: Shutting down guests on qemu+tls://ov42.mydomain/system URI... Feb 1 10:34:04 ov42 libvirt-guests.sh: Starting shutdown on guest: HostedEngineLocal
You definitively hit this one: https://bugzilla.redhat.com/show_bug.cgi?id=1539040 host-deploy stops libvirt-guests triggering a shutdown of all the running VMs (including HE one) We rebuilt host-deploy with a fix for that today. It affects only the host where libvirt-guests has already been configured by a 4.2 host-deploy in the past. As a workaround you have to manually stop libvirt-guests before and deconfigure it on /etc/sysconfig/libvirt-guests.conf before running hosted-engine-setup again.
If I understood corrctly it seems that libvirtd took in charge the ip assignement, using the default 192.168.122.x network, while my host and my engine should be on 10.4.4.x...??
This is absolutely fine. Let me explain: with the new ansible based flow we completely reverted the hosted-engine deployment flow. In the past hosted-engine-setup was directly preparing the host, the storage, the network and a VM in advance via vdsm and the user was waiting for the engine at the to auto-import everything with a lot of possible issues in the middle. Now hosted-engine-setup, doing everything via ansible, bootstraps a local VM on local storage over the default natted libvirt network (that's why you temporary see that address) and it deploys ovirt-engine there. Then hosted-engine-setup will use the engine running on the bootstrap local VM to set up everything else (storage, network, vm...) using the well know and tested engine APIs. Only at the end it migrates the disk of the local VM over the disk created by engine on the shared storage and ovirt-ha-agent will boot the engine VM from as usual. More than that, at this point we don't need auto-import code on engine side since all the involved entities are already know by the engine since it created them.
Currently on host, after the failed deploy, I have:
# brctl show bridge name bridge id STP enabled interfaces ;vdsmdummy; 8000.000000000000 no ovirtmgmt 8000.001a4a17015d no eth0 virbr0 8000.52540084b832 yes virbr0-nic
BTW: on host I have network managed by NetworkManager. It is supported now in upcoming 4.2.1, isn't it?
Yes, it is.
Gianluca

On Thu, Feb 1, 2018 at 7:19 PM, Simone Tiraboschi <stirabos@redhat.com> wrote:
You definitively hit this one: https://bugzilla.redhat.com/show_bug.cgi?id=1539040 host-deploy stops libvirt-guests triggering a shutdown of all the running VMs (including HE one)
We rebuilt host-deploy with a fix for that today. It affects only the host where libvirt-guests has already been configured by a 4.2 host-deploy in the past. As a workaround you have to manually stop libvirt-guests before and deconfigure it on /etc/sysconfig/libvirt-guests.conf before running hosted-engine-setup again.
Ok. This is a test env that I want to give to power users to have a feel about 4.2 nw GUI and so I decided to go from scratch redeploying the OS of the host, and with the initial correct nfs permissions all went good at first attempt. Now I have a reachaility problem of engine vm from outside, but it is a differen tproblem and I'm going to open a new thread for it if I don't solve..
If I understood corrctly it seems that libvirtd took in charge the ip assignement, using the default 192.168.122.x network, while my host and my engine should be on 10.4.4.x...??
This is absolutely fine. Let me explain: with the new ansible based flow we completely reverted the hosted-engine deployment flow.
Thanks for the new workflow explanation. Indeed during my change&try options I also tried to destroy and undefine the "default" libvirt network and the deploy complained about it.
BTW: on host I have network managed by NetworkManager. It is supported now in upcoming 4.2.1, isn't it?
Yes, it is.
Ok. I confirm that in my new deploy I let NetworkManager up in host configuration and all went good.
participants (2)
-
Gianluca Cecchi
-
Simone Tiraboschi