[ovirt-users] ovirt 4.2.1 pre hosted engine deploy failure

Simone Tiraboschi stirabos at redhat.com
Thu Feb 1 18:19:01 UTC 2018


On Thu, Feb 1, 2018 at 11:31 AM, Gianluca Cecchi <gianluca.cecchi at gmail.com>
wrote:

> On Wed, Jan 31, 2018 at 11:48 AM, Simone Tiraboschi <stirabos at redhat.com>
> wrote:
>
>>
>>
>> Ciao Gianluca,
>> we have an issue logging messages with special unicode chars from
>> ansible, it's tracked here:
>> https://bugzilla.redhat.com/show_bug.cgi?id=1533500
>> but this is just hiding your real issue.
>>
>> I'm almost sure that you are facing an issue writing on NFS and thwn dd
>> returns us an error message with \u2018 and \u2019.
>> Can you please check your NFS permissions?
>>
>>
>
> Ciao Simone, thanks for answering.
> I think you were right.
> Previously I had this:
>
> /nfs/SHE_DOMAIN *(rw)
>
> Now I have changed to:
>
> /nfs/SHE_DOMAIN *(rw,anonuid=36,anongid=36,all_squash)
>
> I restarted the deploy with the answer file
>
> # hosted-engine --deploy --config-append=/var/lib/
> ovirt-hosted-engine-setup/answers/answers-20180129164431.conf
>
> and it went ahead... and I have contents inside the directory:
>
> # ll /nfs/SHE_DOMAIN/a0351a82-734d-4d9a-a75e-3313d2ffe23a/
> total 12
> drwxr-xr-x. 2 vdsm kvm 4096 Jan 29 16:40 dom_md
> drwxr-xr-x. 6 vdsm kvm 4096 Jan 29 16:43 images
> drwxr-xr-x. 4 vdsm kvm 4096 Jan 29 16:40 master
>
> But it ended with a problem regarding engine vm:
>
> [ INFO  ] TASK [Wait for engine to start]
> [ INFO  ] ok: [localhost]
> [ INFO  ] TASK [Set engine pub key as authorized key without validating
> the TLS/SSL certificates]
> [ INFO  ] changed: [localhost]
> [ INFO  ] TASK [Force host-deploy in offline mode]
> [ INFO  ] changed: [localhost]
> [ INFO  ] TASK [include_tasks]
> [ INFO  ] ok: [localhost]
> [ INFO  ] TASK [Obtain SSO token using username/password credentials]
> [ INFO  ] ok: [localhost]
> [ INFO  ] TASK [Add host]
> [ INFO  ] changed: [localhost]
> [ INFO  ] TASK [Wait for the host to become non operational]
> [ INFO  ] ok: [localhost]
> [ INFO  ] TASK [Get virbr0 routing configuration]
> [ INFO  ] changed: [localhost]
> [ INFO  ] TASK [Get ovirtmgmt route table id]
> [ INFO  ] changed: [localhost]
> [ INFO  ] TASK [Check network configuration]
> [ INFO  ] changed: [localhost]
> [ INFO  ] TASK [Clean network configuration]
> [ INFO  ] changed: [localhost]
> [ INFO  ] TASK [Restore network configuration]
> [ INFO  ] changed: [localhost]
> [ INFO  ] TASK [Wait for the host to be up]
> [ ERROR ] Error: Failed to read response.
> [ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 50, "changed":
> false, "msg": "Failed to read response."}
> [ ERROR ] Failed to execute stage 'Closing up': Failed executing
> ansible-playbook
> [ INFO  ] Stage: Clean up
> [ INFO  ] Cleaning temporary resources
> [ INFO  ] TASK [Gathering Facts]
> [ INFO  ] ok: [localhost]
> [ INFO  ] TASK [Remove local vm dir]
> [ INFO  ] ok: [localhost]
> [ INFO  ] Generating answer file '/var/lib/ovirt-hosted-engine-
> setup/answers/answers-20180201104600.conf'
> [ INFO  ] Stage: Pre-termination
> [ INFO  ] Stage: Termination
> [ ERROR ] Hosted Engine deployment failed: this system is not reliable,
> please check the issue,fix and redeploy
>           Log file is located at /var/log/ovirt-hosted-engine-
> setup/ovirt-hosted-engine-setup-20180201102603-1of5a1.log
>
> Under /var/log/libvirt/qemu of host from where I'm running the
> hosted-engine deploy I see this
>
>
> 2018-02-01 09:29:05.515+0000: starting up libvirt version: 3.2.0, package:
> 14.el7_4.7 (CentOS BuildSystem <http://bugs.centos.org>,
> 2018-01-04-19:31:34, c1bm.rdu2.centos.org), qemu version:
> 2.9.0(qemu-kvm-ev-2.9.0-16.el7_4.13.1), hostname: ov42.mydomain
> LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
> QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name
> guest=HostedEngineLocal,debug-threads=on -S -object
> secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-
> HostedEngineLocal/master-key.aes -machine pc-i440fx-rhel7.4.0,accel=kvm,usb=off,dump-guest-core=off
> -cpu Westmere,+kvmclock -m 6184 -realtime mlock=off -smp
> 1,sockets=1,cores=1,threads=1 -uuid 8c8f8163-5b69-4ff5-b67c-07b1a9b8f100
> -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/
> var/lib/libvirt/qemu/domain-1-HostedEngineLocal/monitor.sock,server,nowait
> -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc
> -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1
> -boot menu=off,strict=on -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4
> -drive file=/var/tmp/localvm1ClXud/images/918bbfc1-d599-4170-
> 9a92-1ac417bf7658/bb8b3078-fddb-4ce3-8da0-0a191768a357,
> format=qcow2,if=none,id=drive-virtio-disk0 -device
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-
> virtio-disk0,id=virtio-disk0,bootindex=1 -drive
> file=/var/tmp/localvm1ClXud/seed.iso,format=raw,if=none,id=drive-ide0-0-0,readonly=on
> -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev
> tap,fd=26,id=hostnet0,vhost=on,vhostfd=28 -device virtio-net-pci,netdev=
> hostnet0,id=net0,mac=00:16:3e:15:7b:27,bus=pci.0,addr=0x3 -chardev
> pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0
> -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/
> target/domain-1-HostedEngineLocal/org.qemu.guest_agent.0,server,nowait
> -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=
> charchannel0,id=channel0,name=org.qemu.guest_agent.0 -vnc 127.0.0.1:0
> -device VGA,id=video0,vgamem_mb=16,bus=pci.0,addr=0x2 -object
> rng-random,id=objrng0,filename=/dev/random -device
> virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x6 -msg timestamp=on
> 2018-02-01T09:29:05.771459Z qemu-kvm: -chardev pty,id=charserial0: char
> device redirected to /dev/pts/3 (label charserial0)
> 2018-02-01T09:34:19.445774Z qemu-kvm: terminating on signal 15 from pid
> 6052 (/usr/sbin/libvirtd)
> 2018-02-01 09:34:19.668+0000: shutting down, reason=shutdown
>
> In /var/log/messages:
>
> Feb  1 10:29:05 ov42 systemd-machined: New machine
> qemu-1-HostedEngineLocal.
> Feb  1 10:29:05 ov42 systemd: Started Virtual Machine
> qemu-1-HostedEngineLocal.
> Feb  1 10:29:05 ov42 systemd: Starting Virtual Machine
> qemu-1-HostedEngineLocal.
> Feb  1 10:29:05 ov42 kvm: 1 guest now active
> Feb  1 10:29:06 ov42 python: ansible-command Invoked with warn=True
> executable=None _uses_shell=True
>  _raw_params=virsh -r net-dhcp-leases default | grep -i 00:16:3e:15:7b:27
> | awk '{ print $5 }' | cut
>  -f1 -d'/' removes=None creates=None chdir=None stdin=None
> Feb  1 10:29:07 ov42 kernel: virbr0: port 2(vnet0) entered learning state
> Feb  1 10:29:09 ov42 kernel: virbr0: port 2(vnet0) entered forwarding state
> Feb  1 10:29:09 ov42 kernel: virbr0: topology change detected, propagating
> Feb  1 10:29:09 ov42 NetworkManager[749]: <info>  [1517477349.5180] device
> (virbr0): link connected
> Feb  1 10:29:16 ov42 python: ansible-command Invoked with warn=True
> executable=None _uses_shell=True
>  _raw_params=virsh -r net-dhcp-leases default | grep -i 00:16:3e:15:7b:27
> | awk '{ print $5 }' | cut
>  -f1 -d'/' removes=None creates=None chdir=None stdin=None
> Feb  1 10:29:27 ov42 python: ansible-command Invoked with warn=True
> executable=None _uses_shell=True
>  _raw_params=virsh -r net-dhcp-leases default | grep -i 00:16:3e:15:7b:27
> | awk '{ print $5 }' | cut
>  -f1 -d'/' removes=None creates=None chdir=None stdin=None
> Feb  1 10:29:30 ov42 dnsmasq-dhcp[6322]: DHCPDISCOVER(virbr0)
> 00:16:3e:15:7b:27
> Feb  1 10:29:30 ov42 dnsmasq-dhcp[6322]: DHCPOFFER(virbr0) 192.168.122.200
> 00:16:3e:15:7b:27
> Feb  1 10:29:30 ov42 dnsmasq-dhcp[6322]: DHCPREQUEST(virbr0)
> 192.168.122.200 00:16:3e:15:7b:27
> Feb  1 10:29:30 ov42 dnsmasq-dhcp[6322]: DHCPACK(virbr0) 192.168.122.200
> 00:16:3e:15:7b:27
> . . .
> Feb  1 10:34:00 ov42 systemd: Starting Virtualization daemon...
> Feb  1 10:34:00 ov42 python: ansible-ovirt_hosts_facts Invoked with
> pattern=name=ov42.mydomain status=up fetch_nested=False
> nested_attributes=[] auth={'ca_file': None, 'url': '
> https://ov42she.mydomain/ovirt-engine/api', 'insecure': True, 'kerberos':
> False, 'compress': True, 'headers': None, 'token': 'GOK2wLFZ0PIs1GbXVQjNW-
> yBlUtZoGRa2I92NkCkm6lwdlQV-dUdP5EjInyGGN_zEVEHFKgR6nuZ-eIlfaM_lw',
> 'timeout': 0}
> Feb  1 10:34:03 ov42 systemd: Started Virtualization daemon.
> Feb  1 10:34:03 ov42 systemd: Reloading.
> Feb  1 10:34:03 ov42 systemd: [/usr/lib/systemd/system/ip6tables.service:3]
> Failed to add dependency on syslog.target,iptables.service, ignoring:
> Invalid argument
> Feb  1 10:34:03 ov42 systemd: Cannot add dependency job for unit
> lvm2-lvmetad.socket, ignoring: Unit is masked.
> Feb  1 10:34:03 ov42 systemd: Starting Cockpit Web Service...
> Feb  1 10:34:03 ov42 dnsmasq[6322]: read /etc/hosts - 4 addresses
> Feb  1 10:34:03 ov42 dnsmasq[6322]: read /var/lib/libvirt/dnsmasq/default.addnhosts
> - 0 addresses
> Feb  1 10:34:03 ov42 dnsmasq-dhcp[6322]: read /var/lib/libvirt/dnsmasq/
> default.hostsfile
> Feb  1 10:34:03 ov42 systemd: Started Cockpit Web Service.
> Feb  1 10:34:03 ov42 cockpit-ws: Using certificate:
> /etc/cockpit/ws-certs.d/0-self-signed.cert
> Feb  1 10:34:03 ov42 libvirtd: 2018-02-01 09:34:03.840+0000: 6076: info :
> libvirt version: 3.2.0, package: 14.el7_4.7 (CentOS BuildSystem <
> http://bugs.centos.org>, 2018-01-04-19:31:34, c1bm.rdu2.centos.org)
> Feb  1 10:34:03 ov42 libvirtd: 2018-02-01 09:34:03.840+0000: 6076: info :
> hostname: ov42.mydomain
> Feb  1 10:34:03 ov42 libvirtd: 2018-02-01 09:34:03.840+0000: 6076: error :
> virDirOpenInternal:2829 : cannot open directory '/var/tmp/localvm7I0SSJ/
> images/918bbfc1-d599-4170-9a92-1ac417bf7658': No such file or directory
> Feb  1 10:34:03 ov42 libvirtd: 2018-02-01 09:34:03.841+0000: 6076: error :
> storageDriverAutostart:204 : internal error: Failed to autostart storage
> pool '918bbfc1-d599-4170-9a92-1ac417bf7658': cannot open directory
> '/var/tmp/localvm7I0SSJ/images/918bbfc1-d599-4170-9a92-1ac417bf7658': No
> such file or directory
> Feb  1 10:34:03 ov42 libvirtd: 2018-02-01 09:34:03.841+0000: 6076: error :
> virDirOpenInternal:2829 : cannot open directory '/var/tmp/localvm7I0SSJ':
> No such file or directory
> Feb  1 10:34:03 ov42 libvirtd: 2018-02-01 09:34:03.841+0000: 6076: error :
> storageDriverAutostart:204 : internal error: Failed to autostart storage
> pool 'localvm7I0SSJ': cannot open directory '/var/tmp/localvm7I0SSJ': No
> such file or directory
> Feb  1 10:34:03 ov42 systemd: Stopping Suspend/Resume Running libvirt
> Guests...
> Feb  1 10:34:04 ov42 libvirt-guests.sh: Running guests on
> qemu+tls://ov42.mydomain/system URI: HostedEngineLocal
> Feb  1 10:34:04 ov42 libvirt-guests.sh: Shutting down guests on
> qemu+tls://ov42.mydomain/system URI...
> Feb  1 10:34:04 ov42 libvirt-guests.sh: Starting shutdown on guest:
> HostedEngineLocal
>

You definitively hit this one:
https://bugzilla.redhat.com/show_bug.cgi?id=1539040
host-deploy stops libvirt-guests triggering a shutdown of all the running
VMs (including HE one)

We rebuilt host-deploy with a fix for that today.
It affects only the host where libvirt-guests has already been configured
by a 4.2 host-deploy in the past.
As a workaround you have to manually stop libvirt-guests before and
deconfigure it on /etc/sysconfig/libvirt-guests.conf before running
hosted-engine-setup again.


> If I understood corrctly it seems that libvirtd took in charge the ip
> assignement, using the default 192.168.122.x network, while my host and my
> engine should be on 10.4.4.x...??
>

This is absolutely fine.
Let me explain: with the new ansible based flow we completely reverted the
hosted-engine deployment flow.
In the past hosted-engine-setup was directly preparing the host, the
storage, the network and a VM in advance via vdsm and the user was waiting
for the engine at the to auto-import everything with a lot of possible
issues in the middle.

Now hosted-engine-setup, doing everything via ansible, bootstraps a local
VM on local storage over the default natted libvirt network (that's why you
temporary see that address) and it deploys ovirt-engine there.
Then hosted-engine-setup will use the engine running on the bootstrap local
VM to set up everything else (storage, network, vm...) using the well know
and tested engine APIs.
Only at the end it migrates the disk of the local VM over the disk created
by engine on the shared storage and ovirt-ha-agent will boot the engine VM
from as usual.
More than that, at this point we don't need auto-import code on engine side
since all the involved entities are already know by the engine since it
created them.



> Currently on host, after the failed deploy, I have:
>
> # brctl show
> bridge name bridge id STP enabled interfaces
> ;vdsmdummy; 8000.000000000000 no
> ovirtmgmt 8000.001a4a17015d no eth0
> virbr0 8000.52540084b832 yes virbr0-nic
>
> BTW: on host I have network managed by NetworkManager. It is supported now
> in upcoming 4.2.1, isn't it?
>

Yes, it is.


>
> Gianluca
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20180201/43b953bb/attachment.html>


More information about the Users mailing list