On Tue, Oct 16, 2018 at 3:23 PM Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
Hello,
I send a dedicated subject message on this topic (second attempt because the first one seems not to be present in ovirt archive..).
Also, the reply to my other related message seems not visible inside list archive page for some reason.

It seems this nasty problem in nested virt using  pc-i440fx-rhel7.X.0 machine type with X >= 3 impacts not only vSphere as main hypervisor for nested KVM, but other hypervisors too (Hyper-V) and other machine types too and could be due to a bug in KVM, so in the kernel, if I understood correctly.

 
In the mean time I'm trying to use a workaround to setup a 3-hosts HCI environment usign 3 VMs inside ESXi.
My approach:
- cp -p /usr/libexec/qemu-kvm /usr/libexec/qemu-kvm.orig
- rm /usr/libexec/qemu-kvm
- create a new /usr/libexec/qemu-kvm that is a wrapper:
#!/bin/bash
 
i=0
while [ $# -gt 0 ]; do
    case "$1" in
    -machine)
        shift 2;;
    *)
        args[i]="$1"
        (( i++ ))
        shift ;;
    esac
done
 
exec /usr/libexec/qemu-kvm.orig -machine pc-i440fx-rhel7.2.0 "${args[@]}"
- chmod 755 /usr/libexec/qemu-kvm
- chcon system_u:object_r:qemu_exec_t:s0 qemu-kvm
- chcon system_u:object_r:qemu_exec_t:s0 qemu-kvm.orig 

And then I proceed with my setup from cockpit.
All goes well, with local hosted engine vm created from appliance, engine-setup done, host addition done, storage domain for engine done, but then it arrives a step where guestfish comes into place and I have the error below.

Executing ps command before guestfish fails I see:
[root@ovirt01 ~]# ps -ef|grep guestf
root      28812  28807  5 16:55 pts/1    00:00:00 guestfish -a /var/tmp/localvmxmSf0U/images/65f7f081-4d9e-43ae-926f-25807f075f1d/a0a00e73-d3ea-4b9b-bd26-06fe189931f2 --rw -i copy-in /var/tmp/localvmxmSf0U/ifcfg-eth0 /etc/sysconfig/network-scripts : selinux-relabel /etc/selinux/targeted/contexts/files/file_contexts /etc/sysconfig/network-scripts/ifcfg-eth0 force:true
root      28833  28812 33 16:55 pts/1    00:00:00 /usr/libexec/qemu-kvm.orig -machine pc-i440fx-rhel7.2.0 -global virtio-blk-pci.scsi=off -nodefconfig -enable-fips -nodefaults -display none -cpu host -m 500 -no-reboot -rtc driftfix=slew -no-hpet -global kvm-pit.lost_tick_policy=discard -kernel /var/tmp/.guestfs-0/appliance.d/kernel -initrd /var/tmp/.guestfs-0/appliance.d/initrd -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,rng=rng0 -device virtio-scsi-pci,id=scsi -drive file=/var/tmp/localvmxmSf0U/images/65f7f081-4d9e-43ae-926f-25807f075f1d/a0a00e73-d3ea-4b9b-bd26-06fe189931f2,cache=writeback,id=hd0,if=none -device scsi-hd,drive=hd0 -drive file=/var/tmp/.guestfs-0/appliance.d/root,snapshot=on,id=appliance,cache=unsafe,if=none,format=raw -device scsi-hd,drive=appliance -device virtio-serial-pci -serial stdio -chardev socket,path=/tmp/libguestfsAdBLA9/guestfsd.sock,id=channel0 -device virtserialport,chardev=channel0,name=org.libguestfs.channel.0 -append panic=1 console=ttyS0 edd=off udevtimeout=6000 udev.event-timeout=6000 no_timer_check printk.time=1 cgroup_disable=memory usbcore.nousb cryptomgr.notests tsc=reliable 8250.nr_uarts=1 root=/dev/sdb selinux=0 quiet TERM=xterm-256color
root      28834  28812  0 16:55 pts/1    00:00:00 guestfish -a /var/tmp/localvmxmSf0U/images/65f7f081-4d9e-43ae-926f-25807f075f1d/a0a00e73-d3ea-4b9b-bd26-06fe189931f2 --rw -i copy-in /var/tmp/localvmxmSf0U/ifcfg-eth0 /etc/sysconfig/network-scripts : selinux-relabel /etc/selinux/targeted/contexts/files/file_contexts /etc/sysconfig/network-scripts/ifcfg-eth0 force:true

But then I get this in gui 

libguestfs: error: appliance closed the connection unexpectedly.\nThis usually means the libguestfs appliance crashed
Complete output
. . .
[ INFO ] TASK [Copy configuration files to the right location on host]
[ INFO ] TASK [Copy configuration archive to storage]
[ INFO ] changed: [localhost]
[ INFO ] TASK [Initialize metadata volume]
[ INFO ] changed: [localhost]
[ INFO ] TASK [include_tasks]
[ INFO ] ok: [localhost]
[ INFO ] TASK [Find the local appliance image]
[ INFO ] ok: [localhost]
[ INFO ] TASK [Set local_vm_disk_path]
[ INFO ] ok: [localhost]
[ INFO ] TASK [Generate DHCP network configuration for the engine VM]
[ INFO ] skipping: [localhost]
[ INFO ] TASK [Generate static network configuration for the engine VM]
[ INFO ] changed: [localhost]
[ INFO ] TASK [Inject network configuration with guestfish]
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": true, "cmd": ["guestfish", "-a", "/var/tmp/localvmxmSf0U/images/65f7f081-4d9e-43ae-926f-25807f075f1d/a0a00e73-d3ea-4b9b-bd26-06fe189931f2", "--rw", "-i", "copy-in", "/var/tmp/localvmxmSf0U/ifcfg-eth0", "/etc/sysconfig/network-scripts", ":", "selinux-relabel", "/etc/selinux/targeted/contexts/files/file_contexts", "/etc/sysconfig/network-scripts/ifcfg-eth0", "force:true"], "delta": "0:00:01.821590", "end": "2018-10-16 16:55:12.044900", "msg": "non-zero return code", "rc": 1, "start": "2018-10-16 16:55:10.223310", "stderr": "libguestfs: error: appliance closed the connection unexpectedly.\nThis usually means the libguestfs appliance crashed.\nDo:\n export LIBGUESTFS_DEBUG=1 LIBGUESTFS_TRACE=1\nand run the command again. For further information, read:\n http://libguestfs.org/guestfs-faq.1.html#debugging-libguestfs\nYou can also run 'libguestfs-test-tool' and post the *complete* output\ninto a bug report or message to the libguestfs mailing list.\nlibguestfs: error: /usr/libexec/qemu-kvm killed by signal 6 (Aborted).\nTo see full error messages you may need to enable debugging.\nDo:\n export LIBGUESTFS_DEBUG=1 LIBGUESTFS_TRACE=1\nand run the command again. For further information, read:\n http://libguestfs.org/guestfs-faq.1.html#debugging-libguestfs\nYou can also run 'libguestfs-test-tool' and post the *complete* output\ninto a bug report or message to the libguestfs mailing list.\nlibguestfs: error: guestfs_launch failed.\nThis usually means the libguestfs appliance failed to start or crashed.\nDo:\n export LIBGUESTFS_DEBUG=1 LIBGUESTFS_TRACE=1\nand run the command again. For further information, read:\n http://libguestfs.org/guestfs-faq.1.html#debugging-libguestfs\nYou can also run 'libguestfs-test-tool' and post the *complete* output\ninto a bug report or message to the libguestfs mailing list.", "stderr_lines": ["libguestfs: error: appliance closed the connection unexpectedly.", "This usually means the libguestfs appliance crashed.", "Do:", " export LIBGUESTFS_DEBUG=1 LIBGUESTFS_TRACE=1", "and run the command again. For further information, read:", " http://libguestfs.org/guestfs-faq.1.html#debugging-libguestfs", "You can also run 'libguestfs-test-tool' and post the *complete* output", "into a bug report or message to the libguestfs mailing list.", "libguestfs: error: /usr/libexec/qemu-kvm killed by signal 6 (Aborted).", "To see full error messages you may need to enable debugging.", "Do:", " export LIBGUESTFS_DEBUG=1 LIBGUESTFS_TRACE=1", "and run the command again. For further information, read:", " http://libguestfs.org/guestfs-faq.1.html#debugging-libguestfs", "You can also run 'libguestfs-test-tool' and post the *complete* output", "into a bug report or message to the libguestfs mailing list.", "libguestfs: error: guestfs_launch failed.", "This usually means the libguestfs appliance failed to start or crashed.", "Do:", " export LIBGUESTFS_DEBUG=1 LIBGUESTFS_TRACE=1", "and run the command again. For further information, read:", " http://libguestfs.org/guestfs-faq.1.html#debugging-libguestfs", "You can also run 'libguestfs-test-tool' and post the *complete* output", "into a bug report or message to the libguestfs mailing list."], "stdout": "", "stdout_lines": []}

Any hint on how to debug guestfish problem, so where to put th suggested debug env variables for cockpit to adopt them, or understand if it is not related with the problem to be nested inside ESXi?
The nodes are ovirt-ng-nodes based on ovirt-node-ng-4.2.6.1-0.20180913.0

Thanks,
Gianluca