
Hello, I had similar problem in the past with ESXi 6.0 U2 host where I defined a VM and used it as oVirt node for nested virtualization. If I run on this "virtual" node an L2 VM all went well until custom emulated machine value was at "pc-i440fx-rhel7.2.0". At some point the default became 7.3 and the L2 VM was not able to boot any more: it remained in blank screen at boot. The same up to now in 4.2.6.1 where this default seems to be 7.5 I'm having same problem in a mini lab on a nuc6 where I have ESXi 6.7 installed and try to setup an hosted engine environment. This problem prevents me to configure the hosted engine because the L2 VM that should be the hosted engine freezes (?) during its first startup and I see indeed that the qemu-kvm process is started with option "-machine pc-i440fx-rhel7.5.0" Being stated that I'm in an unsupported configuration, I would like to understand if I can correct in some point and proceed. So some questions: 1) How can I understand what is the difference between running qemu-kvm with "-machine pc-i440fx-rhel7.2.0" ad "-machine pc-i440fx-rhel7.5.0" so that I can then dig with ESXi parameters eventually to tune the settings of its VM that becomes my oVirt hypervisor? 2) Do I have any chance to configure my oVirt node so that when it runs the hosted-engine deploy it runs the engine VM with 7.2 parameter? Any conf file on host where I can somehow force 7.2 compatibility or set it as default? 3) during gluster base hosted-engine deployment at first "local" startup how can I connect to the console of the hosted engine VM and see where it is blocked? Thanks in advance, Gianluca

On Fri, Sep 28, 2018 at 12:06 PM Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
Hello, I had similar problem in the past with ESXi 6.0 U2 host where I defined a VM and used it as oVirt node for nested virtualization. If I run on this "virtual" node an L2 VM all went well until custom emulated machine value was at "pc-i440fx-rhel7.2.0". At some point the default became 7.3 and the L2 VM was not able to boot any more: it remained in blank screen at boot. The same up to now in 4.2.6.1 where this default seems to be 7.5
I'm having same problem in a mini lab on a nuc6 where I have ESXi 6.7 installed and try to setup an hosted engine environment. This problem prevents me to configure the hosted engine because the L2 VM that should be the hosted engine freezes (?) during its first startup and I see indeed that the qemu-kvm process is started with option "-machine pc-i440fx-rhel7.5.0"
Being stated that I'm in an unsupported configuration, I would like to understand if I can correct in some point and proceed. So some questions:
1) How can I understand what is the difference between running qemu-kvm with "-machine pc-i440fx-rhel7.2.0" ad "-machine pc-i440fx-rhel7.5.0" so that I can then dig with ESXi parameters eventually to tune the settings of its VM that becomes my oVirt hypervisor?
2) Do I have any chance to configure my oVirt node so that when it runs the hosted-engine deploy it runs the engine VM with 7.2 parameter? Any conf file on host where I can somehow force 7.2 compatibility or set it as default?
3) during gluster base hosted-engine deployment at first "local" startup how can I connect to the console of the hosted engine VM and see where it is blocked?
Thanks in advance, Gianluca
Hello, it seems this nasty problem in nested virt using pc-i440fx-rhel7.X.0 machine type with X >= 3 impacts not only vSphere as main hypervisor for nested KVM, but other hypervisors too (Hyper-V) and other machine types too and could be due to a bug in KVM, so in the kernel, if I understto correctly According to this link below https://bugs.launchpad.net/qemu/+bug/1636217 and its comment by Roman Kagan in June this year: " This is a KVM bug. It has been fixed in mainstream Linux in commit d391f1207067268261add0485f0f34503539c5b0 Author: Vitaly Kuznetsov <email address hidden> Date: Thu Jan 25 16:37:07 2018 +0100 x86/kvm/vmx: do not use vm-exit instruction length for fast MMIO when running nested I was investigating an issue with seabios >= 1.10 which stopped working for nested KVM on Hyper-V. The problem appears to be in handle_ept_violation() function: when we do fast mmio we need to skip the instruction so we do kvm_skip_emulated_instruction(). This, however, depends on VM_EXIT_INSTRUCTION_LEN field being set correctly in VMCS. However, this is not the case. Intel's manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set when EPT MISCONFIG occurs. While on real hardware it was observed to be set, some hypervisors follow the spec and don't set it; we end up advancing IP with some random value. I checked with Microsoft and they confirmed they don't fill VM_EXIT_INSTRUCTION_LEN on EPT MISCONFIG. Fix the issue by doing instruction skip through emulator when running nested. Fixes: 68c3b4d1676d870f0453c31d5a52e7e65c7448ae Suggested-by: Radim Krčmář <email address hidden> Suggested-by: Paolo Bonzini <email address hidden> Signed-off-by: Vitaly Kuznetsov <email address hidden> Acked-by: Michael S. Tsirkin <email address hidden> Signed-off-by: Radim Krčmář <email address hidden> Although the commit mentions Hyper-V as L0 hypervisor, the same problem pertains to ESXi. The commit is included in v4.16. " Is it possible to backport the fix to the kernel provided by plain RHEL/CentOS hosts and/or RHVH/ovirt-ng nodes? Thanks, Gianluca
participants (1)
-
Gianluca Cecchi