[lago-devel] [ovirt-devel] OST: vm_run fails for me (basic-suite-master)

Michal Skrivanek mskrivan at redhat.com
Fri Feb 10 10:16:58 UTC 2017


> On 10 Feb 2017, at 10:26, Michal Skrivanek <mskrivan at redhat.com> wrote:
> 
> 
>> On 9 Feb 2017, at 16:16, Ondrej Svoboda <osvoboda at redhat.com <mailto:osvoboda at redhat.com>> wrote:
>> 
>> Do you mean https://github.com/lago-project/lago/pull/398 <https://github.com/lago-project/lago/pull/398> which has been merged for over a month?
>> 
>> The second sentence in the PR (below) is contradicted by newer, non-recognized CPUs, such as Skylake.
> 
> How/why? Westmere should have been selected in that case

And if it was and didn’t work for you then it is a nested virtualization compatibility bug you should report to QEMU/KVM folks.

> 
>> 
>> "This patch fixes the problems by selecting a minimum reasonable CPU model for the given hardware platform. Westmere is selected unless older or non-Intel hardware is used."
>> 
>> On Thu, Feb 9, 2017 at 4:07 PM, Michal Skrivanek <mskrivan at redhat.com <mailto:mskrivan at redhat.com>> wrote:
>> What happened to Milan' PR from a while ago addressing this exact situation?
>> 
>> On 08 Feb 2017, at 16:04, Ondrej Svoboda <osvoboda at redhat.com <mailto:osvoboda at redhat.com>> wrote:
>> 
>>> In my case, simply adding Skylake-Client a supported CPU family did the trick: https://github.com/lago-project/lago/pull/448 <https://github.com/lago-project/lago/pull/448>
>>> 
>>> i wonder if Westmere is a good fallback -- it works for you on Broadwell, right?
>>> 
>>> On Wed, Feb 8, 2017 at 1:58 PM, Nadav Goldin <ngoldin at redhat.com <mailto:ngoldin at redhat.com>> wrote:
>>> I would first try testing it without OST, because in OST it will pick
>>> the CPU via the cluster family(which is controlled in virt.py). You
>>> can try specifying the 'cpu_model' in the init file, skipping the 'cpu
>>> family' logic, something like:
>>> 
>>> > cat LagoInitFile
>>> domains:
>>>   vm-el73:
>>>     memory: 2048
>>>     service_provider: systemd
>>>     cpu_model: Broadwell
>>>     nics:
>>>       - net: lago
>>>     disks:
>>>       - template_name: el7.3-base
>>>         type: template
>>>         name: root
>>>         dev: vda
>>>         format: qcow2
>>> nets:
>>>   lago:
>>>     type: nat
>>>     dhcp:
>>>       start: 100
>>>       end: 254
>>>     management: true
>>>     dns_domain_name: lago.local
>>> 
>>> > lago init && lago start
>>> 
>>> Then install lago again in the VM, copy the same init file, and check
>>> if for different combinations of cpu_model it works for you - would
>>> give us a hint how to solve this. The 'cpu_model' basically translates
>>> to this xml definition in libvirt:
>>>   <cpu mode='custom' match='exact'>
>>>     <model fallback='allow'>Broadwell</model>
>>>     <topology sockets='2' cores='1' threads='1'/>
>>>     <feature policy='optional' name='vmx'/>
>>>     <feature policy='optional' name='svm'/>
>>>   </cpu>
>>> 
>>> I tried manually editing it also to host-passthrough, but still failed
>>> on the same error. The thing is that the 'kvm_put_msrs: Assertion `ret
>>> == n' failed.' error doesn't give any indication where it failed(or if
>>> the cpu is missing a flag), maybe there is a way to debug this at
>>> qemu/kvm level? I'm not sure.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Wed, Feb 8, 2017 at 1:18 PM, Ondrej Svoboda <osvoboda at redhat.com <mailto:osvoboda at redhat.com>> wrote:
>>> > It is a Skylake-H, and I can see it is not mentioned in lago/virt.py.
>>> >
>>> > I guess I'll step through the code (as well as other places discovered by
>>> > 'git grep cpu') and see if I could solve this by adding the Skylake family
>>> > to _CPU_FAMILIES.
>>> >
>>> > Do you have other pointers?
>>> >
>>> > Thanks,
>>> > Ondra
>>> >
>>> > On Tue, Feb 7, 2017 at 10:40 PM, Nadav Goldin <ngoldin at redhat.com <mailto:ngoldin at redhat.com>> wrote:
>>> >>
>>> >> What is the host CPU you are using?
>>> >> I came across the same error few days ago, but without running OST, I
>>> >> tried running with Lago:
>>> >> fc24 host -> el7 vm -> el7 vm.
>>> >>
>>> >> I have a slight suspect that it is related to the CPU model we
>>> >> configure in libvirt, I tried a mixture of few
>>> >> combinations(host-pass-through, pinning down the CPU model), but it
>>> >> always failed on the same error:
>>> >> kvm_put_msrs: Assertion `ret == n' failed.
>>> >>
>>> >> My CPU is Broadwell btw.
>>> >>
>>> >>
>>> >> Milan, any ideas? you think it might be related?
>>> >>
>>> >> Nadav.
>>> >>
>>> >>
>>> >>
>>> >> On Tue, Feb 7, 2017 at 11:14 PM, Ondrej Svoboda <osvoboda at redhat.com <mailto:osvoboda at redhat.com>>
>>> >> wrote:
>>> >> > Yes, I stated that in my message.
>>> >> >
>>> >> > root at osvoboda-t460p /home/src/ovirt-system-tests (git)-[master] # cat
>>> >> > /sys/module/kvm_intel/parameters/nested
>>> >> > :(
>>> >> > Y
>>> >> >
>>> >> > On Tue, Feb 7, 2017 at 1:39 PM, Eyal Edri <eedri at redhat.com <mailto:eedri at redhat.com>> wrote:
>>> >> >>
>>> >> >> Did you follow the instructions on [1] ?
>>> >> >>
>>> >> >> Specifically, verifying  ' cat /sys/module/kvm_intel/parameters/nested
>>> >> >> '
>>> >> >> gives you 'Y'.
>>> >> >>
>>> >> >> [1]
>>> >> >>
>>> >> >> http://ovirt-system-tests.readthedocs.io/en/latest/docs/general/installation.html <http://ovirt-system-tests.readthedocs.io/en/latest/docs/general/installation.html>
>>> >> >>
>>> >> >> On Tue, Feb 7, 2017 at 2:29 PM, Ondrej Svoboda <osvoboda at redhat.com <mailto:osvoboda at redhat.com>>
>>> >> >> wrote:
>>> >> >>>
>>> >> >>> Hi everyone,
>>> >> >>>
>>> >> >>> Even though I have nested virtualization enabled in my Arch Linux
>>> >> >>> system
>>> >> >>> which I use to run OST, vm_run is the first test to fail in
>>> >> >>> 004_basic_sanity
>>> >> >>> (followed by snapshots_merge and suspend_resume_vm).
>>> >> >>>
>>> >> >>> Can you point me to what I might be missing? I believe I get the same
>>> >> >>> failure even on Fedora.
>>> >> >>>
>>> >> >>> This is what host0's CPU capabilities look like (vmx is there):
>>> >> >>> [root at lago-basic-suite-master-host0 ~]# cat /proc/cpuinfo
>>> >> >>> processor    : 0
>>> >> >>> vendor_id    : GenuineIntel
>>> >> >>> cpu family    : 6
>>> >> >>> model        : 44
>>> >> >>> model name    : Westmere E56xx/L56xx/X56xx (Nehalem-C)
>>> >> >>> stepping    : 1
>>> >> >>> microcode    : 0x1
>>> >> >>> cpu MHz        : 2711.988
>>> >> >>> cache size    : 16384 KB
>>> >> >>> physical id    : 0
>>> >> >>> siblings    : 1
>>> >> >>> core id        : 0
>>> >> >>> cpu cores    : 1
>>> >> >>> apicid        : 0
>>> >> >>> initial apicid    : 0
>>> >> >>> fpu        : yes
>>> >> >>> fpu_exception    : yes
>>> >> >>> cpuid level    : 11
>>> >> >>> wp        : yes
>>> >> >>> flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
>>> >> >>> mca
>>> >> >>> cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx lm constant_tsc
>>> >> >>> rep_good
>>> >> >>> nopl xtopology pni pclmulqdq vmx ssse3 cx16 sse4_1 sse4_2 x2apic
>>> >> >>> popcnt aes
>>> >> >>> hypervisor lahf_lm arat tpr_shadow vnmi flexpriority ept vpid
>>> >> >>> bogomips    : 5423.97
>>> >> >>> clflush size    : 64
>>> >> >>> cache_alignment    : 64
>>> >> >>> address sizes    : 40 bits physical, 48 bits virtual
>>> >> >>> power management:
>>> >> >>>
>>> >> >>> journalctl -b on host0 shows that libvirt complains about NUMA
>>> >> >>> configuration:
>>> >> >>>
>>> >> >>> Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]: libvirt
>>> >> >>> version: 2.0.0, package: 10.el7_3.4 (CentOS BuildSystem
>>> >> >>> <http://bugs.centos.org <http://bugs.centos.org/>>, 2017-01-17-23 <tel:2017-01-17-23>:37:48, c1bm.rdu2.centos.org <http://c1bm.rdu2.centos.org/>)
>>> >> >>> Feb 07 06:33:23 lago-basic-suite-master-host0 kernel: ovirtmgmt: port
>>> >> >>> 2(vnet0) entered disabled state
>>> >> >>> Feb 07 06:33:23 lago-basic-suite-master-host0 kernel: device vnet0
>>> >> >>> left
>>> >> >>> promiscuous mode
>>> >> >>> Feb 07 06:33:23 lago-basic-suite-master-host0 kernel: ovirtmgmt: port
>>> >> >>> 2(vnet0) entered disabled state
>>> >> >>> Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]:
>>> >> >>> hostname:
>>> >> >>> lago-basic-suite-master-host0.lago.local
>>> >> >>> Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]: Unable
>>> >> >>> to
>>> >> >>> read from monitor: Connection reset by peer
>>> >> >>> Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]:
>>> >> >>> internal
>>> >> >>> error: qemu unexpectedly closed the monitor:
>>> >> >>> 2017-02-07T11:33:23.058571Z
>>> >> >>> qemu-kvm: warning: CPU(s) not present in any NUMA nodes: 1 2 3 4 5 6 7
>>> >> >>> 8 9
>>> >> >>> 10 11 12 13 14 15
>>> >> >>>
>>> >> >>> 2017-02-07T11:33:23.058826Z qemu-kvm: warning: All CPU(s) up to
>>> >> >>> maxcpus
>>> >> >>> should be described in NUMA config
>>> >> >>>
>>> >> >>> qemu-kvm:
>>> >> >>> /builddir/build/BUILD/qemu-2.6.0/target-i386/kvm.c:1736: kvm_put_msrs:
>>> >> >>> Assertion `ret == n' failed.
>>> >> >>> Feb 07 06:33:23 lago-basic-suite-master-host0 NetworkManager[657]:
>>> >> >>> <info>
>>> >> >>> [1486467203.1025] device (vnet0): state change: disconnected ->
>>> >> >>> unmanaged
>>> >> >>> (reason 'unmanaged') [30 10 3]
>>> >> >>> Feb 07 06:33:23 lago-basic-suite-master-host0 kvm[22059]: 0 guests now
>>> >> >>> active
>>> >> >>> Feb 07 06:33:23 lago-basic-suite-master-host0 systemd-machined[22044]:
>>> >> >>> Machine qemu-1-vm0 terminated.
>>> >> >>>
>>> >> >>> Thanks,
>>> >> >>> Ondra
>>> >> >>>
>>> >> >>> _______________________________________________
>>> >> >>> Devel mailing list
>>> >> >>> Devel at ovirt.org <mailto:Devel at ovirt.org>
>>> >> >>> http://lists.ovirt.org/mailman/listinfo/devel <http://lists.ovirt.org/mailman/listinfo/devel>
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> --
>>> >> >> Eyal Edri
>>> >> >> Associate Manager
>>> >> >> RHV DevOps
>>> >> >> EMEA ENG Virtualization R&D
>>> >> >> Red Hat Israel
>>> >> >>
>>> >> >> phone: +972-9-7692018 <tel:%2B972-9-7692018>
>>> >> >> irc: eedri (on #tlv #rhev-dev #rhev-integ)
>>> >> >
>>> >> >
>>> >> >
>>> >> > _______________________________________________
>>> >> > Devel mailing list
>>> >> > Devel at ovirt.org <mailto:Devel at ovirt.org>
>>> >> > http://lists.ovirt.org/mailman/listinfo/devel <http://lists.ovirt.org/mailman/listinfo/devel>
>>> >
>>> >
>>> 
>>> _______________________________________________
>>> Devel mailing list
>>> Devel at ovirt.org <mailto:Devel at ovirt.org>
>>> http://lists.ovirt.org/mailman/listinfo/devel <http://lists.ovirt.org/mailman/listinfo/devel>
>> _______________________________________________
>> Devel mailing list
>> Devel at ovirt.org <mailto:Devel at ovirt.org>
>> http://lists.ovirt.org/mailman/listinfo/devel <http://lists.ovirt.org/mailman/listinfo/devel>
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/lago-devel/attachments/20170210/5d266bfe/attachment.html>


More information about the lago-devel mailing list