[lago-devel] [ovirt-devel] OST: vm_run fails for me (basic-suite-master)
Michal Skrivanek
mskrivan at redhat.com
Fri Feb 10 09:26:28 UTC 2017
> On 9 Feb 2017, at 16:16, Ondrej Svoboda <osvoboda at redhat.com> wrote:
>
> Do you mean https://github.com/lago-project/lago/pull/398 <https://github.com/lago-project/lago/pull/398> which has been merged for over a month?
>
> The second sentence in the PR (below) is contradicted by newer, non-recognized CPUs, such as Skylake.
How/why? Westmere should have been selected in that case
>
> "This patch fixes the problems by selecting a minimum reasonable CPU model for the given hardware platform. Westmere is selected unless older or non-Intel hardware is used."
>
> On Thu, Feb 9, 2017 at 4:07 PM, Michal Skrivanek <mskrivan at redhat.com <mailto:mskrivan at redhat.com>> wrote:
> What happened to Milan' PR from a while ago addressing this exact situation?
>
> On 08 Feb 2017, at 16:04, Ondrej Svoboda <osvoboda at redhat.com <mailto:osvoboda at redhat.com>> wrote:
>
>> In my case, simply adding Skylake-Client a supported CPU family did the trick: https://github.com/lago-project/lago/pull/448 <https://github.com/lago-project/lago/pull/448>
>>
>> i wonder if Westmere is a good fallback -- it works for you on Broadwell, right?
>>
>> On Wed, Feb 8, 2017 at 1:58 PM, Nadav Goldin <ngoldin at redhat.com <mailto:ngoldin at redhat.com>> wrote:
>> I would first try testing it without OST, because in OST it will pick
>> the CPU via the cluster family(which is controlled in virt.py). You
>> can try specifying the 'cpu_model' in the init file, skipping the 'cpu
>> family' logic, something like:
>>
>> > cat LagoInitFile
>> domains:
>> vm-el73:
>> memory: 2048
>> service_provider: systemd
>> cpu_model: Broadwell
>> nics:
>> - net: lago
>> disks:
>> - template_name: el7.3-base
>> type: template
>> name: root
>> dev: vda
>> format: qcow2
>> nets:
>> lago:
>> type: nat
>> dhcp:
>> start: 100
>> end: 254
>> management: true
>> dns_domain_name: lago.local
>>
>> > lago init && lago start
>>
>> Then install lago again in the VM, copy the same init file, and check
>> if for different combinations of cpu_model it works for you - would
>> give us a hint how to solve this. The 'cpu_model' basically translates
>> to this xml definition in libvirt:
>> <cpu mode='custom' match='exact'>
>> <model fallback='allow'>Broadwell</model>
>> <topology sockets='2' cores='1' threads='1'/>
>> <feature policy='optional' name='vmx'/>
>> <feature policy='optional' name='svm'/>
>> </cpu>
>>
>> I tried manually editing it also to host-passthrough, but still failed
>> on the same error. The thing is that the 'kvm_put_msrs: Assertion `ret
>> == n' failed.' error doesn't give any indication where it failed(or if
>> the cpu is missing a flag), maybe there is a way to debug this at
>> qemu/kvm level? I'm not sure.
>>
>>
>>
>>
>>
>>
>> On Wed, Feb 8, 2017 at 1:18 PM, Ondrej Svoboda <osvoboda at redhat.com <mailto:osvoboda at redhat.com>> wrote:
>> > It is a Skylake-H, and I can see it is not mentioned in lago/virt.py.
>> >
>> > I guess I'll step through the code (as well as other places discovered by
>> > 'git grep cpu') and see if I could solve this by adding the Skylake family
>> > to _CPU_FAMILIES.
>> >
>> > Do you have other pointers?
>> >
>> > Thanks,
>> > Ondra
>> >
>> > On Tue, Feb 7, 2017 at 10:40 PM, Nadav Goldin <ngoldin at redhat.com <mailto:ngoldin at redhat.com>> wrote:
>> >>
>> >> What is the host CPU you are using?
>> >> I came across the same error few days ago, but without running OST, I
>> >> tried running with Lago:
>> >> fc24 host -> el7 vm -> el7 vm.
>> >>
>> >> I have a slight suspect that it is related to the CPU model we
>> >> configure in libvirt, I tried a mixture of few
>> >> combinations(host-pass-through, pinning down the CPU model), but it
>> >> always failed on the same error:
>> >> kvm_put_msrs: Assertion `ret == n' failed.
>> >>
>> >> My CPU is Broadwell btw.
>> >>
>> >>
>> >> Milan, any ideas? you think it might be related?
>> >>
>> >> Nadav.
>> >>
>> >>
>> >>
>> >> On Tue, Feb 7, 2017 at 11:14 PM, Ondrej Svoboda <osvoboda at redhat.com <mailto:osvoboda at redhat.com>>
>> >> wrote:
>> >> > Yes, I stated that in my message.
>> >> >
>> >> > root at osvoboda-t460p /home/src/ovirt-system-tests (git)-[master] # cat
>> >> > /sys/module/kvm_intel/parameters/nested
>> >> > :(
>> >> > Y
>> >> >
>> >> > On Tue, Feb 7, 2017 at 1:39 PM, Eyal Edri <eedri at redhat.com <mailto:eedri at redhat.com>> wrote:
>> >> >>
>> >> >> Did you follow the instructions on [1] ?
>> >> >>
>> >> >> Specifically, verifying ' cat /sys/module/kvm_intel/parameters/nested
>> >> >> '
>> >> >> gives you 'Y'.
>> >> >>
>> >> >> [1]
>> >> >>
>> >> >> http://ovirt-system-tests.readthedocs.io/en/latest/docs/general/installation.html <http://ovirt-system-tests.readthedocs.io/en/latest/docs/general/installation.html>
>> >> >>
>> >> >> On Tue, Feb 7, 2017 at 2:29 PM, Ondrej Svoboda <osvoboda at redhat.com <mailto:osvoboda at redhat.com>>
>> >> >> wrote:
>> >> >>>
>> >> >>> Hi everyone,
>> >> >>>
>> >> >>> Even though I have nested virtualization enabled in my Arch Linux
>> >> >>> system
>> >> >>> which I use to run OST, vm_run is the first test to fail in
>> >> >>> 004_basic_sanity
>> >> >>> (followed by snapshots_merge and suspend_resume_vm).
>> >> >>>
>> >> >>> Can you point me to what I might be missing? I believe I get the same
>> >> >>> failure even on Fedora.
>> >> >>>
>> >> >>> This is what host0's CPU capabilities look like (vmx is there):
>> >> >>> [root at lago-basic-suite-master-host0 ~]# cat /proc/cpuinfo
>> >> >>> processor : 0
>> >> >>> vendor_id : GenuineIntel
>> >> >>> cpu family : 6
>> >> >>> model : 44
>> >> >>> model name : Westmere E56xx/L56xx/X56xx (Nehalem-C)
>> >> >>> stepping : 1
>> >> >>> microcode : 0x1
>> >> >>> cpu MHz : 2711.988
>> >> >>> cache size : 16384 KB
>> >> >>> physical id : 0
>> >> >>> siblings : 1
>> >> >>> core id : 0
>> >> >>> cpu cores : 1
>> >> >>> apicid : 0
>> >> >>> initial apicid : 0
>> >> >>> fpu : yes
>> >> >>> fpu_exception : yes
>> >> >>> cpuid level : 11
>> >> >>> wp : yes
>> >> >>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
>> >> >>> mca
>> >> >>> cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx lm constant_tsc
>> >> >>> rep_good
>> >> >>> nopl xtopology pni pclmulqdq vmx ssse3 cx16 sse4_1 sse4_2 x2apic
>> >> >>> popcnt aes
>> >> >>> hypervisor lahf_lm arat tpr_shadow vnmi flexpriority ept vpid
>> >> >>> bogomips : 5423.97
>> >> >>> clflush size : 64
>> >> >>> cache_alignment : 64
>> >> >>> address sizes : 40 bits physical, 48 bits virtual
>> >> >>> power management:
>> >> >>>
>> >> >>> journalctl -b on host0 shows that libvirt complains about NUMA
>> >> >>> configuration:
>> >> >>>
>> >> >>> Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]: libvirt
>> >> >>> version: 2.0.0, package: 10.el7_3.4 (CentOS BuildSystem
>> >> >>> <http://bugs.centos.org <http://bugs.centos.org/>>, 2017-01-17-23 <tel:2017-01-17-23>:37:48, c1bm.rdu2.centos.org <http://c1bm.rdu2.centos.org/>)
>> >> >>> Feb 07 06:33:23 lago-basic-suite-master-host0 kernel: ovirtmgmt: port
>> >> >>> 2(vnet0) entered disabled state
>> >> >>> Feb 07 06:33:23 lago-basic-suite-master-host0 kernel: device vnet0
>> >> >>> left
>> >> >>> promiscuous mode
>> >> >>> Feb 07 06:33:23 lago-basic-suite-master-host0 kernel: ovirtmgmt: port
>> >> >>> 2(vnet0) entered disabled state
>> >> >>> Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]:
>> >> >>> hostname:
>> >> >>> lago-basic-suite-master-host0.lago.local
>> >> >>> Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]: Unable
>> >> >>> to
>> >> >>> read from monitor: Connection reset by peer
>> >> >>> Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]:
>> >> >>> internal
>> >> >>> error: qemu unexpectedly closed the monitor:
>> >> >>> 2017-02-07T11:33:23.058571Z
>> >> >>> qemu-kvm: warning: CPU(s) not present in any NUMA nodes: 1 2 3 4 5 6 7
>> >> >>> 8 9
>> >> >>> 10 11 12 13 14 15
>> >> >>>
>> >> >>> 2017-02-07T11:33:23.058826Z qemu-kvm: warning: All CPU(s) up to
>> >> >>> maxcpus
>> >> >>> should be described in NUMA config
>> >> >>>
>> >> >>> qemu-kvm:
>> >> >>> /builddir/build/BUILD/qemu-2.6.0/target-i386/kvm.c:1736: kvm_put_msrs:
>> >> >>> Assertion `ret == n' failed.
>> >> >>> Feb 07 06:33:23 lago-basic-suite-master-host0 NetworkManager[657]:
>> >> >>> <info>
>> >> >>> [1486467203.1025] device (vnet0): state change: disconnected ->
>> >> >>> unmanaged
>> >> >>> (reason 'unmanaged') [30 10 3]
>> >> >>> Feb 07 06:33:23 lago-basic-suite-master-host0 kvm[22059]: 0 guests now
>> >> >>> active
>> >> >>> Feb 07 06:33:23 lago-basic-suite-master-host0 systemd-machined[22044]:
>> >> >>> Machine qemu-1-vm0 terminated.
>> >> >>>
>> >> >>> Thanks,
>> >> >>> Ondra
>> >> >>>
>> >> >>> _______________________________________________
>> >> >>> Devel mailing list
>> >> >>> Devel at ovirt.org <mailto:Devel at ovirt.org>
>> >> >>> http://lists.ovirt.org/mailman/listinfo/devel <http://lists.ovirt.org/mailman/listinfo/devel>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Eyal Edri
>> >> >> Associate Manager
>> >> >> RHV DevOps
>> >> >> EMEA ENG Virtualization R&D
>> >> >> Red Hat Israel
>> >> >>
>> >> >> phone: +972-9-7692018 <tel:%2B972-9-7692018>
>> >> >> irc: eedri (on #tlv #rhev-dev #rhev-integ)
>> >> >
>> >> >
>> >> >
>> >> > _______________________________________________
>> >> > Devel mailing list
>> >> > Devel at ovirt.org <mailto:Devel at ovirt.org>
>> >> > http://lists.ovirt.org/mailman/listinfo/devel <http://lists.ovirt.org/mailman/listinfo/devel>
>> >
>> >
>>
>> _______________________________________________
>> Devel mailing list
>> Devel at ovirt.org <mailto:Devel at ovirt.org>
>> http://lists.ovirt.org/mailman/listinfo/devel <http://lists.ovirt.org/mailman/listinfo/devel>
> _______________________________________________
> Devel mailing list
> Devel at ovirt.org <mailto:Devel at ovirt.org>
> http://lists.ovirt.org/mailman/listinfo/devel <http://lists.ovirt.org/mailman/listinfo/devel>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/lago-devel/attachments/20170210/6f534595/attachment.html>
More information about the lago-devel
mailing list