[lago-devel] [ovirt-devel] OST: vm_run fails for me (basic-suite-master)

Nadav Goldin ngoldin at redhat.com
Wed Feb 8 12:58:33 UTC 2017


I would first try testing it without OST, because in OST it will pick
the CPU via the cluster family(which is controlled in virt.py). You
can try specifying the 'cpu_model' in the init file, skipping the 'cpu
family' logic, something like:

> cat LagoInitFile
domains:
  vm-el73:
    memory: 2048
    service_provider: systemd
    cpu_model: Broadwell
    nics:
      - net: lago
    disks:
      - template_name: el7.3-base
        type: template
        name: root
        dev: vda
        format: qcow2
nets:
  lago:
    type: nat
    dhcp:
      start: 100
      end: 254
    management: true
    dns_domain_name: lago.local

> lago init && lago start

Then install lago again in the VM, copy the same init file, and check
if for different combinations of cpu_model it works for you - would
give us a hint how to solve this. The 'cpu_model' basically translates
to this xml definition in libvirt:
  <cpu mode='custom' match='exact'>
    <model fallback='allow'>Broadwell</model>
    <topology sockets='2' cores='1' threads='1'/>
    <feature policy='optional' name='vmx'/>
    <feature policy='optional' name='svm'/>
  </cpu>

I tried manually editing it also to host-passthrough, but still failed
on the same error. The thing is that the 'kvm_put_msrs: Assertion `ret
== n' failed.' error doesn't give any indication where it failed(or if
the cpu is missing a flag), maybe there is a way to debug this at
qemu/kvm level? I'm not sure.






On Wed, Feb 8, 2017 at 1:18 PM, Ondrej Svoboda <osvoboda at redhat.com> wrote:
> It is a Skylake-H, and I can see it is not mentioned in lago/virt.py.
>
> I guess I'll step through the code (as well as other places discovered by
> 'git grep cpu') and see if I could solve this by adding the Skylake family
> to _CPU_FAMILIES.
>
> Do you have other pointers?
>
> Thanks,
> Ondra
>
> On Tue, Feb 7, 2017 at 10:40 PM, Nadav Goldin <ngoldin at redhat.com> wrote:
>>
>> What is the host CPU you are using?
>> I came across the same error few days ago, but without running OST, I
>> tried running with Lago:
>> fc24 host -> el7 vm -> el7 vm.
>>
>> I have a slight suspect that it is related to the CPU model we
>> configure in libvirt, I tried a mixture of few
>> combinations(host-pass-through, pinning down the CPU model), but it
>> always failed on the same error:
>> kvm_put_msrs: Assertion `ret == n' failed.
>>
>> My CPU is Broadwell btw.
>>
>>
>> Milan, any ideas? you think it might be related?
>>
>> Nadav.
>>
>>
>>
>> On Tue, Feb 7, 2017 at 11:14 PM, Ondrej Svoboda <osvoboda at redhat.com>
>> wrote:
>> > Yes, I stated that in my message.
>> >
>> > root at osvoboda-t460p /home/src/ovirt-system-tests (git)-[master] # cat
>> > /sys/module/kvm_intel/parameters/nested
>> > :(
>> > Y
>> >
>> > On Tue, Feb 7, 2017 at 1:39 PM, Eyal Edri <eedri at redhat.com> wrote:
>> >>
>> >> Did you follow the instructions on [1] ?
>> >>
>> >> Specifically, verifying  ' cat /sys/module/kvm_intel/parameters/nested
>> >> '
>> >> gives you 'Y'.
>> >>
>> >> [1]
>> >>
>> >> http://ovirt-system-tests.readthedocs.io/en/latest/docs/general/installation.html
>> >>
>> >> On Tue, Feb 7, 2017 at 2:29 PM, Ondrej Svoboda <osvoboda at redhat.com>
>> >> wrote:
>> >>>
>> >>> Hi everyone,
>> >>>
>> >>> Even though I have nested virtualization enabled in my Arch Linux
>> >>> system
>> >>> which I use to run OST, vm_run is the first test to fail in
>> >>> 004_basic_sanity
>> >>> (followed by snapshots_merge and suspend_resume_vm).
>> >>>
>> >>> Can you point me to what I might be missing? I believe I get the same
>> >>> failure even on Fedora.
>> >>>
>> >>> This is what host0's CPU capabilities look like (vmx is there):
>> >>> [root at lago-basic-suite-master-host0 ~]# cat /proc/cpuinfo
>> >>> processor    : 0
>> >>> vendor_id    : GenuineIntel
>> >>> cpu family    : 6
>> >>> model        : 44
>> >>> model name    : Westmere E56xx/L56xx/X56xx (Nehalem-C)
>> >>> stepping    : 1
>> >>> microcode    : 0x1
>> >>> cpu MHz        : 2711.988
>> >>> cache size    : 16384 KB
>> >>> physical id    : 0
>> >>> siblings    : 1
>> >>> core id        : 0
>> >>> cpu cores    : 1
>> >>> apicid        : 0
>> >>> initial apicid    : 0
>> >>> fpu        : yes
>> >>> fpu_exception    : yes
>> >>> cpuid level    : 11
>> >>> wp        : yes
>> >>> flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
>> >>> mca
>> >>> cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx lm constant_tsc
>> >>> rep_good
>> >>> nopl xtopology pni pclmulqdq vmx ssse3 cx16 sse4_1 sse4_2 x2apic
>> >>> popcnt aes
>> >>> hypervisor lahf_lm arat tpr_shadow vnmi flexpriority ept vpid
>> >>> bogomips    : 5423.97
>> >>> clflush size    : 64
>> >>> cache_alignment    : 64
>> >>> address sizes    : 40 bits physical, 48 bits virtual
>> >>> power management:
>> >>>
>> >>> journalctl -b on host0 shows that libvirt complains about NUMA
>> >>> configuration:
>> >>>
>> >>> Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]: libvirt
>> >>> version: 2.0.0, package: 10.el7_3.4 (CentOS BuildSystem
>> >>> <http://bugs.centos.org>, 2017-01-17-23:37:48, c1bm.rdu2.centos.org)
>> >>> Feb 07 06:33:23 lago-basic-suite-master-host0 kernel: ovirtmgmt: port
>> >>> 2(vnet0) entered disabled state
>> >>> Feb 07 06:33:23 lago-basic-suite-master-host0 kernel: device vnet0
>> >>> left
>> >>> promiscuous mode
>> >>> Feb 07 06:33:23 lago-basic-suite-master-host0 kernel: ovirtmgmt: port
>> >>> 2(vnet0) entered disabled state
>> >>> Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]:
>> >>> hostname:
>> >>> lago-basic-suite-master-host0.lago.local
>> >>> Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]: Unable
>> >>> to
>> >>> read from monitor: Connection reset by peer
>> >>> Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]:
>> >>> internal
>> >>> error: qemu unexpectedly closed the monitor:
>> >>> 2017-02-07T11:33:23.058571Z
>> >>> qemu-kvm: warning: CPU(s) not present in any NUMA nodes: 1 2 3 4 5 6 7
>> >>> 8 9
>> >>> 10 11 12 13 14 15
>> >>>
>> >>> 2017-02-07T11:33:23.058826Z qemu-kvm: warning: All CPU(s) up to
>> >>> maxcpus
>> >>> should be described in NUMA config
>> >>>
>> >>> qemu-kvm:
>> >>> /builddir/build/BUILD/qemu-2.6.0/target-i386/kvm.c:1736: kvm_put_msrs:
>> >>> Assertion `ret == n' failed.
>> >>> Feb 07 06:33:23 lago-basic-suite-master-host0 NetworkManager[657]:
>> >>> <info>
>> >>> [1486467203.1025] device (vnet0): state change: disconnected ->
>> >>> unmanaged
>> >>> (reason 'unmanaged') [30 10 3]
>> >>> Feb 07 06:33:23 lago-basic-suite-master-host0 kvm[22059]: 0 guests now
>> >>> active
>> >>> Feb 07 06:33:23 lago-basic-suite-master-host0 systemd-machined[22044]:
>> >>> Machine qemu-1-vm0 terminated.
>> >>>
>> >>> Thanks,
>> >>> Ondra
>> >>>
>> >>> _______________________________________________
>> >>> Devel mailing list
>> >>> Devel at ovirt.org
>> >>> http://lists.ovirt.org/mailman/listinfo/devel
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Eyal Edri
>> >> Associate Manager
>> >> RHV DevOps
>> >> EMEA ENG Virtualization R&D
>> >> Red Hat Israel
>> >>
>> >> phone: +972-9-7692018
>> >> irc: eedri (on #tlv #rhev-dev #rhev-integ)
>> >
>> >
>> >
>> > _______________________________________________
>> > Devel mailing list
>> > Devel at ovirt.org
>> > http://lists.ovirt.org/mailman/listinfo/devel
>
>



More information about the lago-devel mailing list