Re: [lago-devel] [ovirt-devel] OST: vm_run fails for me (basic-suite-master)

What is the host CPU you are using? I came across the same error few days ago, but without running OST, I tried running with Lago: fc24 host -> el7 vm -> el7 vm. I have a slight suspect that it is related to the CPU model we configure in libvirt, I tried a mixture of few combinations(host-pass-through, pinning down the CPU model), but it always failed on the same error: kvm_put_msrs: Assertion `ret == n' failed. My CPU is Broadwell btw. Milan, any ideas? you think it might be related? Nadav. On Tue, Feb 7, 2017 at 11:14 PM, Ondrej Svoboda <osvoboda@redhat.com> wrote:
Yes, I stated that in my message.
root@osvoboda-t460p /home/src/ovirt-system-tests (git)-[master] # cat /sys/module/kvm_intel/parameters/nested :( Y
On Tue, Feb 7, 2017 at 1:39 PM, Eyal Edri <eedri@redhat.com> wrote:
Did you follow the instructions on [1] ?
Specifically, verifying ' cat /sys/module/kvm_intel/parameters/nested ' gives you 'Y'.
[1] http://ovirt-system-tests.readthedocs.io/en/latest/docs/general/installation...
On Tue, Feb 7, 2017 at 2:29 PM, Ondrej Svoboda <osvoboda@redhat.com> wrote:
Hi everyone,
Even though I have nested virtualization enabled in my Arch Linux system which I use to run OST, vm_run is the first test to fail in 004_basic_sanity (followed by snapshots_merge and suspend_resume_vm).
Can you point me to what I might be missing? I believe I get the same failure even on Fedora.
This is what host0's CPU capabilities look like (vmx is there): [root@lago-basic-suite-master-host0 ~]# cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 44 model name : Westmere E56xx/L56xx/X56xx (Nehalem-C) stepping : 1 microcode : 0x1 cpu MHz : 2711.988 cache size : 16384 KB physical id : 0 siblings : 1 core id : 0 cpu cores : 1 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx lm constant_tsc rep_good nopl xtopology pni pclmulqdq vmx ssse3 cx16 sse4_1 sse4_2 x2apic popcnt aes hypervisor lahf_lm arat tpr_shadow vnmi flexpriority ept vpid bogomips : 5423.97 clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management:
journalctl -b on host0 shows that libvirt complains about NUMA configuration:
Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]: libvirt version: 2.0.0, package: 10.el7_3.4 (CentOS BuildSystem <http://bugs.centos.org>, 2017-01-17-23:37:48, c1bm.rdu2.centos.org) Feb 07 06:33:23 lago-basic-suite-master-host0 kernel: ovirtmgmt: port 2(vnet0) entered disabled state Feb 07 06:33:23 lago-basic-suite-master-host0 kernel: device vnet0 left promiscuous mode Feb 07 06:33:23 lago-basic-suite-master-host0 kernel: ovirtmgmt: port 2(vnet0) entered disabled state Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]: hostname: lago-basic-suite-master-host0.lago.local Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]: Unable to read from monitor: Connection reset by peer Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]: internal error: qemu unexpectedly closed the monitor: 2017-02-07T11:33:23.058571Z qemu-kvm: warning: CPU(s) not present in any NUMA nodes: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
2017-02-07T11:33:23.058826Z qemu-kvm: warning: All CPU(s) up to maxcpus should be described in NUMA config qemu-kvm: /builddir/build/BUILD/qemu-2.6.0/target-i386/kvm.c:1736: kvm_put_msrs: Assertion `ret == n' failed. Feb 07 06:33:23 lago-basic-suite-master-host0 NetworkManager[657]: <info> [1486467203.1025] device (vnet0): state change: disconnected -> unmanaged (reason 'unmanaged') [30 10 3] Feb 07 06:33:23 lago-basic-suite-master-host0 kvm[22059]: 0 guests now active Feb 07 06:33:23 lago-basic-suite-master-host0 systemd-machined[22044]: Machine qemu-1-vm0 terminated.
Thanks, Ondra
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
-- Eyal Edri Associate Manager RHV DevOps EMEA ENG Virtualization R&D Red Hat Israel
phone: +972-9-7692018 irc: eedri (on #tlv #rhev-dev #rhev-integ)
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

It is a Skylake-H, and I can see it is not mentioned in lago/virt.py. I guess I'll step through the code (as well as other places discovered by 'git grep cpu') and see if I could solve this by adding the Skylake family to _CPU_FAMILIES. Do you have other pointers? Thanks, Ondra On Tue, Feb 7, 2017 at 10:40 PM, Nadav Goldin <ngoldin@redhat.com> wrote:
What is the host CPU you are using? I came across the same error few days ago, but without running OST, I tried running with Lago: fc24 host -> el7 vm -> el7 vm.
I have a slight suspect that it is related to the CPU model we configure in libvirt, I tried a mixture of few combinations(host-pass-through, pinning down the CPU model), but it always failed on the same error: kvm_put_msrs: Assertion `ret == n' failed.
My CPU is Broadwell btw.
Milan, any ideas? you think it might be related?
Nadav.
Yes, I stated that in my message.
root@osvoboda-t460p /home/src/ovirt-system-tests (git)-[master] # cat /sys/module/kvm_intel/parameters/nested :( Y
On Tue, Feb 7, 2017 at 1:39 PM, Eyal Edri <eedri@redhat.com> wrote:
Did you follow the instructions on [1] ?
Specifically, verifying ' cat /sys/module/kvm_intel/parameters/nested
'
gives you 'Y'.
[1] http://ovirt-system-tests.readthedocs.io/en/latest/docs/ general/installation.html
On Tue, Feb 7, 2017 at 2:29 PM, Ondrej Svoboda <osvoboda@redhat.com> wrote:
Hi everyone,
Even though I have nested virtualization enabled in my Arch Linux
system
which I use to run OST, vm_run is the first test to fail in 004_basic_sanity (followed by snapshots_merge and suspend_resume_vm).
Can you point me to what I might be missing? I believe I get the same failure even on Fedora.
This is what host0's CPU capabilities look like (vmx is there): [root@lago-basic-suite-master-host0 ~]# cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 44 model name : Westmere E56xx/L56xx/X56xx (Nehalem-C) stepping : 1 microcode : 0x1 cpu MHz : 2711.988 cache size : 16384 KB physical id : 0 siblings : 1 core id : 0 cpu cores : 1 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx lm constant_tsc rep_good nopl xtopology pni pclmulqdq vmx ssse3 cx16 sse4_1 sse4_2 x2apic
On Tue, Feb 7, 2017 at 11:14 PM, Ondrej Svoboda <osvoboda@redhat.com> wrote: popcnt aes
hypervisor lahf_lm arat tpr_shadow vnmi flexpriority ept vpid bogomips : 5423.97 clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management:
journalctl -b on host0 shows that libvirt complains about NUMA configuration:
Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]: libvirt version: 2.0.0, package: 10.el7_3.4 (CentOS BuildSystem <http://bugs.centos.org>, 2017-01-17-23:37:48, c1bm.rdu2.centos.org) Feb 07 06:33:23 lago-basic-suite-master-host0 kernel: ovirtmgmt: port 2(vnet0) entered disabled state Feb 07 06:33:23 lago-basic-suite-master-host0 kernel: device vnet0 left promiscuous mode Feb 07 06:33:23 lago-basic-suite-master-host0 kernel: ovirtmgmt: port 2(vnet0) entered disabled state Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]: hostname: lago-basic-suite-master-host0.lago.local Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]: Unable to read from monitor: Connection reset by peer Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]: internal error: qemu unexpectedly closed the monitor: 2017-02-07T11:33:23.058571Z qemu-kvm: warning: CPU(s) not present in any NUMA nodes: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
2017-02-07T11:33:23.058826Z qemu-kvm: warning: All CPU(s) up to maxcpus should be described in NUMA config
qemu-kvm:
/builddir/build/BUILD/qemu-2.6.0/target-i386/kvm.c:1736: kvm_put_msrs: Assertion `ret == n' failed. Feb 07 06:33:23 lago-basic-suite-master-host0 NetworkManager[657]: <info> [1486467203.1025] device (vnet0): state change: disconnected -> unmanaged (reason 'unmanaged') [30 10 3] Feb 07 06:33:23 lago-basic-suite-master-host0 kvm[22059]: 0 guests now active Feb 07 06:33:23 lago-basic-suite-master-host0 systemd-machined[22044]: Machine qemu-1-vm0 terminated.
Thanks, Ondra
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
-- Eyal Edri Associate Manager RHV DevOps EMEA ENG Virtualization R&D Red Hat Israel
phone: +972-9-7692018 irc: eedri (on #tlv #rhev-dev #rhev-integ)
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

I would first try testing it without OST, because in OST it will pick the CPU via the cluster family(which is controlled in virt.py). You can try specifying the 'cpu_model' in the init file, skipping the 'cpu family' logic, something like:
cat LagoInitFile domains: vm-el73: memory: 2048 service_provider: systemd cpu_model: Broadwell nics: - net: lago disks: - template_name: el7.3-base type: template name: root dev: vda format: qcow2 nets: lago: type: nat dhcp: start: 100 end: 254 management: true dns_domain_name: lago.local
lago init && lago start
Then install lago again in the VM, copy the same init file, and check if for different combinations of cpu_model it works for you - would give us a hint how to solve this. The 'cpu_model' basically translates to this xml definition in libvirt: <cpu mode='custom' match='exact'> <model fallback='allow'>Broadwell</model> <topology sockets='2' cores='1' threads='1'/> <feature policy='optional' name='vmx'/> <feature policy='optional' name='svm'/> </cpu> I tried manually editing it also to host-passthrough, but still failed on the same error. The thing is that the 'kvm_put_msrs: Assertion `ret == n' failed.' error doesn't give any indication where it failed(or if the cpu is missing a flag), maybe there is a way to debug this at qemu/kvm level? I'm not sure. On Wed, Feb 8, 2017 at 1:18 PM, Ondrej Svoboda <osvoboda@redhat.com> wrote:
It is a Skylake-H, and I can see it is not mentioned in lago/virt.py.
I guess I'll step through the code (as well as other places discovered by 'git grep cpu') and see if I could solve this by adding the Skylake family to _CPU_FAMILIES.
Do you have other pointers?
Thanks, Ondra
On Tue, Feb 7, 2017 at 10:40 PM, Nadav Goldin <ngoldin@redhat.com> wrote:
What is the host CPU you are using? I came across the same error few days ago, but without running OST, I tried running with Lago: fc24 host -> el7 vm -> el7 vm.
I have a slight suspect that it is related to the CPU model we configure in libvirt, I tried a mixture of few combinations(host-pass-through, pinning down the CPU model), but it always failed on the same error: kvm_put_msrs: Assertion `ret == n' failed.
My CPU is Broadwell btw.
Milan, any ideas? you think it might be related?
Nadav.
On Tue, Feb 7, 2017 at 11:14 PM, Ondrej Svoboda <osvoboda@redhat.com> wrote:
Yes, I stated that in my message.
root@osvoboda-t460p /home/src/ovirt-system-tests (git)-[master] # cat /sys/module/kvm_intel/parameters/nested :( Y
On Tue, Feb 7, 2017 at 1:39 PM, Eyal Edri <eedri@redhat.com> wrote:
Did you follow the instructions on [1] ?
Specifically, verifying ' cat /sys/module/kvm_intel/parameters/nested ' gives you 'Y'.
[1]
http://ovirt-system-tests.readthedocs.io/en/latest/docs/general/installation...
On Tue, Feb 7, 2017 at 2:29 PM, Ondrej Svoboda <osvoboda@redhat.com> wrote:
Hi everyone,
Even though I have nested virtualization enabled in my Arch Linux system which I use to run OST, vm_run is the first test to fail in 004_basic_sanity (followed by snapshots_merge and suspend_resume_vm).
Can you point me to what I might be missing? I believe I get the same failure even on Fedora.
This is what host0's CPU capabilities look like (vmx is there): [root@lago-basic-suite-master-host0 ~]# cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 44 model name : Westmere E56xx/L56xx/X56xx (Nehalem-C) stepping : 1 microcode : 0x1 cpu MHz : 2711.988 cache size : 16384 KB physical id : 0 siblings : 1 core id : 0 cpu cores : 1 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx lm constant_tsc rep_good nopl xtopology pni pclmulqdq vmx ssse3 cx16 sse4_1 sse4_2 x2apic popcnt aes hypervisor lahf_lm arat tpr_shadow vnmi flexpriority ept vpid bogomips : 5423.97 clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management:
journalctl -b on host0 shows that libvirt complains about NUMA configuration:
Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]: libvirt version: 2.0.0, package: 10.el7_3.4 (CentOS BuildSystem <http://bugs.centos.org>, 2017-01-17-23:37:48, c1bm.rdu2.centos.org) Feb 07 06:33:23 lago-basic-suite-master-host0 kernel: ovirtmgmt: port 2(vnet0) entered disabled state Feb 07 06:33:23 lago-basic-suite-master-host0 kernel: device vnet0 left promiscuous mode Feb 07 06:33:23 lago-basic-suite-master-host0 kernel: ovirtmgmt: port 2(vnet0) entered disabled state Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]: hostname: lago-basic-suite-master-host0.lago.local Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]: Unable to read from monitor: Connection reset by peer Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]: internal error: qemu unexpectedly closed the monitor: 2017-02-07T11:33:23.058571Z qemu-kvm: warning: CPU(s) not present in any NUMA nodes: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
2017-02-07T11:33:23.058826Z qemu-kvm: warning: All CPU(s) up to maxcpus should be described in NUMA config
qemu-kvm: /builddir/build/BUILD/qemu-2.6.0/target-i386/kvm.c:1736: kvm_put_msrs: Assertion `ret == n' failed. Feb 07 06:33:23 lago-basic-suite-master-host0 NetworkManager[657]: <info> [1486467203.1025] device (vnet0): state change: disconnected -> unmanaged (reason 'unmanaged') [30 10 3] Feb 07 06:33:23 lago-basic-suite-master-host0 kvm[22059]: 0 guests now active Feb 07 06:33:23 lago-basic-suite-master-host0 systemd-machined[22044]: Machine qemu-1-vm0 terminated.
Thanks, Ondra
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
-- Eyal Edri Associate Manager RHV DevOps EMEA ENG Virtualization R&D Red Hat Israel
phone: +972-9-7692018 irc: eedri (on #tlv #rhev-dev #rhev-integ)
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

In my case, simply adding Skylake-Client a supported CPU family did the trick: https://github.com/lago-project/lago/pull/448 i wonder if Westmere is a good fallback -- it works for you on Broadwell, right? On Wed, Feb 8, 2017 at 1:58 PM, Nadav Goldin <ngoldin@redhat.com> wrote:
I would first try testing it without OST, because in OST it will pick the CPU via the cluster family(which is controlled in virt.py). You can try specifying the 'cpu_model' in the init file, skipping the 'cpu family' logic, something like:
cat LagoInitFile domains: vm-el73: memory: 2048 service_provider: systemd cpu_model: Broadwell nics: - net: lago disks: - template_name: el7.3-base type: template name: root dev: vda format: qcow2 nets: lago: type: nat dhcp: start: 100 end: 254 management: true dns_domain_name: lago.local
lago init && lago start
Then install lago again in the VM, copy the same init file, and check if for different combinations of cpu_model it works for you - would give us a hint how to solve this. The 'cpu_model' basically translates to this xml definition in libvirt: <cpu mode='custom' match='exact'> <model fallback='allow'>Broadwell</model> <topology sockets='2' cores='1' threads='1'/> <feature policy='optional' name='vmx'/> <feature policy='optional' name='svm'/> </cpu>
I tried manually editing it also to host-passthrough, but still failed on the same error. The thing is that the 'kvm_put_msrs: Assertion `ret == n' failed.' error doesn't give any indication where it failed(or if the cpu is missing a flag), maybe there is a way to debug this at qemu/kvm level? I'm not sure.
It is a Skylake-H, and I can see it is not mentioned in lago/virt.py.
I guess I'll step through the code (as well as other places discovered by 'git grep cpu') and see if I could solve this by adding the Skylake family to _CPU_FAMILIES.
Do you have other pointers?
Thanks, Ondra
On Tue, Feb 7, 2017 at 10:40 PM, Nadav Goldin <ngoldin@redhat.com> wrote:
What is the host CPU you are using? I came across the same error few days ago, but without running OST, I tried running with Lago: fc24 host -> el7 vm -> el7 vm.
I have a slight suspect that it is related to the CPU model we configure in libvirt, I tried a mixture of few combinations(host-pass-through, pinning down the CPU model), but it always failed on the same error: kvm_put_msrs: Assertion `ret == n' failed.
My CPU is Broadwell btw.
Milan, any ideas? you think it might be related?
Nadav.
On Tue, Feb 7, 2017 at 11:14 PM, Ondrej Svoboda <osvoboda@redhat.com> wrote:
Yes, I stated that in my message.
root@osvoboda-t460p /home/src/ovirt-system-tests (git)-[master] # cat /sys/module/kvm_intel/parameters/nested :( Y
On Tue, Feb 7, 2017 at 1:39 PM, Eyal Edri <eedri@redhat.com> wrote:
Did you follow the instructions on [1] ?
Specifically, verifying ' cat /sys/module/kvm_intel/
' gives you 'Y'.
[1]
http://ovirt-system-tests.readthedocs.io/en/latest/docs/ general/installation.html
On Tue, Feb 7, 2017 at 2:29 PM, Ondrej Svoboda <osvoboda@redhat.com> wrote:
Hi everyone,
Even though I have nested virtualization enabled in my Arch Linux system which I use to run OST, vm_run is the first test to fail in 004_basic_sanity (followed by snapshots_merge and suspend_resume_vm).
Can you point me to what I might be missing? I believe I get the
same
failure even on Fedora.
This is what host0's CPU capabilities look like (vmx is there): [root@lago-basic-suite-master-host0 ~]# cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 44 model name : Westmere E56xx/L56xx/X56xx (Nehalem-C) stepping : 1 microcode : 0x1 cpu MHz : 2711.988 cache size : 16384 KB physical id : 0 siblings : 1 core id : 0 cpu cores : 1 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx lm constant_tsc rep_good nopl xtopology pni pclmulqdq vmx ssse3 cx16 sse4_1 sse4_2 x2apic popcnt aes hypervisor lahf_lm arat tpr_shadow vnmi flexpriority ept vpid bogomips : 5423.97 clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management:
journalctl -b on host0 shows that libvirt complains about NUMA configuration:
Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]:
version: 2.0.0, package: 10.el7_3.4 (CentOS BuildSystem <http://bugs.centos.org>, 2017-01-17-23:37:48, c1bm.rdu2.centos.org ) Feb 07 06:33:23 lago-basic-suite-master-host0 kernel: ovirtmgmt:
2(vnet0) entered disabled state Feb 07 06:33:23 lago-basic-suite-master-host0 kernel: device vnet0 left promiscuous mode Feb 07 06:33:23 lago-basic-suite-master-host0 kernel: ovirtmgmt:
On Wed, Feb 8, 2017 at 1:18 PM, Ondrej Svoboda <osvoboda@redhat.com> wrote: parameters/nested libvirt port port
2(vnet0) entered disabled state Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]: hostname: lago-basic-suite-master-host0.lago.local Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]: Unable to read from monitor: Connection reset by peer Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]: internal error: qemu unexpectedly closed the monitor: 2017-02-07T11:33:23.058571Z qemu-kvm: warning: CPU(s) not present in any NUMA nodes: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
2017-02-07T11:33:23.058826Z qemu-kvm: warning: All CPU(s) up to maxcpus should be described in NUMA config
qemu-kvm: /builddir/build/BUILD/qemu-2.6.0/target-i386/kvm.c:1736: kvm_put_msrs: Assertion `ret == n' failed. Feb 07 06:33:23 lago-basic-suite-master-host0 NetworkManager[657]: <info> [1486467203.1025] device (vnet0): state change: disconnected -> unmanaged (reason 'unmanaged') [30 10 3] Feb 07 06:33:23 lago-basic-suite-master-host0 kvm[22059]: 0 guests now active Feb 07 06:33:23 lago-basic-suite-master-host0 systemd-machined[22044]: Machine qemu-1-vm0 terminated.
Thanks, Ondra
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
-- Eyal Edri Associate Manager RHV DevOps EMEA ENG Virtualization R&D Red Hat Israel
phone: +972-9-7692018 irc: eedri (on #tlv #rhev-dev #rhev-integ)
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

What happened to Milan' PR from a while ago addressing this exact situation? On 08 Feb 2017, at 16:04, Ondrej Svoboda <osvoboda@redhat.com> wrote: In my case, simply adding Skylake-Client a supported CPU family did the trick: https://github.com/lago-project/lago/pull/448 i wonder if Westmere is a good fallback -- it works for you on Broadwell, right? On Wed, Feb 8, 2017 at 1:58 PM, Nadav Goldin <ngoldin@redhat.com> wrote:
I would first try testing it without OST, because in OST it will pick the CPU via the cluster family(which is controlled in virt.py). You can try specifying the 'cpu_model' in the init file, skipping the 'cpu family' logic, something like:
cat LagoInitFile domains: vm-el73: memory: 2048 service_provider: systemd cpu_model: Broadwell nics: - net: lago disks: - template_name: el7.3-base type: template name: root dev: vda format: qcow2 nets: lago: type: nat dhcp: start: 100 end: 254 management: true dns_domain_name: lago.local
lago init && lago start
Then install lago again in the VM, copy the same init file, and check if for different combinations of cpu_model it works for you - would give us a hint how to solve this. The 'cpu_model' basically translates to this xml definition in libvirt: <cpu mode='custom' match='exact'> <model fallback='allow'>Broadwell</model> <topology sockets='2' cores='1' threads='1'/> <feature policy='optional' name='vmx'/> <feature policy='optional' name='svm'/> </cpu>
I tried manually editing it also to host-passthrough, but still failed on the same error. The thing is that the 'kvm_put_msrs: Assertion `ret == n' failed.' error doesn't give any indication where it failed(or if the cpu is missing a flag), maybe there is a way to debug this at qemu/kvm level? I'm not sure.
It is a Skylake-H, and I can see it is not mentioned in lago/virt.py.
I guess I'll step through the code (as well as other places discovered by 'git grep cpu') and see if I could solve this by adding the Skylake family to _CPU_FAMILIES.
Do you have other pointers?
Thanks, Ondra
On Tue, Feb 7, 2017 at 10:40 PM, Nadav Goldin <ngoldin@redhat.com> wrote:
What is the host CPU you are using? I came across the same error few days ago, but without running OST, I tried running with Lago: fc24 host -> el7 vm -> el7 vm.
I have a slight suspect that it is related to the CPU model we configure in libvirt, I tried a mixture of few combinations(host-pass-through, pinning down the CPU model), but it always failed on the same error: kvm_put_msrs: Assertion `ret == n' failed.
My CPU is Broadwell btw.
Milan, any ideas? you think it might be related?
Nadav.
On Tue, Feb 7, 2017 at 11:14 PM, Ondrej Svoboda <osvoboda@redhat.com> wrote:
Yes, I stated that in my message.
root@osvoboda-t460p /home/src/ovirt-system-tests (git)-[master] # cat /sys/module/kvm_intel/parameters/nested :( Y
On Tue, Feb 7, 2017 at 1:39 PM, Eyal Edri <eedri@redhat.com> wrote:
Did you follow the instructions on [1] ?
Specifically, verifying ' cat /sys/module/kvm_intel/
' gives you 'Y'.
[1]
http://ovirt-system-tests.readthedocs.io/en/latest/docs/ general/installation.html
On Tue, Feb 7, 2017 at 2:29 PM, Ondrej Svoboda <osvoboda@redhat.com> wrote:
Hi everyone,
Even though I have nested virtualization enabled in my Arch Linux system which I use to run OST, vm_run is the first test to fail in 004_basic_sanity (followed by snapshots_merge and suspend_resume_vm).
Can you point me to what I might be missing? I believe I get the
same
failure even on Fedora.
This is what host0's CPU capabilities look like (vmx is there): [root@lago-basic-suite-master-host0 ~]# cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 44 model name : Westmere E56xx/L56xx/X56xx (Nehalem-C) stepping : 1 microcode : 0x1 cpu MHz : 2711.988 cache size : 16384 KB physical id : 0 siblings : 1 core id : 0 cpu cores : 1 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx lm constant_tsc rep_good nopl xtopology pni pclmulqdq vmx ssse3 cx16 sse4_1 sse4_2 x2apic popcnt aes hypervisor lahf_lm arat tpr_shadow vnmi flexpriority ept vpid bogomips : 5423.97 clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management:
journalctl -b on host0 shows that libvirt complains about NUMA configuration:
Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]:
version: 2.0.0, package: 10.el7_3.4 (CentOS BuildSystem <http://bugs.centos.org>, 2017-01-17-23:37:48, c1bm.rdu2.centos.org ) Feb 07 06:33:23 lago-basic-suite-master-host0 kernel: ovirtmgmt:
2(vnet0) entered disabled state Feb 07 06:33:23 lago-basic-suite-master-host0 kernel: device vnet0 left promiscuous mode Feb 07 06:33:23 lago-basic-suite-master-host0 kernel: ovirtmgmt:
On Wed, Feb 8, 2017 at 1:18 PM, Ondrej Svoboda <osvoboda@redhat.com> wrote: parameters/nested libvirt port port
2(vnet0) entered disabled state Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]: hostname: lago-basic-suite-master-host0.lago.local Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]: Unable to read from monitor: Connection reset by peer Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]: internal error: qemu unexpectedly closed the monitor: 2017-02-07T11:33:23.058571Z qemu-kvm: warning: CPU(s) not present in any NUMA nodes: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
2017-02-07T11:33:23.058826Z qemu-kvm: warning: All CPU(s) up to maxcpus should be described in NUMA config
qemu-kvm: /builddir/build/BUILD/qemu-2.6.0/target-i386/kvm.c:1736: kvm_put_msrs: Assertion `ret == n' failed. Feb 07 06:33:23 lago-basic-suite-master-host0 NetworkManager[657]: <info> [1486467203.1025] device (vnet0): state change: disconnected -> unmanaged (reason 'unmanaged') [30 10 3] Feb 07 06:33:23 lago-basic-suite-master-host0 kvm[22059]: 0 guests now active Feb 07 06:33:23 lago-basic-suite-master-host0 systemd-machined[22044]: Machine qemu-1-vm0 terminated.
Thanks, Ondra
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
-- Eyal Edri Associate Manager RHV DevOps EMEA ENG Virtualization R&D Red Hat Israel
phone: +972-9-7692018 irc: eedri (on #tlv #rhev-dev #rhev-integ)
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

Do you mean https://github.com/lago-project/lago/pull/398 which has been merged for over a month? The second sentence in the PR (below) is contradicted by newer, non-recognized CPUs, such as Skylake. "This patch fixes the problems by selecting a minimum reasonable CPU model for the given hardware platform. Westmere is selected unless older or non-Intel hardware is used." On Thu, Feb 9, 2017 at 4:07 PM, Michal Skrivanek <mskrivan@redhat.com> wrote:
What happened to Milan' PR from a while ago addressing this exact situation?
On 08 Feb 2017, at 16:04, Ondrej Svoboda <osvoboda@redhat.com> wrote:
In my case, simply adding Skylake-Client a supported CPU family did the trick: https://github.com/lago-project/lago/pull/448
i wonder if Westmere is a good fallback -- it works for you on Broadwell, right?
On Wed, Feb 8, 2017 at 1:58 PM, Nadav Goldin <ngoldin@redhat.com> wrote:
I would first try testing it without OST, because in OST it will pick the CPU via the cluster family(which is controlled in virt.py). You can try specifying the 'cpu_model' in the init file, skipping the 'cpu family' logic, something like:
cat LagoInitFile domains: vm-el73: memory: 2048 service_provider: systemd cpu_model: Broadwell nics: - net: lago disks: - template_name: el7.3-base type: template name: root dev: vda format: qcow2 nets: lago: type: nat dhcp: start: 100 end: 254 management: true dns_domain_name: lago.local
lago init && lago start
Then install lago again in the VM, copy the same init file, and check if for different combinations of cpu_model it works for you - would give us a hint how to solve this. The 'cpu_model' basically translates to this xml definition in libvirt: <cpu mode='custom' match='exact'> <model fallback='allow'>Broadwell</model> <topology sockets='2' cores='1' threads='1'/> <feature policy='optional' name='vmx'/> <feature policy='optional' name='svm'/> </cpu>
I tried manually editing it also to host-passthrough, but still failed on the same error. The thing is that the 'kvm_put_msrs: Assertion `ret == n' failed.' error doesn't give any indication where it failed(or if the cpu is missing a flag), maybe there is a way to debug this at qemu/kvm level? I'm not sure.
It is a Skylake-H, and I can see it is not mentioned in lago/virt.py.
I guess I'll step through the code (as well as other places discovered by 'git grep cpu') and see if I could solve this by adding the Skylake family to _CPU_FAMILIES.
Do you have other pointers?
Thanks, Ondra
On Tue, Feb 7, 2017 at 10:40 PM, Nadav Goldin <ngoldin@redhat.com> wrote:
What is the host CPU you are using? I came across the same error few days ago, but without running OST, I tried running with Lago: fc24 host -> el7 vm -> el7 vm.
I have a slight suspect that it is related to the CPU model we configure in libvirt, I tried a mixture of few combinations(host-pass-through, pinning down the CPU model), but it always failed on the same error: kvm_put_msrs: Assertion `ret == n' failed.
My CPU is Broadwell btw.
Milan, any ideas? you think it might be related?
Nadav.
On Tue, Feb 7, 2017 at 11:14 PM, Ondrej Svoboda <osvoboda@redhat.com> wrote:
Yes, I stated that in my message.
root@osvoboda-t460p /home/src/ovirt-system-tests (git)-[master] #
cat
/sys/module/kvm_intel/parameters/nested :( Y
On Tue, Feb 7, 2017 at 1:39 PM, Eyal Edri <eedri@redhat.com> wrote:
Did you follow the instructions on [1] ?
Specifically, verifying ' cat /sys/module/kvm_intel/paramete
rs/nested
' gives you 'Y'.
[1]
http://ovirt-system-tests.readthedocs.io/en/latest/docs/gene ral/installation.html
On Tue, Feb 7, 2017 at 2:29 PM, Ondrej Svoboda <osvoboda@redhat.com
wrote: > > Hi everyone, > > Even though I have nested virtualization enabled in my Arch Linux > system > which I use to run OST, vm_run is the first test to fail in > 004_basic_sanity > (followed by snapshots_merge and suspend_resume_vm). > > Can you point me to what I might be missing? I believe I get the same > failure even on Fedora. > > This is what host0's CPU capabilities look like (vmx is there): > [root@lago-basic-suite-master-host0 ~]# cat /proc/cpuinfo > processor : 0 > vendor_id : GenuineIntel > cpu family : 6 > model : 44 > model name : Westmere E56xx/L56xx/X56xx (Nehalem-C) > stepping : 1 > microcode : 0x1 > cpu MHz : 2711.988 > cache size : 16384 KB > physical id : 0 > siblings : 1 > core id : 0 > cpu cores : 1 > apicid : 0 > initial apicid : 0 > fpu : yes > fpu_exception : yes > cpuid level : 11 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge > mca > cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx lm constant_tsc > rep_good > nopl xtopology pni pclmulqdq vmx ssse3 cx16 sse4_1 sse4_2 x2apic > popcnt aes > hypervisor lahf_lm arat tpr_shadow vnmi flexpriority ept vpid > bogomips : 5423.97 > clflush size : 64 > cache_alignment : 64 > address sizes : 40 bits physical, 48 bits virtual > power management: > > journalctl -b on host0 shows that libvirt complains about NUMA > configuration: > > Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]:
> version: 2.0.0, package: 10.el7_3.4 (CentOS BuildSystem > <http://bugs.centos.org>, 2017-01-17-23:37:48, c1bm.rdu2.centos.org) > Feb 07 06:33:23 lago-basic-suite-master-host0 kernel: ovirtmgmt:
> 2(vnet0) entered disabled state > Feb 07 06:33:23 lago-basic-suite-master-host0 kernel: device vnet0 > left > promiscuous mode > Feb 07 06:33:23 lago-basic-suite-master-host0 kernel: ovirtmgmt:
On Wed, Feb 8, 2017 at 1:18 PM, Ondrej Svoboda <osvoboda@redhat.com> wrote: libvirt port port
> 2(vnet0) entered disabled state > Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]: > hostname: > lago-basic-suite-master-host0.lago.local > Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]: Unable > to > read from monitor: Connection reset by peer > Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]: > internal > error: qemu unexpectedly closed the monitor: > 2017-02-07T11:33:23.058571Z > qemu-kvm: warning: CPU(s) not present in any NUMA nodes: 1 2 3 4 5 6 7 > 8 9 > 10 11 12 13 14 15 > > 2017-02-07T11:33:23.058826Z qemu-kvm: warning: All CPU(s) up to > maxcpus > should be described in NUMA config > > qemu-kvm: > /builddir/build/BUILD/qemu-2.6.0/target-i386/kvm.c:1736: kvm_put_msrs: > Assertion `ret == n' failed. > Feb 07 06:33:23 lago-basic-suite-master-host0 NetworkManager[657]: > <info> > [1486467203.1025] device (vnet0): state change: disconnected -> > unmanaged > (reason 'unmanaged') [30 10 3] > Feb 07 06:33:23 lago-basic-suite-master-host0 kvm[22059]: 0 guests now > active > Feb 07 06:33:23 lago-basic-suite-master-host0 systemd-machined[22044]: > Machine qemu-1-vm0 terminated. > > Thanks, > Ondra > > _______________________________________________ > Devel mailing list > Devel@ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel
-- Eyal Edri Associate Manager RHV DevOps EMEA ENG Virtualization R&D Red Hat Israel
phone: +972-9-7692018 irc: eedri (on #tlv #rhev-dev #rhev-integ)
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

On 9 Feb 2017, at 16:16, Ondrej Svoboda <osvoboda@redhat.com> wrote:
Do you mean https://github.com/lago-project/lago/pull/398 <https://github.com/lago-project/lago/pull/398> which has been merged for over a month?
The second sentence in the PR (below) is contradicted by newer, non-recognized CPUs, such as Skylake.
How/why? Westmere should have been selected in that case
"This patch fixes the problems by selecting a minimum reasonable CPU model for the given hardware platform. Westmere is selected unless older or non-Intel hardware is used."
On Thu, Feb 9, 2017 at 4:07 PM, Michal Skrivanek <mskrivan@redhat.com <mailto:mskrivan@redhat.com>> wrote: What happened to Milan' PR from a while ago addressing this exact situation?
On 08 Feb 2017, at 16:04, Ondrej Svoboda <osvoboda@redhat.com <mailto:osvoboda@redhat.com>> wrote:
In my case, simply adding Skylake-Client a supported CPU family did the trick: https://github.com/lago-project/lago/pull/448 <https://github.com/lago-project/lago/pull/448>
i wonder if Westmere is a good fallback -- it works for you on Broadwell, right?
On Wed, Feb 8, 2017 at 1:58 PM, Nadav Goldin <ngoldin@redhat.com <mailto:ngoldin@redhat.com>> wrote: I would first try testing it without OST, because in OST it will pick the CPU via the cluster family(which is controlled in virt.py). You can try specifying the 'cpu_model' in the init file, skipping the 'cpu family' logic, something like:
cat LagoInitFile domains: vm-el73: memory: 2048 service_provider: systemd cpu_model: Broadwell nics: - net: lago disks: - template_name: el7.3-base type: template name: root dev: vda format: qcow2 nets: lago: type: nat dhcp: start: 100 end: 254 management: true dns_domain_name: lago.local
lago init && lago start
Then install lago again in the VM, copy the same init file, and check if for different combinations of cpu_model it works for you - would give us a hint how to solve this. The 'cpu_model' basically translates to this xml definition in libvirt: <cpu mode='custom' match='exact'> <model fallback='allow'>Broadwell</model> <topology sockets='2' cores='1' threads='1'/> <feature policy='optional' name='vmx'/> <feature policy='optional' name='svm'/> </cpu>
I tried manually editing it also to host-passthrough, but still failed on the same error. The thing is that the 'kvm_put_msrs: Assertion `ret == n' failed.' error doesn't give any indication where it failed(or if the cpu is missing a flag), maybe there is a way to debug this at qemu/kvm level? I'm not sure.
On Wed, Feb 8, 2017 at 1:18 PM, Ondrej Svoboda <osvoboda@redhat.com <mailto:osvoboda@redhat.com>> wrote:
It is a Skylake-H, and I can see it is not mentioned in lago/virt.py.
I guess I'll step through the code (as well as other places discovered by 'git grep cpu') and see if I could solve this by adding the Skylake family to _CPU_FAMILIES.
Do you have other pointers?
Thanks, Ondra
On Tue, Feb 7, 2017 at 10:40 PM, Nadav Goldin <ngoldin@redhat.com <mailto:ngoldin@redhat.com>> wrote:
What is the host CPU you are using? I came across the same error few days ago, but without running OST, I tried running with Lago: fc24 host -> el7 vm -> el7 vm.
I have a slight suspect that it is related to the CPU model we configure in libvirt, I tried a mixture of few combinations(host-pass-through, pinning down the CPU model), but it always failed on the same error: kvm_put_msrs: Assertion `ret == n' failed.
My CPU is Broadwell btw.
Milan, any ideas? you think it might be related?
Nadav.
On Tue, Feb 7, 2017 at 11:14 PM, Ondrej Svoboda <osvoboda@redhat.com <mailto:osvoboda@redhat.com>> wrote:
Yes, I stated that in my message.
root@osvoboda-t460p /home/src/ovirt-system-tests (git)-[master] # cat /sys/module/kvm_intel/parameters/nested :( Y
On Tue, Feb 7, 2017 at 1:39 PM, Eyal Edri <eedri@redhat.com <mailto:eedri@redhat.com>> wrote:
Did you follow the instructions on [1] ?
Specifically, verifying ' cat /sys/module/kvm_intel/parameters/nested ' gives you 'Y'.
[1]
http://ovirt-system-tests.readthedocs.io/en/latest/docs/general/installation... <http://ovirt-system-tests.readthedocs.io/en/latest/docs/general/installation.html>
On Tue, Feb 7, 2017 at 2:29 PM, Ondrej Svoboda <osvoboda@redhat.com <mailto:osvoboda@redhat.com>> wrote: > > Hi everyone, > > Even though I have nested virtualization enabled in my Arch Linux > system > which I use to run OST, vm_run is the first test to fail in > 004_basic_sanity > (followed by snapshots_merge and suspend_resume_vm). > > Can you point me to what I might be missing? I believe I get the same > failure even on Fedora. > > This is what host0's CPU capabilities look like (vmx is there): > [root@lago-basic-suite-master-host0 ~]# cat /proc/cpuinfo > processor : 0 > vendor_id : GenuineIntel > cpu family : 6 > model : 44 > model name : Westmere E56xx/L56xx/X56xx (Nehalem-C) > stepping : 1 > microcode : 0x1 > cpu MHz : 2711.988 > cache size : 16384 KB > physical id : 0 > siblings : 1 > core id : 0 > cpu cores : 1 > apicid : 0 > initial apicid : 0 > fpu : yes > fpu_exception : yes > cpuid level : 11 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge > mca > cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx lm constant_tsc > rep_good > nopl xtopology pni pclmulqdq vmx ssse3 cx16 sse4_1 sse4_2 x2apic > popcnt aes > hypervisor lahf_lm arat tpr_shadow vnmi flexpriority ept vpid > bogomips : 5423.97 > clflush size : 64 > cache_alignment : 64 > address sizes : 40 bits physical, 48 bits virtual > power management: > > journalctl -b on host0 shows that libvirt complains about NUMA > configuration: > > Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]: libvirt > version: 2.0.0, package: 10.el7_3.4 (CentOS BuildSystem > <http://bugs.centos.org <http://bugs.centos.org/>>, 2017-01-17-23 <tel:2017-01-17-23>:37:48, c1bm.rdu2.centos.org <http://c1bm.rdu2.centos.org/>) > Feb 07 06:33:23 lago-basic-suite-master-host0 kernel: ovirtmgmt: port > 2(vnet0) entered disabled state > Feb 07 06:33:23 lago-basic-suite-master-host0 kernel: device vnet0 > left > promiscuous mode > Feb 07 06:33:23 lago-basic-suite-master-host0 kernel: ovirtmgmt: port > 2(vnet0) entered disabled state > Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]: > hostname: > lago-basic-suite-master-host0.lago.local > Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]: Unable > to > read from monitor: Connection reset by peer > Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]: > internal > error: qemu unexpectedly closed the monitor: > 2017-02-07T11:33:23.058571Z > qemu-kvm: warning: CPU(s) not present in any NUMA nodes: 1 2 3 4 5 6 7 > 8 9 > 10 11 12 13 14 15 > > 2017-02-07T11:33:23.058826Z qemu-kvm: warning: All CPU(s) up to > maxcpus > should be described in NUMA config > > qemu-kvm: > /builddir/build/BUILD/qemu-2.6.0/target-i386/kvm.c:1736: kvm_put_msrs: > Assertion `ret == n' failed. > Feb 07 06:33:23 lago-basic-suite-master-host0 NetworkManager[657]: > <info> > [1486467203.1025] device (vnet0): state change: disconnected -> > unmanaged > (reason 'unmanaged') [30 10 3] > Feb 07 06:33:23 lago-basic-suite-master-host0 kvm[22059]: 0 guests now > active > Feb 07 06:33:23 lago-basic-suite-master-host0 systemd-machined[22044]: > Machine qemu-1-vm0 terminated. > > Thanks, > Ondra > > _______________________________________________ > Devel mailing list > Devel@ovirt.org <mailto:Devel@ovirt.org> > http://lists.ovirt.org/mailman/listinfo/devel <http://lists.ovirt.org/mailman/listinfo/devel>
-- Eyal Edri Associate Manager RHV DevOps EMEA ENG Virtualization R&D Red Hat Israel
phone: +972-9-7692018 <tel:%2B972-9-7692018> irc: eedri (on #tlv #rhev-dev #rhev-integ)
_______________________________________________ Devel mailing list Devel@ovirt.org <mailto:Devel@ovirt.org> http://lists.ovirt.org/mailman/listinfo/devel <http://lists.ovirt.org/mailman/listinfo/devel>
_______________________________________________ Devel mailing list Devel@ovirt.org <mailto:Devel@ovirt.org> http://lists.ovirt.org/mailman/listinfo/devel <http://lists.ovirt.org/mailman/listinfo/devel>
Devel mailing list Devel@ovirt.org <mailto:Devel@ovirt.org> http://lists.ovirt.org/mailman/listinfo/devel <http://lists.ovirt.org/mailman/listinfo/devel>

On 10 Feb 2017, at 10:26, Michal Skrivanek <mskrivan@redhat.com> wrote:
On 9 Feb 2017, at 16:16, Ondrej Svoboda <osvoboda@redhat.com <mailto:osvoboda@redhat.com>> wrote:
Do you mean https://github.com/lago-project/lago/pull/398 <https://github.com/lago-project/lago/pull/398> which has been merged for over a month?
The second sentence in the PR (below) is contradicted by newer, non-recognized CPUs, such as Skylake.
How/why? Westmere should have been selected in that case
And if it was and didn’t work for you then it is a nested virtualization compatibility bug you should report to QEMU/KVM folks.
"This patch fixes the problems by selecting a minimum reasonable CPU model for the given hardware platform. Westmere is selected unless older or non-Intel hardware is used."
On Thu, Feb 9, 2017 at 4:07 PM, Michal Skrivanek <mskrivan@redhat.com <mailto:mskrivan@redhat.com>> wrote: What happened to Milan' PR from a while ago addressing this exact situation?
On 08 Feb 2017, at 16:04, Ondrej Svoboda <osvoboda@redhat.com <mailto:osvoboda@redhat.com>> wrote:
In my case, simply adding Skylake-Client a supported CPU family did the trick: https://github.com/lago-project/lago/pull/448 <https://github.com/lago-project/lago/pull/448>
i wonder if Westmere is a good fallback -- it works for you on Broadwell, right?
On Wed, Feb 8, 2017 at 1:58 PM, Nadav Goldin <ngoldin@redhat.com <mailto:ngoldin@redhat.com>> wrote: I would first try testing it without OST, because in OST it will pick the CPU via the cluster family(which is controlled in virt.py). You can try specifying the 'cpu_model' in the init file, skipping the 'cpu family' logic, something like:
cat LagoInitFile domains: vm-el73: memory: 2048 service_provider: systemd cpu_model: Broadwell nics: - net: lago disks: - template_name: el7.3-base type: template name: root dev: vda format: qcow2 nets: lago: type: nat dhcp: start: 100 end: 254 management: true dns_domain_name: lago.local
lago init && lago start
Then install lago again in the VM, copy the same init file, and check if for different combinations of cpu_model it works for you - would give us a hint how to solve this. The 'cpu_model' basically translates to this xml definition in libvirt: <cpu mode='custom' match='exact'> <model fallback='allow'>Broadwell</model> <topology sockets='2' cores='1' threads='1'/> <feature policy='optional' name='vmx'/> <feature policy='optional' name='svm'/> </cpu>
I tried manually editing it also to host-passthrough, but still failed on the same error. The thing is that the 'kvm_put_msrs: Assertion `ret == n' failed.' error doesn't give any indication where it failed(or if the cpu is missing a flag), maybe there is a way to debug this at qemu/kvm level? I'm not sure.
On Wed, Feb 8, 2017 at 1:18 PM, Ondrej Svoboda <osvoboda@redhat.com <mailto:osvoboda@redhat.com>> wrote:
It is a Skylake-H, and I can see it is not mentioned in lago/virt.py.
I guess I'll step through the code (as well as other places discovered by 'git grep cpu') and see if I could solve this by adding the Skylake family to _CPU_FAMILIES.
Do you have other pointers?
Thanks, Ondra
On Tue, Feb 7, 2017 at 10:40 PM, Nadav Goldin <ngoldin@redhat.com <mailto:ngoldin@redhat.com>> wrote:
What is the host CPU you are using? I came across the same error few days ago, but without running OST, I tried running with Lago: fc24 host -> el7 vm -> el7 vm.
I have a slight suspect that it is related to the CPU model we configure in libvirt, I tried a mixture of few combinations(host-pass-through, pinning down the CPU model), but it always failed on the same error: kvm_put_msrs: Assertion `ret == n' failed.
My CPU is Broadwell btw.
Milan, any ideas? you think it might be related?
Nadav.
On Tue, Feb 7, 2017 at 11:14 PM, Ondrej Svoboda <osvoboda@redhat.com <mailto:osvoboda@redhat.com>> wrote:
Yes, I stated that in my message.
root@osvoboda-t460p /home/src/ovirt-system-tests (git)-[master] # cat /sys/module/kvm_intel/parameters/nested :( Y
On Tue, Feb 7, 2017 at 1:39 PM, Eyal Edri <eedri@redhat.com <mailto:eedri@redhat.com>> wrote: > > Did you follow the instructions on [1] ? > > Specifically, verifying ' cat /sys/module/kvm_intel/parameters/nested > ' > gives you 'Y'. > > [1] > > http://ovirt-system-tests.readthedocs.io/en/latest/docs/general/installation... <http://ovirt-system-tests.readthedocs.io/en/latest/docs/general/installation.html> > > On Tue, Feb 7, 2017 at 2:29 PM, Ondrej Svoboda <osvoboda@redhat.com <mailto:osvoboda@redhat.com>> > wrote: >> >> Hi everyone, >> >> Even though I have nested virtualization enabled in my Arch Linux >> system >> which I use to run OST, vm_run is the first test to fail in >> 004_basic_sanity >> (followed by snapshots_merge and suspend_resume_vm). >> >> Can you point me to what I might be missing? I believe I get the same >> failure even on Fedora. >> >> This is what host0's CPU capabilities look like (vmx is there): >> [root@lago-basic-suite-master-host0 ~]# cat /proc/cpuinfo >> processor : 0 >> vendor_id : GenuineIntel >> cpu family : 6 >> model : 44 >> model name : Westmere E56xx/L56xx/X56xx (Nehalem-C) >> stepping : 1 >> microcode : 0x1 >> cpu MHz : 2711.988 >> cache size : 16384 KB >> physical id : 0 >> siblings : 1 >> core id : 0 >> cpu cores : 1 >> apicid : 0 >> initial apicid : 0 >> fpu : yes >> fpu_exception : yes >> cpuid level : 11 >> wp : yes >> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge >> mca >> cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx lm constant_tsc >> rep_good >> nopl xtopology pni pclmulqdq vmx ssse3 cx16 sse4_1 sse4_2 x2apic >> popcnt aes >> hypervisor lahf_lm arat tpr_shadow vnmi flexpriority ept vpid >> bogomips : 5423.97 >> clflush size : 64 >> cache_alignment : 64 >> address sizes : 40 bits physical, 48 bits virtual >> power management: >> >> journalctl -b on host0 shows that libvirt complains about NUMA >> configuration: >> >> Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]: libvirt >> version: 2.0.0, package: 10.el7_3.4 (CentOS BuildSystem >> <http://bugs.centos.org <http://bugs.centos.org/>>, 2017-01-17-23 <tel:2017-01-17-23>:37:48, c1bm.rdu2.centos.org <http://c1bm.rdu2.centos.org/>) >> Feb 07 06:33:23 lago-basic-suite-master-host0 kernel: ovirtmgmt: port >> 2(vnet0) entered disabled state >> Feb 07 06:33:23 lago-basic-suite-master-host0 kernel: device vnet0 >> left >> promiscuous mode >> Feb 07 06:33:23 lago-basic-suite-master-host0 kernel: ovirtmgmt: port >> 2(vnet0) entered disabled state >> Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]: >> hostname: >> lago-basic-suite-master-host0.lago.local >> Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]: Unable >> to >> read from monitor: Connection reset by peer >> Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]: >> internal >> error: qemu unexpectedly closed the monitor: >> 2017-02-07T11:33:23.058571Z >> qemu-kvm: warning: CPU(s) not present in any NUMA nodes: 1 2 3 4 5 6 7 >> 8 9 >> 10 11 12 13 14 15 >> >> 2017-02-07T11:33:23.058826Z qemu-kvm: warning: All CPU(s) up to >> maxcpus >> should be described in NUMA config >> >> qemu-kvm: >> /builddir/build/BUILD/qemu-2.6.0/target-i386/kvm.c:1736: kvm_put_msrs: >> Assertion `ret == n' failed. >> Feb 07 06:33:23 lago-basic-suite-master-host0 NetworkManager[657]: >> <info> >> [1486467203.1025] device (vnet0): state change: disconnected -> >> unmanaged >> (reason 'unmanaged') [30 10 3] >> Feb 07 06:33:23 lago-basic-suite-master-host0 kvm[22059]: 0 guests now >> active >> Feb 07 06:33:23 lago-basic-suite-master-host0 systemd-machined[22044]: >> Machine qemu-1-vm0 terminated. >> >> Thanks, >> Ondra >> >> _______________________________________________ >> Devel mailing list >> Devel@ovirt.org <mailto:Devel@ovirt.org> >> http://lists.ovirt.org/mailman/listinfo/devel <http://lists.ovirt.org/mailman/listinfo/devel> > > > > > -- > Eyal Edri > Associate Manager > RHV DevOps > EMEA ENG Virtualization R&D > Red Hat Israel > > phone: +972-9-7692018 <tel:%2B972-9-7692018> > irc: eedri (on #tlv #rhev-dev #rhev-integ)
_______________________________________________ Devel mailing list Devel@ovirt.org <mailto:Devel@ovirt.org> http://lists.ovirt.org/mailman/listinfo/devel <http://lists.ovirt.org/mailman/listinfo/devel>
_______________________________________________ Devel mailing list Devel@ovirt.org <mailto:Devel@ovirt.org> http://lists.ovirt.org/mailman/listinfo/devel <http://lists.ovirt.org/mailman/listinfo/devel>
Devel mailing list Devel@ovirt.org <mailto:Devel@ovirt.org> http://lists.ovirt.org/mailman/listinfo/devel <http://lists.ovirt.org/mailman/listinfo/devel>
participants (3)
-
Michal Skrivanek
-
Nadav Goldin
-
Ondrej Svoboda