Strange network performance on VirtIIO VM NIC

FERNANDO FREDIANI

17 Mar 2017 17 Mar '17

5:11 p.m.

Hello all. I have a peculiar problem here which perhaps others may have had or know about and can advise. I have Virtual Machine with 2 VirtIO NICs. This VM serves around 1Gbps of traffic with thousands of clients connecting to it. When I do a packet loss test to the IP pinned to NIC1 it varies from 3% to 10% of packet loss. When I run the same test on NIC2 the packet loss is consistently 0%.

...

From what I gather I may have something to do with possible lack of Multi Queu VirtIO where NIC1 is managed by a single CPU which might be hitting 100% and causing this packet loss.

Looking at this reference ( https://fedoraproject.org/wiki/Features/MQ_virtio_net) I see one way to test it is start the VM with 4 queues (for example), but checking on the qemu-kvm process I don't see option present. Any way I can force it from the Engine ? This other reference ( https://www.linux-kvm.org/page/Multiqueue#Enable_MQ_feature) points to the same direction about starting the VM with queues=N Also trying to increase the TX ring buffer within the guest with ethtool -g eth0 is not possible. Oh, by the way, the Load on the VM is significantly high despite the CPU usage isn't above 50% - 60% in average. Thanks Fernando

Attachments:

attachment.html (text/html — 1.6 KB)

Show replies by date

Phil Meyer

17 Mar 17 Mar

6:53 p.m.

On 03/17/2017 11:11 AM, FERNANDO FREDIANI wrote:

...

Hello all.

I have a peculiar problem here which perhaps others may have had or know about and can advise.

I have Virtual Machine with 2 VirtIO NICs. This VM serves around 1Gbps of traffic with thousands of clients connecting to it. When I do a packet loss test to the IP pinned to NIC1 it varies from 3% to 10% of packet loss. When I run the same test on NIC2 the packet loss is consistently 0%.

From what I gather I may have something to do with possible lack of Multi Queu VirtIO where NIC1 is managed by a single CPU which might be hitting 100% and causing this packet loss.

Looking at this reference (https://fedoraproject.org/wiki/Features/MQ_virtio_net) I see one way to test it is start the VM with 4 queues (for example), but checking on the qemu-kvm process I don't see option present. Any way I can force it from the Engine ?

This other reference (https://www.linux-kvm.org/page/Multiqueue#Enable_MQ_feature) points to the same direction about starting the VM with queues=N

Also trying to increase the TX ring buffer within the guest with ethtool -g eth0 is not possible.

Oh, by the way, the Load on the VM is significantly high despite the CPU usage isn't above 50% - 60% in average.

Thanks Fernando

Check for NIC errors on the host. There have been numerous issues with Windows VMs not being able to handle certain features of better NICs on the host. By turning those features off on the host, the VM may be able to cope again. here is a snippet from a support case we had here: " There have been no occurrences of the ixgbe driver issue in the logs since the fix went in at roughly: Jan 3 22:50:11 2016 until now: Tue Jan 5 15:28:02 2016 Only large-receive-offload was turned off with: # ethtool -K eth0 lro off # ethtool -K eth1 lro off " By making that change on all of the hosts, the Windows VMs all recovered. This is likely not your exact issue, but its included here to show that some OSes on VMs can have issues with the host NIC that the VM does not support. The issue may even be seen in the error logs on the host, as these were.

FERNANDO FREDIANI

10:47 p.m.

...

From what I could gather searching all day about this issue it make sense it be something related to NIC buffers or multiqueue , but not sure yet what is the best way to address it: if when starting up the Virtual Machine add the queue=N option, if changing anything on the NIC driver config on

Hello Phil. Thanks for the tips. I have checked the hosts and all four 1 Gb NICs use tg3 driver and are "Broadcom Gigabit Ethernet BCM5720" so they should all behaviour the same. As I use 3 bonded interfaces on each Host where the VM connects to, I have downed each of the 3 one at a time to see if any of them could be adding this packet loss but that changed nothing about. Interesting is: I have another server with exactly same hardware which is not an Hypervisor, runs CentOS 6 with newer kernel 4.5.0-1 and has not packet loss at all even with high traffic. While the oVirt Node runs CentOS 7.3 (oVirt-Node-NG 4.1) but with kernel 3.10.0-514.6.1. Could it possibility be anything related to the kernel version and should I try to upgrade the oVirt-Node kernel or rather install a Minimal CentOS 7, use the newer kernel on it and use it as a Hypervisor instead of oVirt-Node-NG? the Host or even trying to use a different driver/kernel version. Please note something I mentioned in the previous message: If I run the packet loss test against each of the 2 VirtIO NIC on the same VM, the busy one (NIC1) has packet loss and the one without much traffic (NIC2) doesn't. All the traffic going up to all VMs on the Host pass through the same bond interface, so if it was something related to the physical NICs it would show packet loss to the second vNIC as well. Or do I miss anything here ? Thanks Fernando 2017-03-17 14:53 GMT-03:00 Phil Meyer <phil@unixlords.com>:

...

On 03/17/2017 11:11 AM, FERNANDO FREDIANI wrote:

...
Hello all.

I have a peculiar problem here which perhaps others may have had or know about and can advise.

I have Virtual Machine with 2 VirtIO NICs. This VM serves around 1Gbps of traffic with thousands of clients connecting to it. When I do a packet loss test to the IP pinned to NIC1 it varies from 3% to 10% of packet loss. When I run the same test on NIC2 the packet loss is consistently 0%.

From what I gather I may have something to do with possible lack of Multi Queu VirtIO where NIC1 is managed by a single CPU which might be hitting 100% and causing this packet loss.

Looking at this reference (https://fedoraproject.org/wiki/Features/MQ_virtio_net) I see one way to test it is start the VM with 4 queues (for example), but checking on the qemu-kvm process I don't see option present. Any way I can force it from the Engine ?

This other reference (https://www.linux-kvm.org/page/Multiqueue#Enable_MQ_feature) points to the same direction about starting the VM with queues=N

Also trying to increase the TX ring buffer within the guest with ethtool -g eth0 is not possible.

Oh, by the way, the Load on the VM is significantly high despite the CPU usage isn't above 50% - 60% in average.

Thanks Fernando

Check for NIC errors on the host. There have been numerous issues with Windows VMs not being able to handle certain features of better NICs on the host.

By turning those features off on the host, the VM may be able to cope again.

here is a snippet from a support case we had here:

" There have been no occurrences of the ixgbe driver issue in the logs since the fix went in at roughly: Jan 3 22:50:11 2016 until now: Tue Jan 5 15:28:02 2016

Only large-receive-offload was turned off with:

# ethtool -K eth0 lro off # ethtool -K eth1 lro off "

By making that change on all of the hosts, the Windows VMs all recovered.

This is likely not your exact issue, but its included here to show that some OSes on VMs can have issues with the host NIC that the VM does not support.

The issue may even be seen in the error logs on the host, as these were.

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Yaniv Kaul

18 Mar 18 Mar

4:53 p.m.

On Fri, Mar 17, 2017 at 6:11 PM, FERNANDO FREDIANI < fernando.frediani@upx.com> wrote:

...

Hello all.

I have a peculiar problem here which perhaps others may have had or know about and can advise.

I have Virtual Machine with 2 VirtIO NICs. This VM serves around 1Gbps of traffic with thousands of clients connecting to it. When I do a packet loss test to the IP pinned to NIC1 it varies from 3% to 10% of packet loss. When I run the same test on NIC2 the packet loss is consistently 0%.

From what I gather I may have something to do with possible lack of Multi Queu VirtIO where NIC1 is managed by a single CPU which might be hitting 100% and causing this packet loss.

Looking at this reference (https://fedoraproject.org/ wiki/Features/MQ_virtio_net) I see one way to test it is start the VM with 4 queues (for example), but checking on the qemu-kvm process I don't see option present. Any way I can force it from the Engine ?

I don't see a need for multi-queue for 1Gbps. Can you share the host statistics, the network configuration, the qemu-kvm command line, etc.? What is the difference between NIC1 and NIC2, in the way they are connected to the outside world?

...

This other reference (https://www.linux-kvm.org/page/Multiqueue#Enable_MQ_ feature) points to the same direction about starting the VM with queues=N

Also trying to increase the TX ring buffer within the guest with ethtool -g eth0 is not possible.

Oh, by the way, the Load on the VM is significantly high despite the CPU usage isn't above 50% - 60% in average.

Load = latest 'top' results? Vs. CPU usage? Can mean a lot of processes waiting for CPU and doing very little - typical for web servers, for example. What is occupying the CPU? Y.

...

Thanks Fernando

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

FERNANDO FREDIANI

20 Mar 20 Mar

4:14 p.m.

This is a multi-part message in MIME format. --------------9F08DEA242C3CF4ED7B43038 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Hello Yaniv. It also looks to me initially that for 1Gbps multi-queue would not be necessary, however the Virtual Machine is relatively busy where the CPU necessary to process it may (or not) be competing with the processes running on in the guest. The network is as following: 3 x 1 Gb interfaces bonded together with layer2+3 has algorithm where the VMs connect to the outside world. vNIC1 and vNIC2 in the VMs are the same VirtIO NIC types. These vNICs are connected to the same VLAN and they are both able to output 1Gbps throughput each at the same time in iperf tests as the bond below has 3Gb capacity. Please note something interesting I mentioned previously: All traffic currently goes in and out via vNIC1 which is showing packet loss (3% to 10%) on the tests conducted. NIC2 has zero traffic and if the same tests are conducted against it shows 0% packets loss. At first impression if it was something related to the bond or even to the physical NICs on the Host it should show packet loss for ANY of the vNICs as the traffic flows through the same physical NIC and bond, but is not the case. This is the qemu-kvm command the Host is executing: /usr/libexec/qemu-kvm -name guest=VM_NAME_REPLACED,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-6-VM_NAME_REPLACED/master-key.aes -machine pc-i440fx-rhel7.3.0,accel=kvm,usb=off -cpu SandyBridge -m 4096 -realtime mlock=off -smp 4,maxcpus=16,sockets=16,cores=1,threads=1 -numa node,nodeid=0,cpus=0-3,mem=4096 -uuid 57ffc2ed-fec5-47d6-bfb1-60c728737bd2 -smbios type=1,manufacturer=oVirt,product=oVirt Node,version=7-3.1611.el7.centos,serial=4C4C4544-0043-5610-804B-B1C04F4E3232,uuid=57ffc2ed-fec5-47d6-bfb1-60c728737bd2 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-6-VM_NAME_REPLACED/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2017-03-17T01:12:39,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x7 -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x5 -drive if=none,id=drive-ide0-1-0,readonly=on -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=/rhev/data-center/2325e1a4-c702-469c-82eb-ff43baa06d44/8dcd90f4-c0f0-47db-be39-5b49685acc04/images/ebe10e75-799a-439e-bc52-551b894c34fa/1a73cd53-0e51-4e49-8631-38cf571f6bb9,format=qcow2,if=none,id=drive-scsi0-0-0-0,serial=ebe10e75-799a-439e-bc52-551b894c34fa,cache=none,werror=stop,rerror=stop,aio=native -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 -drive file=/rhev/data-center/2325e1a4-c702-469c-82eb-ff43baa06d44/8dcd90f4-c0f0-47db-be39-5b49685acc04/images/db401b27-006d-494c-a1ee-1d37810710c8/664cffe6-52f8-429d-8bb9-2f43fa7a468f,format=qcow2,if=none,id=drive-scsi0-0-0-1,serial=db401b27-006d-494c-a1ee-1d37810710c8,cache=none,werror=stop,rerror=stop,aio=native -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-0-1,id=scsi0-0-0-1 -netdev tap,fd=33,id=hostnet0,vhost=on,vhostfd=36 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:16:01:60,bus=pci.0,addr=0x3 -netdev tap,fd=37,id=hostnet1,vhost=on,vhostfd=38 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:1a:4a:16:01:61,bus=pci.0,addr=0x4 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/57ffc2ed-fec5-47d6-bfb1-60c728737bd2.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/57ffc2ed-fec5-47d6-bfb1-60c728737bd2.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel2,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel2,id=channel2,name=com.redhat.spice.0 -vnc 192.168.100.19:2,password -k pt-br -spice tls-port=5903,addr=192.168.100.19,x509-dir=/etc/pki/vdsm/libvirt-spice,tls-channel=default,tls-channel=main,tls-channel=display,tls-channel=inputs,tls-channel=cursor,tls-channel=playback,tls-channel=record,tls-channel=smartcard,tls-channel=usbredir,seamless-migration=on -k pt-br -device qxl-vga,id=video0,ram_size=67108864,vram_size=33554432,vram64_size_mb=0,vgamem_mb=16,bus=pci.0,addr=0x2 -incoming defer -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -object rng-random,id=objrng0,filename=/dev/urandom -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x8 -msg timestamp=on Load in the VM is relatively high (20 to 30) and CPU usage is between 50% to 60% with eventual peaks of 100% in one of the vCPUs. There is a lot of processes running in the VM similar to Web Servers which is using this amount of CPU. Only guess I could have so far is that traffic on NIC1 is being handeled by one of the vCPUs which eventually get 100% due to some of the processes while traffic on NIC2 is handled by another vCPU which is not that busy and explains the 0% packet loss. BUT, should VirtIO vNIC use CPU from within the Guest ? Does it make any sense ? Thanks Fernando On 18/03/2017 12:53, Yaniv Kaul wrote:

...

On Fri, Mar 17, 2017 at 6:11 PM, FERNANDO FREDIANI <fernando.frediani@upx.com <mailto:fernando.frediani@upx.com>> wrote:

Hello all.

I have a peculiar problem here which perhaps others may have had or know about and can advise.

I have Virtual Machine with 2 VirtIO NICs. This VM serves around 1Gbps of traffic with thousands of clients connecting to it. When I do a packet loss test to the IP pinned to NIC1 it varies from 3% to 10% of packet loss. When I run the same test on NIC2 the packet loss is consistently 0%.

From what I gather I may have something to do with possible lack of Multi Queu VirtIO where NIC1 is managed by a single CPU which might be hitting 100% and causing this packet loss.

Looking at this reference (https://fedoraproject.org/wiki/Features/MQ_virtio_net <https://fedoraproject.org/wiki/Features/MQ_virtio_net>) I see one way to test it is start the VM with 4 queues (for example), but checking on the qemu-kvm process I don't see option present. Any way I can force it from the Engine ?

I don't see a need for multi-queue for 1Gbps. Can you share the host statistics, the network configuration, the qemu-kvm command line, etc.? What is the difference between NIC1 and NIC2, in the way they are connected to the outside world?

This other reference (https://www.linux-kvm.org/page/Multiqueue#Enable_MQ_feature <https://www.linux-kvm.org/page/Multiqueue#Enable_MQ_feature>) points to the same direction about starting the VM with queues=N

Also trying to increase the TX ring buffer within the guest with ethtool -g eth0 is not possible.

Oh, by the way, the Load on the VM is significantly high despite the CPU usage isn't above 50% - 60% in average.

Load = latest 'top' results? Vs. CPU usage? Can mean a lot of processes waiting for CPU and doing very little - typical for web servers, for example. What is occupying the CPU? Y.

Thanks Fernando

_______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users <http://lists.ovirt.org/mailman/listinfo/users>

--------------9F08DEA242C3CF4ED7B43038 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit <html> <head> <meta content="text/html; charset=utf-8" http-equiv="Content-Type"> </head> <body bgcolor="#FFFFFF" text="#000000"> <p>Hello Yaniv.</p> <p>It also looks to me initially that for 1Gbps multi-queue would not be necessary, however the Virtual Machine is relatively busy where the CPU necessary to process it may (or not) be competing with the processes running on in the guest.</p> <p>The network is as following: 3 x 1 Gb interfaces bonded together with layer2+3 has algorithm where the VMs connect to the outside world.<br> vNIC1 and vNIC2 in the VMs are the same VirtIO NIC types. These vNICs are connected to the same VLAN and they are both able to output 1Gbps throughput each at the same time in iperf tests as the bond below has 3Gb capacity.</p> <p>Please note something interesting I mentioned previously: All traffic currently goes in and out via vNIC1 which is showing packet loss (3% to 10%) on the tests conducted. NIC2 has zero traffic and if the same tests are conducted against it shows 0% packets loss.<br> At first impression if it was something related to the bond or even to the physical NICs on the Host it should show packet loss for ANY of the vNICs as the traffic flows through the same physical NIC and bond, but is not the case.</p> <p>This is the qemu-kvm command the Host is executing:<br> /usr/libexec/qemu-kvm -name guest=VM_NAME_REPLACED,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-6-VM_NAME_REPLACED/master-key.aes -machine pc-i440fx-rhel7.3.0,accel=kvm,usb=off -cpu SandyBridge -m 4096 -realtime mlock=off -smp 4,maxcpus=16,sockets=16,cores=1,threads=1 -numa node,nodeid=0,cpus=0-3,mem=4096 -uuid 57ffc2ed-fec5-47d6-bfb1-60c728737bd2 -smbios type=1,manufacturer=oVirt,product=oVirt Node,version=7-3.1611.el7.centos,serial=4C4C4544-0043-5610-804B-B1C04F4E3232,uuid=57ffc2ed-fec5-47d6-bfb1-60c728737bd2 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-6-VM_NAME_REPLACED/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2017-03-17T01:12:39,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x7 -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x5 -drive if=none,id=drive-ide0-1-0,readonly=on -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=/rhev/data-center/2325e1a4-c702-469c-82eb-ff43baa06d44/8dcd90f4-c0f0-47db-be39-5b49685acc04/images/ebe10e75-799a-439e-bc52-551b894c34fa/1a73cd53-0e51-4e49-8631-38cf571f6bb9,format=qcow2,if=none,id=drive-scsi0-0-0-0,serial=ebe10e75-799a-439e-bc52-551b894c34fa,cache=none,werror=stop,rerror=stop,aio=native -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 -drive file=/rhev/data-center/2325e1a4-c702-469c-82eb-ff43baa06d44/8dcd90f4-c0f0-47db-be39-5b49685acc04/images/db401b27-006d-494c-a1ee-1d37810710c8/664cffe6-52f8-429d-8bb9-2f43fa7a468f,format=qcow2,if=none,id=drive-scsi0-0-0-1,serial=db401b27-006d-494c-a1ee-1d37810710c8,cache=none,werror=stop,rerror=stop,aio=native -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-0-1,id=scsi0-0-0-1 -netdev tap,fd=33,id=hostnet0,vhost=on,vhostfd=36 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:16:01:60,bus=pci.0,addr=0x3 -netdev tap,fd=37,id=hostnet1,vhost=on,vhostfd=38 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:1a:4a:16:01:61,bus=pci.0,addr=0x4 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/57ffc2ed-fec5-47d6-bfb1-60c728737bd2.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/57ffc2ed-fec5-47d6-bfb1-60c728737bd2.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel2,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel2,id=channel2,name=com.redhat.spice.0 -vnc 192.168.100.19:2,password -k pt-br -spice tls-port=5903,addr=192.168.100.19,x509-dir=/etc/pki/vdsm/libvirt-spice,tls-channel=default,tls-channel=main,tls-channel=display,tls-channel=inputs,tls-channel=cursor,tls-channel=playback,tls-channel=record,tls-channel=smartcard,tls-channel=usbredir,seamless-migration=on -k pt-br -device qxl-vga,id=video0,ram_size=67108864,vram_size=33554432,vram64_size_mb=0,vgamem_mb=16,bus=pci.0,addr=0x2 -incoming defer -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -object rng-random,id=objrng0,filename=/dev/urandom -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x8 -msg timestamp=on<br> </p> <p>Load in the VM is relatively high (20 to 30) and CPU usage is between 50% to 60% with eventual peaks of 100% in one of the vCPUs. There is a lot of processes running in the VM similar to Web Servers which is using this amount of CPU.</p> <p>Only guess I could have so far is that traffic on NIC1 is being handeled by one of the vCPUs which eventually get 100% due to some of the processes while traffic on NIC2 is handled by another vCPU which is not that busy and explains the 0% packet loss. BUT, should VirtIO vNIC use CPU from within the Guest ?<br> Does it make any sense ?</p> <p>Thanks</p> <p>Fernando<br> </p> <br> <div class="moz-cite-prefix">On 18/03/2017 12:53, Yaniv Kaul wrote:<br> </div> <blockquote cite="mid:CAJgorsaK0pKUvkzecg+oGYwHCrw+essr8jiGk3cGPrzO9PGrqw@mail.gmail.com" type="cite"> <div dir="ltr"><br> <div class="gmail_extra"><br> <div class="gmail_quote">On Fri, Mar 17, 2017 at 6:11 PM, FERNANDO FREDIANI <span dir="ltr"><<a moz-do-not-send="true" href="mailto:fernando.frediani@upx.com" target="_blank">fernando.frediani@upx.com</a>></span> wrote:<br> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div dir="ltr"> <div> <div> <div> <div> <div> <div> <div> <div>Hello all.<br> <br> </div> I have a peculiar problem here which perhaps others may have had or know about and can advise.<br> <br> </div> I have Virtual Machine with 2 VirtIO NICs. This VM serves around 1Gbps of traffic with thousands of clients connecting to it. When I do a packet loss test to the IP pinned to NIC1 it varies from 3% to 10% of packet loss. When I run the same test on NIC2 the packet loss is consistently 0%.<br> <br> </div> From what I gather I may have something to do with possible lack of Multi Queu VirtIO where NIC1 is managed by a single CPU which might be hitting 100% and causing this packet loss.<br> <br> </div> Looking at this reference (<a moz-do-not-send="true" href="https://fedoraproject.org/wiki/Features/MQ_virtio_net" target="_blank">https://fedoraproject.org/<wbr>wiki/Features/MQ_virtio_net</a>) I see one way to test it is start the VM with 4 queues (for example), but checking on the qemu-kvm process I don't see option present. Any way I can force it from the Engine ?<br> </div> </div> </div> </div> </div> </blockquote> <div><br> </div> <div>I don't see a need for multi-queue for 1Gbps.</div> <div>Can you share the host statistics, the network configuration, the qemu-kvm command line, etc.?</div> <div>What is the difference between NIC1 and NIC2, in the way they are connected to the outside world?</div> <div> </div> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div dir="ltr"> <div> <div> <div> <div><br> </div> This other reference (<a moz-do-not-send="true" href="https://www.linux-kvm.org/page/Multiqueue#Enable_MQ_feature" target="_blank">https://www.linux-kvm.org/<wbr>page/Multiqueue#Enable_MQ_<wbr>feature</a>) points to the same direction about starting the VM with queues=N<br> <br> </div> <div>Also trying to increase the TX ring buffer within the guest with ethtool -g eth0 is not possible.<br> </div> <div><br> </div> Oh, by the way, the Load on the VM is significantly high despite the CPU usage isn't above 50% - 60% in average.<br> </div> </div> </div> </blockquote> <div><br> </div> <div>Load = latest 'top' results? Vs. CPU usage? Can mean a lot of processes waiting for CPU and doing very little - typical for web servers, for example. What is occupying the CPU?</div> <div>Y.</div> <div> </div> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div dir="ltr"> <div> <div><br> </div> Thanks<span class="HOEnZb"><font color="#888888"><br> </font></span></div> <span class="HOEnZb"><font color="#888888">Fernando<br> <div> <div> <div> <div><br> <br> </div> </div> </div> </div> </font></span></div> <br> ______________________________<wbr>_________________<br> Users mailing list<br> <a moz-do-not-send="true" href="mailto:Users@ovirt.org">Users@ovirt.org</a><br> <a moz-do-not-send="true" href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.ovirt.org/<wbr>mailman/listinfo/users</a><br> <br> </blockquote> </div> <br> </div> </div> </blockquote> <br> </body> </html> --------------9F08DEA242C3CF4ED7B43038--

Yaniv Kaul

21 Mar 21 Mar

10:19 a.m.

On Mon, Mar 20, 2017 at 5:14 PM, FERNANDO FREDIANI < fernando.frediani@upx.com> wrote:

...

Hello Yaniv.

It also looks to me initially that for 1Gbps multi-queue would not be necessary, however the Virtual Machine is relatively busy where the CPU necessary to process it may (or not) be competing with the processes running on in the guest.

The network is as following: 3 x 1 Gb interfaces bonded together with layer2+3 has algorithm where the VMs connect to the outside world.

Is your host with NUMA support (multiple sockets) ? Are all your interfaces connected to the same socket? Perhaps one is on the 'other' socket (a different PCI bus, etc.)? This can introduce latency. In general, you would want to align everything, from host (interrupts of the drivers) all the way to the guest to perform the processing on the same socket. Layer 2+3 may or may not provide you with good distribution across the physical links, depending on the traffic. Layer 3+4 hashing is better, but is not entirely compliant with all vendors/equipment. vNIC1 and vNIC2 in the VMs are the same VirtIO NIC types. These vNICs are

...

connected to the same VLAN and they are both able to output 1Gbps throughput each at the same time in iperf tests as the bond below has 3Gb capacity.

Linux is not always happy with multiple interfaces on the same L2 network. I think there are some params needed to be set to make it happy?

...

Please note something interesting I mentioned previously: All traffic currently goes in and out via vNIC1 which is showing packet loss (3% to 10%) on the tests conducted. NIC2 has zero traffic and if the same tests are conducted against it shows 0% packets loss. At first impression if it was something related to the bond or even to the physical NICs on the Host it should show packet loss for ANY of the vNICs as the traffic flows through the same physical NIC and bond, but is not the case.

This is the qemu-kvm command the Host is executing: /usr/libexec/qemu-kvm -name guest=VM_NAME_REPLACED,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/ qemu/domain-6-VM_NAME_REPLACED/master-key.aes -machine pc-i440fx-rhel7.3.0,accel=kvm,usb=off -cpu SandyBridge -m 4096 -realtime mlock=off -smp 4,maxcpus=16,sockets=16,cores=1,threads=1 -numa node,nodeid=0,cpus=0-3,mem=4096 -uuid 57ffc2ed-fec5-47d6-bfb1-60c728737bd2 -smbios type=1,manufacturer=oVirt,product=oVirt Node,version=7-3.1611.el7. centos,serial=4C4C4544-0043-5610-804B-B1C04F4E3232,uuid= 57ffc2ed-fec5-47d6-bfb1-60c728737bd2 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-6- VM_NAME_REPLACED/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2017-03-17T01:12:39,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x7 -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x5 -drive if=none,id=drive-ide0-1-0,readonly=on -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=/rhev/data-center/2325e1a4-c702-469c-82eb-ff43baa06d44/8dcd90f4-c0f0- 47db-be39-5b49685acc04/images/ebe10e75-799a-439e-bc52- 551b894c34fa/1a73cd53-0e51-4e49-8631-38cf571f6bb9,format= qcow2,if=none,id=drive-scsi0-0-0-0,serial=ebe10e75-799a- 439e-bc52-551b894c34fa,cache=none,werror=stop,rerror=stop,aio=native -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive- scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 -drive file=/rhev/data-center/ 2325e1a4-c702-469c-82eb-ff43baa06d44/8dcd90f4-c0f0- 47db-be39-5b49685acc04/images/db401b27-006d-494c-a1ee- 1d37810710c8/664cffe6-52f8-429d-8bb9-2f43fa7a468f,format= qcow2,if=none,id=drive-scsi0-0-0-1,serial=db401b27-006d- 494c-a1ee-1d37810710c8,cache=none,werror=stop,rerror=stop,aio=native -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-0-1,id=scsi0-0-0-1 -netdev tap,fd=33,id=hostnet0,vhost=on,vhostfd=36 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:16:01:60,bus=pci.0,addr=0x3 -netdev tap,fd=37,id=hostnet1,vhost=on,vhostfd=38 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:1a:4a:16:01:61,bus=pci.0,addr=0x4 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/ 57ffc2ed-fec5-47d6-bfb1-60c728737bd2.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev= charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/ 57ffc2ed-fec5-47d6-bfb1-60c728737bd2.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev= charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel2,name=vdagent -device virtserialport,bus=virtio- serial0.0,nr=3,chardev=charchannel2,id=channel2,name=com.redhat.spice.0 -vnc 192.168.100.19:2,password -k pt-br -spice tls-port=5903,addr=192.168. 100.19,x509-dir=/etc/pki/vdsm/libvirt-spice,tls-channel= default,tls-channel=main,tls-channel=display,tls-channel= inputs,tls-channel=cursor,tls-channel=playback,tls-channel= record,tls-channel=smartcard,tls-channel=usbredir,seamless-migration=on -k pt-br -device qxl-vga,id=video0,ram_size=67108864,vram_size=33554432, vram64_size_mb=0,vgamem_mb=16,bus=pci.0,addr=0x2 -incoming defer -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -object rng-random,id=objrng0,filename=/dev/urandom -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x8 -msg timestamp=on

Load in the VM is relatively high (20 to 30) and CPU usage is between 50% to 60% with eventual peaks of 100% in one of the vCPUs. There is a lot of processes running in the VM similar to Web Servers which is using this amount of CPU.

Only guess I could have so far is that traffic on NIC1 is being handeled by one of the vCPUs which eventually get 100% due to some of the processes while traffic on NIC2 is handled by another vCPU which is not that busy and explains the 0% packet loss. BUT, should VirtIO vNIC use CPU from within the Guest ? Does it make any sense ?

Thanks

That can explain it. Ideally, you need to also streamline the processing in the guest. The relevant application should be on the same NUMA node as the vCPU processing the virtio-net interrupts. In your case, the VM sees a single NUMA node - does that match the underlying host architecture as well? Y. Fernando

...

On 18/03/2017 12:53, Yaniv Kaul wrote:

On Fri, Mar 17, 2017 at 6:11 PM, FERNANDO FREDIANI < fernando.frediani@upx.com> wrote:

...
Hello all.

I have a peculiar problem here which perhaps others may have had or know about and can advise.

I have Virtual Machine with 2 VirtIO NICs. This VM serves around 1Gbps of traffic with thousands of clients connecting to it. When I do a packet loss test to the IP pinned to NIC1 it varies from 3% to 10% of packet loss. When I run the same test on NIC2 the packet loss is consistently 0%.

From what I gather I may have something to do with possible lack of Multi Queu VirtIO where NIC1 is managed by a single CPU which might be hitting 100% and causing this packet loss.

Looking at this reference (https://fedoraproject.org/wik i/Features/MQ_virtio_net) I see one way to test it is start the VM with 4 queues (for example), but checking on the qemu-kvm process I don't see option present. Any way I can force it from the Engine ?

I don't see a need for multi-queue for 1Gbps. Can you share the host statistics, the network configuration, the qemu-kvm command line, etc.? What is the difference between NIC1 and NIC2, in the way they are connected to the outside world?

...
This other reference (https://www.linux-kvm.org/pag e/Multiqueue#Enable_MQ_feature) points to the same direction about starting the VM with queues=N

Also trying to increase the TX ring buffer within the guest with ethtool -g eth0 is not possible.

Oh, by the way, the Load on the VM is significantly high despite the CPU usage isn't above 50% - 60% in average.

Load = latest 'top' results? Vs. CPU usage? Can mean a lot of processes waiting for CPU and doing very little - typical for web servers, for example. What is occupying the CPU? Y.

...
Thanks Fernando

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

FERNANDO FREDIANI

4 p.m.

This is a multi-part message in MIME format. --------------E6119F5318438FF584A61CEE Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Hi Yaniv On 21/03/2017 06:19, Yaniv Kaul wrote:

...

Is your host with NUMA support (multiple sockets) ? Are all your interfaces connected to the same socket? Perhaps one is on the 'other' socket (a different PCI bus, etc.)? This can introduce latency. In general, you would want to align everything, from host (interrupts of the drivers) all the way to the guest to perform the processing on the same socket.

I believe so it is. Look: ~]# dmesg | grep -i numa [ 0.000000] Enabling automatic NUMA balancing. Configure with numa_balancing= or the kernel.numa_balancing sysctl [ 0.693082] pci_bus 0000:00: on NUMA node 0 [ 0.696457] pci_bus 0000:40: on NUMA node 1 [ 0.700678] pci_bus 0000:3f: on NUMA node 0 [ 0.704844] pci_bus 0000:7f: on NUMA node 1 The thing is, if was something affecting the underlying network layer (drivers for the physical nics for example) it would affect all traffic to the VM, not just the one going in/out via vNIC1, right ?

...

Layer 2+3 may or may not provide you with good distribution across the physical links, depending on the traffic. Layer 3+4 hashing is better, but is not entirely compliant with all vendors/equipment.

Yes, I have tested with both and both work well. Have settled on layer2+3 as it balances the traffic equally layer3+4 for my scenario. Initially I have guessed it could be the bonding, but ruled that out when I tested with another physical interface that doesn't have any bonding and the problem happened the same for the VM in question.

...

Linux is not always happy with multiple interfaces on the same L2 network. I think there are some params needed to be set to make it happy? Yes you are right and yes, knowing of that I have configured PBR using iproute2 which makes Linux work happy in this scenario. Works like a charm.

That can explain it. Ideally, you need to also streamline the processing in the guest. The relevant application should be on the same NUMA node as the vCPU processing the virtio-net interrupts. In your case, the VM sees a single NUMA node - does that match the underlying host architecture as well? Not sure. The command line from qemu-kvm is automatically generated by oVirt. Perhaps some extra option to be changed under Advanced Parameters on VM CPU configuration ? Also I was wondering if enabling "IO Threads Enabled" under Resource Allocation could be of any help.

To finish I more inclined to understand that problem is restricted to the VM, not to the Host(drivers, physical NICs, etc), given the packet loss happens in vNIC1 not in vNIC2 when it has no traffic. If it was in the Host level or bonding it would affect the whole VM traffic in either vNICs. As a last resource I am considering add an extra 2 vCPUs to the VMs, but I guess that will only lower the problem. Does anyone think that "Threads per Core" or IO Thread could be a better choice ? Thanks Fernando

...

On 18/03/2017 12:53, Yaniv Kaul wrote:

...
On Fri, Mar 17, 2017 at 6:11 PM, FERNANDO FREDIANI <fernando.frediani@upx.com <mailto:fernando.frediani@upx.com>> wrote:

Hello all.

I have a peculiar problem here which perhaps others may have had or know about and can advise.

I have Virtual Machine with 2 VirtIO NICs. This VM serves around 1Gbps of traffic with thousands of clients connecting to it. When I do a packet loss test to the IP pinned to NIC1 it varies from 3% to 10% of packet loss. When I run the same test on NIC2 the packet loss is consistently 0%.

From what I gather I may have something to do with possible lack of Multi Queu VirtIO where NIC1 is managed by a single CPU which might be hitting 100% and causing this packet loss.

Looking at this reference (https://fedoraproject.org/wiki/Features/MQ_virtio_net <https://fedoraproject.org/wiki/Features/MQ_virtio_net>) I see one way to test it is start the VM with 4 queues (for example), but checking on the qemu-kvm process I don't see option present. Any way I can force it from the Engine ?

I don't see a need for multi-queue for 1Gbps. Can you share the host statistics, the network configuration, the qemu-kvm command line, etc.? What is the difference between NIC1 and NIC2, in the way they are connected to the outside world?

This other reference (https://www.linux-kvm.org/page/Multiqueue#Enable_MQ_feature <https://www.linux-kvm.org/page/Multiqueue#Enable_MQ_feature>) points to the same direction about starting the VM with queues=N

Also trying to increase the TX ring buffer within the guest with ethtool -g eth0 is not possible.

Oh, by the way, the Load on the VM is significantly high despite the CPU usage isn't above 50% - 60% in average.

Load = latest 'top' results? Vs. CPU usage? Can mean a lot of processes waiting for CPU and doing very little - typical for web servers, for example. What is occupying the CPU? Y.

Thanks Fernando

_______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users <http://lists.ovirt.org/mailman/listinfo/users>

--------------E6119F5318438FF584A61CEE Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit <html> <head> <meta content="text/html; charset=utf-8" http-equiv="Content-Type"> </head> <body bgcolor="#FFFFFF" text="#000000"> <p>Hi Yaniv<br> </p> On 21/03/2017 06:19, Yaniv Kaul wrote:<br> <blockquote cite="mid:CAJgorsaVtaNyz4iJf=ZeRGkZE_n32JKYq9oxEUzoKRQyr8VD5w@mail.gmail.com" type="cite"> <div dir="ltr"><br> <div class="gmail_extra">Is your host with NUMA support (multiple sockets) ? Are all your interfaces connected to the same socket? Perhaps one is on the 'other' socket (a different PCI bus, etc.)? This can introduce latency. <div class="gmail_quote"> <div>In general, you would want to align everything, from host (interrupts of the drivers) all the way to the guest to perform the processing on the same socket.</div> </div> </div> </div> </blockquote> I believe so it is. Look:<br> ~]# dmesg | grep -i numa<br> [ 0.000000] Enabling automatic NUMA balancing. Configure with numa_balancing= or the kernel.numa_balancing sysctl<br> [ 0.693082] pci_bus 0000:00: on NUMA node 0<br> [ 0.696457] pci_bus 0000:40: on NUMA node 1<br> [ 0.700678] pci_bus 0000:3f: on NUMA node 0<br> [ 0.704844] pci_bus 0000:7f: on NUMA node 1<br> <br> The thing is, if was something affecting the underlying network layer (drivers for the physical nics for example) it would affect all traffic to the VM, not just the one going in/out via vNIC1, right ?<br> <blockquote cite="mid:CAJgorsaVtaNyz4iJf=ZeRGkZE_n32JKYq9oxEUzoKRQyr8VD5w@mail.gmail.com" type="cite"> <div dir="ltr"> <div class="gmail_extra"> <div class="gmail_quote"> <div><br> </div> <div>Layer 2+3 may or may not provide you with good distribution across the physical links, depending on the traffic. Layer 3+4 hashing is better, but is not entirely compliant with all vendors/equipment.</div> </div> </div> </div> </blockquote> Yes, I have tested with both and both work well. Have settled on layer2+3 as it balances the traffic equally layer3+4 for my scenario.<br> Initially I have guessed it could be the bonding, but ruled that out when I tested with another physical interface that doesn't have any bonding and the problem happened the same for the VM in question.<br> <blockquote cite="mid:CAJgorsaVtaNyz4iJf=ZeRGkZE_n32JKYq9oxEUzoKRQyr8VD5w@mail.gmail.com" type="cite"> <div dir="ltr"> <div class="gmail_extra"> <div class="gmail_quote"> <div>Linux is not always happy with multiple interfaces on the same L2 network. I think there are some params needed to be set to make it happy?</div> </div> </div> </div> </blockquote> Yes you are right and yes, knowing of that I have configured PBR using iproute2 which makes Linux work happy in this scenario. Works like a charm.<br> <blockquote cite="mid:CAJgorsaVtaNyz4iJf=ZeRGkZE_n32JKYq9oxEUzoKRQyr8VD5w@mail.gmail.com" type="cite"> <div dir="ltr"> <div class="gmail_extra"> <div class="gmail_quote"> <div> </div> <div><br> </div> <div>That can explain it. Ideally, you need to also streamline the processing in the guest. The relevant application should be on the same NUMA node as the vCPU processing the virtio-net interrupts.</div> <div>In your case, the VM sees a single NUMA node - does that match the underlying host architecture as well?</div> </div> </div> </div> </blockquote> Not sure. The command line from qemu-kvm is automatically generated by oVirt. Perhaps some extra option to be changed under Advanced Parameters on VM CPU configuration ? Also I was wondering if enabling "IO Threads Enabled" under Resource Allocation could be of any help.<br> <br> To finish I more inclined to understand that problem is restricted to the VM, not to the Host(drivers, physical NICs, etc), given the packet loss happens in vNIC1 not in vNIC2 when it has no traffic. If it was in the Host level or bonding it would affect the whole VM traffic in either vNICs.<br> As a last resource I am considering add an extra 2 vCPUs to the VMs, but I guess that will only lower the problem. Does anyone think that "Threads per Core" or IO Thread could be a better choice ?<br> <br> Thanks<br> Fernando<br> <br> <blockquote cite="mid:CAJgorsaVtaNyz4iJf=ZeRGkZE_n32JKYq9oxEUzoKRQyr8VD5w@mail.gmail.com" type="cite"> <div dir="ltr"> <div class="gmail_extra"> <div class="gmail_quote"> <div bgcolor="#FFFFFF" text="#000000"><span class="HOEnZb"></span> <div> <div class="h5"> <br> <div class="m_7680788519611111480moz-cite-prefix">On 18/03/2017 12:53, Yaniv Kaul wrote:<br> </div> <blockquote type="cite"> <div dir="ltr"><br> <div class="gmail_extra"><br> <div class="gmail_quote">On Fri, Mar 17, 2017 at 6:11 PM, FERNANDO FREDIANI <span dir="ltr"><<a moz-do-not-send="true" href="mailto:fernando.frediani@upx.com" target="_blank">fernando.frediani@upx.com</a>></span> wrote:<br> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div dir="ltr"> <div> <div> <div> <div> <div> <div> <div> <div>Hello all.<br> <br> </div> I have a peculiar problem here which perhaps others may have had or know about and can advise.<br> <br> </div> I have Virtual Machine with 2 VirtIO NICs. This VM serves around 1Gbps of traffic with thousands of clients connecting to it. When I do a packet loss test to the IP pinned to NIC1 it varies from 3% to 10% of packet loss. When I run the same test on NIC2 the packet loss is consistently 0%.<br> <br> </div> From what I gather I may have something to do with possible lack of Multi Queu VirtIO where NIC1 is managed by a single CPU which might be hitting 100% and causing this packet loss.<br> <br> </div> Looking at this reference (<a moz-do-not-send="true" href="https://fedoraproject.org/wiki/Features/MQ_virtio_net" target="_blank">https://fedoraproject.org/wik<wbr>i/Features/MQ_virtio_net</a>) I see one way to test it is start the VM with 4 queues (for example), but checking on the qemu-kvm process I don't see option present. Any way I can force it from the Engine ?<br> </div> </div> </div> </div> </div> </blockquote> <div><br> </div> <div>I don't see a need for multi-queue for 1Gbps.</div> <div>Can you share the host statistics, the network configuration, the qemu-kvm command line, etc.?</div> <div>What is the difference between NIC1 and NIC2, in the way they are connected to the outside world?</div> <div> </div> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div dir="ltr"> <div> <div> <div> <div><br> </div> This other reference (<a moz-do-not-send="true" href="https://www.linux-kvm.org/page/Multiqueue#Enable_MQ_feature" target="_blank">https://www.linux-kvm.org/pag<wbr>e/Multiqueue#Enable_MQ_feature</a><wbr>) points to the same direction about starting the VM with queues=N<br> <br> </div> <div>Also trying to increase the TX ring buffer within the guest with ethtool -g eth0 is not possible.<br> </div> <div><br> </div> Oh, by the way, the Load on the VM is significantly high despite the CPU usage isn't above 50% - 60% in average.<br> </div> </div> </div> </blockquote> <div><br> </div> <div>Load = latest 'top' results? Vs. CPU usage? Can mean a lot of processes waiting for CPU and doing very little - typical for web servers, for example. What is occupying the CPU?</div> <div>Y.</div> <div> </div> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div dir="ltr"> <div> <div><br> </div> Thanks<span class="m_7680788519611111480HOEnZb"><font color="#888888"><br> </font></span></div> <span class="m_7680788519611111480HOEnZb"><font color="#888888">Fernando<br> <div> <div> <div> <div><br> <br> </div> </div> </div> </div> </font></span></div> <br> ______________________________<wbr>_________________<br> Users mailing list<br> <a moz-do-not-send="true" href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a><br> <a moz-do-not-send="true" href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman<wbr>/listinfo/users</a><br> <br> </blockquote> </div> <br> </div> </div> </blockquote> <br> </div> </div> </div> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> </blockquote> </div> <br> </div> </div> </blockquote> <br> </body> </html> --------------E6119F5318438FF584A61CEE--

Yaniv Kaul

4:31 p.m.

On Tue, Mar 21, 2017 at 5:00 PM, FERNANDO FREDIANI < fernando.frediani@upx.com> wrote:

...

Hi Yaniv On 21/03/2017 06:19, Yaniv Kaul wrote:

Is your host with NUMA support (multiple sockets) ? Are all your interfaces connected to the same socket? Perhaps one is on the 'other' socket (a different PCI bus, etc.)? This can introduce latency. In general, you would want to align everything, from host (interrupts of the drivers) all the way to the guest to perform the processing on the same socket.

I believe so it is. Look: ~]# dmesg | grep -i numa [ 0.000000] Enabling automatic NUMA balancing. Configure with numa_balancing= or the kernel.numa_balancing sysctl [ 0.693082] pci_bus 0000:00: on NUMA node 0 [ 0.696457] pci_bus 0000:40: on NUMA node 1 [ 0.700678] pci_bus 0000:3f: on NUMA node 0 [ 0.704844] pci_bus 0000:7f: on NUMA node 1

So there are 2 NUMA nodes on the host? And where are the NICs located?

...

The thing is, if was something affecting the underlying network layer (drivers for the physical nics for example) it would affect all traffic to the VM, not just the one going in/out via vNIC1, right ?

Most likely.

...

Layer 2+3 may or may not provide you with good distribution across the physical links, depending on the traffic. Layer 3+4 hashing is better, but is not entirely compliant with all vendors/equipment.

Yes, I have tested with both and both work well. Have settled on layer2+3 as it balances the traffic equally layer3+4 for my scenario. Initially I have guessed it could be the bonding, but ruled that out when I tested with another physical interface that doesn't have any bonding and the problem happened the same for the VM in question.

Linux is not always happy with multiple interfaces on the same L2 network. I think there are some params needed to be set to make it happy?

Yes you are right and yes, knowing of that I have configured PBR using iproute2 which makes Linux work happy in this scenario. Works like a charm.

BTW, since those are virtual interfaces, why do you need two on the same VLAN?

...

That can explain it. Ideally, you need to also streamline the processing in the guest. The relevant application should be on the same NUMA node as the vCPU processing the virtio-net interrupts. In your case, the VM sees a single NUMA node - does that match the underlying host architecture as well?

Not sure. The command line from qemu-kvm is automatically generated by oVirt. Perhaps some extra option to be changed under Advanced Parameters on VM CPU configuration ? Also I was wondering if enabling "IO Threads Enabled" under Resource Allocation could be of any help.

IO threads are for IO (= storage, perhaps it's not clear and we need to clarify it) and only useful with large number of disks (and IO of course).

...

To finish I more inclined to understand that problem is restricted to the VM, not to the Host(drivers, physical NICs, etc), given the packet loss happens in vNIC1 not in vNIC2 when it has no traffic. If it was in the Host level or bonding it would affect the whole VM traffic in either vNICs. As a last resource I am considering add an extra 2 vCPUs to the VMs, but I guess that will only lower the problem. Does anyone think that "Threads per Core" or IO Thread could be a better choice ?

Are you using hyper-threading on the host? Otherwise, I'm not sure threads per core would help. Y.

...

Thanks Fernando

On 18/03/2017 12:53, Yaniv Kaul wrote:

On Fri, Mar 17, 2017 at 6:11 PM, FERNANDO FREDIANI < fernando.frediani@upx.com> wrote:

...
Hello all.

I have a peculiar problem here which perhaps others may have had or know about and can advise.

I have Virtual Machine with 2 VirtIO NICs. This VM serves around 1Gbps of traffic with thousands of clients connecting to it. When I do a packet loss test to the IP pinned to NIC1 it varies from 3% to 10% of packet loss. When I run the same test on NIC2 the packet loss is consistently 0%.

From what I gather I may have something to do with possible lack of Multi Queu VirtIO where NIC1 is managed by a single CPU which might be hitting 100% and causing this packet loss.

Looking at this reference (https://fedoraproject.org/wik i/Features/MQ_virtio_net) I see one way to test it is start the VM with 4 queues (for example), but checking on the qemu-kvm process I don't see option present. Any way I can force it from the Engine ?

I don't see a need for multi-queue for 1Gbps. Can you share the host statistics, the network configuration, the qemu-kvm command line, etc.? What is the difference between NIC1 and NIC2, in the way they are connected to the outside world?

...
This other reference (https://www.linux-kvm.org/pag e/Multiqueue#Enable_MQ_feature) points to the same direction about starting the VM with queues=N

Also trying to increase the TX ring buffer within the guest with ethtool -g eth0 is not possible.

Oh, by the way, the Load on the VM is significantly high despite the CPU usage isn't above 50% - 60% in average.

Load = latest 'top' results? Vs. CPU usage? Can mean a lot of processes waiting for CPU and doing very little - typical for web servers, for example. What is occupying the CPU? Y.

...
Thanks Fernando

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

FERNANDO FREDIANI

7:14 p.m.

This is a multi-part message in MIME format. --------------90DBE1C09961831317864F04 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Hello Yaniv. Have a new information about this scenario: I have load-balanced the requests between both vNICs, so each is receiving/sending half of the traffic in average and the packet loss although it still exists it lowered to 1% - 2% (which was expected as the CPU to process this traffic is shared by more than one CPU at a time). However the Load on the VM is still high probably due to the interrupts. Find below in-line the answers to some of your points: On 21/03/2017 12:31, Yaniv Kaul wrote:

...

So there are 2 NUMA nodes on the host? And where are the NICs located?

Tried to search how to check it but couldn't find how. Could you give me a hint ?

...

BTW, since those are virtual interfaces, why do you need two on the same VLAN?

Very good question. It's because of an specific situation where I need to 2 MAC addresses in order to balance the traffic in LAG in a switch which does only layer 2 hashing.

...

Are you using hyper-threading on the host? Otherwise, I'm not sure threads per core would help. Yes I have hyper-threading enabled on the Host. Is it worth to enable it ?

Thanks Fernando

...

...
On 18/03/2017 12:53, Yaniv Kaul wrote:

...
On Fri, Mar 17, 2017 at 6:11 PM, FERNANDO FREDIANI <fernando.frediani@upx.com <mailto:fernando.frediani@upx.com>> wrote:

Hello all.

I have a peculiar problem here which perhaps others may have had or know about and can advise.

I have Virtual Machine with 2 VirtIO NICs. This VM serves around 1Gbps of traffic with thousands of clients connecting to it. When I do a packet loss test to the IP pinned to NIC1 it varies from 3% to 10% of packet loss. When I run the same test on NIC2 the packet loss is consistently 0%.

From what I gather I may have something to do with possible lack of Multi Queu VirtIO where NIC1 is managed by a single CPU which might be hitting 100% and causing this packet loss.

Looking at this reference (https://fedoraproject.org/wiki/Features/MQ_virtio_net <https://fedoraproject.org/wiki/Features/MQ_virtio_net>) I see one way to test it is start the VM with 4 queues (for example), but checking on the qemu-kvm process I don't see option present. Any way I can force it from the Engine ?

I don't see a need for multi-queue for 1Gbps. Can you share the host statistics, the network configuration, the qemu-kvm command line, etc.? What is the difference between NIC1 and NIC2, in the way they are connected to the outside world?

This other reference (https://www.linux-kvm.org/page/Multiqueue#Enable_MQ_feature <https://www.linux-kvm.org/page/Multiqueue#Enable_MQ_feature>) points to the same direction about starting the VM with queues=N

Also trying to increase the TX ring buffer within the guest with ethtool -g eth0 is not possible.

Oh, by the way, the Load on the VM is significantly high despite the CPU usage isn't above 50% - 60% in average.

Load = latest 'top' results? Vs. CPU usage? Can mean a lot of processes waiting for CPU and doing very little - typical for web servers, for example. What is occupying the CPU? Y.

Thanks Fernando

_______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users <http://lists.ovirt.org/mailman/listinfo/users>

--------------90DBE1C09961831317864F04 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit <html> <head> <meta content="text/html; charset=utf-8" http-equiv="Content-Type"> </head> <body bgcolor="#FFFFFF" text="#000000"> <p>Hello Yaniv.</p> <p>Have a new information about this scenario: I have load-balanced the requests between both vNICs, so each is receiving/sending half of the traffic in average and the packet loss although it still exists it lowered to 1% - 2% (which was expected as the CPU to process this traffic is shared by more than one CPU at a time).<br> However the Load on the VM is still high probably due to the interrupts.</p> <p>Find below in-line the answers to some of your points:<br> </p> <br> <div class="moz-cite-prefix">On 21/03/2017 12:31, Yaniv Kaul wrote:<br> </div> <blockquote cite="mid:CAJgorsaVQzthC8Mq+Gpa66e6SB4SfvzwftE8ViH8bE4jv3r4eg@mail.gmail.com" type="cite"> <div dir="ltr"><br> <div class="gmail_extra">So there are 2 NUMA nodes on the host? And where are the NICs located?</div> </div> </blockquote> Tried to search how to check it but couldn't find how. Could you give me a hint ?<br> <blockquote cite="mid:CAJgorsaVQzthC8Mq+Gpa66e6SB4SfvzwftE8ViH8bE4jv3r4eg@mail.gmail.com" type="cite"> <div dir="ltr"> <div class="gmail_extra"> <div class="gmail_quote"> <div> <br> <span class=""> </span></div> BTW, since those are virtual interfaces, why do you need two on the same VLAN?</div> </div> </div> </blockquote> Very good question. It's because of an specific situation where I need to 2 MAC addresses in order to balance the traffic in LAG in a switch which does only layer 2 hashing.<br> <blockquote cite="mid:CAJgorsaVQzthC8Mq+Gpa66e6SB4SfvzwftE8ViH8bE4jv3r4eg@mail.gmail.com" type="cite"> <div dir="ltr"> <div class="gmail_extra"> <div class="gmail_quote"> <div> </div> Are you using hyper-threading on the host? Otherwise, I'm not sure threads per core would help.</div> </div> </div> </blockquote> Yes I have hyper-threading enabled on the Host. Is it worth to enable it ?<br> <br> Thanks<br> Fernando<br> <span class=""></span><br> <span class=""> </span> <blockquote cite="mid:CAJgorsaVQzthC8Mq+Gpa66e6SB4SfvzwftE8ViH8bE4jv3r4eg@mail.gmail.com" type="cite"> <div dir="ltr"> <div class="gmail_extra"> <div class="gmail_quote"> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div bgcolor="#FFFFFF" text="#000000"><span class=""> <blockquote type="cite"> <div dir="ltr"> <div class="gmail_extra"> <div class="gmail_quote"> <div bgcolor="#FFFFFF" text="#000000"><span class="m_-144604539708212250HOEnZb"></span> <div> <div class="m_-144604539708212250h5"> <br> <div class="m_-144604539708212250m_7680788519611111480moz-cite-prefix">On 18/03/2017 12:53, Yaniv Kaul wrote:<br> </div> <blockquote type="cite"> <div dir="ltr"><br> <div class="gmail_extra"><br> <div class="gmail_quote">On Fri, Mar 17, 2017 at 6:11 PM, FERNANDO FREDIANI <span dir="ltr"><<a moz-do-not-send="true" href="mailto:fernando.frediani@upx.com" target="_blank">fernando.frediani@upx.com</a>></span> wrote:<br> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div dir="ltr"> <div> <div> <div> <div> <div> <div> <div> <div>Hello all.<br> <br> </div> I have a peculiar problem here which perhaps others may have had or know about and can advise.<br> <br> </div> I have Virtual Machine with 2 VirtIO NICs. This VM serves around 1Gbps of traffic with thousands of clients connecting to it. When I do a packet loss test to the IP pinned to NIC1 it varies from 3% to 10% of packet loss. When I run the same test on NIC2 the packet loss is consistently 0%.<br> <br> </div> From what I gather I may have something to do with possible lack of Multi Queu VirtIO where NIC1 is managed by a single CPU which might be hitting 100% and causing this packet loss.<br> <br> </div> Looking at this reference (<a moz-do-not-send="true" href="https://fedoraproject.org/wiki/Features/MQ_virtio_net" target="_blank">https://fedoraproject.org/wik<wbr>i/Features/MQ_virtio_net</a>) I see one way to test it is start the VM with 4 queues (for example), but checking on the qemu-kvm process I don't see option present. Any way I can force it from the Engine ?<br> </div> </div> </div> </div> </div> </blockquote> <div><br> </div> <div>I don't see a need for multi-queue for 1Gbps.</div> <div>Can you share the host statistics, the network configuration, the qemu-kvm command line, etc.?</div> <div>What is the difference between NIC1 and NIC2, in the way they are connected to the outside world?</div> <div> </div> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div dir="ltr"> <div> <div> <div> <div><br> </div> This other reference (<a moz-do-not-send="true" href="https://www.linux-kvm.org/page/Multiqueue#Enable_MQ_feature" target="_blank">https://www.linux-kvm.org/pag<wbr>e/Multiqueue#Enable_MQ_feature</a><wbr>) points to the same direction about starting the VM with queues=N<br> <br> </div> <div>Also trying to increase the TX ring buffer within the guest with ethtool -g eth0 is not possible.<br> </div> <div><br> </div> Oh, by the way, the Load on the VM is significantly high despite the CPU usage isn't above 50% - 60% in average.<br> </div> </div> </div> </blockquote> <div><br> </div> <div>Load = latest 'top' results? Vs. CPU usage? Can mean a lot of processes waiting for CPU and doing very little - typical for web servers, for example. What is occupying the CPU?</div> <div>Y.</div> <div> </div> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div dir="ltr"> <div> <div><br> </div> Thanks<span class="m_-144604539708212250m_7680788519611111480HOEnZb"><font color="#888888"><br> </font></span></div> <span class="m_-144604539708212250m_7680788519611111480HOEnZb"><font color="#888888">Fernando<br> <div> <div> <div> <div><br> <br> </div> </div> </div> </div> </font></span></div> <br> ______________________________<wbr>_________________<br> Users mailing list<br> <a moz-do-not-send="true" href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a><br> <a moz-do-not-send="true" href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman<wbr>/listinfo/users</a><br> <br> </blockquote> </div> <br> </div> </div> </blockquote> <br> </div> </div> </div> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> </blockquote> </div> <br> </div> </div> </blockquote> <br> </span></div> </blockquote> </div> <br> </div> </div> </blockquote> <br> </body> </html> --------------90DBE1C09961831317864F04--

Yaniv Kaul

10 p.m.

On Tue, Mar 21, 2017 at 8:14 PM, FERNANDO FREDIANI < fernando.frediani@upx.com> wrote:

...

Hello Yaniv.

Have a new information about this scenario: I have load-balanced the requests between both vNICs, so each is receiving/sending half of the traffic in average and the packet loss although it still exists it lowered to 1% - 2% (which was expected as the CPU to process this traffic is shared by more than one CPU at a time). However the Load on the VM is still high probably due to the interrupts.

Find below in-line the answers to some of your points:

On 21/03/2017 12:31, Yaniv Kaul wrote:

So there are 2 NUMA nodes on the host? And where are the NICs located?

Tried to search how to check it but couldn't find how. Could you give me a hint ?

I believe 'lspci -vmm' should provide you with node information per PCI device. 'numactl' can also provide interesting information.

...

BTW, since those are virtual interfaces, why do you need two on the same VLAN?

Very good question. It's because of an specific situation where I need to 2 MAC addresses in order to balance the traffic in LAG in a switch which does only layer 2 hashing.

Are you using hyper-threading on the host? Otherwise, I'm not sure threads per core would help.

Yes I have hyper-threading enabled on the Host. Is it worth to enable it ?

Depends on the workload. Some benefit from it, some don't. I wouldn't in your case (it benefits mainly the case of many VMs with small number of vCPUs). Y.

...

Thanks Fernando

...
On 18/03/2017 12:53, Yaniv Kaul wrote:

On Fri, Mar 17, 2017 at 6:11 PM, FERNANDO FREDIANI < fernando.frediani@upx.com> wrote:

...
Hello all.

I have a peculiar problem here which perhaps others may have had or know about and can advise.

I have Virtual Machine with 2 VirtIO NICs. This VM serves around 1Gbps of traffic with thousands of clients connecting to it. When I do a packet loss test to the IP pinned to NIC1 it varies from 3% to 10% of packet loss. When I run the same test on NIC2 the packet loss is consistently 0%.

From what I gather I may have something to do with possible lack of Multi Queu VirtIO where NIC1 is managed by a single CPU which might be hitting 100% and causing this packet loss.

Looking at this reference (https://fedoraproject.org/wik i/Features/MQ_virtio_net) I see one way to test it is start the VM with 4 queues (for example), but checking on the qemu-kvm process I don't see option present. Any way I can force it from the Engine ?

I don't see a need for multi-queue for 1Gbps. Can you share the host statistics, the network configuration, the qemu-kvm command line, etc.? What is the difference between NIC1 and NIC2, in the way they are connected to the outside world?

...
This other reference (https://www.linux-kvm.org/pag e/Multiqueue#Enable_MQ_feature) points to the same direction about starting the VM with queues=N

Also trying to increase the TX ring buffer within the guest with ethtool -g eth0 is not possible.

Oh, by the way, the Load on the VM is significantly high despite the CPU usage isn't above 50% - 60% in average.

Load = latest 'top' results? Vs. CPU usage? Can mean a lot of processes waiting for CPU and doing very little - typical for web servers, for example. What is occupying the CPU? Y.

...
Thanks Fernando

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

3129

Age (days ago)

3133

Last active (days ago)

List overview

Download

9 comments

3 participants

participants (3)

FERNANDO FREDIANI
Phil Meyer
Yaniv Kaul