On Mon, Mar 20, 2017 at 5:14 PM, FERNANDO FREDIANI <
fernando.frediani(a)upx.com> wrote:
Hello Yaniv.
It also looks to me initially that for 1Gbps multi-queue would not be
necessary, however the Virtual Machine is relatively busy where the CPU
necessary to process it may (or not) be competing with the processes
running on in the guest.
The network is as following: 3 x 1 Gb interfaces bonded together with
layer2+3 has algorithm where the VMs connect to the outside world.
Is your host with NUMA support (multiple sockets) ? Are all your interfaces
connected to the same socket? Perhaps one is on the 'other' socket (a
different PCI bus, etc.)? This can introduce latency.
In general, you would want to align everything, from host (interrupts of
the drivers) all the way to the guest to perform the processing on the same
socket.
Layer 2+3 may or may not provide you with good distribution across the
physical links, depending on the traffic. Layer 3+4 hashing is better, but
is not entirely compliant with all vendors/equipment.
vNIC1 and vNIC2 in the VMs are the same VirtIO NIC types. These vNICs are
connected to the same VLAN and they are both able to output 1Gbps
throughput each at the same time in iperf tests as the bond below has 3Gb
capacity.
Linux is not always happy with multiple interfaces on the same L2 network.
I think there are some params needed to be set to make it happy?
Please note something interesting I mentioned previously: All
traffic
currently goes in and out via vNIC1 which is showing packet loss (3% to
10%) on the tests conducted. NIC2 has zero traffic and if the same tests
are conducted against it shows 0% packets loss.
At first impression if it was something related to the bond or even to the
physical NICs on the Host it should show packet loss for ANY of the vNICs
as the traffic flows through the same physical NIC and bond, but is not the
case.
This is the qemu-kvm command the Host is executing:
/usr/libexec/qemu-kvm -name guest=VM_NAME_REPLACED,debug-threads=on -S
-object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/
qemu/domain-6-VM_NAME_REPLACED/master-key.aes -machine
pc-i440fx-rhel7.3.0,accel=kvm,usb=off -cpu SandyBridge -m 4096 -realtime
mlock=off -smp 4,maxcpus=16,sockets=16,cores=1,threads=1 -numa
node,nodeid=0,cpus=0-3,mem=4096 -uuid 57ffc2ed-fec5-47d6-bfb1-60c728737bd2
-smbios type=1,manufacturer=oVirt,product=oVirt Node,version=7-3.1611.el7.
centos,serial=4C4C4544-0043-5610-804B-B1C04F4E3232,uuid=
57ffc2ed-fec5-47d6-bfb1-60c728737bd2 -no-user-config -nodefaults -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-6-
VM_NAME_REPLACED/monitor.sock,server,nowait -mon
chardev=charmonitor,id=monitor,mode=control
-rtc base=2017-03-17T01:12:39,driftfix=slew -global
kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot strict=on
-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device
virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x7 -device
virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x5
-drive if=none,id=drive-ide0-1-0,readonly=on -device
ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive
file=/rhev/data-center/2325e1a4-c702-469c-82eb-ff43baa06d44/8dcd90f4-c0f0-
47db-be39-5b49685acc04/images/ebe10e75-799a-439e-bc52-
551b894c34fa/1a73cd53-0e51-4e49-8631-38cf571f6bb9,format=
qcow2,if=none,id=drive-scsi0-0-0-0,serial=ebe10e75-799a-
439e-bc52-551b894c34fa,cache=none,werror=stop,rerror=stop,aio=native
-device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-
scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 -drive file=/rhev/data-center/
2325e1a4-c702-469c-82eb-ff43baa06d44/8dcd90f4-c0f0-
47db-be39-5b49685acc04/images/db401b27-006d-494c-a1ee-
1d37810710c8/664cffe6-52f8-429d-8bb9-2f43fa7a468f,format=
qcow2,if=none,id=drive-scsi0-0-0-1,serial=db401b27-006d-
494c-a1ee-1d37810710c8,cache=none,werror=stop,rerror=stop,aio=native
-device
scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-0-1,id=scsi0-0-0-1
-netdev tap,fd=33,id=hostnet0,vhost=on,vhostfd=36 -device
virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:16:01:60,bus=pci.0,addr=0x3
-netdev tap,fd=37,id=hostnet1,vhost=on,vhostfd=38 -device
virtio-net-pci,netdev=hostnet1,id=net1,mac=00:1a:4a:16:01:61,bus=pci.0,addr=0x4
-chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/
57ffc2ed-fec5-47d6-bfb1-60c728737bd2.com.redhat.rhevm.vdsm,server,nowait
-device virtserialport,bus=virtio-serial0.0,nr=1,chardev=
charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev
socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/
57ffc2ed-fec5-47d6-bfb1-60c728737bd2.org.qemu.guest_agent.0,server,nowait
-device virtserialport,bus=virtio-serial0.0,nr=2,chardev=
charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev
spicevmc,id=charchannel2,name=vdagent -device virtserialport,bus=virtio-
serial0.0,nr=3,chardev=charchannel2,id=channel2,name=com.redhat.spice.0
-vnc 192.168.100.19:2,password -k pt-br -spice tls-port=5903,addr=192.168.
100.19,x509-dir=/etc/pki/vdsm/libvirt-spice,tls-channel=
default,tls-channel=main,tls-channel=display,tls-channel=
inputs,tls-channel=cursor,tls-channel=playback,tls-channel=
record,tls-channel=smartcard,tls-channel=usbredir,seamless-migration=on
-k pt-br -device qxl-vga,id=video0,ram_size=67108864,vram_size=33554432,
vram64_size_mb=0,vgamem_mb=16,bus=pci.0,addr=0x2 -incoming defer -device
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -object
rng-random,id=objrng0,filename=/dev/urandom -device
virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x8 -msg timestamp=on
Load in the VM is relatively high (20 to 30) and CPU usage is between 50%
to 60% with eventual peaks of 100% in one of the vCPUs. There is a lot of
processes running in the VM similar to Web Servers which is using this
amount of CPU.
Only guess I could have so far is that traffic on NIC1 is being handeled
by one of the vCPUs which eventually get 100% due to some of the processes
while traffic on NIC2 is handled by another vCPU which is not that busy and
explains the 0% packet loss. BUT, should VirtIO vNIC use CPU from within
the Guest ?
Does it make any sense ?
Thanks
That can explain it. Ideally, you need to also streamline the processing
in the guest. The relevant application should be on the same NUMA node as
the vCPU processing the virtio-net interrupts.
In your case, the VM sees a single NUMA node - does that match the
underlying host architecture as well?
Y.
Fernando
On 18/03/2017 12:53, Yaniv Kaul wrote:
On Fri, Mar 17, 2017 at 6:11 PM, FERNANDO FREDIANI <
fernando.frediani(a)upx.com> wrote:
> Hello all.
>
> I have a peculiar problem here which perhaps others may have had or know
> about and can advise.
>
> I have Virtual Machine with 2 VirtIO NICs. This VM serves around 1Gbps of
> traffic with thousands of clients connecting to it. When I do a packet loss
> test to the IP pinned to NIC1 it varies from 3% to 10% of packet loss. When
> I run the same test on NIC2 the packet loss is consistently 0%.
>
> From what I gather I may have something to do with possible lack of Multi
> Queu VirtIO where NIC1 is managed by a single CPU which might be hitting
> 100% and causing this packet loss.
>
> Looking at this reference (
https://fedoraproject.org/wik
> i/Features/MQ_virtio_net) I see one way to test it is start the VM with
> 4 queues (for example), but checking on the qemu-kvm process I don't see
> option present. Any way I can force it from the Engine ?
>
I don't see a need for multi-queue for 1Gbps.
Can you share the host statistics, the network configuration, the qemu-kvm
command line, etc.?
What is the difference between NIC1 and NIC2, in the way they are
connected to the outside world?
>
> This other reference (
https://www.linux-kvm.org/pag
> e/Multiqueue#Enable_MQ_feature) points to the same direction about
> starting the VM with queues=N
>
> Also trying to increase the TX ring buffer within the guest with ethtool
> -g eth0 is not possible.
>
> Oh, by the way, the Load on the VM is significantly high despite the CPU
> usage isn't above 50% - 60% in average.
>
Load = latest 'top' results? Vs. CPU usage? Can mean a lot of processes
waiting for CPU and doing very little - typical for web servers, for
example. What is occupying the CPU?
Y.
>
> Thanks
> Fernando
>
>
>
> _______________________________________________
> Users mailing list
> Users(a)ovirt.org
>
http://lists.ovirt.org/mailman/listinfo/users
>
>