Hello Phil.
Thanks for the tips.
I have checked the hosts and all four 1 Gb NICs use tg3 driver and are
"Broadcom
Gigabit Ethernet BCM5720" so they should all behaviour the same.
As I use 3 bonded interfaces on each Host where the VM connects to, I have
downed each of the 3 one at a time to see if any of them could be adding
this packet loss but that changed nothing about.
Interesting is: I have another server with exactly same hardware which is
not an Hypervisor, runs CentOS 6 with newer kernel 4.5.0-1 and has not
packet loss at all even with high traffic. While the oVirt Node runs CentOS
7.3 (oVirt-Node-NG 4.1) but with kernel 3.10.0-514.6.1.
Could it possibility be anything related to the kernel version and should I
try to upgrade the oVirt-Node kernel or rather install a Minimal CentOS 7,
use the newer kernel on it and use it as a Hypervisor instead of
oVirt-Node-NG?
From what I could gather searching all day about this issue it make
sense
it be something related to NIC buffers or multiqueue , but not sure yet
what is the best way to address it: if when starting up the Virtual Machine
add the queue=N option, if changing anything on the NIC driver config on
the Host or even trying to use a different driver/kernel version.
Please note something I mentioned in the previous message: If I run the
packet loss test against each of the 2 VirtIO NIC on the same VM, the busy
one (NIC1) has packet loss and the one without much traffic (NIC2) doesn't.
All the traffic going up to all VMs on the Host pass through the same bond
interface, so if it was something related to the physical NICs it would
show packet loss to the second vNIC as well. Or do I miss anything here ?
Thanks
Fernando
2017-03-17 14:53 GMT-03:00 Phil Meyer <phil(a)unixlords.com>:
On 03/17/2017 11:11 AM, FERNANDO FREDIANI wrote:
> Hello all.
>
> I have a peculiar problem here which perhaps others may have had or
> know about and can advise.
>
> I have Virtual Machine with 2 VirtIO NICs. This VM serves around 1Gbps
> of traffic with thousands of clients connecting to it. When I do a
> packet loss test to the IP pinned to NIC1 it varies from 3% to 10% of
> packet loss. When I run the same test on NIC2 the packet loss is
> consistently 0%.
>
> From what I gather I may have something to do with possible lack of
> Multi Queu VirtIO where NIC1 is managed by a single CPU which might be
> hitting 100% and causing this packet loss.
>
> Looking at this reference
> (
https://fedoraproject.org/wiki/Features/MQ_virtio_net) I see one way
> to test it is start the VM with 4 queues (for example), but checking
> on the qemu-kvm process I don't see option present. Any way I can
> force it from the Engine ?
>
> This other reference
> (
https://www.linux-kvm.org/page/Multiqueue#Enable_MQ_feature) points
> to the same direction about starting the VM with queues=N
>
> Also trying to increase the TX ring buffer within the guest with
> ethtool -g eth0 is not possible.
>
> Oh, by the way, the Load on the VM is significantly high despite the
> CPU usage isn't above 50% - 60% in average.
>
> Thanks
> Fernando
Check for NIC errors on the host. There have been numerous issues with
Windows VMs
not being able to handle certain features of better NICs on the host.
By turning those features off on the host, the VM may be able to cope
again.
here is a snippet from a support case we had here:
"
There have been no occurrences of the ixgbe driver issue in the logs
since the fix went in at roughly: Jan 3 22:50:11 2016 until now: Tue
Jan 5 15:28:02 2016
Only large-receive-offload was turned off with:
# ethtool -K eth0 lro off
# ethtool -K eth1 lro off
"
By making that change on all of the hosts, the Windows VMs all recovered.
This is likely not your exact issue, but its included here to show that
some OSes on VMs can have issues with the host NIC that the VM does not
support.
The issue may even be seen in the error logs on the host, as these were.
_______________________________________________
Users mailing list
Users(a)ovirt.org
http://lists.ovirt.org/mailman/listinfo/users