<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<p>Hi Yaniv<br>
</p>
On 21/03/2017 06:19, Yaniv Kaul wrote:<br>
<blockquote
cite="mid:CAJgorsaVtaNyz4iJf=ZeRGkZE_n32JKYq9oxEUzoKRQyr8VD5w@mail.gmail.com"
type="cite">
<div dir="ltr"><br>
<div class="gmail_extra">Is your host with NUMA support
(multiple sockets) ? Are all your interfaces connected to the
same socket? Perhaps one is on the 'other' socket (a different
PCI bus, etc.)? This can introduce latency.
<div class="gmail_quote">
<div>In general, you would want to align everything, from
host (interrupts of the drivers) all the way to the guest
to perform the processing on the same socket.</div>
</div>
</div>
</div>
</blockquote>
I believe so it is. Look:<br>
~]# dmesg | grep -i numa<br>
[ 0.000000] Enabling automatic NUMA balancing. Configure with
numa_balancing= or the kernel.numa_balancing sysctl<br>
[ 0.693082] pci_bus 0000:00: on NUMA node 0<br>
[ 0.696457] pci_bus 0000:40: on NUMA node 1<br>
[ 0.700678] pci_bus 0000:3f: on NUMA node 0<br>
[ 0.704844] pci_bus 0000:7f: on NUMA node 1<br>
<br>
The thing is, if was something affecting the underlying network
layer (drivers for the physical nics for example) it would affect
all traffic to the VM, not just the one going in/out via vNIC1,
right ?<br>
<blockquote
cite="mid:CAJgorsaVtaNyz4iJf=ZeRGkZE_n32JKYq9oxEUzoKRQyr8VD5w@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div>Layer 2+3 may or may not provide you with good
distribution across the physical links, depending on the
traffic. Layer 3+4 hashing is better, but is not entirely
compliant with all vendors/equipment.</div>
</div>
</div>
</div>
</blockquote>
Yes, I have tested with both and both work well. Have settled on
layer2+3 as it balances the traffic equally layer3+4 for my
scenario.<br>
Initially I have guessed it could be the bonding, but ruled that out
when I tested with another physical interface that doesn't have any
bonding and the problem happened the same for the VM in question.<br>
<blockquote
cite="mid:CAJgorsaVtaNyz4iJf=ZeRGkZE_n32JKYq9oxEUzoKRQyr8VD5w@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div>Linux is not always happy with multiple interfaces on
the same L2 network. I think there are some params needed
to be set to make it happy?</div>
</div>
</div>
</div>
</blockquote>
Yes you are right and yes, knowing of that I have configured PBR
using iproute2 which makes Linux work happy in this scenario. Works
like a charm.<br>
<blockquote
cite="mid:CAJgorsaVtaNyz4iJf=ZeRGkZE_n32JKYq9oxEUzoKRQyr8VD5w@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> </div>
<div><br>
</div>
<div>That can explain it. Ideally, you need to also
streamline the processing in the guest. The relevant
application should be on the same NUMA node as the vCPU
processing the virtio-net interrupts.</div>
<div>In your case, the VM sees a single NUMA node - does
that match the underlying host architecture as well?</div>
</div>
</div>
</div>
</blockquote>
Not sure. The command line from qemu-kvm is automatically generated
by oVirt. Perhaps some extra option to be changed under Advanced
Parameters on VM CPU configuration ? Also I was wondering if
enabling "IO Threads Enabled" under Resource Allocation could be of
any help.<br>
<br>
To finish I more inclined to understand that problem is restricted
to the VM, not to the Host(drivers, physical NICs, etc), given the
packet loss happens in vNIC1 not in vNIC2 when it has no traffic. If
it was in the Host level or bonding it would affect the whole VM
traffic in either vNICs.<br>
As a last resource I am considering add an extra 2 vCPUs to the VMs,
but I guess that will only lower the problem. Does anyone think that
"Threads per Core" or IO Thread could be a better choice ?<br>
<br>
Thanks<br>
Fernando<br>
<br>
<blockquote
cite="mid:CAJgorsaVtaNyz4iJf=ZeRGkZE_n32JKYq9oxEUzoKRQyr8VD5w@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div bgcolor="#FFFFFF" text="#000000"><span class="HOEnZb"></span>
<div>
<div class="h5"> <br>
<div class="m_7680788519611111480moz-cite-prefix">On
18/03/2017 12:53, Yaniv Kaul wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Fri, Mar 17, 2017 at
6:11 PM, FERNANDO FREDIANI <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:fernando.frediani@upx.com"
target="_blank">fernando.frediani@upx.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex">
<div dir="ltr">
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>Hello all.<br>
<br>
</div>
I have a peculiar problem
here which perhaps others
may have had or know about
and can advise.<br>
<br>
</div>
I have Virtual Machine with 2
VirtIO NICs. This VM serves
around 1Gbps of traffic with
thousands of clients
connecting to it. When I do a
packet loss test to the IP
pinned to NIC1 it varies from
3% to 10% of packet loss. When
I run the same test on NIC2
the packet loss is
consistently 0%.<br>
<br>
</div>
From what I gather I may have
something to do with possible
lack of Multi Queu VirtIO where
NIC1 is managed by a single CPU
which might be hitting 100% and
causing this packet loss.<br>
<br>
</div>
Looking at this reference (<a
moz-do-not-send="true"
href="https://fedoraproject.org/wiki/Features/MQ_virtio_net"
target="_blank">https://fedoraproject.org/wik<wbr>i/Features/MQ_virtio_net</a>)
I see one way to test it is start
the VM with 4 queues (for
example), but checking on the
qemu-kvm process I don't see
option present. Any way I can
force it from the Engine ?<br>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>I don't see a need for multi-queue for
1Gbps.</div>
<div>Can you share the host statistics, the
network configuration, the qemu-kvm command
line, etc.?</div>
<div>What is the difference between NIC1 and
NIC2, in the way they are connected to the
outside world?</div>
<div> </div>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex">
<div dir="ltr">
<div>
<div>
<div>
<div><br>
</div>
This other reference (<a
moz-do-not-send="true"
href="https://www.linux-kvm.org/page/Multiqueue#Enable_MQ_feature"
target="_blank">https://www.linux-kvm.org/pag<wbr>e/Multiqueue#Enable_MQ_feature</a><wbr>)
points to the same direction about
starting the VM with queues=N<br>
<br>
</div>
<div>Also trying to increase the TX
ring buffer within the guest with
ethtool -g eth0 is not possible.<br>
</div>
<div><br>
</div>
Oh, by the way, the Load on the VM is
significantly high despite the CPU
usage isn't above 50% - 60% in
average.<br>
</div>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>Load = latest 'top' results? Vs. CPU
usage? Can mean a lot of processes waiting
for CPU and doing very little - typical for
web servers, for example. What is occupying
the CPU?</div>
<div>Y.</div>
<div> </div>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex">
<div dir="ltr">
<div>
<div><br>
</div>
Thanks<span
class="m_7680788519611111480HOEnZb"><font
color="#888888"><br>
</font></span></div>
<span class="m_7680788519611111480HOEnZb"><font
color="#888888">Fernando<br>
<div>
<div>
<div>
<div><br>
<br>
</div>
</div>
</div>
</div>
</font></span></div>
<br>
______________________________<wbr>_________________<br>
Users mailing list<br>
<a moz-do-not-send="true"
href="mailto:Users@ovirt.org"
target="_blank">Users@ovirt.org</a><br>
<a moz-do-not-send="true"
href="http://lists.ovirt.org/mailman/listinfo/users"
rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman<wbr>/listinfo/users</a><br>
<br>
</blockquote>
</div>
<br>
</div>
</div>
</blockquote>
<br>
</div>
</div>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
</blockquote>
</div>
<br>
</div>
</div>
</blockquote>
<br>
</body>
</html>