--_000_56EBB5D93000101dmcamcnetworkscom_
Content-Type: text/plain; charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable
Hi Johan,
Could you check if you see the following in you dmesg or message log file?
[1123306.014288] ------------[ cut here ]------------
[1123306.014302] WARNING: at net/core/dev.c:2189 skb_warn_bad_offload+0xcd/=
0xda()
[1123306.014306] : caps=3D(0x0000000200004849, 0x0000000000000000) len=3D33=
0 data_len=3D276 gso_size=3D276 gso_type=3D1 ip_summed=3D1
[1123306.014308] Modules linked in: vhost_net macvtap macvlan ip6table_filt=
er ip6_tables iptable_filter ip_tables ebt_arp ebtable_nat ebtables tun scs=
i_transport_iscsi iTCO_wdt iTCO_vendor_support dm_service_time intel_powerc=
lamp coretemp intel_rapl kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_=
clmulni_intel cryptd pcspkr sb_edac edac_core i2c_i801 lpc_ich mfd_core mei=
_me mei wmi ioatdma shpchp ipmi_devintf ipmi_si ipmi_msghandler acpi_power_=
meter acpi_pad 8021q garp mrp bridge stp llc bonding dm_multipath xfs libcr=
c32c sd_mod crc_t10dif crct10dif_common ast syscopyarea sysfillrect sysimgb=
lt drm_kms_helper ttm crc32c_intel igb drm ahci ixgbe i2c_algo_bit libahci =
libata mdio i2c_core ptp megaraid_sas pps_core dca dm_mirror dm_region_hash=
dm_log dm_mod
[1123306.014360] CPU: 30 PID: 0 Comm: swapper/30 Tainted: G W ----=
---------- 3.10.0-229.1.2.el7.x86_64 #1
[1123306.014362] Hardware name: Supermicro SYS-2028TP-HC1TR/X10DRT-PT, BIOS=
1.1 08/03/2015
[1123306.014364] ffff881fffc439a8 5326fb90ad1041ea ffff881fffc43960 ffffff=
ff81604afa
[1123306.014371] ffff881fffc43998 ffffffff8106e34b ffff881fcebb0500 ffff88=
1fce88c000
[1123306.014376] 0000000000000001 0000000000000001 ffff881fcebb0500 ffff88=
1fffc43a00
[1123306.014381] Call Trace:
[1123306.014383] <IRQ> [<ffffffff81604afa>] dump_stack+0x19/0x1b
[1123306.014396] [<ffffffff8106e34b>] warn_slowpath_common+0x6b/0xb0
[1123306.014399] [<ffffffff8106e3ec>] warn_slowpath_fmt+0x5c/0x80
[1123306.014405] [<ffffffff812db093>] ? ___ratelimit+0x93/0x100
[1123306.014409] [<ffffffff816076c3>] skb_warn_bad_offload+0xcd/0xda
[1123306.014425] [<ffffffff814fdeb9>] __skb_gso_segment+0x79/0xb0
[1123306.014429] [<ffffffff814fe1c2>] dev_hard_start_xmit+0x1a2/0x580
[1123306.014438] [<ffffffffa0168790>] ? deliver_clone+0x50/0x50 [bridge]
[1123306.014443] [<ffffffff8151df1e>] sch_direct_xmit+0xee/0x1c0
[1123306.014447] [<ffffffff814fe798>] dev_queue_xmit+0x1f8/0x4a0
[1123306.014453] [<ffffffffa016880b>] br_dev_queue_push_xmit+0x7b/0xc0 [br=
idge]
[1123306.014458] [<ffffffffa0168a22>] br_forward_finish+0x22/0x60 [bridge]
[1123306.014464] [<ffffffffa0168ae0>] __br_forward+0x80/0xf0 [bridge]
[1123306.014469] [<ffffffffa0168ebb>] br_forward+0x8b/0xa0 [bridge]
[1123306.014476] [<ffffffffa0169e65>] br_handle_frame_finish+0x175/0x410 [=
bridge]
[1123306.014481] [<ffffffffa016a275>] br_handle_frame+0x175/0x260 [bridge]
[1123306.014485] [<ffffffff814fc112>] __netif_receive_skb_core+0x282/0x870
[1123306.014490] [<ffffffff8101b589>] ? read_tsc+0x9/0x10
[1123306.014493] [<ffffffff814fc718>] __netif_receive_skb+0x18/0x60
[1123306.014497] [<ffffffff814fc7a0>] netif_receive_skb+0x40/0xd0
[1123306.014500] [<ffffffff814fd2b0>] napi_gro_receive+0x80/0xb0
[1123306.014512] [<ffffffffa00cde2c>] ixgbe_clean_rx_irq+0x7ac/0xb30 [ixgb=
e]
[1123306.014519] [<ffffffffa00cf07b>] ixgbe_poll+0x4bb/0x930 [ixgbe]
[1123306.014524] [<ffffffff814fcb62>] net_rx_action+0x152/0x240
[1123306.014528] [<ffffffff81077bf7>] __do_softirq+0xf7/0x290
[1123306.014533] [<ffffffff8161635c>] call_softirq+0x1c/0x30
[1123306.014539] [<ffffffff81015de5>] do_softirq+0x55/0x90
[1123306.014543] [<ffffffff81077f95>] irq_exit+0x115/0x120
[1123306.014546] [<ffffffff81616ef8>] do_IRQ+0x58/0xf0
[1123306.014551] [<ffffffff8160c0ed>] common_interrupt+0x6d/0x6d
[1123306.014553] <EOI> [<ffffffff814aa6d2>] ? cpuidle_enter_state+0x52/0x=
c0
[1123306.014561] [<ffffffff814aa6c8>] ? cpuidle_enter_state+0x48/0xc0
[1123306.014565] [<ffffffff814aa805>] cpuidle_idle_call+0xc5/0x200
[1123306.014569] [<ffffffff8101d21e>] arch_cpu_idle+0xe/0x30
[1123306.014574] [<ffffffff810c6945>] cpu_startup_entry+0xf5/0x290
[1123306.014580] [<ffffffff810423ca>] start_secondary+0x1ba/0x230
[1123306.014582] ---[ end trace 4d5a1bc838e1fcc0 ]---
If so, then could you try the following:
ethtool -K <nic name> lro off
Do this for all the 10G intel nics and check if the problems still exists
Kind regards,
Jurri=EBn Bloemen
On 17-03-16 09:49, Johan Kooijman wrote:
Hi all,
Since we upgraded to the latest ovirt node running 7.2, we're seeing that n=
odes become unavailable after a while. It's running fine, with a couple of =
VM's on it, untill it becomes non responsive. At that moment it doesn't eve=
n respond to ICMP. It'll come back by itself after a while, but oVirt fence=
s the machine before that time and restarts VM's elsewhere.
Engine tells me this message:
VDSM host09 command failed: Message timeout which can be caused by communic=
ation issues
Is anyone else experiencing these issues with ixgbe drivers? I'm running on=
Intel X540-AT2 cards.
--
Met vriendelijke groeten / With kind regards,
Johan Kooijman
_______________________________________________
Users mailing list
Users@ovirt.org<mailto:Users@ovirt.org>
http://lists.ovirt.org/mailman/listinfo/users
This message (including any attachments) may contain information that is pr=
ivileged or confidential. If you are not the intended recipient, please not=
ify the sender and delete this email immediately from your systems and dest=
roy all copies of it. You may not, directly or indirectly, use, disclose, d=
istribute, print or copy this email or any part of it if you are not the in=
tended recipient
--_000_56EBB5D93000101dmcamcnetworkscom_
Content-Type: text/html; charset="Windows-1252"
Content-ID: <69C5549CA6C47E439841A1DD44699816(a)chellomedia.com>
Content-Transfer-Encoding: quoted-printable
<html>
<head>
<meta http-equiv=3D"Content-Type" content=3D"text/html;
charset=3DWindows-1=
252">
</head>
<body bgcolor=3D"#FFFFFF" text=3D"#000000">
<tt>Hi Johan,<br>
<br>
Could you check if you see the following in you dmesg or message log file?<=
br>
<br>
[1123306.014288] ------------[ cut here ]------------<br>
[1123306.014302] WARNING: at net/core/dev.c:2189 skb_warn_bad_offload+0=
xcd/0xda()<br>
[1123306.014306] : caps=3D(0x0000000200004849, 0x0000000000000000) len=3D33=
0 data_len=3D276 gso_size=3D276 gso_type=3D1 ip_summed=3D1<br>
[1123306.014308] Modules linked in: vhost_net macvtap macvlan ip6table_filt=
er ip6_tables iptable_filter ip_tables ebt_arp ebtable_nat ebtables tun scs=
i_transport_iscsi iTCO_wdt iTCO_vendor_support dm_service_time intel_powerc=
lamp coretemp intel_rapl kvm_intel
kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel cryptd pcspkr sb_eda=
c edac_core i2c_i801 lpc_ich mfd_core mei_me mei wmi ioatdma shpchp ipmi_de=
vintf ipmi_si ipmi_msghandler acpi_power_meter acpi_pad 8021q garp mrp brid=
ge stp llc bonding dm_multipath
xfs libcrc32c sd_mod crc_t10dif crct10dif_common ast syscopyarea sysfillre=
ct sysimgblt drm_kms_helper ttm crc32c_intel igb drm ahci ixgbe i2c_algo_bi=
t libahci libata mdio i2c_core ptp megaraid_sas pps_core dca dm_mirror dm_r=
egion_hash dm_log dm_mod<br>
[1123306.014360] CPU: 30 PID: 0 Comm: swapper/30 Tainted: G &nbs=
p; W
-------------- 3.10.0-=
229.1.2.el7.x86_64 #1<br>
[1123306.014362] Hardware name: Supermicro SYS-2028TP-HC1TR/X10DRT-PT, BIOS=
1.1 08/03/2015<br>
[1123306.014364] ffff881fffc439a8 5326fb90ad1041ea ffff881fffc43960 f=
fffffff81604afa<br>
[1123306.014371] ffff881fffc43998 ffffffff8106e34b ffff881fcebb0500 f=
fff881fce88c000<br>
[1123306.014376] 0000000000000001 0000000000000001 ffff881fcebb0500 f=
fff881fffc43a00<br>
[1123306.014381] Call Trace:<br>
[1123306.014383] <IRQ>
[<ffffffff81604afa>] dump_st=
ack+0x19/0x1b<br>
[1123306.014396] [<ffffffff8106e34b>]
warn_slowpath_common+=
0x6b/0xb0<br>
[1123306.014399] [<ffffffff8106e3ec>]
warn_slowpath_fmt+0x5=
c/0x80<br>
[1123306.014405] [<ffffffff812db093>] ?
___ratelimit+0x93/0=
x100<br>
[1123306.014409] [<ffffffff816076c3>]
skb_warn_bad_offload+=
0xcd/0xda<br>
[1123306.014425] [<ffffffff814fdeb9>]
__skb_gso_segment+0x7=
9/0xb0<br>
[1123306.014429] [<ffffffff814fe1c2>]
dev_hard_start_xmit+0=
x1a2/0x580<br>
[1123306.014438] [<ffffffffa0168790>] ?
deliver_clone+0x50/=
0x50 [bridge]<br>
[1123306.014443] [<ffffffff8151df1e>]
sch_direct_xmit+0xee/=
0x1c0<br>
[1123306.014447] [<ffffffff814fe798>]
dev_queue_xmit+0x1f8/=
0x4a0<br>
[1123306.014453] [<ffffffffa016880b>]
br_dev_queue_push_xmit=
3;0x7b/0xc0 [bridge]<br>
[1123306.014458] [<ffffffffa0168a22>]
br_forward_finish+0x2=
2/0x60 [bridge]<br>
[1123306.014464] [<ffffffffa0168ae0>]
__br_forward+0x80/0xf=
0 [bridge]<br>
[1123306.014469] [<ffffffffa0168ebb>] br_forward+0x8b/0xa0
=
[bridge]<br>
[1123306.014476] [<ffffffffa0169e65>]
br_handle_frame_finish=
3;0x175/0x410 [bridge]<br>
[1123306.014481] [<ffffffffa016a275>]
br_handle_frame+0x175=
/0x260 [bridge]<br>
[1123306.014485] [<ffffffff814fc112>]
__netif_receive_skb_core&=
#43;0x282/0x870<br>
[1123306.014490] [<ffffffff8101b589>] ?
read_tsc+0x9/0x10<b=
r>
[1123306.014493] [<ffffffff814fc718>]
__netif_receive_skb+0=
x18/0x60<br>
[1123306.014497] [<ffffffff814fc7a0>]
netif_receive_skb+0x4=
0/0xd0<br>
[1123306.014500] [<ffffffff814fd2b0>]
napi_gro_receive+0x80=
/0xb0<br>
[1123306.014512] [<ffffffffa00cde2c>]
ixgbe_clean_rx_irq+0x=
7ac/0xb30 [ixgbe]<br>
[1123306.014519] [<ffffffffa00cf07b>]
ixgbe_poll+0x4bb/0x93=
0 [ixgbe]<br>
[1123306.014524] [<ffffffff814fcb62>]
net_rx_action+0x152/0=
x240<br>
[1123306.014528] [<ffffffff81077bf7>]
__do_softirq+0xf7/0x2=
90<br>
[1123306.014533] [<ffffffff8161635c>]
call_softirq+0x1c/0x3=
0<br>
[1123306.014539] [<ffffffff81015de5>]
do_softirq+0x55/0x90<=
br>
[1123306.014543] [<ffffffff81077f95>]
irq_exit+0x115/0x120<=
br>
[1123306.014546] [<ffffffff81616ef8>]
do_IRQ+0x58/0xf0<br>
[1123306.014551] [<ffffffff8160c0ed>]
common_interrupt+0x6d=
/0x6d<br>
[1123306.014553] <EOI>
[<ffffffff814aa6d2>] ? cpuid=
le_enter_state+0x52/0xc0<br>
[1123306.014561] [<ffffffff814aa6c8>] ?
cpuidle_enter_state+=
;0x48/0xc0<br>
[1123306.014565] [<ffffffff814aa805>]
cpuidle_idle_call+0xc=
5/0x200<br>
[1123306.014569] [<ffffffff8101d21e>]
arch_cpu_idle+0xe/0x3=
0<br>
[1123306.014574] [<ffffffff810c6945>]
cpu_startup_entry+0xf=
5/0x290<br>
[1123306.014580] [<ffffffff810423ca>]
start_secondary+0x1ba=
/0x230<br>
[1123306.014582] ---[ end trace 4d5a1bc838e1fcc0 ]---<br>
<br>
If so, then could you try the following:<br>
<br>
ethtool -K <nic name> lro off<br>
<br>
Do this for all the 10G intel nics and check if the problems still exists <=
br>
<br>
<br>
</tt>
<div class=3D"moz-signature">
<title></title>
<div style=3D"color: rgb(0, 0, 0);">
<p class=3D"MsoNormal" style=3D"font-size: 14px; font-family:
Calibri, sans-serif; margin: 0cm 0cm 0.0001pt;">
<b><font color=3D"#2c8cb6"
face=3D"Arial,sans-serif"><span style=3D"font-si=
ze: 10pt;">K</span><span style=3D"font-size:
13px;">i</span><span style=3D"font-size:
10pt;">nd regards,=
</span></font></b></p>
<p class=3D"MsoNormal" style=3D"font-size: 11pt; font-family:
Calibri, sans-serif; margin: 0cm 0cm 0.0001pt;">
<b><span style=3D"font-size: 10pt; font-family: Arial, sans-serif;
color: rgb(44, 140,
182);"> </span></b></p>
<p class=3D"MsoNormal" style=3D"font-size: 14px; font-family:
Calibri, sans-serif; margin: 0cm 0cm 0.0001pt;">
<b style=3D"font-size: 11pt;"><span style=3D"font-size: 10pt;
font-family: Arial, sans-serif; color: rgb(44, 140, 182);">Ju=
rri=EBn Bloemen</span></b><b style=3D"font-size:
11pt;"><span style=3D"font=
-size: 10pt; font-family: Arial, sans-serif;
color: gray;"><br>
</span></b><font color=3D"#808080"
face=3D"Arial,sans-serif"><span style=3D=
"font-size: 10pt;"></span></font></p>
<br>
</div>
</div>
<div class=3D"moz-cite-prefix">On 17-03-16 09:49, Johan Kooijman
wrote:<br>
</div>
<blockquote cite=3D"mid:CAHvs-HWDFNrQ1uXuZdXqG9_PNZdUN+OYAUU9ZuWzavHe8y=
CoKw(a)mail.gmail.com" type=3D"cite">
<div dir=3D"ltr">Hi all,
<div><br>
</div>
<div>Since we upgraded to the latest ovirt node running 7.2, we're seeing t=
hat nodes become unavailable after a while. It's running fine, with a coupl=
e of VM's on it, untill it becomes non responsive. At that moment it doesn'=
t even respond to ICMP. It'll come
back by itself after a while, but oVirt fences the machine before that tim=
e and restarts VM's elsewhere.</div>
<div><br>
</div>
<div>Engine tells me this message:</div>
<div><br>
</div>
<div>VDSM host09 command failed: Message timeout which can be caused by com=
munication issues</div>
<div><br>
</div>
<div>Is anyone else experiencing these issues with ixgbe drivers? I'm runni=
ng on Intel X540-AT2 cards.<br clear=3D"all">
<div><br>
</div>
-- <br>
<div class=3D"gmail_signature">
<div dir=3D"ltr">Met vriendelijke groeten / With kind regards,<br>
Johan Kooijman<br>
</div>
</div>
</div>
</div>
<br>
<fieldset class=3D"mimeAttachmentHeader"></fieldset> <br>
<pre wrap=3D"">_______________________________________________
Users mailing list
<a class=3D"moz-txt-link-abbreviated"
href=3D"mailto:Users@ovirt.org">Users=
@ovirt.org</a>
<a class=3D"moz-txt-link-freetext"
href=3D"http://lists.ovirt.org/mailman/l=
istinfo/users">http://lists.ovirt.org/mailman/listinfo/users</...
</pre>
</blockquote>
<br>
This message (including any attachments) may contain information that is pr=
ivileged or confidential. If you are not the intended recipient, please not=
ify the sender and delete this email immediately from your systems and dest=
roy all copies of it. You may not,
directly or indirectly, use, disclose, distribute, print or copy this emai=
l or any part of it if you are not the intended recipient
</body>
</html>
--_000_56EBB5D93000101dmcamcnetworkscom_--