
--_000_56EBB5D93000101dmcamcnetworkscom_ Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable Hi Johan, Could you check if you see the following in you dmesg or message log file? [1123306.014288] ------------[ cut here ]------------ [1123306.014302] WARNING: at net/core/dev.c:2189 skb_warn_bad_offload+0xcd/= 0xda() [1123306.014306] : caps=3D(0x0000000200004849, 0x0000000000000000) len=3D33= 0 data_len=3D276 gso_size=3D276 gso_type=3D1 ip_summed=3D1 [1123306.014308] Modules linked in: vhost_net macvtap macvlan ip6table_filt= er ip6_tables iptable_filter ip_tables ebt_arp ebtable_nat ebtables tun scs= i_transport_iscsi iTCO_wdt iTCO_vendor_support dm_service_time intel_powerc= lamp coretemp intel_rapl kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_= clmulni_intel cryptd pcspkr sb_edac edac_core i2c_i801 lpc_ich mfd_core mei= _me mei wmi ioatdma shpchp ipmi_devintf ipmi_si ipmi_msghandler acpi_power_= meter acpi_pad 8021q garp mrp bridge stp llc bonding dm_multipath xfs libcr= c32c sd_mod crc_t10dif crct10dif_common ast syscopyarea sysfillrect sysimgb= lt drm_kms_helper ttm crc32c_intel igb drm ahci ixgbe i2c_algo_bit libahci = libata mdio i2c_core ptp megaraid_sas pps_core dca dm_mirror dm_region_hash= dm_log dm_mod [1123306.014360] CPU: 30 PID: 0 Comm: swapper/30 Tainted: G W ----= ---------- 3.10.0-229.1.2.el7.x86_64 #1 [1123306.014362] Hardware name: Supermicro SYS-2028TP-HC1TR/X10DRT-PT, BIOS= 1.1 08/03/2015 [1123306.014364] ffff881fffc439a8 5326fb90ad1041ea ffff881fffc43960 ffffff= ff81604afa [1123306.014371] ffff881fffc43998 ffffffff8106e34b ffff881fcebb0500 ffff88= 1fce88c000 [1123306.014376] 0000000000000001 0000000000000001 ffff881fcebb0500 ffff88= 1fffc43a00 [1123306.014381] Call Trace: [1123306.014383] <IRQ> [<ffffffff81604afa>] dump_stack+0x19/0x1b [1123306.014396] [<ffffffff8106e34b>] warn_slowpath_common+0x6b/0xb0 [1123306.014399] [<ffffffff8106e3ec>] warn_slowpath_fmt+0x5c/0x80 [1123306.014405] [<ffffffff812db093>] ? ___ratelimit+0x93/0x100 [1123306.014409] [<ffffffff816076c3>] skb_warn_bad_offload+0xcd/0xda [1123306.014425] [<ffffffff814fdeb9>] __skb_gso_segment+0x79/0xb0 [1123306.014429] [<ffffffff814fe1c2>] dev_hard_start_xmit+0x1a2/0x580 [1123306.014438] [<ffffffffa0168790>] ? deliver_clone+0x50/0x50 [bridge] [1123306.014443] [<ffffffff8151df1e>] sch_direct_xmit+0xee/0x1c0 [1123306.014447] [<ffffffff814fe798>] dev_queue_xmit+0x1f8/0x4a0 [1123306.014453] [<ffffffffa016880b>] br_dev_queue_push_xmit+0x7b/0xc0 [br= idge] [1123306.014458] [<ffffffffa0168a22>] br_forward_finish+0x22/0x60 [bridge] [1123306.014464] [<ffffffffa0168ae0>] __br_forward+0x80/0xf0 [bridge] [1123306.014469] [<ffffffffa0168ebb>] br_forward+0x8b/0xa0 [bridge] [1123306.014476] [<ffffffffa0169e65>] br_handle_frame_finish+0x175/0x410 [= bridge] [1123306.014481] [<ffffffffa016a275>] br_handle_frame+0x175/0x260 [bridge] [1123306.014485] [<ffffffff814fc112>] __netif_receive_skb_core+0x282/0x870 [1123306.014490] [<ffffffff8101b589>] ? read_tsc+0x9/0x10 [1123306.014493] [<ffffffff814fc718>] __netif_receive_skb+0x18/0x60 [1123306.014497] [<ffffffff814fc7a0>] netif_receive_skb+0x40/0xd0 [1123306.014500] [<ffffffff814fd2b0>] napi_gro_receive+0x80/0xb0 [1123306.014512] [<ffffffffa00cde2c>] ixgbe_clean_rx_irq+0x7ac/0xb30 [ixgb= e] [1123306.014519] [<ffffffffa00cf07b>] ixgbe_poll+0x4bb/0x930 [ixgbe] [1123306.014524] [<ffffffff814fcb62>] net_rx_action+0x152/0x240 [1123306.014528] [<ffffffff81077bf7>] __do_softirq+0xf7/0x290 [1123306.014533] [<ffffffff8161635c>] call_softirq+0x1c/0x30 [1123306.014539] [<ffffffff81015de5>] do_softirq+0x55/0x90 [1123306.014543] [<ffffffff81077f95>] irq_exit+0x115/0x120 [1123306.014546] [<ffffffff81616ef8>] do_IRQ+0x58/0xf0 [1123306.014551] [<ffffffff8160c0ed>] common_interrupt+0x6d/0x6d [1123306.014553] <EOI> [<ffffffff814aa6d2>] ? cpuidle_enter_state+0x52/0x= c0 [1123306.014561] [<ffffffff814aa6c8>] ? cpuidle_enter_state+0x48/0xc0 [1123306.014565] [<ffffffff814aa805>] cpuidle_idle_call+0xc5/0x200 [1123306.014569] [<ffffffff8101d21e>] arch_cpu_idle+0xe/0x30 [1123306.014574] [<ffffffff810c6945>] cpu_startup_entry+0xf5/0x290 [1123306.014580] [<ffffffff810423ca>] start_secondary+0x1ba/0x230 [1123306.014582] ---[ end trace 4d5a1bc838e1fcc0 ]--- If so, then could you try the following: ethtool -K <nic name> lro off Do this for all the 10G intel nics and check if the problems still exists Kind regards, Jurri=EBn Bloemen On 17-03-16 09:49, Johan Kooijman wrote: Hi all, Since we upgraded to the latest ovirt node running 7.2, we're seeing that n= odes become unavailable after a while. It's running fine, with a couple of = VM's on it, untill it becomes non responsive. At that moment it doesn't eve= n respond to ICMP. It'll come back by itself after a while, but oVirt fence= s the machine before that time and restarts VM's elsewhere. Engine tells me this message: VDSM host09 command failed: Message timeout which can be caused by communic= ation issues Is anyone else experiencing these issues with ixgbe drivers? I'm running on= Intel X540-AT2 cards. -- Met vriendelijke groeten / With kind regards, Johan Kooijman _______________________________________________ Users mailing list Users@ovirt.org<mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users This message (including any attachments) may contain information that is pr= ivileged or confidential. If you are not the intended recipient, please not= ify the sender and delete this email immediately from your systems and dest= roy all copies of it. You may not, directly or indirectly, use, disclose, d= istribute, print or copy this email or any part of it if you are not the in= tended recipient --_000_56EBB5D93000101dmcamcnetworkscom_ Content-Type: text/html; charset="Windows-1252" Content-ID: <69C5549CA6C47E439841A1DD44699816@chellomedia.com> Content-Transfer-Encoding: quoted-printable <html> <head> <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3DWindows-1= 252"> </head> <body bgcolor=3D"#FFFFFF" text=3D"#000000"> <tt>Hi Johan,<br> <br> Could you check if you see the following in you dmesg or message log file?<= br> <br> [1123306.014288] ------------[ cut here ]------------<br> [1123306.014302] WARNING: at net/core/dev.c:2189 skb_warn_bad_offload+0= xcd/0xda()<br> [1123306.014306] : caps=3D(0x0000000200004849, 0x0000000000000000) len=3D33= 0 data_len=3D276 gso_size=3D276 gso_type=3D1 ip_summed=3D1<br> [1123306.014308] Modules linked in: vhost_net macvtap macvlan ip6table_filt= er ip6_tables iptable_filter ip_tables ebt_arp ebtable_nat ebtables tun scs= i_transport_iscsi iTCO_wdt iTCO_vendor_support dm_service_time intel_powerc= lamp coretemp intel_rapl kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel cryptd pcspkr sb_eda= c edac_core i2c_i801 lpc_ich mfd_core mei_me mei wmi ioatdma shpchp ipmi_de= vintf ipmi_si ipmi_msghandler acpi_power_meter acpi_pad 8021q garp mrp brid= ge stp llc bonding dm_multipath xfs libcrc32c sd_mod crc_t10dif crct10dif_common ast syscopyarea sysfillre= ct sysimgblt drm_kms_helper ttm crc32c_intel igb drm ahci ixgbe i2c_algo_bi= t libahci libata mdio i2c_core ptp megaraid_sas pps_core dca dm_mirror dm_r= egion_hash dm_log dm_mod<br> [1123306.014360] CPU: 30 PID: 0 Comm: swapper/30 Tainted: G &nbs= p; W -------------- 3.10.0-= 229.1.2.el7.x86_64 #1<br> [1123306.014362] Hardware name: Supermicro SYS-2028TP-HC1TR/X10DRT-PT, BIOS= 1.1 08/03/2015<br> [1123306.014364] ffff881fffc439a8 5326fb90ad1041ea ffff881fffc43960 f= fffffff81604afa<br> [1123306.014371] ffff881fffc43998 ffffffff8106e34b ffff881fcebb0500 f= fff881fce88c000<br> [1123306.014376] 0000000000000001 0000000000000001 ffff881fcebb0500 f= fff881fffc43a00<br> [1123306.014381] Call Trace:<br> [1123306.014383] <IRQ> [<ffffffff81604afa>] dump_st= ack+0x19/0x1b<br> [1123306.014396] [<ffffffff8106e34b>] warn_slowpath_common+= 0x6b/0xb0<br> [1123306.014399] [<ffffffff8106e3ec>] warn_slowpath_fmt+0x5= c/0x80<br> [1123306.014405] [<ffffffff812db093>] ? ___ratelimit+0x93/0= x100<br> [1123306.014409] [<ffffffff816076c3>] skb_warn_bad_offload+= 0xcd/0xda<br> [1123306.014425] [<ffffffff814fdeb9>] __skb_gso_segment+0x7= 9/0xb0<br> [1123306.014429] [<ffffffff814fe1c2>] dev_hard_start_xmit+0= x1a2/0x580<br> [1123306.014438] [<ffffffffa0168790>] ? deliver_clone+0x50/= 0x50 [bridge]<br> [1123306.014443] [<ffffffff8151df1e>] sch_direct_xmit+0xee/= 0x1c0<br> [1123306.014447] [<ffffffff814fe798>] dev_queue_xmit+0x1f8/= 0x4a0<br> [1123306.014453] [<ffffffffa016880b>] br_dev_queue_push_xmit= 3;0x7b/0xc0 [bridge]<br> [1123306.014458] [<ffffffffa0168a22>] br_forward_finish+0x2= 2/0x60 [bridge]<br> [1123306.014464] [<ffffffffa0168ae0>] __br_forward+0x80/0xf= 0 [bridge]<br> [1123306.014469] [<ffffffffa0168ebb>] br_forward+0x8b/0xa0 = [bridge]<br> [1123306.014476] [<ffffffffa0169e65>] br_handle_frame_finish= 3;0x175/0x410 [bridge]<br> [1123306.014481] [<ffffffffa016a275>] br_handle_frame+0x175= /0x260 [bridge]<br> [1123306.014485] [<ffffffff814fc112>] __netif_receive_skb_core&= #43;0x282/0x870<br> [1123306.014490] [<ffffffff8101b589>] ? read_tsc+0x9/0x10<b= r> [1123306.014493] [<ffffffff814fc718>] __netif_receive_skb+0= x18/0x60<br> [1123306.014497] [<ffffffff814fc7a0>] netif_receive_skb+0x4= 0/0xd0<br> [1123306.014500] [<ffffffff814fd2b0>] napi_gro_receive+0x80= /0xb0<br> [1123306.014512] [<ffffffffa00cde2c>] ixgbe_clean_rx_irq+0x= 7ac/0xb30 [ixgbe]<br> [1123306.014519] [<ffffffffa00cf07b>] ixgbe_poll+0x4bb/0x93= 0 [ixgbe]<br> [1123306.014524] [<ffffffff814fcb62>] net_rx_action+0x152/0= x240<br> [1123306.014528] [<ffffffff81077bf7>] __do_softirq+0xf7/0x2= 90<br> [1123306.014533] [<ffffffff8161635c>] call_softirq+0x1c/0x3= 0<br> [1123306.014539] [<ffffffff81015de5>] do_softirq+0x55/0x90<= br> [1123306.014543] [<ffffffff81077f95>] irq_exit+0x115/0x120<= br> [1123306.014546] [<ffffffff81616ef8>] do_IRQ+0x58/0xf0<br> [1123306.014551] [<ffffffff8160c0ed>] common_interrupt+0x6d= /0x6d<br> [1123306.014553] <EOI> [<ffffffff814aa6d2>] ? cpuid= le_enter_state+0x52/0xc0<br> [1123306.014561] [<ffffffff814aa6c8>] ? cpuidle_enter_state+= ;0x48/0xc0<br> [1123306.014565] [<ffffffff814aa805>] cpuidle_idle_call+0xc= 5/0x200<br> [1123306.014569] [<ffffffff8101d21e>] arch_cpu_idle+0xe/0x3= 0<br> [1123306.014574] [<ffffffff810c6945>] cpu_startup_entry+0xf= 5/0x290<br> [1123306.014580] [<ffffffff810423ca>] start_secondary+0x1ba= /0x230<br> [1123306.014582] ---[ end trace 4d5a1bc838e1fcc0 ]---<br> <br> If so, then could you try the following:<br> <br> ethtool -K <nic name> lro off<br> <br> Do this for all the 10G intel nics and check if the problems still exists <= br> <br> <br> </tt> <div class=3D"moz-signature"> <title></title> <div style=3D"color: rgb(0, 0, 0);"> <p class=3D"MsoNormal" style=3D"font-size: 14px; font-family: Calibri, sans-serif; margin: 0cm 0cm 0.0001pt;"> <b><font color=3D"#2c8cb6" face=3D"Arial,sans-serif"><span style=3D"font-si= ze: 10pt;">K</span><span style=3D"font-size: 13px;">i</span><span style=3D"font-size: 10pt;">nd regards,= </span></font></b></p> <p class=3D"MsoNormal" style=3D"font-size: 11pt; font-family: Calibri, sans-serif; margin: 0cm 0cm 0.0001pt;"> <b><span style=3D"font-size: 10pt; font-family: Arial, sans-serif; color: rgb(44, 140, 182);"> </span></b></p> <p class=3D"MsoNormal" style=3D"font-size: 14px; font-family: Calibri, sans-serif; margin: 0cm 0cm 0.0001pt;"> <b style=3D"font-size: 11pt;"><span style=3D"font-size: 10pt; font-family: Arial, sans-serif; color: rgb(44, 140, 182);">Ju= rri=EBn Bloemen</span></b><b style=3D"font-size: 11pt;"><span style=3D"font= -size: 10pt; font-family: Arial, sans-serif; color: gray;"><br> </span></b><font color=3D"#808080" face=3D"Arial,sans-serif"><span style=3D= "font-size: 10pt;"></span></font></p> <br> </div> </div> <div class=3D"moz-cite-prefix">On 17-03-16 09:49, Johan Kooijman wrote:<br> </div> <blockquote cite=3D"mid:CAHvs-HWDFNrQ1uXuZdXqG9_PNZdUN+OYAUU9ZuWzavHe8y= CoKw@mail.gmail.com" type=3D"cite"> <div dir=3D"ltr">Hi all, <div><br> </div> <div>Since we upgraded to the latest ovirt node running 7.2, we're seeing t= hat nodes become unavailable after a while. It's running fine, with a coupl= e of VM's on it, untill it becomes non responsive. At that moment it doesn'= t even respond to ICMP. It'll come back by itself after a while, but oVirt fences the machine before that tim= e and restarts VM's elsewhere.</div> <div><br> </div> <div>Engine tells me this message:</div> <div><br> </div> <div>VDSM host09 command failed: Message timeout which can be caused by com= munication issues</div> <div><br> </div> <div>Is anyone else experiencing these issues with ixgbe drivers? I'm runni= ng on Intel X540-AT2 cards.<br clear=3D"all"> <div><br> </div> -- <br> <div class=3D"gmail_signature"> <div dir=3D"ltr">Met vriendelijke groeten / With kind regards,<br> Johan Kooijman<br> </div> </div> </div> </div> <br> <fieldset class=3D"mimeAttachmentHeader"></fieldset> <br> <pre wrap=3D"">_______________________________________________ Users mailing list <a class=3D"moz-txt-link-abbreviated" href=3D"mailto:Users@ovirt.org">Users= @ovirt.org</a> <a class=3D"moz-txt-link-freetext" href=3D"http://lists.ovirt.org/mailman/l= istinfo/users">http://lists.ovirt.org/mailman/listinfo/users</a> </pre> </blockquote> <br> This message (including any attachments) may contain information that is pr= ivileged or confidential. If you are not the intended recipient, please not= ify the sender and delete this email immediately from your systems and dest= roy all copies of it. You may not, directly or indirectly, use, disclose, distribute, print or copy this emai= l or any part of it if you are not the intended recipient </body> </html> --_000_56EBB5D93000101dmcamcnetworkscom_--