<div dir="ltr">Hi Jurrien,<div><br></div><div>I don&#39;t see anything in logs on the nodes itself. The only thing we see in logs are in engine log - it looses connectivity to the host.</div><div>Definitely CentOS 7.1/7.2 related. Downgraded the hosts to ovirt-iso 3.5, this resolves the issue.</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Mar 18, 2016 at 9:01 AM, Bloemen, Jurriën <span dir="ltr">&lt;<a href="mailto:Jurrien.Bloemen@dmc.amcnetworks.com" target="_blank">Jurrien.Bloemen@dmc.amcnetworks.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">



<div bgcolor="#FFFFFF" text="#000000">
<tt>Hi Johan,<br>
<br>
Could you check if you see the following in you dmesg or message log file?<br>
<br>
[1123306.014288] ------------[ cut here ]------------<br>
[1123306.014302] WARNING: at net/core/dev.c:2189 skb_warn_bad_offload+0xcd/0xda()<br>
[1123306.014306] : caps=(0x0000000200004849, 0x0000000000000000) len=330 data_len=276 gso_size=276 gso_type=1 ip_summed=1<br>
[1123306.014308] Modules linked in: vhost_net macvtap macvlan ip6table_filter ip6_tables iptable_filter ip_tables ebt_arp ebtable_nat ebtables tun scsi_transport_iscsi iTCO_wdt iTCO_vendor_support dm_service_time intel_powerclamp coretemp intel_rapl kvm_intel
 kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel cryptd pcspkr sb_edac edac_core i2c_i801 lpc_ich mfd_core mei_me mei wmi ioatdma shpchp ipmi_devintf ipmi_si ipmi_msghandler acpi_power_meter acpi_pad 8021q garp mrp bridge stp llc bonding dm_multipath
 xfs libcrc32c sd_mod crc_t10dif crct10dif_common ast syscopyarea sysfillrect sysimgblt drm_kms_helper ttm crc32c_intel igb drm ahci ixgbe i2c_algo_bit libahci libata mdio i2c_core ptp megaraid_sas pps_core dca dm_mirror dm_region_hash dm_log dm_mod<br>
[1123306.014360] CPU: 30 PID: 0 Comm: swapper/30 Tainted: G        W   --------------   3.10.0-229.1.2.el7.x86_64 #1<br>
[1123306.014362] Hardware name: Supermicro SYS-2028TP-HC1TR/X10DRT-PT, BIOS 1.1 08/03/2015<br>
[1123306.014364]  ffff881fffc439a8 5326fb90ad1041ea ffff881fffc43960 ffffffff81604afa<br>
[1123306.014371]  ffff881fffc43998 ffffffff8106e34b ffff881fcebb0500 ffff881fce88c000<br>
[1123306.014376]  0000000000000001 0000000000000001 ffff881fcebb0500 ffff881fffc43a00<br>
[1123306.014381] Call Trace:<br>
[1123306.014383]  &lt;IRQ&gt;  [&lt;ffffffff81604afa&gt;] dump_stack+0x19/0x1b<br>
[1123306.014396]  [&lt;ffffffff8106e34b&gt;] warn_slowpath_common+0x6b/0xb0<br>
[1123306.014399]  [&lt;ffffffff8106e3ec&gt;] warn_slowpath_fmt+0x5c/0x80<br>
[1123306.014405]  [&lt;ffffffff812db093&gt;] ? ___ratelimit+0x93/0x100<br>
[1123306.014409]  [&lt;ffffffff816076c3&gt;] skb_warn_bad_offload+0xcd/0xda<br>
[1123306.014425]  [&lt;ffffffff814fdeb9&gt;] __skb_gso_segment+0x79/0xb0<br>
[1123306.014429]  [&lt;ffffffff814fe1c2&gt;] dev_hard_start_xmit+0x1a2/0x580<br>
[1123306.014438]  [&lt;ffffffffa0168790&gt;] ? deliver_clone+0x50/0x50 [bridge]<br>
[1123306.014443]  [&lt;ffffffff8151df1e&gt;] sch_direct_xmit+0xee/0x1c0<br>
[1123306.014447]  [&lt;ffffffff814fe798&gt;] dev_queue_xmit+0x1f8/0x4a0<br>
[1123306.014453]  [&lt;ffffffffa016880b&gt;] br_dev_queue_push_xmit+0x7b/0xc0 [bridge]<br>
[1123306.014458]  [&lt;ffffffffa0168a22&gt;] br_forward_finish+0x22/0x60 [bridge]<br>
[1123306.014464]  [&lt;ffffffffa0168ae0&gt;] __br_forward+0x80/0xf0 [bridge]<br>
[1123306.014469]  [&lt;ffffffffa0168ebb&gt;] br_forward+0x8b/0xa0 [bridge]<br>
[1123306.014476]  [&lt;ffffffffa0169e65&gt;] br_handle_frame_finish+0x175/0x410 [bridge]<br>
[1123306.014481]  [&lt;ffffffffa016a275&gt;] br_handle_frame+0x175/0x260 [bridge]<br>
[1123306.014485]  [&lt;ffffffff814fc112&gt;] __netif_receive_skb_core+0x282/0x870<br>
[1123306.014490]  [&lt;ffffffff8101b589&gt;] ? read_tsc+0x9/0x10<br>
[1123306.014493]  [&lt;ffffffff814fc718&gt;] __netif_receive_skb+0x18/0x60<br>
[1123306.014497]  [&lt;ffffffff814fc7a0&gt;] netif_receive_skb+0x40/0xd0<br>
[1123306.014500]  [&lt;ffffffff814fd2b0&gt;] napi_gro_receive+0x80/0xb0<br>
[1123306.014512]  [&lt;ffffffffa00cde2c&gt;] ixgbe_clean_rx_irq+0x7ac/0xb30 [ixgbe]<br>
[1123306.014519]  [&lt;ffffffffa00cf07b&gt;] ixgbe_poll+0x4bb/0x930 [ixgbe]<br>
[1123306.014524]  [&lt;ffffffff814fcb62&gt;] net_rx_action+0x152/0x240<br>
[1123306.014528]  [&lt;ffffffff81077bf7&gt;] __do_softirq+0xf7/0x290<br>
[1123306.014533]  [&lt;ffffffff8161635c&gt;] call_softirq+0x1c/0x30<br>
[1123306.014539]  [&lt;ffffffff81015de5&gt;] do_softirq+0x55/0x90<br>
[1123306.014543]  [&lt;ffffffff81077f95&gt;] irq_exit+0x115/0x120<br>
[1123306.014546]  [&lt;ffffffff81616ef8&gt;] do_IRQ+0x58/0xf0<br>
[1123306.014551]  [&lt;ffffffff8160c0ed&gt;] common_interrupt+0x6d/0x6d<br>
[1123306.014553]  &lt;EOI&gt;  [&lt;ffffffff814aa6d2&gt;] ? cpuidle_enter_state+0x52/0xc0<br>
[1123306.014561]  [&lt;ffffffff814aa6c8&gt;] ? cpuidle_enter_state+0x48/0xc0<br>
[1123306.014565]  [&lt;ffffffff814aa805&gt;] cpuidle_idle_call+0xc5/0x200<br>
[1123306.014569]  [&lt;ffffffff8101d21e&gt;] arch_cpu_idle+0xe/0x30<br>
[1123306.014574]  [&lt;ffffffff810c6945&gt;] cpu_startup_entry+0xf5/0x290<br>
[1123306.014580]  [&lt;ffffffff810423ca&gt;] start_secondary+0x1ba/0x230<br>
[1123306.014582] ---[ end trace 4d5a1bc838e1fcc0 ]---<br>
<br>
If so, then could you try the following:<br>
<br>
ethtool -K &lt;nic name&gt; lro off<br>
<br>
Do this for all the 10G intel nics and check if the problems still exists <br>
<br>
<br>
</tt>
<div>

<div style="color:rgb(0,0,0)">
<p class="MsoNormal" style="font-size:14px;font-family:Calibri,sans-serif;margin:0cm 0cm 0.0001pt">
<b><font color="#2c8cb6" face="Arial,sans-serif"><span style="font-size:10pt">K</span><span style="font-size:13px">i</span><span style="font-size:10pt">nd regards,</span></font></b></p>
<p class="MsoNormal" style="font-size:11pt;font-family:Calibri,sans-serif;margin:0cm 0cm 0.0001pt">
<b><span style="font-size:10pt;font-family:Arial,sans-serif;color:rgb(44,140,182)"> </span></b></p>
<p class="MsoNormal" style="font-size:14px;font-family:Calibri,sans-serif;margin:0cm 0cm 0.0001pt">
<b style="font-size:11pt"><span style="font-size:10pt;font-family:Arial,sans-serif;color:rgb(44,140,182)">Jurriën Bloemen</span></b><b style="font-size:11pt"><span style="font-size:10pt;font-family:Arial,sans-serif;color:gray"><br>
</span></b><font color="#808080" face="Arial,sans-serif"><span style="font-size:10pt"></span></font></p>
<br>
</div>
</div><div><div class="h5">
<div>On 17-03-16 09:49, Johan Kooijman wrote:<br>
</div>
</div></div><blockquote type="cite"><div><div class="h5">
<div dir="ltr">Hi all,
<div><br>
</div>
<div>Since we upgraded to the latest ovirt node running 7.2, we&#39;re seeing that nodes become unavailable after a while. It&#39;s running fine, with a couple of VM&#39;s on it, untill it becomes non responsive. At that moment it doesn&#39;t even respond to ICMP. It&#39;ll come
 back by itself after a while, but oVirt fences the machine before that time and restarts VM&#39;s elsewhere.</div>
<div><br>
</div>
<div>Engine tells me this message:</div>
<div><br>
</div>
<div>VDSM host09 command failed: Message timeout which can be caused by communication issues</div>
<div><br>
</div>
<div>Is anyone else experiencing these issues with ixgbe drivers? I&#39;m running on Intel X540-AT2 cards.<br clear="all">
<div><br>
</div>
-- <br>
<div>
<div dir="ltr">Met vriendelijke groeten / With kind regards,<br>
Johan Kooijman<br>
</div>
</div>
</div>
</div>
<br>
<fieldset></fieldset> <br>
</div></div><span class=""><pre>_______________________________________________
Users mailing list
<a href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a>
<a href="http://lists.ovirt.org/mailman/listinfo/users" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a>
</pre>
</span></blockquote>
<br>
This message (including any attachments) may contain information that is privileged or confidential. If you are not the intended recipient, please notify the sender and delete this email immediately from your systems and destroy all copies of it. You may not,
 directly or indirectly, use, disclose, distribute, print or copy this email or any part of it if you are not the intended recipient
</div>

<br>_______________________________________________<br>
Users mailing list<br>
<a href="mailto:Users@ovirt.org">Users@ovirt.org</a><br>
<a href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br>
<br></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature"><div dir="ltr">Met vriendelijke groeten / With kind regards,<br>Johan Kooijman<br></div></div>
</div>