Falsely detected network device status

Wednesday, 1 October 2014

Hi All,

I've encountered a network device that is being falsely detected as down. 
On adding a new network to a 3.4 hosted engine, one of the two hosts is displaying a
component interface as down with the webadmin console.
Ethtool confirms the link status as up and connectivity through the corresponding
interfaces of the network has been proven.
Shortly after adding the network I noted that the node with the falsely detected iface
status, was set Non-Operational.
All running VM's evacuated.

The engine.log showing...
INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(DefaultQuartzScheduler_Worker-33) [2daa7788] Correlation ID: null, Call Stack: null,
Custom Event ID: -1, Message: Host hosted_engine_2 moved to Non-Operational state because
interfaces 'eth1' are down but are needed by networks 'EVD_DMZ' in the
current cluster

2 other interfaces for ovirtmgmt and another VM network on this same host have long
functioned perfectly well.

When assigning the network to the interface I see the following in messages...

Oct  1 20:57:39 lx004 kernel: ADDRCONF(NETDEV_UP): eth1: link is not ready
Oct  1 20:57:39 lx004 kernel: 8021q: adding VLAN 0 to HW filter on device eth1
Oct  1 20:57:39 lx004 kernel: device eth1 entered promiscuous mode
Oct  1 20:57:40 lx004 kernel: igb: eth1 NIC Link is Up 100 Mbps Full Duplex, Flow Control:
RX
Oct  1 20:57:40 lx004 kernel: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
Oct  1 20:57:40 lx004 kernel: EVD_DMZ: port 1(eth1) entering forwarding state

Looks as it should to me, brctl also shows the bridge is as it should be.

# brctl show
bridge name	bridge id		STP enabled	interfaces
;vdsmdummy;		8000.000000000000	no		
EVD_DMZ		8000.0025901abc49	no		eth1
PXE		8000.0007e9a5a197	no		eth3
ovirtmgmt		8000.0025901abc48	no		eth0

Not able to see anything relevant in vdsm.log

vdsClient -s 0 getVdsStats 
does not list the new network, but instead lists an older deleted one.

Has any one encountered an event like this or have any clues on how this might be
failing?

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011