On Tue, Sep 4, 2018 at 10:42 AM, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
On Tue, Sep 4, 2018 at 9:02 AM Edward Haas <ehaas@redhat.com> wrote:
Hello Florian,

Thanks for checking the patch and posting the bug.

You need to restart vdsmd and supervdsmd.
It should not affect running VM/s, but you always have a risk that something unexpected can happen. Perhaps try it on a host and then proceed with others.

Thanks,
Edy.

I'm having similar problem in a 3 hosts oVirt test cluster with these notifications every day on 1Gbit adapters.
I have bond0 on em1 and em2 and then bondo.65, bond0.68, bond0.167 vlans defined for the VMs
I get these warnings 
Message:Host ov300 has network interface which exceeded the defined threshold [95%] (em1: transmit rate[98%], receive rate [0%])
when actually I think the 3 VMs running on this host generate few MB/s of traffic
I applied the changes to the 3 hosts. 

I notice that due to dependencies it is sufficient to restart supervdsmd and then also vdsmd will be automatically restarted, correct?

In my case for each of the 3 hosts, after restarting supervdsmd I got messages like these, but without impacts on runnign VMs

VDSM ov300 command GetStatsAsyncVDS failed: Broken pipe 9/4/18 9:07:52 AM
Host ov300 is not responding. It will stay in Connecting state for a grace period of 61 seconds and after that an attempt to fence the host will be issued. 9/4/18 9:07:52 AM
No faulty multipath paths on host ov300 9/4/18 9:07:58 AM
Executing power management status on Host ov300 using Proxy Host ov200 and Fence Agent ipmilan:10.10.193.103. 9/4/18 9:07:58 AM
Status of host ov300 was set to Up. 9/4/18 9:07:58 AM
Host ov300 power management was verified successfully. 9/4/18 9:07:58 AM

Please note that when doing on SPM host you could also get these:

VDSM ov301 command SpmStatusVDS failed: Broken pipe 9/4/18 9:10:00 AM
Host ov301 is not responding. It will stay in Connecting state for a grace period of 81 seconds and after that an attempt to fence the host will be issued. 9/4/18 9:10:00 AM
Invalid status on Data Center MYDC. Setting Data Center status to Non Responsive (On host ov301, Error: Network error during communication with the Host.). 9/4/18 9:10:00 AM

with reassignment of SPM role:
VDSM command GetStoragePoolInfoVDS failed: Heartbeat exceeded 9/4/18 9:10:12 AM
Storage Pool Manager runs on Host ov200 (Address: ov200), Data Center MYDC. 9/4/18 9:10:14 AM

Probably safer to manually move the SPM before restarting supervdsmd on that host.

Let's see this evening if I will get any message about thresholds.

BTW: one question. I see in the code iface.Type.NIC and now also iface.Type.BOND. Don't you think that you should manage also the network teaming option available in RH EL 7, as described here:
This only if it is supported to use the new network teaming implementation in oVirt, and I'm not sure about it...

There are no immediate plans to support it in VDSM.
We are evaluating the options to change the way we interact with the host networking, that may open the door for team and others to get in.


Thanks,
Gianluca