On Tue, Sep 4, 2018 at 10:42 AM, Gianluca Cecchi <gianluca.cecchi(a)gmail.com>
wrote:
On Tue, Sep 4, 2018 at 9:02 AM Edward Haas <ehaas(a)redhat.com>
wrote:
> Hello Florian,
>
> Thanks for checking the patch and posting the bug.
>
> You need to restart vdsmd and supervdsmd.
> It should not affect running VM/s, but you always have a risk that
> something unexpected can happen. Perhaps try it on a host and then proceed
> with others.
>
> Thanks,
> Edy.
>
I'm having similar problem in a 3 hosts oVirt test cluster with these
notifications every day on 1Gbit adapters.
I have bond0 on em1 and em2 and then bondo.65, bond0.68, bond0.167 vlans
defined for the VMs
I get these warnings
Message:Host ov300 has network interface which exceeded the defined threshold [95%] (em1:
transmit rate[98%], receive rate [0%])
when actually I think the 3 VMs running on this host generate few MB/s of
traffic
I applied the changes to the 3 hosts.
I notice that due to dependencies it is sufficient to restart supervdsmd
and then also vdsmd will be automatically restarted, correct?
In my case for each of the 3 hosts, after restarting supervdsmd I got
messages like these, but without impacts on runnign VMs
VDSM ov300 command GetStatsAsyncVDS failed: Broken pipe 9/4/18 9:07:52 AM
Host ov300 is not responding. It will stay in Connecting state for a grace
period of 61 seconds and after that an attempt to fence the host will be
issued. 9/4/18 9:07:52 AM
No faulty multipath paths on host ov300 9/4/18 9:07:58 AM
Executing power management status on Host ov300 using Proxy Host ov200 and
Fence Agent ipmilan:10.10.193.103. 9/4/18 9:07:58 AM
Status of host ov300 was set to Up. 9/4/18 9:07:58 AM
Host ov300 power management was verified successfully. 9/4/18 9:07:58 AM
Please note that when doing on SPM host you could also get these:
VDSM ov301 command SpmStatusVDS failed: Broken pipe 9/4/18 9:10:00 AM
Host ov301 is not responding. It will stay in Connecting state for a grace
period of 81 seconds and after that an attempt to fence the host will be
issued. 9/4/18 9:10:00 AM
Invalid status on Data Center MYDC. Setting Data Center status to Non
Responsive (On host ov301, Error: Network error during communication with
the Host.). 9/4/18 9:10:00 AM
with reassignment of SPM role:
VDSM command GetStoragePoolInfoVDS failed: Heartbeat exceeded 9/4/18
9:10:12 AM
Storage Pool Manager runs on Host ov200 (Address: ov200), Data Center
MYDC. 9/4/18 9:10:14 AM
Probably safer to manually move the SPM before restarting supervdsmd on
that host.
Let's see this evening if I will get any message about thresholds.
BTW: one question. I see in the code iface.Type.NIC and now
also iface.Type.BOND. Don't you think that you should manage also the
network teaming option available in RH EL 7, as described here:
https://access.redhat.com/documentation/en-us/red_hat_
enterprise_linux/7/html/networking_guide/ch-configure_network_teaming
?
This only if it is supported to use the new network teaming implementation
in oVirt, and I'm not sure about it...
There are no immediate plans to support it in VDSM.
We are evaluating the options to change the way we interact with the host
networking, that may open the door for team and others to get in.