Wrong network threshold limit warnings on 4.2.5

Good morning, since we have upgraded to version 4.2.5, we get a lot of warnings about network interface exceeded defined threshold limits. For example: Aug 31, 2018, 7:54:05 AM Host xxx has network interface which exceeded the defined threshold [95%] (enp9s0.80: transmit rate[100%], receive rate [12%]) This is a 10 Gbit interface and on our monitoring software, which is getting network statistics every 10s, the bandwidth of TX was 150 Mbit maximum at this time, so far away from being 100%. Could it be, that the engine detected the wrong interface speed or there is a calculation error? In the engine for this host, I have 10000 Mbps for all interfaces. I have checked now all those warnings on our different hosts and they happen every time, we go over 100 Mbit and this is for sure quite often... Can I maybe disable these warnings, because we have it anyway in our monitoring software? If you need any logs, please ask. BR Florian Schmid

I've been seeing these warnings myself, on 1Gb ovirtmanagement (glusterFS is 10Gbe backend). I haven't correlated to network graphs yet but I don't know what would be happening on my management network that would be exhausting 1Gb network. On Fri, Aug 31, 2018 at 3:27 AM Florian Schmid <fschmid@ubimet.com> wrote:
Good morning,
since we have upgraded to version 4.2.5, we get a lot of warnings about network interface exceeded defined threshold limits.
For example: Aug 31, 2018, 7:54:05 AM Host xxx has network interface which exceeded the defined threshold [95%] (enp9s0.80: transmit rate[100%], receive rate [12%])
This is a 10 Gbit interface and on our monitoring software, which is getting network statistics every 10s, the bandwidth of TX was 150 Mbit maximum at this time, so far away from being 100%.
Could it be, that the engine detected the wrong interface speed or there is a calculation error? In the engine for this host, I have 10000 Mbps for all interfaces.
I have checked now all those warnings on our different hosts and they happen every time, we go over 100 Mbit and this is for sure quite often...
Can I maybe disable these warnings, because we have it anyway in our monitoring software?
If you need any logs, please ask.
BR Florian Schmid _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/2NFL3O66IN4Z6H...

If you manage to recreate this, please collect a few samples from what the hypervisor reports back: Run the command: vdsm-client Host getStats Engine is calculating based on this information the rate. (and the agent collects it from /sys/class/net/<device>/statistics/) Please also mention on what OS you are running the hosts. Thanks, Edy On Fri, Aug 31, 2018 at 5:35 PM, Jayme <jaymef@gmail.com> wrote:
I've been seeing these warnings myself, on 1Gb ovirtmanagement (glusterFS is 10Gbe backend). I haven't correlated to network graphs yet but I don't know what would be happening on my management network that would be exhausting 1Gb network.
On Fri, Aug 31, 2018 at 3:27 AM Florian Schmid <fschmid@ubimet.com> wrote:
Good morning,
since we have upgraded to version 4.2.5, we get a lot of warnings about network interface exceeded defined threshold limits.
For example: Aug 31, 2018, 7:54:05 AM Host xxx has network interface which exceeded the defined threshold [95%] (enp9s0.80: transmit rate[100%], receive rate [12%])
This is a 10 Gbit interface and on our monitoring software, which is getting network statistics every 10s, the bandwidth of TX was 150 Mbit maximum at this time, so far away from being 100%.
Could it be, that the engine detected the wrong interface speed or there is a calculation error? In the engine for this host, I have 10000 Mbps for all interfaces.
I have checked now all those warnings on our different hosts and they happen every time, we go over 100 Mbit and this is for sure quite often...
Can I maybe disable these warnings, because we have it anyway in our monitoring software?
If you need any logs, please ask.
BR Florian Schmid _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community- guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/ message/2NFL3O66IN4Z6HUK45WQFXRUBMQDUY7P/
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community- guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/ message/3UPGCAKVDJDPNSODTO6PWXOD2ETT63N6/

Hi Edward, I got some alarms today from a server and I have checked your command there. (not at the time the issue happened!!) Hosts are on latest patch level CentOS 7.5 and oVirt 4.2.5 Example: cat /sys/class/net/enp9s0/speed 10000 cat /sys/class/net/enp9s0.80/speed 10000 cat /sys/class/net/vm-int-nfs/speed cat: /sys/class/net/vm-int-nfs/speed: invalid argument <- this is the bridge for the VMs vdsm-client Host getStats -> ... "enp9s0": { "rxErrors": "0", "name": "enp9s0", "tx": "3335325754762", "txDropped": "0", "sampleTime": 1535970960.602359, "rx": "5916567956502", "txErrors": "0", "state": "up", "speed": "10000", "rxDropped": "0" }, ... "enp9s0.80": { "rxErrors": "0", "name": "enp9s0.80", "tx": "3180024039398", "txDropped": "0", "sampleTime": 1535970960.602359, "rx": "5669421065686", "txErrors": "0", "state": "up", "speed": "1000", "rxDropped": "0" }, ... "vm-int-nfs": { "rxErrors": "0", "name": "vm-int-nfs", "tx": "508", "txDropped": "0", "sampleTime": 1535970960.602359, "rx": "4428568", "txErrors": "0", "state": "up", "speed": "1000", "rxDropped": "0" }, ... As you see here, vdsm is reporting the wrong speed for the vlan devices. BR Florian Schmid Von: "Edward Haas" <ehaas@redhat.com> An: "Jayme" <jaymef@gmail.com>, "Florian Schmid" <fschmid@ubimet.com> CC: "users" <users@ovirt.org>, "Alona Kaplan" <alkaplan@redhat.com> Gesendet: Montag, 3. September 2018 11:38:25 Betreff: Re: [ovirt-users] Re: Wrong network threshold limit warnings on 4.2.5 If you manage to recreate this, please collect a few samples from what the hypervisor reports back: Run the command: vdsm-client Host getStats Engine is calculating based on this information the rate. (and the agent collects it from /sys/class/net/<device>/statistics/) Please also mention on what OS you are running the hosts. Thanks, Edy On Fri, Aug 31, 2018 at 5:35 PM, Jayme < [ mailto:jaymef@gmail.com | jaymef@gmail.com ] > wrote: I've been seeing these warnings myself, on 1Gb ovirtmanagement (glusterFS is 10Gbe backend). I haven't correlated to network graphs yet but I don't know what would be happening on my management network that would be exhausting 1Gb network. On Fri, Aug 31, 2018 at 3:27 AM Florian Schmid < [ mailto:fschmid@ubimet.com | fschmid@ubimet.com ] > wrote: BQ_BEGIN Good morning, since we have upgraded to version 4.2.5, we get a lot of warnings about network interface exceeded defined threshold limits. For example: Aug 31, 2018, 7:54:05 AM Host xxx has network interface which exceeded the defined threshold [95%] (enp9s0.80: transmit rate[100%], receive rate [12%]) This is a 10 Gbit interface and on our monitoring software, which is getting network statistics every 10s, the bandwidth of TX was 150 Mbit maximum at this time, so far away from being 100%. Could it be, that the engine detected the wrong interface speed or there is a calculation error? In the engine for this host, I have 10000 Mbps for all interfaces. I have checked now all those warnings on our different hosts and they happen every time, we go over 100 Mbit and this is for sure quite often... Can I maybe disable these warnings, because we have it anyway in our monitoring software? If you need any logs, please ask. BR Florian Schmid _______________________________________________ Users mailing list -- [ mailto:users@ovirt.org | users@ovirt.org ] To unsubscribe send an email to [ mailto:users-leave@ovirt.org | users-leave@ovirt.org ] Privacy Statement: [ https://www.ovirt.org/site/privacy-policy/ | https://www.ovirt.org/site/privacy-policy/ ] oVirt Code of Conduct: [ https://www.ovirt.org/community/about/community-guidelines/ | https://www.ovirt.org/community/about/community-guidelines/ ] List Archives: [ https://lists.ovirt.org/archives/list/users@ovirt.org/message/2NFL3O66IN4Z6H... | https://lists.ovirt.org/archives/list/users@ovirt.org/message/2NFL3O66IN4Z6H... ] _______________________________________________ Users mailing list -- [ mailto:users@ovirt.org | users@ovirt.org ] To unsubscribe send an email to [ mailto:users-leave@ovirt.org | users-leave@ovirt.org ] Privacy Statement: [ https://www.ovirt.org/site/privacy-policy/ | https://www.ovirt.org/site/privacy-policy/ ] oVirt Code of Conduct: [ https://www.ovirt.org/community/about/community-guidelines/ | https://www.ovirt.org/community/about/community-guidelines/ ] List Archives: [ https://lists.ovirt.org/archives/list/users@ovirt.org/message/3UPGCAKVDJDPNS... | https://lists.ovirt.org/archives/list/users@ovirt.org/message/3UPGCAKVDJDPNS... ] BQ_END

Hello Edward, I am also seeing this problem, it's on our ovirtmgmt. cat /sys/class/net/eno49/speed 10000 cat /sys/class/net/eno49.20/speed 10000 cat /sys/class/net/ovirtmgmt/speed cat: /sys/class/net/ovirtmgmt/speed: Invalid argument vdsm-client Host getStats -> ... "eno49": { "rxErrors": "0", "name": "eno49", "tx": "3456777", "txDropped": "0", "sampleTime": 1535974190.687987, "rx": "121362321", "txErrors": "0", "state": "up", "speed": "10000", "rxDropped": "2" }, "eno49.20": { "rxErrors": "0", "name": "eno49.20", "tx": "3384452", "txDropped": "0", "sampleTime": 1535974190.687987, "rx": "115884579", "txErrors": "0", "state": "up", "speed": "1000", "rxDropped": "0" }, "ovirtmgmt": { "rxErrors": "0", "name": "ovirtmgmt", "tx": "3383804", "txDropped": "0", "sampleTime": 1535974190.687987, "rx": "115710919", "txErrors": "0", "state": "up", "speed": "1000", "rxDropped": "0" }, Regards, Paul S. ________________________________ From: Florian Schmid <fschmid@ubimet.com> Sent: 03 September 2018 11:44 To: edwardh@redhat.com Cc: users Subject: [ovirt-users] Re: Wrong network threshold limit warnings on 4.2.5 Hi Edward, I got some alarms today from a server and I have checked your command there. (not at the time the issue happened!!) Hosts are on latest patch level CentOS 7.5 and oVirt 4.2.5 Example: cat /sys/class/net/enp9s0/speed 10000 cat /sys/class/net/enp9s0.80/speed 10000 cat /sys/class/net/vm-int-nfs/speed cat: /sys/class/net/vm-int-nfs/speed: invalid argument <- this is the bridge for the VMs vdsm-client Host getStats -> ... "enp9s0": { "rxErrors": "0", "name": "enp9s0", "tx": "3335325754762", "txDropped": "0", "sampleTime": 1535970960.602359, "rx": "5916567956502", "txErrors": "0", "state": "up", "speed": "10000", "rxDropped": "0" }, ... "enp9s0.80": { "rxErrors": "0", "name": "enp9s0.80", "tx": "3180024039398", "txDropped": "0", "sampleTime": 1535970960.602359, "rx": "5669421065686", "txErrors": "0", "state": "up", "speed": "1000", "rxDropped": "0" }, ... "vm-int-nfs": { "rxErrors": "0", "name": "vm-int-nfs", "tx": "508", "txDropped": "0", "sampleTime": 1535970960.602359, "rx": "4428568", "txErrors": "0", "state": "up", "speed": "1000", "rxDropped": "0" }, ... As you see here, vdsm is reporting the wrong speed for the vlan devices. BR Florian Schmid ________________________________ Von: "Edward Haas" <ehaas@redhat.com> An: "Jayme" <jaymef@gmail.com>, "Florian Schmid" <fschmid@ubimet.com> CC: "users" <users@ovirt.org>, "Alona Kaplan" <alkaplan@redhat.com> Gesendet: Montag, 3. September 2018 11:38:25 Betreff: Re: [ovirt-users] Re: Wrong network threshold limit warnings on 4.2.5 If you manage to recreate this, please collect a few samples from what the hypervisor reports back: Run the command: vdsm-client Host getStats Engine is calculating based on this information the rate. (and the agent collects it from /sys/class/net/<device>/statistics/) Please also mention on what OS you are running the hosts. Thanks, Edy On Fri, Aug 31, 2018 at 5:35 PM, Jayme <jaymef@gmail.com<mailto:jaymef@gmail.com>> wrote: I've been seeing these warnings myself, on 1Gb ovirtmanagement (glusterFS is 10Gbe backend). I haven't correlated to network graphs yet but I don't know what would be happening on my management network that would be exhausting 1Gb network. On Fri, Aug 31, 2018 at 3:27 AM Florian Schmid <fschmid@ubimet.com<mailto:fschmid@ubimet.com>> wrote: Good morning, since we have upgraded to version 4.2.5, we get a lot of warnings about network interface exceeded defined threshold limits. For example: Aug 31, 2018, 7:54:05 AM Host xxx has network interface which exceeded the defined threshold [95%] (enp9s0.80: transmit rate[100%], receive rate [12%]) This is a 10 Gbit interface and on our monitoring software, which is getting network statistics every 10s, the bandwidth of TX was 150 Mbit maximum at this time, so far away from being 100%. Could it be, that the engine detected the wrong interface speed or there is a calculation error? In the engine for this host, I have 10000 Mbps for all interfaces. I have checked now all those warnings on our different hosts and they happen every time, we go over 100 Mbit and this is for sure quite often... Can I maybe disable these warnings, because we have it anyway in our monitoring software? If you need any logs, please ask. BR Florian Schmid _______________________________________________ Users mailing list -- users@ovirt.org<mailto:users@ovirt.org> To unsubscribe send an email to users-leave@ovirt.org<mailto:users-leave@ovirt.org> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/2NFL3O66IN4Z6H... _______________________________________________ Users mailing list -- users@ovirt.org<mailto:users@ovirt.org> To unsubscribe send an email to users-leave@ovirt.org<mailto:users-leave@ovirt.org> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/3UPGCAKVDJDPNS... To view the terms under which this email is distributed, please go to:- http://disclaimer.leedsbeckett.ac.uk/disclaimer/disclaimer.html

Indeed looks like a nasty bug. Could you please open a bug on this? https://tinyurl.com/ya7crjhf If you can, could you also verify the fix? https://gerrit.ovirt.org/#/c/94132/ Thanks, Edy. On Mon, Sep 3, 2018 at 2:32 PM, Staniforth, Paul < P.Staniforth@leedsbeckett.ac.uk> wrote:
Hello Edward,
I am also seeing this problem, it's on our ovirtmgmt.
cat /sys/class/net/eno49/speed 10000
cat /sys/class/net/eno49.20/speed 10000
cat /sys/class/net/ovirtmgmt/speed cat: /sys/class/net/ovirtmgmt/speed: Invalid argument
vdsm-client Host getStats -> ...
"eno49": { "rxErrors": "0", "name": "eno49", "tx": "3456777", "txDropped": "0", "sampleTime": 1535974190.687987, "rx": "121362321", "txErrors": "0", "state": "up", "speed": "10000", "rxDropped": "2" },
"eno49.20": { "rxErrors": "0", "name": "eno49.20", "tx": "3384452", "txDropped": "0", "sampleTime": 1535974190.687987, "rx": "115884579", "txErrors": "0", "state": "up", "speed": "1000", "rxDropped": "0" },
"ovirtmgmt": { "rxErrors": "0", "name": "ovirtmgmt", "tx": "3383804", "txDropped": "0", "sampleTime": 1535974190.687987, "rx": "115710919", "txErrors": "0", "state": "up", "speed": "1000", "rxDropped": "0" },
Regards,
Paul S.
------------------------------ *From:* Florian Schmid <fschmid@ubimet.com> *Sent:* 03 September 2018 11:44 *To:* edwardh@redhat.com *Cc:* users *Subject:* [ovirt-users] Re: Wrong network threshold limit warnings on 4.2.5
Hi Edward,
I got some alarms today from a server and I have checked your command there. (not at the time the issue happened!!) Hosts are on latest patch level CentOS 7.5 and oVirt 4.2.5
Example: cat /sys/class/net/enp9s0/speed 10000
cat /sys/class/net/enp9s0.80/speed 10000
cat /sys/class/net/vm-int-nfs/speed cat: /sys/class/net/vm-int-nfs/speed: invalid argument <- this is the bridge for the VMs
vdsm-client Host getStats -> ... "enp9s0": { "rxErrors": "0", "name": "enp9s0", "tx": "3335325754762", "txDropped": "0", "sampleTime": 1535970960.602359, "rx": "5916567956502", "txErrors": "0", "state": "up", "speed": "10000", "rxDropped": "0" }, ... "enp9s0.80": { "rxErrors": "0", "name": "enp9s0.80", "tx": "3180024039398", "txDropped": "0", "sampleTime": 1535970960.602359, "rx": "5669421065686", "txErrors": "0", "state": "up", "speed": "1000", "rxDropped": "0" }, ... "vm-int-nfs": { "rxErrors": "0", "name": "vm-int-nfs", "tx": "508", "txDropped": "0", "sampleTime": 1535970960.602359, "rx": "4428568", "txErrors": "0", "state": "up", "speed": "1000", "rxDropped": "0" }, ...
As you see here, vdsm is reporting the wrong speed for the vlan devices.
BR Florian Schmid
------------------------------ *Von: *"Edward Haas" <ehaas@redhat.com> *An: *"Jayme" <jaymef@gmail.com>, "Florian Schmid" <fschmid@ubimet.com> *CC: *"users" <users@ovirt.org>, "Alona Kaplan" <alkaplan@redhat.com> *Gesendet: *Montag, 3. September 2018 11:38:25 *Betreff: *Re: [ovirt-users] Re: Wrong network threshold limit warnings on 4.2.5
If you manage to recreate this, please collect a few samples from what the hypervisor reports back: Run the command: vdsm-client Host getStats
Engine is calculating based on this information the rate. (and the agent collects it from /sys/class/net/<device>/statistics/)
Please also mention on what OS you are running the hosts.
Thanks, Edy
On Fri, Aug 31, 2018 at 5:35 PM, Jayme <jaymef@gmail.com> wrote:
I've been seeing these warnings myself, on 1Gb ovirtmanagement (glusterFS is 10Gbe backend). I haven't correlated to network graphs yet but I don't know what would be happening on my management network that would be exhausting 1Gb network.
On Fri, Aug 31, 2018 at 3:27 AM Florian Schmid <fschmid@ubimet.com> wrote:
Good morning,
since we have upgraded to version 4.2.5, we get a lot of warnings about network interface exceeded defined threshold limits.
For example: Aug 31, 2018, 7:54:05 AM Host xxx has network interface which exceeded the defined threshold [95%] (enp9s0.80: transmit rate[100%], receive rate [12%])
This is a 10 Gbit interface and on our monitoring software, which is getting network statistics every 10s, the bandwidth of TX was 150 Mbit maximum at this time, so far away from being 100%.
Could it be, that the engine detected the wrong interface speed or there is a calculation error? In the engine for this host, I have 10000 Mbps for all interfaces.
I have checked now all those warnings on our different hosts and they happen every time, we go over 100 Mbit and this is for sure quite often...
Can I maybe disable these warnings, because we have it anyway in our monitoring software?
If you need any logs, please ask.
BR Florian Schmid _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community- guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/ message/2NFL3O66IN4Z6HUK45WQFXRUBMQDUY7P/
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community- guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/ message/3UPGCAKVDJDPNSODTO6PWXOD2ETT63N6/
To view the terms under which this email is distributed, please go to:- http://disclaimer.leedsbeckett.ac.uk/disclaimer/disclaimer.html

Hello Edward, raised a bug report: https://bugzilla.redhat.com/show_bug.cgi?id=1625098 I will try your patch. LG Florian Von: "Edward Haas" <ehaas@redhat.com> An: "p staniforth" <P.Staniforth@leedsbeckett.ac.uk>, "Florian Schmid" <fschmid@ubimet.com> CC: "users" <users@ovirt.org> Gesendet: Montag, 3. September 2018 16:42:25 Betreff: Re: Wrong network threshold limit warnings on 4.2.5 Indeed looks like a nasty bug. Could you please open a bug on this? [ https://tinyurl.com/ya7crjhf | https://tinyurl.com/ya7crjhf ] If you can, could you also verify the fix? [ https://gerrit.ovirt.org/#/c/94132/ | https://gerrit.ovirt.org/#/c/94132/ ] Thanks, Edy. On Mon, Sep 3, 2018 at 2:32 PM, Staniforth, Paul < [ mailto:P.Staniforth@leedsbeckett.ac.uk | P.Staniforth@leedsbeckett.ac.uk ] > wrote: Hello Edward, I am also seeing this problem, it's on our ovirtmgmt. cat /sys/class/net/eno49/speed 10000 cat /sys/class/net/eno49.20/speed 10000 cat /sys/class/net/ovirtmgmt/speed cat: /sys/class/net/ovirtmgmt/speed: Invalid argument vdsm-client Host getStats -> ... "eno49": { "rxErrors": "0", "name": "eno49", "tx": "3456777", "txDropped": "0", "sampleTime": 1535974190.687987, "rx": "121362321", "txErrors": "0", "state": "up", "speed": "10000", "rxDropped": "2" }, "eno49.20": { "rxErrors": "0", "name": "eno49.20", "tx": "3384452", "txDropped": "0", "sampleTime": 1535974190.687987, "rx": "115884579", "txErrors": "0", "state": "up", "speed": "1000", "rxDropped": "0" }, "ovirtmgmt": { "rxErrors": "0", "name": "ovirtmgmt", "tx": "3383804", "txDropped": "0", "sampleTime": 1535974190.687987, "rx": "115710919", "txErrors": "0", "state": "up", "speed": "1000", "rxDropped": "0" }, Regards, Paul S. From: Florian Schmid < [ mailto:fschmid@ubimet.com | fschmid@ubimet.com ] > Sent: 03 September 2018 11:44 To: [ mailto:edwardh@redhat.com | edwardh@redhat.com ] Cc: users Subject: [ovirt-users] Re: Wrong network threshold limit warnings on 4.2.5 Hi Edward, I got some alarms today from a server and I have checked your command there. (not at the time the issue happened!!) Hosts are on latest patch level CentOS 7.5 and oVirt 4.2.5 Example: cat /sys/class/net/enp9s0/speed 10000 cat /sys/class/net/enp9s0.80/speed 10000 cat /sys/class/net/vm-int-nfs/speed cat: /sys/class/net/vm-int-nfs/speed: invalid argument <- this is the bridge for the VMs vdsm-client Host getStats -> ... "enp9s0": { "rxErrors": "0", "name": "enp9s0", "tx": "3335325754762", "txDropped": "0", "sampleTime": 1535970960.602359, "rx": "5916567956502", "txErrors": "0", "state": "up", "speed": "10000", "rxDropped": "0" }, ... "enp9s0.80": { "rxErrors": "0", "name": "enp9s0.80", "tx": "3180024039398", "txDropped": "0", "sampleTime": 1535970960.602359, "rx": "5669421065686", "txErrors": "0", "state": "up", "speed": "1000", "rxDropped": "0" }, ... "vm-int-nfs": { "rxErrors": "0", "name": "vm-int-nfs", "tx": "508", "txDropped": "0", "sampleTime": 1535970960.602359, "rx": "4428568", "txErrors": "0", "state": "up", "speed": "1000", "rxDropped": "0" }, ... As you see here, vdsm is reporting the wrong speed for the vlan devices. BR Florian Schmid Von: "Edward Haas" < [ mailto:ehaas@redhat.com | ehaas@redhat.com ] > An: "Jayme" < [ mailto:jaymef@gmail.com | jaymef@gmail.com ] >, "Florian Schmid" < [ mailto:fschmid@ubimet.com | fschmid@ubimet.com ] > CC: "users" < [ mailto:users@ovirt.org | users@ovirt.org ] >, "Alona Kaplan" < [ mailto:alkaplan@redhat.com | alkaplan@redhat.com ] > Gesendet: Montag, 3. September 2018 11:38:25 Betreff: Re: [ovirt-users] Re: Wrong network threshold limit warnings on 4.2.5 If you manage to recreate this, please collect a few samples from what the hypervisor reports back: Run the command: vdsm-client Host getStats Engine is calculating based on this information the rate. (and the agent collects it from /sys/class/net/<device>/statistics/) Please also mention on what OS you are running the hosts. Thanks, Edy On Fri, Aug 31, 2018 at 5:35 PM, Jayme < [ mailto:jaymef@gmail.com | jaymef@gmail.com ] > wrote: BQ_BEGIN I've been seeing these warnings myself, on 1Gb ovirtmanagement (glusterFS is 10Gbe backend). I haven't correlated to network graphs yet but I don't know what would be happening on my management network that would be exhausting 1Gb network. On Fri, Aug 31, 2018 at 3:27 AM Florian Schmid < [ mailto:fschmid@ubimet.com | fschmid@ubimet.com ] > wrote: BQ_BEGIN Good morning, since we have upgraded to version 4.2.5, we get a lot of warnings about network interface exceeded defined threshold limits. For example: Aug 31, 2018, 7:54:05 AM Host xxx has network interface which exceeded the defined threshold [95%] (enp9s0.80: transmit rate[100%], receive rate [12%]) This is a 10 Gbit interface and on our monitoring software, which is getting network statistics every 10s, the bandwidth of TX was 150 Mbit maximum at this time, so far away from being 100%. Could it be, that the engine detected the wrong interface speed or there is a calculation error? In the engine for this host, I have 10000 Mbps for all interfaces. I have checked now all those warnings on our different hosts and they happen every time, we go over 100 Mbit and this is for sure quite often... Can I maybe disable these warnings, because we have it anyway in our monitoring software? If you need any logs, please ask. BR Florian Schmid _______________________________________________ Users mailing list -- [ mailto:users@ovirt.org | users@ovirt.org ] To unsubscribe send an email to [ mailto:users-leave@ovirt.org | users-leave@ovirt.org ] Privacy Statement: [ https://www.ovirt.org/site/privacy-policy/ | https://www.ovirt.org/site/privacy-policy/ ] oVirt Code of Conduct: [ https://www.ovirt.org/community/about/community-guidelines/ | https://www.ovirt.org/community/about/community-guidelines/ ] List Archives: [ https://lists.ovirt.org/archives/list/users@ovirt.org/message/2NFL3O66IN4Z6H... | https://lists.ovirt.org/archives/list/users@ovirt.org/message/2NFL3O66IN4Z6H... ] _______________________________________________ Users mailing list -- [ mailto:users@ovirt.org | users@ovirt.org ] To unsubscribe send an email to [ mailto:users-leave@ovirt.org | users-leave@ovirt.org ] Privacy Statement: [ https://www.ovirt.org/site/privacy-policy/ | https://www.ovirt.org/site/privacy-policy/ ] oVirt Code of Conduct: [ https://www.ovirt.org/community/about/community-guidelines/ | https://www.ovirt.org/community/about/community-guidelines/ ] List Archives: [ https://lists.ovirt.org/archives/list/users@ovirt.org/message/3UPGCAKVDJDPNS... | https://lists.ovirt.org/archives/list/users@ovirt.org/message/3UPGCAKVDJDPNS... ] BQ_END To view the terms under which this email is distributed, please go to:- [ http://disclaimer.leedsbeckett.ac.uk/disclaimer/disclaimer.html | http://disclaimer.leedsbeckett.ac.uk/disclaimer/disclaimer.html ] BQ_END

Hello Edward, I have applied the patch and it looks very good! vdsm-client Host getStats -> ... "enp9s0.88": { "rxErrors": "0", "name": "enp9s0.88", "tx": "1226", "txDropped": "0", "sampleTime": 1536043097.701361, "rx": "98642", "txErrors": "0", "state": "up", "speed": "10000", "rxDropped": "0" }, ... Bridge devices still have only 1000 configured: "vm-int-dev": { "rxErrors": "0", "name": "vm-int-dev", "tx": "578", "txDropped": "0", "sampleTime": 1536043097.701361, "rx": "27843284", "txErrors": "0", "state": "up", "speed": "1000", "rxDropped": "0" }, One important question: I want to apply this patch without upgrading all hosts, because this is a huge task. When I apply that patch only to this particular file, which service do I need to restart? I have restarted now all three vdsm services, but I think, I can't do that while VMs are running on the hosts, do I? LG Florian Von: "Florian Schmid" <fschmid@ubimet.com> An: "edwardh" <edwardh@redhat.com> CC: "users" <users@ovirt.org> Gesendet: Dienstag, 4. September 2018 08:32:09 Betreff: [ovirt-users] Re: Wrong network threshold limit warnings on 4.2.5 Hello Edward, raised a bug report: https://bugzilla.redhat.com/show_bug.cgi?id=1625098 I will try your patch. LG Florian Von: "Edward Haas" <ehaas@redhat.com> An: "p staniforth" <P.Staniforth@leedsbeckett.ac.uk>, "Florian Schmid" <fschmid@ubimet.com> CC: "users" <users@ovirt.org> Gesendet: Montag, 3. September 2018 16:42:25 Betreff: Re: Wrong network threshold limit warnings on 4.2.5 Indeed looks like a nasty bug. Could you please open a bug on this? [ https://tinyurl.com/ya7crjhf | https://tinyurl.com/ya7crjhf ] If you can, could you also verify the fix? [ https://gerrit.ovirt.org/#/c/94132/ | https://gerrit.ovirt.org/#/c/94132/ ] Thanks, Edy. On Mon, Sep 3, 2018 at 2:32 PM, Staniforth, Paul < [ mailto:P.Staniforth@leedsbeckett.ac.uk | P.Staniforth@leedsbeckett.ac.uk ] > wrote: Hello Edward, I am also seeing this problem, it's on our ovirtmgmt. cat /sys/class/net/eno49/speed 10000 cat /sys/class/net/eno49.20/speed 10000 cat /sys/class/net/ovirtmgmt/speed cat: /sys/class/net/ovirtmgmt/speed: Invalid argument vdsm-client Host getStats -> ... "eno49": { "rxErrors": "0", "name": "eno49", "tx": "3456777", "txDropped": "0", "sampleTime": 1535974190.687987, "rx": "121362321", "txErrors": "0", "state": "up", "speed": "10000", "rxDropped": "2" }, "eno49.20": { "rxErrors": "0", "name": "eno49.20", "tx": "3384452", "txDropped": "0", "sampleTime": 1535974190.687987, "rx": "115884579", "txErrors": "0", "state": "up", "speed": "1000", "rxDropped": "0" }, "ovirtmgmt": { "rxErrors": "0", "name": "ovirtmgmt", "tx": "3383804", "txDropped": "0", "sampleTime": 1535974190.687987, "rx": "115710919", "txErrors": "0", "state": "up", "speed": "1000", "rxDropped": "0" }, Regards, Paul S. From: Florian Schmid < [ mailto:fschmid@ubimet.com | fschmid@ubimet.com ] > Sent: 03 September 2018 11:44 To: [ mailto:edwardh@redhat.com | edwardh@redhat.com ] Cc: users Subject: [ovirt-users] Re: Wrong network threshold limit warnings on 4.2.5 Hi Edward, I got some alarms today from a server and I have checked your command there. (not at the time the issue happened!!) Hosts are on latest patch level CentOS 7.5 and oVirt 4.2.5 Example: cat /sys/class/net/enp9s0/speed 10000 cat /sys/class/net/enp9s0.80/speed 10000 cat /sys/class/net/vm-int-nfs/speed cat: /sys/class/net/vm-int-nfs/speed: invalid argument <- this is the bridge for the VMs vdsm-client Host getStats -> ... "enp9s0": { "rxErrors": "0", "name": "enp9s0", "tx": "3335325754762", "txDropped": "0", "sampleTime": 1535970960.602359, "rx": "5916567956502", "txErrors": "0", "state": "up", "speed": "10000", "rxDropped": "0" }, ... "enp9s0.80": { "rxErrors": "0", "name": "enp9s0.80", "tx": "3180024039398", "txDropped": "0", "sampleTime": 1535970960.602359, "rx": "5669421065686", "txErrors": "0", "state": "up", "speed": "1000", "rxDropped": "0" }, ... "vm-int-nfs": { "rxErrors": "0", "name": "vm-int-nfs", "tx": "508", "txDropped": "0", "sampleTime": 1535970960.602359, "rx": "4428568", "txErrors": "0", "state": "up", "speed": "1000", "rxDropped": "0" }, ... As you see here, vdsm is reporting the wrong speed for the vlan devices. BR Florian Schmid Von: "Edward Haas" < [ mailto:ehaas@redhat.com | ehaas@redhat.com ] > An: "Jayme" < [ mailto:jaymef@gmail.com | jaymef@gmail.com ] >, "Florian Schmid" < [ mailto:fschmid@ubimet.com | fschmid@ubimet.com ] > CC: "users" < [ mailto:users@ovirt.org | users@ovirt.org ] >, "Alona Kaplan" < [ mailto:alkaplan@redhat.com | alkaplan@redhat.com ] > Gesendet: Montag, 3. September 2018 11:38:25 Betreff: Re: [ovirt-users] Re: Wrong network threshold limit warnings on 4.2.5 If you manage to recreate this, please collect a few samples from what the hypervisor reports back: Run the command: vdsm-client Host getStats Engine is calculating based on this information the rate. (and the agent collects it from /sys/class/net/<device>/statistics/) Please also mention on what OS you are running the hosts. Thanks, Edy On Fri, Aug 31, 2018 at 5:35 PM, Jayme < [ mailto:jaymef@gmail.com | jaymef@gmail.com ] > wrote: BQ_BEGIN I've been seeing these warnings myself, on 1Gb ovirtmanagement (glusterFS is 10Gbe backend). I haven't correlated to network graphs yet but I don't know what would be happening on my management network that would be exhausting 1Gb network. On Fri, Aug 31, 2018 at 3:27 AM Florian Schmid < [ mailto:fschmid@ubimet.com | fschmid@ubimet.com ] > wrote: BQ_BEGIN Good morning, since we have upgraded to version 4.2.5, we get a lot of warnings about network interface exceeded defined threshold limits. For example: Aug 31, 2018, 7:54:05 AM Host xxx has network interface which exceeded the defined threshold [95%] (enp9s0.80: transmit rate[100%], receive rate [12%]) This is a 10 Gbit interface and on our monitoring software, which is getting network statistics every 10s, the bandwidth of TX was 150 Mbit maximum at this time, so far away from being 100%. Could it be, that the engine detected the wrong interface speed or there is a calculation error? In the engine for this host, I have 10000 Mbps for all interfaces. I have checked now all those warnings on our different hosts and they happen every time, we go over 100 Mbit and this is for sure quite often... Can I maybe disable these warnings, because we have it anyway in our monitoring software? If you need any logs, please ask. BR Florian Schmid _______________________________________________ Users mailing list -- [ mailto:users@ovirt.org | users@ovirt.org ] To unsubscribe send an email to [ mailto:users-leave@ovirt.org | users-leave@ovirt.org ] Privacy Statement: [ https://www.ovirt.org/site/privacy-policy/ | https://www.ovirt.org/site/privacy-policy/ ] oVirt Code of Conduct: [ https://www.ovirt.org/community/about/community-guidelines/ | https://www.ovirt.org/community/about/community-guidelines/ ] List Archives: [ https://lists.ovirt.org/archives/list/users@ovirt.org/message/2NFL3O66IN4Z6H... | https://lists.ovirt.org/archives/list/users@ovirt.org/message/2NFL3O66IN4Z6H... ] _______________________________________________ Users mailing list -- [ mailto:users@ovirt.org | users@ovirt.org ] To unsubscribe send an email to [ mailto:users-leave@ovirt.org | users-leave@ovirt.org ] Privacy Statement: [ https://www.ovirt.org/site/privacy-policy/ | https://www.ovirt.org/site/privacy-policy/ ] oVirt Code of Conduct: [ https://www.ovirt.org/community/about/community-guidelines/ | https://www.ovirt.org/community/about/community-guidelines/ ] List Archives: [ https://lists.ovirt.org/archives/list/users@ovirt.org/message/3UPGCAKVDJDPNS... | https://lists.ovirt.org/archives/list/users@ovirt.org/message/3UPGCAKVDJDPNS... ] BQ_END To view the terms under which this email is distributed, please go to:- [ http://disclaimer.leedsbeckett.ac.uk/disclaimer/disclaimer.html | http://disclaimer.leedsbeckett.ac.uk/disclaimer/disclaimer.html ] BQ_END _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/O2JKN5QYZUOV4T...

Hello Florian, Thanks for checking the patch and posting the bug. You need to restart vdsmd and supervdsmd. It should not affect running VM/s, but you always have a risk that something unexpected can happen. Perhaps try it on a host and then proceed with others. Thanks, Edy. On Tue, Sep 4, 2018 at 9:45 AM, Florian Schmid <fschmid@ubimet.com> wrote:
Hello Edward,
I have applied the patch and it looks very good! vdsm-client Host getStats -> ... "enp9s0.88": { "rxErrors": "0", "name": "enp9s0.88", "tx": "1226", "txDropped": "0", "sampleTime": 1536043097.701361, "rx": "98642", "txErrors": "0", "state": "up", "speed": "10000", "rxDropped": "0" }, ...
Bridge devices still have only 1000 configured: "vm-int-dev": { "rxErrors": "0", "name": "vm-int-dev", "tx": "578", "txDropped": "0", "sampleTime": 1536043097.701361, "rx": "27843284", "txErrors": "0", "state": "up", "speed": "1000", "rxDropped": "0" },
One important question: I want to apply this patch without upgrading all hosts, because this is a huge task. When I apply that patch only to this particular file, which service do I need to restart? I have restarted now all three vdsm services, but I think, I can't do that while VMs are running on the hosts, do I?
LG Florian
------------------------------ *Von: *"Florian Schmid" <fschmid@ubimet.com> *An: *"edwardh" <edwardh@redhat.com> *CC: *"users" <users@ovirt.org> *Gesendet: *Dienstag, 4. September 2018 08:32:09 *Betreff: *[ovirt-users] Re: Wrong network threshold limit warnings on 4.2.5
Hello Edward,
raised a bug report: https://bugzilla.redhat.com/show_bug.cgi?id=1625098
I will try your patch.
LG Florian
------------------------------ *Von: *"Edward Haas" <ehaas@redhat.com> *An: *"p staniforth" <P.Staniforth@leedsbeckett.ac.uk>, "Florian Schmid" < fschmid@ubimet.com> *CC: *"users" <users@ovirt.org> *Gesendet: *Montag, 3. September 2018 16:42:25 *Betreff: *Re: Wrong network threshold limit warnings on 4.2.5
Indeed looks like a nasty bug. Could you please open a bug on this? https://tinyurl.com/ya7crjhf
If you can, could you also verify the fix? https://gerrit.ovirt.org/#/c/ 94132/
Thanks, Edy.
On Mon, Sep 3, 2018 at 2:32 PM, Staniforth, Paul < P.Staniforth@leedsbeckett.ac.uk> wrote:
Hello Edward,
I am also seeing this problem, it's on our ovirtmgmt.
cat /sys/class/net/eno49/speed 10000
cat /sys/class/net/eno49.20/speed 10000
cat /sys/class/net/ovirtmgmt/speed cat: /sys/class/net/ovirtmgmt/speed: Invalid argument
vdsm-client Host getStats -> ...
"eno49": { "rxErrors": "0", "name": "eno49", "tx": "3456777", "txDropped": "0", "sampleTime": 1535974190.687987, "rx": "121362321", "txErrors": "0", "state": "up", "speed": "10000", "rxDropped": "2" },
"eno49.20": { "rxErrors": "0", "name": "eno49.20", "tx": "3384452", "txDropped": "0", "sampleTime": 1535974190.687987, "rx": "115884579", "txErrors": "0", "state": "up", "speed": "1000", "rxDropped": "0" },
"ovirtmgmt": { "rxErrors": "0", "name": "ovirtmgmt", "tx": "3383804", "txDropped": "0", "sampleTime": 1535974190.687987, "rx": "115710919", "txErrors": "0", "state": "up", "speed": "1000", "rxDropped": "0" },
Regards,
Paul S.
------------------------------ *From:* Florian Schmid <fschmid@ubimet.com> *Sent:* 03 September 2018 11:44 *To:* edwardh@redhat.com *Cc:* users *Subject:* [ovirt-users] Re: Wrong network threshold limit warnings on 4.2.5
Hi Edward,
I got some alarms today from a server and I have checked your command there. (not at the time the issue happened!!) Hosts are on latest patch level CentOS 7.5 and oVirt 4.2.5
Example: cat /sys/class/net/enp9s0/speed 10000
cat /sys/class/net/enp9s0.80/speed 10000
cat /sys/class/net/vm-int-nfs/speed cat: /sys/class/net/vm-int-nfs/speed: invalid argument <- this is the bridge for the VMs
vdsm-client Host getStats -> ... "enp9s0": { "rxErrors": "0", "name": "enp9s0", "tx": "3335325754762", "txDropped": "0", "sampleTime": 1535970960.602359, "rx": "5916567956502", "txErrors": "0", "state": "up", "speed": "10000", "rxDropped": "0" }, ... "enp9s0.80": { "rxErrors": "0", "name": "enp9s0.80", "tx": "3180024039398", "txDropped": "0", "sampleTime": 1535970960.602359, "rx": "5669421065686", "txErrors": "0", "state": "up", "speed": "1000", "rxDropped": "0" }, ... "vm-int-nfs": { "rxErrors": "0", "name": "vm-int-nfs", "tx": "508", "txDropped": "0", "sampleTime": 1535970960.602359, "rx": "4428568", "txErrors": "0", "state": "up", "speed": "1000", "rxDropped": "0" }, ...
As you see here, vdsm is reporting the wrong speed for the vlan devices.
BR Florian Schmid
------------------------------ *Von: *"Edward Haas" <ehaas@redhat.com> *An: *"Jayme" <jaymef@gmail.com>, "Florian Schmid" <fschmid@ubimet.com> *CC: *"users" <users@ovirt.org>, "Alona Kaplan" <alkaplan@redhat.com> *Gesendet: *Montag, 3. September 2018 11:38:25 *Betreff: *Re: [ovirt-users] Re: Wrong network threshold limit warnings on 4.2.5
If you manage to recreate this, please collect a few samples from what the hypervisor reports back: Run the command: vdsm-client Host getStats
Engine is calculating based on this information the rate. (and the agent collects it from /sys/class/net/<device>/statistics/)
Please also mention on what OS you are running the hosts.
Thanks, Edy
On Fri, Aug 31, 2018 at 5:35 PM, Jayme <jaymef@gmail.com> wrote:
I've been seeing these warnings myself, on 1Gb ovirtmanagement (glusterFS is 10Gbe backend). I haven't correlated to network graphs yet but I don't know what would be happening on my management network that would be exhausting 1Gb network.
On Fri, Aug 31, 2018 at 3:27 AM Florian Schmid <fschmid@ubimet.com> wrote:
Good morning,
since we have upgraded to version 4.2.5, we get a lot of warnings about network interface exceeded defined threshold limits.
For example: Aug 31, 2018, 7:54:05 AM Host xxx has network interface which exceeded the defined threshold [95%] (enp9s0.80: transmit rate[100%], receive rate [12%])
This is a 10 Gbit interface and on our monitoring software, which is getting network statistics every 10s, the bandwidth of TX was 150 Mbit maximum at this time, so far away from being 100%.
Could it be, that the engine detected the wrong interface speed or there is a calculation error? In the engine for this host, I have 10000 Mbps for all interfaces.
I have checked now all those warnings on our different hosts and they happen every time, we go over 100 Mbit and this is for sure quite often...
Can I maybe disable these warnings, because we have it anyway in our monitoring software?
If you need any logs, please ask.
BR Florian Schmid _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community- guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/ message/2NFL3O66IN4Z6HUK45WQFXRUBMQDUY7P/
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community- guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/ message/3UPGCAKVDJDPNSODTO6PWXOD2ETT63N6/
To view the terms under which this email is distributed, please go to:- http://disclaimer.leedsbeckett.ac.uk/disclaimer/disclaimer.html
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community- guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/ message/O2JKN5QYZUOV4T7RBSGSBTBSU6V6HQ62/

On Tue, Sep 4, 2018 at 9:02 AM Edward Haas <ehaas@redhat.com> wrote:
Hello Florian,
Thanks for checking the patch and posting the bug.
You need to restart vdsmd and supervdsmd. It should not affect running VM/s, but you always have a risk that something unexpected can happen. Perhaps try it on a host and then proceed with others.
Thanks, Edy.
I'm having similar problem in a 3 hosts oVirt test cluster with these notifications every day on 1Gbit adapters. I have bond0 on em1 and em2 and then bondo.65, bond0.68, bond0.167 vlans defined for the VMs I get these warnings Message:Host ov300 has network interface which exceeded the defined threshold [95%] (em1: transmit rate[98%], receive rate [0%]) when actually I think the 3 VMs running on this host generate few MB/s of traffic I applied the changes to the 3 hosts. I notice that due to dependencies it is sufficient to restart supervdsmd and then also vdsmd will be automatically restarted, correct? In my case for each of the 3 hosts, after restarting supervdsmd I got messages like these, but without impacts on runnign VMs VDSM ov300 command GetStatsAsyncVDS failed: Broken pipe 9/4/18 9:07:52 AM Host ov300 is not responding. It will stay in Connecting state for a grace period of 61 seconds and after that an attempt to fence the host will be issued. 9/4/18 9:07:52 AM No faulty multipath paths on host ov300 9/4/18 9:07:58 AM Executing power management status on Host ov300 using Proxy Host ov200 and Fence Agent ipmilan:10.10.193.103. 9/4/18 9:07:58 AM Status of host ov300 was set to Up. 9/4/18 9:07:58 AM Host ov300 power management was verified successfully. 9/4/18 9:07:58 AM Please note that when doing on SPM host you could also get these: VDSM ov301 command SpmStatusVDS failed: Broken pipe 9/4/18 9:10:00 AM Host ov301 is not responding. It will stay in Connecting state for a grace period of 81 seconds and after that an attempt to fence the host will be issued. 9/4/18 9:10:00 AM Invalid status on Data Center MYDC. Setting Data Center status to Non Responsive (On host ov301, Error: Network error during communication with the Host.). 9/4/18 9:10:00 AM with reassignment of SPM role: VDSM command GetStoragePoolInfoVDS failed: Heartbeat exceeded 9/4/18 9:10:12 AM Storage Pool Manager runs on Host ov200 (Address: ov200), Data Center MYDC. 9/4/18 9:10:14 AM Probably safer to manually move the SPM before restarting supervdsmd on that host. Let's see this evening if I will get any message about thresholds. BTW: one question. I see in the code iface.Type.NIC and now also iface.Type.BOND. Don't you think that you should manage also the network teaming option available in RH EL 7, as described here: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/htm... ? This only if it is supported to use the new network teaming implementation in oVirt, and I'm not sure about it... Thanks, Gianluca

On Tue, Sep 4, 2018 at 10:42 AM, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
On Tue, Sep 4, 2018 at 9:02 AM Edward Haas <ehaas@redhat.com> wrote:
Hello Florian,
Thanks for checking the patch and posting the bug.
You need to restart vdsmd and supervdsmd. It should not affect running VM/s, but you always have a risk that something unexpected can happen. Perhaps try it on a host and then proceed with others.
Thanks, Edy.
I'm having similar problem in a 3 hosts oVirt test cluster with these notifications every day on 1Gbit adapters. I have bond0 on em1 and em2 and then bondo.65, bond0.68, bond0.167 vlans defined for the VMs I get these warnings
Message:Host ov300 has network interface which exceeded the defined threshold [95%] (em1: transmit rate[98%], receive rate [0%])
when actually I think the 3 VMs running on this host generate few MB/s of traffic I applied the changes to the 3 hosts.
I notice that due to dependencies it is sufficient to restart supervdsmd and then also vdsmd will be automatically restarted, correct?
In my case for each of the 3 hosts, after restarting supervdsmd I got messages like these, but without impacts on runnign VMs
VDSM ov300 command GetStatsAsyncVDS failed: Broken pipe 9/4/18 9:07:52 AM Host ov300 is not responding. It will stay in Connecting state for a grace period of 61 seconds and after that an attempt to fence the host will be issued. 9/4/18 9:07:52 AM No faulty multipath paths on host ov300 9/4/18 9:07:58 AM Executing power management status on Host ov300 using Proxy Host ov200 and Fence Agent ipmilan:10.10.193.103. 9/4/18 9:07:58 AM Status of host ov300 was set to Up. 9/4/18 9:07:58 AM Host ov300 power management was verified successfully. 9/4/18 9:07:58 AM
Please note that when doing on SPM host you could also get these:
VDSM ov301 command SpmStatusVDS failed: Broken pipe 9/4/18 9:10:00 AM Host ov301 is not responding. It will stay in Connecting state for a grace period of 81 seconds and after that an attempt to fence the host will be issued. 9/4/18 9:10:00 AM Invalid status on Data Center MYDC. Setting Data Center status to Non Responsive (On host ov301, Error: Network error during communication with the Host.). 9/4/18 9:10:00 AM
with reassignment of SPM role: VDSM command GetStoragePoolInfoVDS failed: Heartbeat exceeded 9/4/18 9:10:12 AM Storage Pool Manager runs on Host ov200 (Address: ov200), Data Center MYDC. 9/4/18 9:10:14 AM
Probably safer to manually move the SPM before restarting supervdsmd on that host.
Let's see this evening if I will get any message about thresholds.
BTW: one question. I see in the code iface.Type.NIC and now also iface.Type.BOND. Don't you think that you should manage also the network teaming option available in RH EL 7, as described here: https://access.redhat.com/documentation/en-us/red_hat_ enterprise_linux/7/html/networking_guide/ch-configure_network_teaming ? This only if it is supported to use the new network teaming implementation in oVirt, and I'm not sure about it...
There are no immediate plans to support it in VDSM. We are evaluating the options to change the way we interact with the host networking, that may open the door for team and others to get in.
Thanks, Gianluca
participants (5)
-
Edward Haas
-
Florian Schmid
-
Gianluca Cecchi
-
Jayme
-
Staniforth, Paul