Hi Strahil,
the server uses Intel NICs with ixgbe and igb kernel drivers.
I did upgrade the firmware to the latest available one (through Dell
lifecycle-contoller).
I also tried replacing the network card itself but without success.
As this issue did not arise when running Debian 10 or even oVirt Node
before adding it to the cluster I don't think its hardware related. For
my testing I mounted my oVirt Datastore manually on the fresh install of
oVirt node (using the ISO) and then coping a large ISO file to the local
disk. This fills the NIC up to the full 1 Gbit/s I have available there
for a good 5 to 10 minutes.
Also the administration through cockpit works perfectly before adding it
to the cluster.
As soon as I add the node to the cluster the trouble starts.
1. oVirt reports that the install has failed on this host
2. the node logs (kernel log) adapter resets on some interfaces (even
ones that arent UP)
3. the engine looses connection to the host and declares it "Unresponsive"
4. the node becomes unmanageable through cockpit or ssh because the
connection is lost repeatedly.
5. the fencing agent reboots the node (If fencing is enabled)
6. node comes up and gets added to the cluster (oVirt says the node is
in state UP)
7. repeat from step 2
It seems that this behavior stops when I put the node into maintenance.
Then I can even mount the Datastore manually and transfer large ISOs
without it dropping the connection.
This is all very strange and I don't understand what causes this.
Thank you.
--
Best regards
Tivon Häberlein
On 11.07.2021 13:51, Strahil Nikolov wrote:
Are you sure it's not a HW issue ?
Try to update the server to latest firmware and test again.At least it
won't hurt.
Best Regards,
Strahil Nikolov
On Sat, Jul 10, 2021 at 14:45, Tivon Häberlein
<tivon.haeberlein(a)secges.de> wrote:
Hi,
I've been trying to get oVirt Node 4.4.6 up and running on my Dell
r620 hosts but am facing a strange issue where seemingly all
network adapters get reset at random times after install.
The interfaces reset as soon as a bit of traffic is flowing
through them.
Also the logs show nfs timeouts.
This only happens after I have installed the host using the oVirt
engine and it also only happens when the host is connected to the
engine. When the host is in maintenance mode it also seems to work
fine.
The host and networks work fine when its by itself (I tested right
after install using the ISO and also after I have removed the host
from the cluster)
I cant figure why this is happening. Am I missing something?
I've been stuck on this for the last couple of weeks, a bit of
help would be much appreciated.
Thank you!
My cluster is looking like this:
Engine: oVirt 4.4.6 - CentOS Linux release 8.3.2011
host1: oVirt 4.4 repository on CentOS Linux release 8.4.2105
host2: oVirt 4.4 repository on CentOS Linux release 8.4.2105
host3 (this is the one I'm trying to install): oVirt node 4.4.6
--
Best regards
Tivon Häberlein
_______________________________________________
Users mailing list -- users(a)ovirt.org <mailto:users@ovirt.org>
To unsubscribe send an email to users-leave(a)ovirt.org
<mailto:users-leave@ovirt.org>
Privacy Statement:
https://www.ovirt.org/privacy-policy.html
<
https://www.ovirt.org/privacy-policy.html>
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
<
https://www.ovirt.org/community/about/community-guidelines/>
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/UQP3S4LFWGE...
<
https://lists.ovirt.org/archives/list/users@ovirt.org/message/UQP3S4LFWGE...