Hi Nathaniel,
thanks for your time here and sorry for my late reply now.
Even though my NICs didn't use the E1000E driver I now got a broadcom
NIC from the stash and gave it a try.
I'm happy to announce that the NICs don't seem to be resetting on the
broadcom NIC.
This obviously means that there's some driver issue with the Intel NICs
I have been trying.
I still don't get the host into operational state because "Failed to
connect Host n3 to Storage Pool cl1" even though NFS is mounted properly
but I this is a different issue I think.
If you want I can reproduce this issue and grab all logs to maybe find a
fix other than "get a broadcom NIC" for the community.
To be honest though, I think this just can be added to the "weird driver
fuckups in centos" list if we start digging.
--
Best regards
Tivon Häberlein
On 13.07.2021 01:07, Nathaniel Roach via Users wrote:
On 12/7/21 11:44 pm, Nathaniel Roach via Users wrote:
>
> Do you get anything in the logs at all? For something like this I
> would expect it to show in syslog from the kernel.
>
> It really does sound like the E1000E issue, but will probably have a
> different fix - I first encountered it on a router when I was pushing
> >100Mbps in *and then back out* the same NIC. Otherwise it wouldn't
> happen at all. That would explain why it's not an issue in
> maintenance mode and downloading an image works fine.
>
> On 12/7/21 7:57 am, Tivon Häberlein wrote:
>>
>> Hi Strahil,
>>
>> the server uses Intel NICs with ixgbe and igb kernel drivers.
>> I did upgrade the firmware to the latest available one (through Dell
>> lifecycle-contoller).
>> I also tried replacing the network card itself but without success.
>>
>> As this issue did not arise when running Debian 10 or even oVirt
>> Node before adding it to the cluster I don't think its hardware
>> related. For my testing I mounted my oVirt Datastore manually on the
>> fresh install of oVirt node (using the ISO) and then coping a large
>> ISO file to the local disk. This fills the NIC up to the full 1
>> Gbit/s I have available there for a good 5 to 10 minutes.
>> Also the administration through cockpit works perfectly before
>> adding it to the cluster.
>>
>> As soon as I add the node to the cluster the trouble starts.
>> 1. oVirt reports that the install has failed on this host
>> 2. the node logs (kernel log) adapter resets on some interfaces
>> (even ones that arent UP)
>>
Having read your message again, are you able to capture these log
events before the node gets fenced (or just disable fencing for the time)?
>>
>> 3. the engine looses connection to the host and declares it
>> "Unresponsive"
>> 4. the node becomes unmanageable through cockpit or ssh because the
>> connection is lost repeatedly.
>> 5. the fencing agent reboots the node (If fencing is enabled)
>> 6. node comes up and gets added to the cluster (oVirt says the node
>> is in state UP)
>> 7. repeat from step 2
>>
>> It seems that this behavior stops when I put the node into
>> maintenance. Then I can even mount the Datastore manually and
>> transfer large ISOs without it dropping the connection.
>>
>> This is all very strange and I don't understand what causes this.
>>
>> Thank you.
>>
>> --
>> Best regards
>> Tivon Häberlein
>> On 11.07.2021 13:51, Strahil Nikolov wrote:
>>> Are you sure it's not a HW issue ?
>>> Try to update the server to latest firmware and test again.At least
>>> it won't hurt.
>>>
>>> Best Regards,
>>> Strahil Nikolov
>>>
>>> On Sat, Jul 10, 2021 at 14:45, Tivon Häberlein
>>> <tivon.haeberlein(a)secges.de> wrote:
>>>
>>> Hi,
>>>
>>> I've been trying to get oVirt Node 4.4.6 up and running on my
>>> Dell r620 hosts but am facing a strange issue where seemingly
>>> all network adapters get reset at random times after install.
>>> The interfaces reset as soon as a bit of traffic is flowing
>>> through them.
>>> Also the logs show nfs timeouts.
>>>
>>> This only happens after I have installed the host using the
>>> oVirt engine and it also only happens when the host is
>>> connected to the engine. When the host is in maintenance mode
>>> it also seems to work fine.
>>>
>>> The host and networks work fine when its by itself (I tested
>>> right after install using the ISO and also after I have removed
>>> the host from the cluster)
>>>
>>> I cant figure why this is happening. Am I missing something?
>>> I've been stuck on this for the last couple of weeks, a bit of
>>> help would be much appreciated.
>>>
>>> Thank you!
>>>
>>> My cluster is looking like this:
>>> Engine: oVirt 4.4.6 - CentOS Linux release 8.3.2011
>>> host1: oVirt 4.4 repository on CentOS Linux release 8.4.2105
>>> host2: oVirt 4.4 repository on CentOS Linux release 8.4.2105
>>> host3 (this is the one I'm trying to install): oVirt node 4.4.6
>>>
>>> --
>>> Best regards
>>> Tivon Häberlein
>>>
>>> _______________________________________________
>>> Users mailing list -- users(a)ovirt.org <mailto:users@ovirt.org>
>>> To unsubscribe send an email to users-leave(a)ovirt.org
>>> <mailto:users-leave@ovirt.org>
>>> Privacy Statement:
https://www.ovirt.org/privacy-policy.html
>>> <
https://www.ovirt.org/privacy-policy.html>
>>> oVirt Code of Conduct:
>>>
https://www.ovirt.org/community/about/community-guidelines/
>>> <
https://www.ovirt.org/community/about/community-guidelines/>
>>> List Archives:
>>>
https://lists.ovirt.org/archives/list/users@ovirt.org/message/UQP3S4LFWGE...
>>>
<
https://lists.ovirt.org/archives/list/users@ovirt.org/message/UQP3S4LFWGE...
>>>
>>
>> _______________________________________________
>> Users mailing list --users(a)ovirt.org
>> To unsubscribe send an email tousers-leave(a)ovirt.org
>> Privacy
Statement:https://www.ovirt.org/privacy-policy.html
>> oVirt Code of
Conduct:https://www.ovirt.org/community/about/community-guidelines/
>> List
Archives:https://lists.ovirt.org/archives/list/users@ovirt.org/message/AD...
> --
>
> *Nathaniel Roach*
>
>
> _______________________________________________
> Users mailing list --users(a)ovirt.org
> To unsubscribe send an email tousers-leave(a)ovirt.org
> Privacy
Statement:https://www.ovirt.org/privacy-policy.html
> oVirt Code of
Conduct:https://www.ovirt.org/community/about/community-guidelines/
> List
Archives:https://lists.ovirt.org/archives/list/users@ovirt.org/message/4V...
--
*Nathaniel Roach*
_______________________________________________
Users mailing list -- users(a)ovirt.org
To unsubscribe send an email to users-leave(a)ovirt.org
Privacy Statement:
https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/DRECREHLNKL...