Just a point of clarification, for all of these hosts, 1 of these interfaces is connected
to my 1Gbps switch, and the other interface is connected to my 10Gbps switch.
For Host 1 specifically,
enp4s0f0 is physically connected to 1 switch.
eno1 is physically connected to another.
But those interfaces are also bridged - and controlled - by oVirt itself.
Is it possible that oVirt took them down for some reason.
I don't know what that reason might be?
Sent with ProtonMail Secure Email.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Monday, May 10, 2021 7:14 PM, David White via Users <users(a)ovirt.org> wrote:
I'm not sure what to make of this, but looking at
/var/log/messages on all 3 of the hosts,it appears that the kernel disabled my oVirt
networks at the same exact time on all 3 hosts.
This occurred twice this morning, once around 8am and again around 8:30am:
ovirtmgmt is the storage network.
Private is the frond-end network.
I actually don't have *any* backup storage domains currently, and
no backups to speak of, so that wouldn't have been a cause from this morning.
My goal for this week is to install a 4th physical server with some spinning disks, and
expose those as an NFS mount point so that I can build a backup domain.
I also hope to get the 10Gbps network cards installed on the remaining two hosts, to get
10Gbps connectivity up and running between all 3 of the HCI hosts.
Host 1
May 10 08:00:23 cha1-storage kernel: tg3 0000:04:00.0 enp4s0f0: Link is down
May 10 08:00:23 cha1-storage kernel: ovirtmgmt: port 1(enp4s0f0) entered disabled state
May 10 08:00:23 cha1-storage kernel: tg3 0000:01:00.0 eno1: Link is down
May 10 08:00:24 cha1-storage kernel: Private: port 1(eno1) entered disabled state
{snip}
May 10 08:01:10 cha1-storage kernel: tg3 0000:01:00.0 eno1: Link is up at 1000 Mbps, full
duplex
May 10 08:01:10 cha1-storage kernel: tg3 0000:01:00.0 eno1: Flow control is off for TX
and off for RX
May 10 08:01:10 cha1-storage kernel: tg3 0000:01:00.0 eno1: EEE is disabled
May 10 08:01:10 cha1-storage kernel: Private: port 1(eno1) entered blocking state
May 10 08:01:10 cha1-storage kernel: Private: port 1(eno1) entered forwarding state
May 10 08:01:10 cha1-storage NetworkManager[1805]: <info> [1620648070.6021] device
(eno1): carrier: link connected
May 10 08:30:01 cha1-storage kernel: tg3 0000:04:00.0 enp4s0f0: Link is down
May 10 08:30:01 cha1-storage kernel: ovirtmgmt: port 1(enp4s0f0) entered disabled state
May 10 08:30:01 cha1-storage systemd[1]: Starting system activity accounting tool...
May 10 08:30:01 cha1-storage systemd[1]: sysstat-collect.service: Succeeded.
May 10 08:30:01 cha1-storage systemd[1]: Started system activity accounting tool.
May 10 08:30:01 cha1-storage kernel: tg3 0000:01:00.0 eno1: Link is down
May 10 08:30:02 cha1-storage kernel: Private: port 1(eno1) entered disabled state
{snip}
May 10 08:30:47 cha1-storage kernel: tg3 0000:01:00.0 eno1: Link is up at 1000 Mbps, full
duplex
May 10 08:30:47 cha1-storage kernel: tg3 0000:01:00.0 eno1: Flow control is off for TX
and off for RX
May 10 08:30:47 cha1-storage kernel: tg3 0000:01:00.0 eno1: EEE is disabled
May 10 08:30:47 cha1-storage kernel: Private: port 1(eno1) entered blocking state
May 10 08:30:47 cha1-storage kernel: Private: port 1(eno1) entered forwarding state
May 10 08:30:47 cha1-storage NetworkManager[1805]: <info> [1620649847.8592] device
(eno1): carrier: link connected
May 10 08:30:47 cha1-storage NetworkManager[1805]: <info> [1620649847.8602] device
(Private): carrier: link connected
Host 2
May 10 08:00:23 cha2-storage kernel: ixgbe 0000:01:00.1 eno2: NIC Link is Down
May 10 08:00:23 cha2-storage kernel: ovirtmgmt: port 1(eno2) entered disabled state
May 10 08:00:23 cha2-storage kernel: ixgbe 0000:01:00.0 eno1: NIC Link is Down
May 10 08:00:24 cha2-storage kernel: Private: port 1(eno1) entered disabled state
{snip}
May 10 08:01:10 cha2-storage kernel: ixgbe 0000:01:00.0 eno1: NIC Link is Up 1 Gbps, Flow
Control: None
May 10 08:01:10 cha2-storage kernel: Private: port 1(eno1) entered blocking state
May 10 08:01:10 cha2-storage kernel: Private: port 1(eno1) entered forwarding state
May 10 08:01:10 cha2-storage NetworkManager[16957]: <info> [1620648070.1303]
device (eno1): carrier: link connected
{snip}
May 10 08:30:01 cha2-storage kernel: ixgbe 0000:01:00.1 eno2: NIC Link is Down
May 10 08:30:01 cha2-storage kernel: ovirtmgmt: port 1(eno2) entered disabled state
May 10 08:30:01 cha2-storage systemd[1]: Starting system activity accounting tool...
May 10 08:30:01 cha2-storage systemd[1]: sysstat-collect.service: Succeeded.
May 10 08:30:01 cha2-storage systemd[1]: Started system activity accounting tool.
May 10 08:30:01 cha2-storage kernel: ixgbe 0000:01:00.0 eno1: NIC Link is Down
May 10 08:30:02 cha2-storage kernel: Private: port 1(eno1) entered disabled state
{snip}
May 10 08:30:47 cha2-storage kernel: ixgbe 0000:01:00.0 eno1: NIC Link is Up 1 Gbps, Flow
Control: None
May 10 08:30:47 cha2-storage kernel: Private: port 1(eno1) entered blocking state
May 10 08:30:47 cha2-storage kernel: Private: port 1(eno1) entered forwarding state
May 10 08:30:47 cha2-storage NetworkManager[16957]: <info> [1620649847.5041]
device (eno1): carrier: link connected
Host 3
May 10 08:00:23 cha3-storage kernel: tg3 0000:01:00.0 eno1: Link is down
May 10 08:00:24 cha3-storage journal[2196]: Guest agent is not responding: Guest agent
not available for now
May 10 08:00:24 cha3-storage kernel: Private: port 1(eno1) entered disabled state
May 10 08:00:30 cha3-storage journal[2196]: Guest agent is not responding: Guest agent
not available for now
{snip}
May 10 08:01:10 cha3-storage sanlock[1477]: 2021-05-10 08:01:10 490364 [17727]: s4
renewal error -107 delta_length 0 last_success 490310
May 10 08:01:10 cha3-storage kernel: tg3 0000:01:00.0 eno1: Link is up at 1000 Mbps, full
duplex
May 10 08:01:10 cha3-storage kernel: tg3 0000:01:00.0 eno1: Flow control is off for TX
and off for RX
May 10 08:01:10 cha3-storage kernel: tg3 0000:01:00.0 eno1: EEE is disabled
May 10 08:01:10 cha3-storage kernel: Private: port 1(eno1) entered blocking state
May 10 08:01:10 cha3-storage kernel: Private: port 1(eno1) entered forwarding state
May 10 08:01:10 cha3-storage NetworkManager[1812]: <info> [1620648070.5575] device
(eno1): carrier: link connected
{snip}
May 10 08:30:01 cha3-storage kernel: tg3 0000:04:00.0 enp4s0f0: Link is down
May 10 08:30:01 cha3-storage kernel: ovirtmgmt: port 1(enp4s0f0) entered disabled state
May 10 08:30:01 cha3-storage kernel: tg3 0000:01:00.0 eno1: Link is down
May 10 08:30:02 cha3-storage kernel: Private: port 1(eno1) entered disabled state
{snip}
May 10 08:30:48 cha3-storage kernel: tg3 0000:01:00.0 eno1: Link is up at 1000 Mbps, full
duplex
May 10 08:30:48 cha3-storage kernel: tg3 0000:01:00.0 eno1: Flow control is off for TX
and off for RX
May 10 08:30:48 cha3-storage kernel: tg3 0000:01:00.0 eno1: EEE is disabled
May 10 08:30:48 cha3-storage kernel: Private: port 1(eno1) entered blocking state
May 10 08:30:48 cha3-storage kernel: Private: port 1(eno1) entered forwarding state
May 10 08:30:48 cha3-storage NetworkManager[1812]: <info> [1620649848.0309] device
(eno1): carrier: link connected
Sent with ProtonMail Secure Email.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Monday, May 10, 2021 3:01 PM, Strahil Nikolov <hunter86_bg(a)yahoo.com> wrote:
> The symptoms are similar to a loss of quorum (like in a network
outage/disruption).
>
> Check the gluster logs for any indication of the root cause.
> As you have only one gigabit network, consider enabling cluster choose-local option
which will make FUSE client to try to read from local brick instead of a remote one.
>
> Theoretically congestion on storage network could be the root
cause, but this is usually a symptom and not the real problem. Maybe you got too many
backups running in parallel ?
>
> Best Regards,
> Strahil Nikolov
>
> > On Mon, May 10, 2021 at 19:13, David White via Users
> > <users(a)ovirt.org> wrote:
> > _______________________________________________
> > Users mailing list -- users(a)ovirt.org
> > To unsubscribe send an email to users-leave(a)ovirt.org
> > Privacy Statement:
https://www.ovirt.org/privacy-policy.html
> > oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
> > List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/DOI6BEFMTS3...