Your described setup seems correct.

Please attempt to isolate the issue by trying to pass traffic between the hosts, taking the VM/s out of the equation.
You may also consider connecting the hosts directly to each other, to make sure this is not a switch problem.


I'm testing an Oracle RAC with 2 Oracle Linux VMs inside a 4.0.6 environment.
They run on two different hosts
I would like to configure RAC intracluster communication with jumbo frames.
At VM level network adapter is eth1 (mapped to a vlan 95 at oVirt hosts side)
At oVirt side I configured a vm enabled vlan with mtu=9000
I verified that at hosts side I have

vlan95: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000
        ether 00:1c:c4:ab:be:ba  txqueuelen 1000  (Ethernet)
        RX packets 61706  bytes 3631426 (3.4 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 3  bytes 258 (258.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

And able to do a
ping -M do -s 8972 ip
from each host to the other one
In VMs I configure the same MTU=9000 in ifcfg-eth1

But actually inside VMs it works erratically: the same ping test is ok between the VMs but Oracle checks sometimes work and sometimes give error on communication.
At initial cluster config, the second node fails to start the cluster.
I tried 5-6 times and also tried then to set mtu=8000 inside the VMs, supposing some sort of inner overhead to consider (such as 2 times 28 bytes) but nothing.
As soon as I set MTU=1500 at VM side, the cluster is able to form without any problem.
I can survive without jumbo frames in this particular case, because this is only a test, but the question remains about eventual best practices to put in place if I want to use jumbo frames.

One thing I see is that at VM side I see many drops when interface mtu was 9000, such as

eth1      Link encap:Ethernet  HWaddr 00:1A:4A:17:01:57 
          inet addr:  Bcast:  Mask:
          RX packets:93046 errors:0 dropped:54964 overruns:0 frame:0
          TX packets:26258 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:25726242 (24.5 MiB)  TX bytes:33573207 (32.0 MiB)

at host side I see drops at bond0 level only:

[root@ovmsrv05 ~]# brctl show
bridge name    bridge id        STP enabled    interfaces
;vdsmdummy;        8000.000000000000    no       
vlan100        8000.001cc446ef73    no        bond1.100
vlan65        8000.001cc446ef73    no        bond1.65
vlan95        8000.001cc4abbeba    no        bond0.95

bond0: flags=5187<UP,BROADCAST,RUNNING,MASTER,MULTICAST>  mtu 9000
        ether 00:1c:c4:ab:be:ba  txqueuelen 1000  (Ethernet)
        RX packets 2855175  bytes 3126868334 (2.9 GiB)
        RX errors 0  dropped 11686  overruns 0  frame 0
        TX packets 1012849  bytes 478702140 (456.5 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

bond0.95: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000
        ether 00:1c:c4:ab:be:ba  txqueuelen 1000  (Ethernet)
        RX packets 100272  bytes 27125992 (25.8 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 42355  bytes 40833904 (38.9 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

vlan95: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000
        ether 00:1c:c4:ab:be:ba  txqueuelen 1000  (Ethernet)
        RX packets 62576  bytes 3719175 (3.5 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 3  bytes 258 (258.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

vnet2: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000
        inet6 fe80::fc1a:4aff:fe17:157  prefixlen 64  scopeid 0x20<link>
        ether fe:1a:4a:17:01:57  txqueuelen 1000  (Ethernet)
        RX packets 21014  bytes 24139492 (23.0 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 85777  bytes 21089777 (20.1 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@ovmsrv05 ~]# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: enp3s0
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: enp3s0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:1c:c4:ab:be:ba
Slave queue ID: 0

Slave Interface: enp5s0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:1c:c4:ab:be:bc
Slave queue ID: 0

Any hint?
Thanks in advance,

