[ovirt-users] MacPoolRanges not working as expected
Xavier Naveira
xnaveira at gmail.com
Tue Oct 28 09:48:59 UTC 2014
On 10/27/2014 10:48 AM, Moti Asayag wrote:
>
>
> ----- Original Message -----
>> From: "Xavier Naveira" <xnaveira at gmail.com>
>> To: users at ovirt.org
>> Sent: Monday, October 27, 2014 10:42:47 AM
>> Subject: [ovirt-users] MacPoolRanges not working as expected
>>
>> Hi everyone,
>>
>> First of all I'd like to say that we have been using oVirt successfully
>> for more than a year, creating an automated deploy system with help of
>> foreman and puppet.
>>
>> That being said, we're currently facing the first serious problem and
>> we'd appreciate some help.
>>
>> Everything was working fine until we exhausted the default
>> MacPoolRanges. After looking for a solution to the error message we
>> found this document:
>> http://www.ovirt.org/Engine_config_examples#MacPoolRanges
>>
>> Following the instructions on it we executed the following commands:
>>
>> First we found out what our current pool was:
>>
>> # engine-config -g MacPoolRanges
>> MacPoolRanges: 00:1a:4a:24:26:00-00:1a:4a:24:26:ff version: general
>>
>> So we proceeded to expand it:
>>
>> # engine-config -s "MacPoolRanges=00:1a:4a:24:26:00-00:1a:4a:24:27:ff"
>> # service ovirt-engine restart
>>
>> After this we were able to create new machines but none of them seemed
>> to have network.
>>
>> After some unsuccessful troubleshooting we restored the original pool
>> and instead added a new one:
>>
>> # engine-config -s "MacPoolRanges=00:1a:4a:24:26:00-00:1a:4a:24:26:ff"
>> # service ovirt-engine restart
>>
>> # engine-config -s
>> "MacPoolRanges=00:1a:4a:24:26:00-00:1a:4a:24:26:ff,00:1a:4a:24:27:00-00:1a:4a:24:27:ff"
>>
>> # service ovirt-engine restart
>>
>> After doing this and test to create a new host everything seemed to work
>> fine.
>>
>> The problem is that after the successful creation of some hosts the
>> original problem where the new hosts didn't seem to have network,
>> reappeared.
>>
>> Trying to narrow down the problem what we've find out so far is:
>>
>> This oVirt environment kickstarts hosts via PXE, when trying to PXE boot
>> a new host, the DHCP process fails (timeout).
>>
>> Tracing the network packets, we are able to see that the virtual host
>> sends the dhcp request, the dhcp server receives it and acknowledges it
>> and it sends the dhcp offer back. The dhcp offer reaches the hypervisor
>> to the vnetxx network interface BUT it doesn't go further and it doesn't
>> reach the virtual host. This behavior is consistent through different
>> hypervisors and vlans, including the ones that have been used/created
>> before the problems appeared.
>>
>> The only pattern that we've been able to identify so far is through
>> issuing the command "brctl showmacs <bridge_name>"
>>
>> This command list the mac addresses for the interfaces connected to the
>> bridge. In the cases where everything works fine the output looks like this:
>>
>> port no mac addr is local? ageing timer
>> 2 00:1a:4a:24:27:e0 no 0.01
>> 2 fe:1a:4a:24:27:e0 yes 0.00
>>
>> The virtual host MAC address begins with "00" and it has a corresponding
>> address beginning with "fe" which is assigned to the "vnetxx" interface
>> in the hypervisor.
>>
>> In the cases where the virtual host doesn't get the dhcp answers the
>> output of "brctl showmacs <bridge_name>" is:
>>
>> port no mac addr is local? ageing timer
>> 6 fe:1a:4a:24:27:a0 yes 0.00
>>
>
> It seems that the vm has no vnic connected to the destined bridge. Which
> is weird due to the fact that you've noticed outgoing traffic from the vm
> to the DHCP server.
>
> Could you verify it by dumping the xml used to create the vm ? (could be
> obtained either from /var/log/vdsm/vdsm.log or by "virsh -r dumpxml <domain_id>"
> and the domain id could be obtained by "virsh -r list" or "vdsClient -s 0 list table"
> This will allow us to verify there is an actual interface device configured
> for that vm with the expected mac address and connected to the expect bridge.
>
> What is the nature of fe:1a:4a:24:27:a0 ? where did it come from ?
> Just to make sure - the expectation is for the virtual host to have a single
> interface only, with a mac address as allocated from the mac addresses pool.
>
> ovirt enables the nwfilter vdsm-no-mac-spoofing on libvirt to prevent spoofing
> of the assigned vnic mac address. But that should be confirmed.
>
>> This is, the actual virtual host's MAC address is missing from the bridge.
>>
>> We haven't been able to find a detailed explanation on how the network
>> internals of oVirt should work but hopefully someone in this list can
>> point us to the right resource.
>>
>> Thank you.
>>
>> Xavier.
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Users mailing list
>> Users at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
Hi Moti,
Thank you for your answer.
I was sitting the whole day yesterday with one of your colleagues via
IRC with this.
Let me go straight to the point: It wasn't an oVirt problem :)
It was, as it often happens, a chain of unfortunate circumstances.
First we hit the MacPoolAddress limit, we had, as described, some
problems to expand it but we found the way. Right after that the
APPARENTLY same problem reappeared, and so we focused in the
MacPoolRanges issue, trying to debug what it seemed some kind of network
problem in the virtualization stack.
Analyzing the network traffic we realized that the actual problem wasn't
anywhere near oVirt, the problem was that we weren't receiving DHCP ACK
(the rest of packets were going through just fine) and this was caused
by a default value in the firewall (we use dhcp relays and there was a
limit in the routing table used by that: 256).
So reaching the same limit (256) in two different systems that affected
the same feature (ability of getting network to work and thus pxe boot
hte hosts) was the problem. Increasing both limits solved it.
Thank you so much for the answer and keep the excellent work with ovirt up!
Xavier
More information about the Users
mailing list