Dear Dominik,
Thank you for active engagement with this issue and all contribution so far.
“I have another idea: If you assign a new range to the cluster which creates new vNICs/VMs, the problem should be gone, as long as all MACs in the new range are not yet used.
This way all new generated MAC addresses are created in the new range.“
We actually now (since yesterday) have unique mac address pool per cluster, each cluster, so this should keep us safe. This was confirmed as last nightly test
vms were deployed without any mac related issues, all good.
“The bright side is that this looks like no vNIC with a duplicated MAC address is created.”
Indeed, that is the case.
“The "last moment" check is done, to prevent the unintended creation of duplicate MAC addresses.”
“The logic is that the MAC address is created in the context of the associated mac pool. In this mac pool, the relevant MAC address is not used.
But I agree, a MAC address should be handled global, not in any context.”
I fully agree, especially makes sense when using same, whether Default or other mac pool, across all dcs/clusters.
“Can you share a specific example how the issues looked liked in the beginning, as there was only a single MAC pool with the single range
56:6f:ef:88:00:00 to 56:6f:ef:88:ff:ff ?”
I will go through logs on oVirt engine and vdsm to find exact case, but first indicator to us was ISC DHCP reporting that it cannot assign
an IP to a MAC as there is already assigned IP for that same MAC, by DHCP, which was actually belonging to another VM. We could not pin point those VMs in the oVirt as it was always happening to VMs spawned by Jenkins, during nightly runs, so we initially
assumes a race condition issue, as tests are creating lot of VMs (~70), via Jenkins oVirt plugin and then via Ansible controller in short amount of time.
We currently have two big production and one decent staging environment, so if we can contribute by testing this further, let me know.
-----
kind regards/met vriendelijke groeten
Marko Vrgotic
Sr. System Engineer @ System Administration
ActiveVideo
o: +31 (35) 6774131
e: m.vrgotic@activevideo.com
w: www.activevideo.com
ActiveVideo Networks BV. Mediacentrum 3745 Joop van den Endeplein 1.1217 WJ Hilversum, The Netherlands. The
information contained in this message may be legally privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are
on notice that any distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or ActiveVideo Networks, LLC by telephone at +1 408.931.9200 and delete or destroy any
copy of this message.
From: Dominik Holler <dholler@redhat.com>
Date: Wednesday, 5 February 2020 at 17:54
To: "Vrgotic, Marko" <M.Vrgotic@activevideo.com>
Cc: Yedidyah Bar David <didi@redhat.com>, "users@ovirt.org" <users@ovirt.org>, Darko Stojchev <D.Stojchev@activevideo.com>
Subject: Re: [ovirt-users] oVirt MAC Pool question
On Wed, Feb 5, 2020 at 1:33 PM Vrgotic, Marko <M.Vrgotic@activevideo.com> wrote:
Hi Dominik,
Unfortunately, we have to make a change on a main cluster as well. Collisions keep happening.
I have another idea: If you assign a new range to the cluster which creates new vNICs/VMs, the problem should be gone, as long as all MACs in the new range are not yet used. This way all new generated MAC addresses are created in the new
range.
The bright side is that this looks like no vNIC with a duplicated MAC address is created.
I will keep a closer eye to it next few days and try to collect more information.
Honestly, I do not understand why is vNIC mac address checked in last moment,
The "last moment" check is done, to prevent the unintended creation of duplicate MAC addresses.
instead of being checked at very beginning of the creation,
The logic is that the MAC address is created in the context of the associated mac pool. In this mac pool, the relevant MAC address is not used.
But I agree, a MAC address should be handled global, not in any context.
especially if same MAC pool is used by default, across all clusters / datacenters.
Can you share a specific example how the issues looked liked in the beginning, as there was only a single MAC pool with the single range
56:6f:ef:88:00:00 to 56:6f:ef:88:ff:ff ?
Assuming we are affected by before mentioned bug, and seems we are, will this bug be solved in one of the later 4.3.4 releases?
We are currently running 4.3.4.3 and planning to upgrade to 4.3.4.7 soon, but I did not see this bug solved in any of releases.
I am optimistic that we can find a workaround for your scenario, before the bug is fixed.
If there is anything else I can do to collect more info or to better monitor this specific situation, please let me know.
Kindly awaiting your reply.
-----
kind regards/met vriendelijke groeten
Marko Vrgotic
Sr. System Engineer @ System Administration
ActiveVideoo: +31 (35) 6774131
e: m.vrgotic@activevideo.com
w: www.activevideo.com
ActiveVideo Networks BV. Mediacentrum 3745 Joop van den Endeplein 1.1217 WJ Hilversum, The Netherlands. The information contained in this message may be legally privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or ActiveVideo Networks, LLC by telephone at +1 408.931.9200 and delete or destroy any copy of this message.
From: Dominik Holler <dholler@redhat.com>
Date: Tuesday, 4 February 2020 at 12:47
To: "Vrgotic, Marko" <M.Vrgotic@activevideo.com>
Cc: Yedidyah Bar David <didi@redhat.com>, "users@ovirt.org" <users@ovirt.org>, Darko Stojchev <D.Stojchev@activevideo.com>
Subject: Re: [ovirt-users] oVirt MAC Pool question
On Mon, Feb 3, 2020 at 12:10 PM Vrgotic, Marko <M.Vrgotic@activevideo.com> wrote:
Hi Dominik,
Thank you – please find the sql query output file attached.
In addition, today, while spawning set of VMs, and we are mostly using Ansible (98% of the time), we got this message:
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is "[MAC Address 56:6f:ef:88:0b:23 is already in use by VM or snapshot: fred-appcloud.]". HTTP response code is 409.
fatal: [suilen-th-scalernode3.avinity.tv -> localhost]: FAILED! => {"changed": false, "msg": "Fault reason is \"Operation Failed\". Fault detail is \"[MAC Address 56:6f:ef:88:0b:23 is already in use by VM or snapshot: fred-appcloud.]\". HTTP response code is 409."}
The mac addresses are not defined in ovirt_vm or cloud_init_nics module, we always let oVirt assign mac address.
Thanks for your input!
I am highly interested if you can confirm, that no virtual NIC with a duplicated MAC address is created?
The issue you are running into might be
Bug 1760170 - If an in-use MAC is held by a VM on a different cluster, the engine does not attempt to get the next free MAC.
What happens in this bug is that a new MAC address is generated, which is not yet used inside the mac pool.
But oVirt runs a final check before creating the vNIC, to ensure that no duplicated MAC is used across all managed VMs (across all mac pools),
which fails because the MAC is used in a cluster that is associated with another mac pool.
Would a workaround like adjusting all MAC addresses of all vNICs according to the mac pool ranges which are associated with the cluster be achievable for you, e.g. by re-creating the vNICs?
In addition, I have enabled “deny duplicates;” and “one-lease-per-client;” on our isc-dhcp for subnet, in order to try to prevent this.
As mentioned in previous email, I have already switched last week to having unique pool per cluster and here are the ranges:
[root@ovirt-engine ~]# /usr/share/ovirt-engine/dbscripts/engine-psql.sh -c "select from_mac, to_mac from mac_pools, mac_pool_ranges where id=mac_pool_id"
from_mac | to_mac
-------------------+-------------------
56:6f:ef:88:00:00 | 56:6f:ef:88:ff:ff
56:6f:ef:86:00:00 | 56:6f:ef:86:ff:ff
56:6f:ef:82:00:00 | 56:6f:ef:82:ff:ff
56:6f:ef:84:00:00 | 56:6f:ef:84:ff:ff
Kindly awaiting your reply.
If additional information is required, I will be happy to provide.
-----
kind regards/met vriendelijke groeten
Marko Vrgotic
Sr. System Engineer @ System Administration
ActiveVideoo: +31 (35) 6774131
e: m.vrgotic@activevideo.com
w: www.activevideo.com
ActiveVideo Networks BV. Mediacentrum 3745 Joop van den Endeplein 1.1217 WJ Hilversum, The Netherlands. The information contained in this message may be legally privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or ActiveVideo Networks, LLC by telephone at +1 408.931.9200 and delete or destroy any copy of this message.
From: Dominik Holler <dholler@redhat.com>
Date: Monday, 3 February 2020 at 10:11
To: "Vrgotic, Marko" <M.Vrgotic@activevideo.com>
Cc: Yedidyah Bar David <didi@redhat.com>, "users@ovirt.org" <users@ovirt.org>
Subject: Re: [ovirt-users] oVirt MAC Pool question
On Fri, Jan 31, 2020 at 9:56 AM Vrgotic, Marko <M.Vrgotic@activevideo.com> wrote:
Dear Yedidyah,
We are actually seeing collisions, which is why I reached out in first place.
Strange is that is did not happen since few weeks ago, and since then I saw it multiple times.
I am interested in reproducing this issue.
Can you please describe how this situation was created?
Are the related virtual NICs created in oVirt, or are they imported?
Is it still possible to create duplicates, or do you have a backup of the db during this period?
Can you please share the output of
/usr/share/ovirt-engine/dbscripts/engine-psql.sh -c "select vm_interface.mac_addr,vm_interface.vm_guid,vm_interface.name,vm_static.cluster_id,cluster.mac_pool_id,mac_pools.allow_duplicate_mac_addresses,mac_pool_ranges.from_mac,mac_pool_ranges.to_mac from (((( vm_interface left join vm_static on vm_interface.vm_guid = vm_static.vm_guid) left join cluster on vm_static.cluster_id = cluster.cluster_id) left join mac_pools on cluster.mac_pool_id = mac_pools.id) left join mac_pool_ranges on mac_pools.id = mac_pool_ranges.mac_pool_id) order by vm_interface.mac_addr;"
and point us to the virtual NICs or VMs which contained the duplicated MACs?
For now I am simply going to create new mac pool for each of the clusters and switch to it, hoping it's not going to affect existing VMs.
This will affect only newly created virtual NICs.
Regarding planning, if I would have known, that same mac pool is created across datacenters/clusters, I would have taken it into account.
Even the mac pool is shared, the mac addresses should be unique across all datacenters managed by a single oVirt Engine.
Relying on common sense, I just did not expect this to be the case, but to my fault I should have applied trust-but-verify approach.
-----
kind regards/met vriendelijke groeten
Marko Vrgotic
Sr. System Engineer
ActiveVideo
e: m.vrgotic@activevideo.com
On 26/01/2020, 07:45, "Yedidyah Bar David" <didi@redhat.com> wrote:
On Thu, Jan 23, 2020 at 2:30 PM Vrgotic, Marko
<M.Vrgotic@activevideo.com> wrote:
>
> Hi Yedidyah,
>
> Thank you for you update.
>
> This platform started with 4.3 deployment.
> The Default mac address pool, apparently on all Clusters (5) is:
> from_mac | to_mac
> -------------------+-------------------
> 56:6f:ef:88:00:00 | 56:6f:ef:88:ff:ff
>
I think I misled you, or for some reason didn't understand your
original post. The default pool is for "everything". I thought you
refer to different setups - separate engines - and the bug I mentioned
about changing the default was addressed at this scenario.
Inside a single engine, there is only one default.
You should not see collisions *inside* it. Do you? The engine should
know no to allocated the same mac to two different NICs.
> Interestingly enough, I am alos not able to add another mac pool to Default. I can only create new one,
Correct.
> let's say MacPool2 and also create only single pool inside. Option to add second mac range under same name is grayed out, whether I login as SuperUser or Admin to Aministration Portal.
Indeed. You can only change it, not add a new one with the same name.
>
> Never mind, it is as so, but I am still not "happiest" with:
> > Question2: Would there be an harming effect on existing VMs if the default mac pool would be changed?
> => I am pretty certain it's harmless, but didn't try that myself.
> Reason is that I have 600VMs on 5 cluster setup in production - If I make the change where currently required and we are wrong, its going to affect almost half of those existing VMs. I did test the change on the staging, and it did not seem to have any harmful effect but that one has like 5VMs atm.
>
> I will run some additional tests on staging to see if I can get more comfortable before making change in production, but if anyone else can contribute boosting the confidence, please let me know.
Ccing Dominik from the network team.
I am pretty certain that people do change/add pools live, but guess
not often - I guess most people plan ahead and then don't touch.
Groetjes,
--
Didi