Dear Dominik,
Thank you for active engagement with this issue and all contribution so far.
“I have another idea: If you assign a new range to the cluster which creates new
vNICs/VMs, the problem should be gone, as long as all MACs in the new range are not yet
used. This way all new generated MAC addresses are created in the new range.“
We actually now (since yesterday) have unique mac address pool per cluster, each cluster,
so this should keep us safe. This was confirmed as last nightly test vms were deployed
without any mac related issues, all good.
“The bright side is that this looks like no vNIC with a duplicated MAC address is
created.”
Indeed, that is the case.
“The "last moment" check is done, to prevent the unintended creation of
duplicate MAC addresses.”
“The logic is that the MAC address is created in the context of the associated mac pool.
In this mac pool, the relevant MAC address is not used.
But I agree, a MAC address should be handled global, not in any context.”
I fully agree, especially makes sense when using same, whether Default or other mac pool,
across all dcs/clusters.
“Can you share a specific example how the issues looked liked in the beginning, as there
was only a single MAC pool with the single range
56:6f:ef:88:00:00 to 56:6f:ef:88:ff:ff ?”
I will go through logs on oVirt engine and vdsm to find exact case, but first indicator to
us was ISC DHCP reporting that it cannot assign an IP to a MAC as there is already
assigned IP for that same MAC, by DHCP, which was actually belonging to another VM. We
could not pin point those VMs in the oVirt as it was always happening to VMs spawned by
Jenkins, during nightly runs, so we initially assumes a race condition issue, as tests are
creating lot of VMs (~70), via Jenkins oVirt plugin and then via Ansible controller in
short amount of time.
We currently have two big production and one decent staging environment, so if we can
contribute by testing this further, let me know.
-----
kind regards/met vriendelijke groeten
Marko Vrgotic
Sr. System Engineer @ System Administration
ActiveVideo
o: +31 (35) 6774131
e: m.vrgotic@activevideo.com<mailto:m.vrgotic@activevideo.com>
w:
www.activevideo.com<http://www.activevideo.com>
ActiveVideo Networks BV. Mediacentrum 3745 Joop van den Endeplein 1.1217 WJ Hilversum, The
Netherlands. The information contained in this message may be legally privileged and
confidential. It is intended to be read only by the individual or entity to whom it is
addressed or by their designee. If the reader of this message is not the intended
recipient, you are on notice that any distribution of this message, in any form, is
strictly prohibited. If you have received this message in error, please immediately
notify the sender and/or ActiveVideo Networks, LLC by telephone at +1 408.931.9200 and
delete or destroy any copy of this message.
From: Dominik Holler <dholler(a)redhat.com>
Date: Wednesday, 5 February 2020 at 17:54
To: "Vrgotic, Marko" <M.Vrgotic(a)activevideo.com>
Cc: Yedidyah Bar David <didi(a)redhat.com>, "users(a)ovirt.org"
<users(a)ovirt.org>, Darko Stojchev <D.Stojchev(a)activevideo.com>
Subject: Re: [ovirt-users] oVirt MAC Pool question
On Wed, Feb 5, 2020 at 1:33 PM Vrgotic, Marko
<M.Vrgotic@activevideo.com<mailto:M.Vrgotic@activevideo.com>> wrote:
Hi Dominik,
Unfortunately, we have to make a change on a main cluster as well. Collisions keep
happening.
I have another idea: If you assign a new range to the cluster which creates new vNICs/VMs,
the problem should be gone, as long as all MACs in the new range are not yet used. This
way all new generated MAC addresses are created in the new range.
[A screenshot of text Description automatically generated]
The bright side is that this looks like no vNIC with a duplicated MAC address is
created.
I will keep a closer eye to it next few days and try to collect more information.
Honestly, I do not understand why is vNIC mac address checked in last moment,
The "last moment" check is done, to prevent the unintended creation of duplicate
MAC addresses.
instead of being checked at very beginning of the creation,
The logic is that the MAC address is created in the context of the associated mac pool. In
this mac pool, the relevant MAC address is not used.
But I agree, a MAC address should be handled global, not in any context.
especially if same MAC pool is used by default, across all clusters / datacenters.
Can you share a specific example how the issues looked liked in the beginning, as there
was only a single MAC pool with the single range
56:6f:ef:88:00:00 to 56:6f:ef:88:ff:ff ?
Assuming we are affected by before mentioned bug, and seems we are, will this bug be
solved in one of the later 4.3.4 releases?
We are currently running 4.3.4.3 and planning to upgrade to 4.3.4.7 soon, but I did not
see this bug solved in any of releases.
I am optimistic that we can find a workaround for your scenario, before the bug is
fixed.
If there is anything else I can do to collect more info or to better monitor this specific
situation, please let me know.
Kindly awaiting your reply.
-----
kind regards/met vriendelijke groeten
Marko Vrgotic
Sr. System Engineer @ System Administration
ActiveVideo
o: +31 (35) 6774131
e: m.vrgotic@activevideo.com<mailto:m.vrgotic@activevideo.com>
w:
www.activevideo.com<http://www.activevideo.com>
ActiveVideo Networks BV. Mediacentrum 3745 Joop van den Endeplein 1.1217 WJ Hilversum, The
Netherlands. The information contained in this message may be legally privileged and
confidential. It is intended to be read only by the individual or entity to whom it is
addressed or by their designee. If the reader of this message is not the intended
recipient, you are on notice that any distribution of this message, in any form, is
strictly prohibited. If you have received this message in error, please immediately
notify the sender and/or ActiveVideo Networks, LLC by telephone at +1 408.931.9200 and
delete or destroy any copy of this message.
From: Dominik Holler <dholler@redhat.com<mailto:dholler@redhat.com>>
Date: Tuesday, 4 February 2020 at 12:47
To: "Vrgotic, Marko"
<M.Vrgotic@activevideo.com<mailto:M.Vrgotic@activevideo.com>>
Cc: Yedidyah Bar David <didi@redhat.com<mailto:didi@redhat.com>>,
"users@ovirt.org<mailto:users@ovirt.org>"
<users@ovirt.org<mailto:users@ovirt.org>>, Darko Stojchev
<D.Stojchev@activevideo.com<mailto:D.Stojchev@activevideo.com>>
Subject: Re: [ovirt-users] oVirt MAC Pool question
On Mon, Feb 3, 2020 at 12:10 PM Vrgotic, Marko
<M.Vrgotic@activevideo.com<mailto:M.Vrgotic@activevideo.com>> wrote:
Hi Dominik,
Thank you – please find the sql query output file attached.
In addition, today, while spawning set of VMs, and we are mostly using Ansible (98% of the
time), we got this message:
An exception occurred during task execution. To see the full traceback, use -vvv. The
error was: ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is
"[MAC Address 56:6f:ef:88:0b:23 is already in use by VM or snapshot:
fred-appcloud.]". HTTP response code is 409.
fatal: [suilen-th-scalernode3.avinity.tv<http://suilen-th-scalernode3.avinity.tv>
-> localhost]: FAILED! => {"changed": false, "msg": "Fault
reason is \"Operation Failed\". Fault detail is \"[MAC Address
56:6f:ef:88:0b:23 is already in use by VM or snapshot: fred-appcloud.]\". HTTP
response code is 409."}
The mac addresses are not defined in ovirt_vm or cloud_init_nics module, we always let
oVirt assign mac address.
Thanks for your input!
I am highly interested if you can confirm, that no virtual NIC with a duplicated MAC
address is created?
The issue you are running into might be
Bug
1760170<https://bugzilla.redhat.com/show_bug.cgi?id=1760170> - If an in-use MAC
is held by a VM on a different cluster, the engine does not attempt to get the next free
MAC.
What happens in this bug is that a new MAC address is generated, which is not yet used
inside the mac pool.
But oVirt runs a final check before creating the vNIC, to ensure that no duplicated MAC is
used across all managed VMs (across all mac pools),
which fails because the MAC is used in a cluster that is associated with another mac
pool.
Would a workaround like adjusting all MAC addresses of all vNICs according to the mac pool
ranges which are associated with the cluster be achievable for you, e.g. by re-creating
the vNICs?
In addition, I have enabled “deny duplicates;” and “one-lease-per-client;” on our isc-dhcp
for subnet, in order to try to prevent this.
As mentioned in previous email, I have already switched last week to having unique pool
per cluster and here are the ranges:
[root@ovirt-engine ~]# /usr/share/ovirt-engine/dbscripts/engine-psql.sh -c "select
from_mac, to_mac from mac_pools, mac_pool_ranges where id=mac_pool_id"
from_mac | to_mac
-------------------+-------------------
56:6f:ef:88:00:00 | 56:6f:ef:88:ff:ff
56:6f:ef:86:00:00 | 56:6f:ef:86:ff:ff
56:6f:ef:82:00:00 | 56:6f:ef:82:ff:ff
56:6f:ef:84:00:00 | 56:6f:ef:84:ff:ff
Kindly awaiting your reply.
If additional information is required, I will be happy to provide.
-----
kind regards/met vriendelijke groeten
Marko Vrgotic
Sr. System Engineer @ System Administration
ActiveVideo
o: +31 (35) 6774131
e: m.vrgotic@activevideo.com<mailto:m.vrgotic@activevideo.com>
w:
www.activevideo.com<http://www.activevideo.com>
ActiveVideo Networks BV. Mediacentrum 3745 Joop van den Endeplein 1.1217 WJ Hilversum, The
Netherlands. The information contained in this message may be legally privileged and
confidential. It is intended to be read only by the individual or entity to whom it is
addressed or by their designee. If the reader of this message is not the intended
recipient, you are on notice that any distribution of this message, in any form, is
strictly prohibited. If you have received this message in error, please immediately
notify the sender and/or ActiveVideo Networks, LLC by telephone at +1 408.931.9200 and
delete or destroy any copy of this message.
From: Dominik Holler <dholler@redhat.com<mailto:dholler@redhat.com>>
Date: Monday, 3 February 2020 at 10:11
To: "Vrgotic, Marko"
<M.Vrgotic@activevideo.com<mailto:M.Vrgotic@activevideo.com>>
Cc: Yedidyah Bar David <didi@redhat.com<mailto:didi@redhat.com>>,
"users@ovirt.org<mailto:users@ovirt.org>"
<users@ovirt.org<mailto:users@ovirt.org>>
Subject: Re: [ovirt-users] oVirt MAC Pool question
On Fri, Jan 31, 2020 at 9:56 AM Vrgotic, Marko
<M.Vrgotic@activevideo.com<mailto:M.Vrgotic@activevideo.com>> wrote:
Dear Yedidyah,
We are actually seeing collisions, which is why I reached out in first place.
Strange is that is did not happen since few weeks ago, and since then I saw it multiple
times.
I am interested in reproducing this issue.
Can you please describe how this situation was created?
Are the related virtual NICs created in oVirt, or are they imported?
Is it still possible to create duplicates, or do you have a backup of the db during this
period?
Can you please share the output of
/usr/share/ovirt-engine/dbscripts/engine-psql.sh -c "select
vm_interface.mac_addr,vm_interface.vm_guid,vm_interface.name<http://vm_interface.name>,vm_static.cluster_id,cluster.mac_pool_id,mac_pools.allow_duplicate_mac_addresses,mac_pool_ranges.from_mac,mac_pool_ranges.to_mac
from (((( vm_interface left join vm_static on vm_interface.vm_guid = vm_static.vm_guid)
left join cluster on vm_static.cluster_id = cluster.cluster_id) left join mac_pools on
cluster.mac_pool_id = mac_pools.id<http://mac_pools.id>) left join mac_pool_ranges
on mac_pools.id<http://mac_pools.id> = mac_pool_ranges.mac_pool_id) order by
vm_interface.mac_addr;"
and point us to the virtual NICs or VMs which contained the duplicated MACs?
For now I am simply going to create new mac pool for each of the clusters and switch to
it, hoping it's not going to affect existing VMs.
This will affect only newly created virtual NICs.
Regarding planning, if I would have known, that same mac pool is created across
datacenters/clusters, I would have taken it into account.
Even the mac pool is shared, the mac addresses should be unique across all datacenters
managed by a single oVirt Engine.
Relying on common sense, I just did not expect this to be the case, but to my fault I
should have applied trust-but-verify approach.
-----
kind regards/met vriendelijke groeten
Marko Vrgotic
Sr. System Engineer
ActiveVideo
e: m.vrgotic@activevideo.com<mailto:m.vrgotic@activevideo.com>
On 26/01/2020, 07:45, "Yedidyah Bar David"
<didi@redhat.com<mailto:didi@redhat.com>> wrote:
On Thu, Jan 23, 2020 at 2:30 PM Vrgotic, Marko
<M.Vrgotic@activevideo.com<mailto:M.Vrgotic@activevideo.com>> wrote:
Hi Yedidyah,
Thank you for you update.
This platform started with 4.3 deployment.
The Default mac address pool, apparently on all Clusters (5) is:
from_mac | to_mac
-------------------+-------------------
56:6f:ef:88:00:00 | 56:6f:ef:88:ff:ff
I think I misled you, or for some reason didn't understand your
original post. The default pool is for "everything". I thought you
refer to different setups - separate engines - and the bug I mentioned
about changing the default was addressed at this scenario.
Inside a single engine, there is only one default.
You should not see collisions *inside* it. Do you? The engine should
know no to allocated the same mac to two different NICs.
Interestingly enough, I am alos not able to add another mac pool to
Default. I can only create new one,
Correct.
let's say MacPool2 and also create only single pool inside.
Option to add second mac range under same name is grayed out, whether I login as SuperUser
or Admin to Aministration Portal.
Indeed. You can only change it, not add a new one with the same name.
Never mind, it is as so, but I am still not "happiest" with:
> Question2: Would there be an harming effect on existing VMs if the default mac
pool would be changed?
=> I am pretty certain it's harmless, but didn't try that myself.
Reason is that I have 600VMs on 5 cluster setup in production - If I make the change
where currently required and we are wrong, its going to affect almost half of those
existing VMs. I did test the change on the staging, and it did not seem to have any
harmful effect but that one has like 5VMs atm.
I will run some additional tests on staging to see if I can get more comfortable before
making change in production, but if anyone else can contribute boosting the confidence,
please let me know.
Ccing Dominik from the network team.
I am pretty certain that people do change/add pools live, but guess
not often - I guess most people plan ahead and then don't touch.
Groetjes,
--
Didi