On Mon, Aug 9, 2021 at 4:01 PM Nir Soffer <nsoffer(a)redhat.com> wrote:
On Mon, Aug 9, 2021 at 2:42 PM Yedidyah Bar David <didi(a)redhat.com> wrote:
>
> On Mon, Aug 9, 2021 at 1:43 PM Nir Soffer <nsoffer(a)redhat.com> wrote:
> >
> > On Mon, Aug 9, 2021 at 10:35 AM Yedidyah Bar David <didi(a)redhat.com>
wrote:
> > >
> > > On Sun, Aug 8, 2021 at 5:42 PM Code Review <gerrit(a)ovirt.org>
wrote:
> > > >
> > > > From Jenkins CI <jenkins(a)ovirt.org>:
> > > >
> > > > Jenkins CI has posted comments on this change. (
https://gerrit.ovirt.org/c/ovirt-system-tests/+/115392 )
> > > >
> > > > Change subject: HE: Use node image
> > > >
......................................................................
> > > >
> > > >
> > > > Patch Set 13: Continuous-Integration-1
> > > >
> > > > Build Failed
> > >
> > > While trying to deactivate a host, the engine wanted to migrate a VM
> > > (vm0) from host-0 to host-1. vdsm log of host-0 says:
> > >
> > > 2021-08-08 14:31:10,076+0000 ERROR (migsrc/cde311f9) [virt.vm]
> > > (vmId='cde311f9-9a33-4eb9-8338-fa22ff49edc2') Failed to migrate
> > > (migration:503)
> > > Traceback (most recent call last):
> > > File
"/usr/lib/python3.6/site-packages/vdsm/virt/migration.py", line
> > > 477, in _regular_run
> > > time.time(), machineParams
> > > File
"/usr/lib/python3.6/site-packages/vdsm/virt/migration.py", line
> > > 578, in _startUnderlyingMigration
> > > self._perform_with_conv_schedule(duri, muri)
> > > File
"/usr/lib/python3.6/site-packages/vdsm/virt/migration.py", line
> > > 667, in _perform_with_conv_schedule
> > > self._perform_migration(duri, muri)
> > > File
"/usr/lib/python3.6/site-packages/vdsm/virt/migration.py", line
> > > 596, in _perform_migration
> > > self._migration_flags)
> > > File
"/usr/lib/python3.6/site-packages/vdsm/virt/virdomain.py", line
> > > 159, in call
> > > return getattr(self._vm._dom, name)(*a, **kw)
> > > File
"/usr/lib/python3.6/site-packages/vdsm/virt/virdomain.py", line 101, in f
> > > ret = attr(*args, **kwargs)
> > > File
"/usr/lib/python3.6/site-packages/vdsm/common/libvirtconnection.py",
> > > line 131, in wrapper
> > > ret = f(*args, **kwargs)
> > > File
"/usr/lib/python3.6/site-packages/vdsm/common/function.py",
> > > line 94, in wrapper
> > > return func(inst, *args, **kwargs)
> > > File "/usr/lib64/python3.6/site-packages/libvirt.py", line
2126, in
> > > migrateToURI3
> > > raise libvirtError('virDomainMigrateToURI3() failed')
> > > libvirt.libvirtError: Unsafe migration: Migration without shared
> > > storage is unsafe
> >
> > Please share the vm xml:
> >
> > sudo virsh -r dumpxl vm-name
>
> I think you should be able to find a dump of it in vdsm.log:
>
>
https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/186...
>
> I think the first line of starting a migration is:
>
> 2021-08-08 14:31:08,350+0000 DEBUG (jsonrpc/4) [jsonrpc.JsonRpcServer]
> Calling 'VM.migrate' in bridge with {'vmID':
> 'cde311f9-9a33-4eb9-8338-fa22ff49edc2', 'params':
>
> A few lines later:
>
> 2021-08-08 14:31:08,387+0000 DEBUG (migsrc/cde311f9)
> [virt.metadata.Descriptor] dumped metadata for
> cde311f9-9a33-4eb9-8338-fa22ff49edc2: <?xml version='1.0'
> encoding='utf-8'?>
> <vm>
> <balloonTarget type="int">98304</balloonTarget>
This is not the vm xml but the metadata xml.
OK
Looking at the logs on both hosts:
[nsoffer@sparse ost]$ head -1 *vdsm.log
==> host0-vdsm.log <==
2021-08-08 13:16:04,676+0000 INFO (MainThread) [vds] (PID: 65169) I
am the actual vdsm 4.40.80.3.12.git6d67b935b
ost-he-basic-suite-master-host-0 (4.18.0-326.el8.x86_64) (vdsmd:162)
==> host1-vdsm.log <==
2021-08-08 15:40:54,367+0200 INFO (MainThread) [vds] (PID: 23005) I
am the actual vdsm 4.40.80.4.5.git4309a3949
ost-he-basic-suite-master-host-1 (4.18.0-326.el8.x86_64) (vdsmd:162)
- The hosts clocks are using different time zones (+0000 vs +0200) is
this intended?
- You are testing different versions of vdsm - is this intended?
Both of these are a result of the patch this was ran for - which makes
host-0 use ovirt-node, for the he-basic suite.
We have about 60 errors:
$ grep 'Migration without shared storage is unsafe' host0-vdsm.log | wc -l
60
Looking at the first migration that failed, the vm xml is here:
2021-08-08 14:20:34,127+0000 INFO (vm/cde311f9) [virt.vm]
(vmId='cde311f9-9a33-4eb9-8338-fa22ff49edc2') <?xml version='1.0' e
ncoding='utf-8'?>
<domain
xmlns:ns0="http://libvirt.org/schemas/domain/qemu/1.0"
xmlns:ovirt-vm="http://ovirt.org/vm/1.0" type="kvm">
<name>vm0</name>
<uuid>cde311f9-9a33-4eb9-8338-fa22ff49edc2</uuid>
...
The relevant parts for storage are:
<disk device="disk" snapshot="no"
type="file">
<driver cache="none" error_policy="stop"
io="threads"
iothread="1" name="qemu" type="qcow2" />
<source
file="/rhev/data-center/7d97ea80-f849-11eb-ac79-5452d501341a/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87">
<seclabel model="dac" relabel="no" />
</source>
<target bus="virtio" dev="vda" />
<serial>20002ad2-4a97-4d2f-b3fc-c103477b5b91</serial>
<boot order="1" />
<alias name="ua-20002ad2-4a97-4d2f-b3fc-c103477b5b91" />
<address bus="0x05" domain="0x0000"
function="0x0"
slot="0x00" type="pci" />
</disk>
<disk device="disk" snapshot="no"
type="block">
<driver cache="none" error_policy="stop"
io="native"
name="qemu" type="raw" />
<source dev="/dev/mapper/36001405bc9d94e4419b4b80a2f702e2f">
<seclabel model="dac" relabel="no" />
</source>
<target bus="scsi" dev="sda" />
<serial>738c8486-8929-44ec-9083-69327bde9c65</serial>
<alias name="ua-738c8486-8929-44ec-9083-69327bde9c65" />
<address bus="0" controller="0" target="0"
type="drive" unit="0" />
</disk>
So we have one qcow2 disk on file storage, and one direct lun.
On the destination, the first migration attempt is here:
2021-08-08 16:31:08,437+0200 DEBUG (jsonrpc/2) [jsonrpc.JsonRpcServer]
Calling 'VM.migrationCreate' in bridge with {'vmID': 'c
de311f9-9a33-4eb9-8338-fa22ff49edc2', 'params': {'_srcDomXML':
'<domain type=\'kvm\' id=\'6\'
xmlns:qemu=\'http://libvirt.org/
...
We prepare the qcow2 disk:
2021-08-08 16:31:09,313+0200 INFO (vm/cde311f9) [vdsm.api] FINISH
prepareImage return={'path':
'/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87',
'info': {'type': 'file', 'path':
'/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87'},
'imgVolumesInfo': [{'domainID':
'46fa5761-bb9e-46be-8f1c-35f4b03d0203', 'imageID':
'20002ad2-4a97-4d2f-b3fc-c103477b5b91', 'volumeID':
'1d3f07dc-b481-492f-a2a6-7c46689d82ba', 'path':
'/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/1d3f07dc-b481-492f-a2a6-7c46689d82ba',
'leasePath':
'/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/1d3f07dc-b481-492f-a2a6-7c46689d82ba.lease',
'leaseOffset': 0}, {'domainID':
'46fa5761-bb9e-46be-8f1c-35f4b03d0203', 'imageID':
'20002ad2-4a97-4d2f-b3fc-c103477b5b91', 'volumeID':
'614abd56-4d4f-4412-aa2a-3f7bad2f3a87', 'path':
'/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87',
'leasePath':
'/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87.lease',
'leaseOffset': 0}, {'domainID':
'46fa5761-bb9e-46be-8f1c-35f4b03d0203', 'imageID':
'20002ad2-4a97-4d2f-b3fc-c103477b5b91', 'volumeID':
'a4309ef3-01bb-45db-8bf7-0f9498a7feeb', 'path':
'/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/a4309ef3-01bb-45db-8bf7-0f9498a7feeb',
'leasePath':
'/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/a4309ef3-01bb-45db-8bf7-0f9498a7feeb.lease',
'leaseOffset': 0}]} from=internal,
task_id=f2dfbacb-154c-4f5b-b57d-affcbf419691 (api:54)
2021-08-08 16:31:09,314+0200 INFO (vm/cde311f9) [vds] prepared volume
path:
/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87
(clientIF:518)
And the direct lun:
2021-08-08 16:31:09,315+0200 INFO (vm/cde311f9) [vdsm.api] START
appropriateDevice(guid='36001405bc9d94e4419b4b80a2f702e2f',
thiefId='cde311f9-9a33-4eb9-8338-fa22ff49edc2', deviceType='mpath')
from=internal, task_id=220f1c1f-caec-4327-a157-7a4fab3b54a5 (api:48)
2021-08-08 16:31:09,550+0200 INFO (vm/cde311f9) [vdsm.api] FINISH
appropriateDevice return={'truesize': '21474836480',
'apparentsize':
'21474836480', 'path':
'/dev/mapper/36001405bc9d94e4419b4b80a2f702e2f'} from=internal,
task_id=220f1c1f-caec-4327-a157-7a4fab3b54a5 (api:54)
2021-08-08 16:31:09,550+0200 INFO (vm/cde311f9) [vds] prepared volume
path: /dev/mapper/36001405bc9d94e4419b4b80a2f702e2f (clientIF:518)
The interesting thing is that the qcow2 disk is using different path
on the source
and destination vms:
source:
/rhev/data-center/7d97ea80-f849-11eb-ac79-5452d501341a/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87
destination:
/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87
On the source we have:
/rhev/data-center/pool-id/domain-id/images/image-id/volume-id
On the destination we have:
/rhev/data-center/mnt/mountpoint/domain-id/images/image-id/volume-id
Both lead to the same disk, but libvirt probably compared the strings
and decided that
we don't have shared storage.
It may be new validation in libvirt, or maybe engine changed the way the disk
path is added to the xml recently?
And/or maybe this is related to one of the hosts being ovirt-node?
Some time ago Michal pushed this, to do the same on other suites:
https://gerrit.ovirt.org/c/ovirt-system-tests/+/115913
and it did pass CI. Artifacts are gone by now, but it does test also migrations.
Now rebased it, let's see how it goes.
In vdsm we use os.path.realpath() when we compare disk paths since we know
that we can have different paths to the same volume.
This kind of issue is likely to be reproducible without ost.
Nir
Thanks,
--
Didi