On Mon, Aug 9, 2021 at 4:01 PM Nir Soffer <nsoffer(a)redhat.com> wrote:
On Mon, Aug 9, 2021 at 2:42 PM Yedidyah Bar David <didi(a)redhat.com> wrote:
>
> On Mon, Aug 9, 2021 at 1:43 PM Nir Soffer <nsoffer(a)redhat.com> wrote:
> >
> > On Mon, Aug 9, 2021 at 10:35 AM Yedidyah Bar David <didi(a)redhat.com>
wrote:
> > >
> > > On Sun, Aug 8, 2021 at 5:42 PM Code Review <gerrit(a)ovirt.org>
wrote:
> > > >
> > > > From Jenkins CI <jenkins(a)ovirt.org>:
> > > >
> > > > Jenkins CI has posted comments on this change. (
https://gerrit.ovirt.org/c/ovirt-system-tests/+/115392 )
> > > >
> > > > Change subject: HE: Use node image
> > > >
......................................................................
> > > >
> > > >
> > > > Patch Set 13: Continuous-Integration-1
> > > >
> > > > Build Failed
> > >
> > > While trying to deactivate a host, the engine wanted to migrate a VM
> > > (vm0) from host-0 to host-1. vdsm log of host-0 says:
> > >
> > > 2021-08-08 14:31:10,076+0000 ERROR (migsrc/cde311f9) [virt.vm]
> > > (vmId='cde311f9-9a33-4eb9-8338-fa22ff49edc2') Failed to migrate
> > > (migration:503)
> > > Traceback (most recent call last):
> > > File
"/usr/lib/python3.6/site-packages/vdsm/virt/migration.py", line
> > > 477, in _regular_run
> > > time.time(), machineParams
> > > File
"/usr/lib/python3.6/site-packages/vdsm/virt/migration.py", line
> > > 578, in _startUnderlyingMigration
> > > self._perform_with_conv_schedule(duri, muri)
> > > File
"/usr/lib/python3.6/site-packages/vdsm/virt/migration.py", line
> > > 667, in _perform_with_conv_schedule
> > > self._perform_migration(duri, muri)
> > > File
"/usr/lib/python3.6/site-packages/vdsm/virt/migration.py", line
> > > 596, in _perform_migration
> > > self._migration_flags)
> > > File
"/usr/lib/python3.6/site-packages/vdsm/virt/virdomain.py", line
> > > 159, in call
> > > return getattr(self._vm._dom, name)(*a, **kw)
> > > File
"/usr/lib/python3.6/site-packages/vdsm/virt/virdomain.py", line 101, in f
> > > ret = attr(*args, **kwargs)
> > > File
"/usr/lib/python3.6/site-packages/vdsm/common/libvirtconnection.py",
> > > line 131, in wrapper
> > > ret = f(*args, **kwargs)
> > > File
"/usr/lib/python3.6/site-packages/vdsm/common/function.py",
> > > line 94, in wrapper
> > > return func(inst, *args, **kwargs)
> > > File "/usr/lib64/python3.6/site-packages/libvirt.py", line
2126, in
> > > migrateToURI3
> > > raise libvirtError('virDomainMigrateToURI3() failed')
> > > libvirt.libvirtError: Unsafe migration: Migration without shared
> > > storage is unsafe
> >
> > Please share the vm xml:
> >
> > sudo virsh -r dumpxl vm-name
>
> I think you should be able to find a dump of it in vdsm.log:
>
>
https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/186...
>
> I think the first line of starting a migration is:
>
> 2021-08-08 14:31:08,350+0000 DEBUG (jsonrpc/4) [jsonrpc.JsonRpcServer]
> Calling 'VM.migrate' in bridge with {'vmID':
> 'cde311f9-9a33-4eb9-8338-fa22ff49edc2', 'params':
>
> A few lines later:
>
> 2021-08-08 14:31:08,387+0000 DEBUG (migsrc/cde311f9)
> [virt.metadata.Descriptor] dumped metadata for
> cde311f9-9a33-4eb9-8338-fa22ff49edc2: <?xml version='1.0'
> encoding='utf-8'?>
> <vm>
> <balloonTarget type="int">98304</balloonTarget>
This is not the vm xml but the metadata xml.
Looking at the logs on both hosts:
[nsoffer@sparse ost]$ head -1 *vdsm.log
==> host0-vdsm.log <==
2021-08-08 13:16:04,676+0000 INFO (MainThread) [vds] (PID: 65169) I
am the actual vdsm 4.40.80.3.12.git6d67b935b
ost-he-basic-suite-master-host-0 (4.18.0-326.el8.x86_64) (vdsmd:162)
==> host1-vdsm.log <==
2021-08-08 15:40:54,367+0200 INFO (MainThread) [vds] (PID: 23005) I
am the actual vdsm 4.40.80.4.5.git4309a3949
ost-he-basic-suite-master-host-1 (4.18.0-326.el8.x86_64) (vdsmd:162)
- The hosts clocks are using different time zones (+0000 vs +0200) is
this intended?
- You are testing different versions of vdsm - is this intended?
We have about 60 errors:
$ grep 'Migration without shared storage is unsafe' host0-vdsm.log | wc -l
60
Looking at the first migration that failed, the vm xml is here:
2021-08-08 14:20:34,127+0000 INFO (vm/cde311f9) [virt.vm]
(vmId='cde311f9-9a33-4eb9-8338-fa22ff49edc2') <?xml version='1.0' e
ncoding='utf-8'?>
<domain
xmlns:ns0="http://libvirt.org/schemas/domain/qemu/1.0"
xmlns:ovirt-vm="http://ovirt.org/vm/1.0" type="kvm">
<name>vm0</name>
<uuid>cde311f9-9a33-4eb9-8338-fa22ff49edc2</uuid>
...
The relevant parts for storage are:
<disk device="disk" snapshot="no"
type="file">
<driver cache="none" error_policy="stop"
io="threads"
iothread="1" name="qemu" type="qcow2" />
<source
file="/rhev/data-center/7d97ea80-f849-11eb-ac79-5452d501341a/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87">
<seclabel model="dac" relabel="no" />
</source>
<target bus="virtio" dev="vda" />
<serial>20002ad2-4a97-4d2f-b3fc-c103477b5b91</serial>
<boot order="1" />
<alias name="ua-20002ad2-4a97-4d2f-b3fc-c103477b5b91" />
<address bus="0x05" domain="0x0000"
function="0x0"
slot="0x00" type="pci" />
</disk>
<disk device="disk" snapshot="no"
type="block">
<driver cache="none" error_policy="stop"
io="native"
name="qemu" type="raw" />
<source dev="/dev/mapper/36001405bc9d94e4419b4b80a2f702e2f">
<seclabel model="dac" relabel="no" />
</source>
<target bus="scsi" dev="sda" />
<serial>738c8486-8929-44ec-9083-69327bde9c65</serial>
<alias name="ua-738c8486-8929-44ec-9083-69327bde9c65" />
<address bus="0" controller="0" target="0"
type="drive" unit="0" />
</disk>
So we have one qcow2 disk on file storage, and one direct lun.
On the destination, the first migration attempt is here:
2021-08-08 16:31:08,437+0200 DEBUG (jsonrpc/2) [jsonrpc.JsonRpcServer]
Calling 'VM.migrationCreate' in bridge with {'vmID': 'c
de311f9-9a33-4eb9-8338-fa22ff49edc2', 'params': {'_srcDomXML':
'<domain type=\'kvm\' id=\'6\'
xmlns:qemu=\'http://libvirt.org/
...
We prepare the qcow2 disk:
2021-08-08 16:31:09,313+0200 INFO (vm/cde311f9) [vdsm.api] FINISH
prepareImage return={'path':
'/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87',
'info': {'type': 'file', 'path':
'/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87'},
'imgVolumesInfo': [{'domainID':
'46fa5761-bb9e-46be-8f1c-35f4b03d0203', 'imageID':
'20002ad2-4a97-4d2f-b3fc-c103477b5b91', 'volumeID':
'1d3f07dc-b481-492f-a2a6-7c46689d82ba', 'path':
'/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/1d3f07dc-b481-492f-a2a6-7c46689d82ba',
'leasePath':
'/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/1d3f07dc-b481-492f-a2a6-7c46689d82ba.lease',
'leaseOffset': 0}, {'domainID':
'46fa5761-bb9e-46be-8f1c-35f4b03d0203', 'imageID':
'20002ad2-4a97-4d2f-b3fc-c103477b5b91', 'volumeID':
'614abd56-4d4f-4412-aa2a-3f7bad2f3a87', 'path':
'/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87',
'leasePath':
'/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87.lease',
'leaseOffset': 0}, {'domainID':
'46fa5761-bb9e-46be-8f1c-35f4b03d0203', 'imageID':
'20002ad2-4a97-4d2f-b3fc-c103477b5b91', 'volumeID':
'a4309ef3-01bb-45db-8bf7-0f9498a7feeb', 'path':
'/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/a4309ef3-01bb-45db-8bf7-0f9498a7feeb',
'leasePath':
'/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/a4309ef3-01bb-45db-8bf7-0f9498a7feeb.lease',
'leaseOffset': 0}]} from=internal,
task_id=f2dfbacb-154c-4f5b-b57d-affcbf419691 (api:54)
2021-08-08 16:31:09,314+0200 INFO (vm/cde311f9) [vds] prepared volume
path:
/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87
(clientIF:518)
And the direct lun:
2021-08-08 16:31:09,315+0200 INFO (vm/cde311f9) [vdsm.api] START
appropriateDevice(guid='36001405bc9d94e4419b4b80a2f702e2f',
thiefId='cde311f9-9a33-4eb9-8338-fa22ff49edc2', deviceType='mpath')
from=internal, task_id=220f1c1f-caec-4327-a157-7a4fab3b54a5 (api:48)
2021-08-08 16:31:09,550+0200 INFO (vm/cde311f9) [vdsm.api] FINISH
appropriateDevice return={'truesize': '21474836480',
'apparentsize':
'21474836480', 'path':
'/dev/mapper/36001405bc9d94e4419b4b80a2f702e2f'} from=internal,
task_id=220f1c1f-caec-4327-a157-7a4fab3b54a5 (api:54)
2021-08-08 16:31:09,550+0200 INFO (vm/cde311f9) [vds] prepared volume
path: /dev/mapper/36001405bc9d94e4419b4b80a2f702e2f (clientIF:518)
The interesting thing is that the qcow2 disk is using different path
on the source
and destination vms:
source:
/rhev/data-center/7d97ea80-f849-11eb-ac79-5452d501341a/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87
destination:
/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87
On the source we have:
/rhev/data-center/pool-id/domain-id/images/image-id/volume-id
On the destination we have:
/rhev/data-center/mnt/mountpoint/domain-id/images/image-id/volume-id
Both lead to the same disk, but libvirt probably compared the strings
and decided that
we don't have shared storage.
It may be new validation in libvirt, or maybe engine changed the way the disk
path is added to the xml recently?
In vdsm we use os.path.realpath() when we compare disk paths since we know
that we can have different paths to the same volume.
This kind of issue is likely to be reproducible without ost.
Trying to understand why the drive path is different on source:
The source path was created by a snapshot:
2021-08-08 14:19:40,219+0000 INFO (tasks/9) [storage.Volume] Request
to create snapshot 20002ad2-4a97-4d2f-b3fc-c103477b5b91/
614abd56-4d4f-4412-aa2a-3f7bad2f3a87 of volume
20002ad2-4a97-4d2f-b3fc-c103477b5b91/1d3f07dc-b481-492f-a2a6-7c46689d82ba
with
capacity 117440512 (fileVolume:528)
Vdsm reports the volume as:
2021-08-08 14:19:56,944+0000 INFO (jsonrpc/6) [vdsm.api] FINISH
getVolumeInfo return={'info': {'uuid':
'614abd56-4d4f-4412-aa2a-3f7bad2f3a87', 'type': 'SPARSE',
'format':
'COW', 'disktype': 'DATA', 'voltype': 'LEAF',
'capacity': '117440512',
'parent': '1d3f07dc-b481-492f-a2a6-7c46689d82ba', 'description':
'',
'pool': '', 'domain':
'46fa5761-bb9e-46be-8f1c-35f4b03d0203', 'image':
'20002ad2-4a97-4d2f-b3fc-c103477b5b91', 'ctime': '1628432381',
'mtime': '0', 'legality': 'LEGAL', 'generation':
0, 'apparentsize':
'196616', 'truesize': '200704', 'status': 'OK',
'lease': {'path':
'/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87.lease',
'offset': 0, 'owners': [], 'version': None}, 'children':
[]}}
from=::ffff:192.168.200.61,44352,
flow_id=fe1c52bf-a1ea-462f-ac32-14cd8f16c411,
task_id=14914c43-a38e-4692-9753-78ac2ddfbe5d (api:54)
(Using the /rhev/data-center/mnt/mountpoint/...)
The image was prepared later:
2021-08-08 14:20:32,047+0000 INFO (vm/cde311f9) [vdsm.api] START
prepareImage(sdUUID='46fa5761-bb9e-46be-8f1c-35f4b03d0203',
spUUID='7d97ea80-f849-11eb-ac79-5452d501341a',
imgUUID='20002ad2-4a97-4d2f-b3fc-c103477b5b91',
leafUUID='614abd56-4d4f-4412-aa
...
2021-08-08 14:20:32,726+0000 INFO (vm/cde311f9) [vdsm.api] FINISH
prepareImage return={'path': '/rhev/data-center/mnt/192.168
.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412
-aa2a-3f7bad2f3a87', 'info': {'type': 'file', 'path':
'/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-4
6be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87'},
'imgVolumesInfo': [{
'domainID': '46fa5761-bb9e-46be-8f1c-35f4b03d0203', 'imageID':
'20002ad2-4a97-4d2f-b3fc-c103477b5b91', 'volumeID': '1d3f07dc-b
481-492f-a2a6-7c46689d82ba', 'path':
'/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d
0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/1d3f07dc-b481-492f-a2a6-7c46689d82ba',
'leasePath': '/rhev/data-center/mnt/19
2.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/1d3f07dc-b481
-492f-a2a6-7c46689d82ba.lease', 'leaseOffset': 0}, {'domainID':
'46fa5761-bb9e-46be-8f1c-35f4b03d0203', 'imageID': '20002ad2-4
a97-4d2f-b3fc-c103477b5b91', 'volumeID':
'614abd56-4d4f-4412-aa2a-3f7bad2f3a87', 'path':
'/rhev/data-center/mnt/192.168.200.2:
_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3
f7bad2f3a87', 'leasePath':
'/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/image
s/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87.lease',
'leaseOffset': 0}, {'domainID': '46fa5761-
bb9e-46be-8f1c-35f4b03d0203', 'imageID':
'20002ad2-4a97-4d2f-b3fc-c103477b5b91', 'volumeID':
'a4309ef3-01bb-45db-8bf7-0f9498a7
feeb', 'path':
'/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4
a97-4d2f-b3fc-c103477b5b91/a4309ef3-01bb-45db-8bf7-0f9498a7feeb',
'leasePath': '/rhev/data-center/mnt/192.168.200.2:_exports_n
fs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/a4309ef3-01bb-45db-8bf7-0f9498a7fee
b.lease', 'leaseOffset': 0}]} from=internal,
task_id=92af0386-5c0f-4495-9af2-9d5c47d02c48 (api:54)
2021-08-08 14:20:32,726+0000 INFO (vm/cde311f9) [vds] prepared volume
path: /rhev/data-center/mnt/192.168.200.2:_exports_nfs_
share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87
(
clientIF:518)
...
2021-08-08 14:20:32,726+0000 INFO (vm/cde311f9) [vds] prepared volume
path: /rhev/data-center/mnt/192.168.200.2:_exports_nfs_
share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87
(
clientIF:518)
And preapreImage returns the /rhev/data-center/mnt/... path - this path should
be used by the VM!
But then we see this suspicious log:
2021-08-08 14:20:33,975+0000 INFO (vm/cde311f9) [virt.vm]
(vmId='cde311f9-9a33-4eb9-8338-fa22ff49edc2') drive 'vda' path:
'file=/rhev/data-center/7d97ea80-f849-11eb-ac79-5452d501341a/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87'
->
'*file=/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87'
(storagexml:341)
And we see the wrong paths in the vm xml:
<disk device="disk" snapshot="no"
type="file">
<driver cache="none" error_policy="stop"
io="threads"
iothread="1" name="qemu" type="qcow2" />
<source
file="/rhev/data-center/7d97ea80-f849-11eb-ac79-5452d501341a/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87">
<seclabel model="dac" relabel="no" />
</source>
So it looks like code in storagexml is *changing* the path returned from storage
(rhev/data-center/mnt/...), with another path (rhev/data-center/pool-id/...),
which is incorrect and confuses libvirt.
I don't think we change anything in this area in vdsm, so this is likely a new
validation added in libvirt, revealing old bug in vdsm/engine.
Nir