
On Mon, Aug 9, 2021 at 4:29 PM Nir Soffer <nsoffer@redhat.com> wrote:
On Mon, Aug 9, 2021 at 4:01 PM Nir Soffer <nsoffer@redhat.com> wrote:
On Mon, Aug 9, 2021 at 2:42 PM Yedidyah Bar David <didi@redhat.com> wrote:
On Mon, Aug 9, 2021 at 1:43 PM Nir Soffer <nsoffer@redhat.com> wrote:
On Mon, Aug 9, 2021 at 10:35 AM Yedidyah Bar David <didi@redhat.com> wrote:
On Sun, Aug 8, 2021 at 5:42 PM Code Review <gerrit@ovirt.org> wrote:
From Jenkins CI <jenkins@ovirt.org>:
Jenkins CI has posted comments on this change. ( https://gerrit.ovirt.org/c/ovirt-system-tests/+/115392 )
Change subject: HE: Use node image ......................................................................
Patch Set 13: Continuous-Integration-1
Build Failed
While trying to deactivate a host, the engine wanted to migrate a VM (vm0) from host-0 to host-1. vdsm log of host-0 says:
2021-08-08 14:31:10,076+0000 ERROR (migsrc/cde311f9) [virt.vm] (vmId='cde311f9-9a33-4eb9-8338-fa22ff49edc2') Failed to migrate (migration:503) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/virt/migration.py", line 477, in _regular_run time.time(), machineParams File "/usr/lib/python3.6/site-packages/vdsm/virt/migration.py", line 578, in _startUnderlyingMigration self._perform_with_conv_schedule(duri, muri) File "/usr/lib/python3.6/site-packages/vdsm/virt/migration.py", line 667, in _perform_with_conv_schedule self._perform_migration(duri, muri) File "/usr/lib/python3.6/site-packages/vdsm/virt/migration.py", line 596, in _perform_migration self._migration_flags) File "/usr/lib/python3.6/site-packages/vdsm/virt/virdomain.py", line 159, in call return getattr(self._vm._dom, name)(*a, **kw) File "/usr/lib/python3.6/site-packages/vdsm/virt/virdomain.py", line 101, in f ret = attr(*args, **kwargs) File "/usr/lib/python3.6/site-packages/vdsm/common/libvirtconnection.py", line 131, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python3.6/site-packages/vdsm/common/function.py", line 94, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python3.6/site-packages/libvirt.py", line 2126, in migrateToURI3 raise libvirtError('virDomainMigrateToURI3() failed') libvirt.libvirtError: Unsafe migration: Migration without shared storage is unsafe
Please share the vm xml:
sudo virsh -r dumpxl vm-name
I think you should be able to find a dump of it in vdsm.log:
https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/18650/...
I think the first line of starting a migration is:
2021-08-08 14:31:08,350+0000 DEBUG (jsonrpc/4) [jsonrpc.JsonRpcServer] Calling 'VM.migrate' in bridge with {'vmID': 'cde311f9-9a33-4eb9-8338-fa22ff49edc2', 'params':
A few lines later:
2021-08-08 14:31:08,387+0000 DEBUG (migsrc/cde311f9) [virt.metadata.Descriptor] dumped metadata for cde311f9-9a33-4eb9-8338-fa22ff49edc2: <?xml version='1.0' encoding='utf-8'?> <vm> <balloonTarget type="int">98304</balloonTarget>
This is not the vm xml but the metadata xml.
Looking at the logs on both hosts:
[nsoffer@sparse ost]$ head -1 *vdsm.log ==> host0-vdsm.log <== 2021-08-08 13:16:04,676+0000 INFO (MainThread) [vds] (PID: 65169) I am the actual vdsm 4.40.80.3.12.git6d67b935b ost-he-basic-suite-master-host-0 (4.18.0-326.el8.x86_64) (vdsmd:162)
==> host1-vdsm.log <== 2021-08-08 15:40:54,367+0200 INFO (MainThread) [vds] (PID: 23005) I am the actual vdsm 4.40.80.4.5.git4309a3949 ost-he-basic-suite-master-host-1 (4.18.0-326.el8.x86_64) (vdsmd:162)
- The hosts clocks are using different time zones (+0000 vs +0200) is this intended? - You are testing different versions of vdsm - is this intended?
We have about 60 errors: $ grep 'Migration without shared storage is unsafe' host0-vdsm.log | wc -l 60
Looking at the first migration that failed, the vm xml is here:
2021-08-08 14:20:34,127+0000 INFO (vm/cde311f9) [virt.vm] (vmId='cde311f9-9a33-4eb9-8338-fa22ff49edc2') <?xml version='1.0' e ncoding='utf-8'?> <domain xmlns:ns0="http://libvirt.org/schemas/domain/qemu/1.0" xmlns:ovirt-vm="http://ovirt.org/vm/1.0" type="kvm"> <name>vm0</name> <uuid>cde311f9-9a33-4eb9-8338-fa22ff49edc2</uuid> ...
The relevant parts for storage are:
<disk device="disk" snapshot="no" type="file"> <driver cache="none" error_policy="stop" io="threads" iothread="1" name="qemu" type="qcow2" /> <source file="/rhev/data-center/7d97ea80-f849-11eb-ac79-5452d501341a/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87"> <seclabel model="dac" relabel="no" /> </source> <target bus="virtio" dev="vda" /> <serial>20002ad2-4a97-4d2f-b3fc-c103477b5b91</serial> <boot order="1" /> <alias name="ua-20002ad2-4a97-4d2f-b3fc-c103477b5b91" /> <address bus="0x05" domain="0x0000" function="0x0" slot="0x00" type="pci" /> </disk> <disk device="disk" snapshot="no" type="block"> <driver cache="none" error_policy="stop" io="native" name="qemu" type="raw" /> <source dev="/dev/mapper/36001405bc9d94e4419b4b80a2f702e2f"> <seclabel model="dac" relabel="no" /> </source> <target bus="scsi" dev="sda" /> <serial>738c8486-8929-44ec-9083-69327bde9c65</serial> <alias name="ua-738c8486-8929-44ec-9083-69327bde9c65" /> <address bus="0" controller="0" target="0" type="drive" unit="0" /> </disk>
So we have one qcow2 disk on file storage, and one direct lun.
On the destination, the first migration attempt is here:
2021-08-08 16:31:08,437+0200 DEBUG (jsonrpc/2) [jsonrpc.JsonRpcServer] Calling 'VM.migrationCreate' in bridge with {'vmID': 'c de311f9-9a33-4eb9-8338-fa22ff49edc2', 'params': {'_srcDomXML': '<domain type=\'kvm\' id=\'6\' xmlns:qemu=\'http://libvirt.org/ ...
We prepare the qcow2 disk:
2021-08-08 16:31:09,313+0200 INFO (vm/cde311f9) [vdsm.api] FINISH prepareImage return={'path': '/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87', 'info': {'type': 'file', 'path': '/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87'}, 'imgVolumesInfo': [{'domainID': '46fa5761-bb9e-46be-8f1c-35f4b03d0203', 'imageID': '20002ad2-4a97-4d2f-b3fc-c103477b5b91', 'volumeID': '1d3f07dc-b481-492f-a2a6-7c46689d82ba', 'path': '/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/1d3f07dc-b481-492f-a2a6-7c46689d82ba', 'leasePath': '/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/1d3f07dc-b481-492f-a2a6-7c46689d82ba.lease', 'leaseOffset': 0}, {'domainID': '46fa5761-bb9e-46be-8f1c-35f4b03d0203', 'imageID': '20002ad2-4a97-4d2f-b3fc-c103477b5b91', 'volumeID': '614abd56-4d4f-4412-aa2a-3f7bad2f3a87', 'path': '/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87', 'leasePath': '/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87.lease', 'leaseOffset': 0}, {'domainID': '46fa5761-bb9e-46be-8f1c-35f4b03d0203', 'imageID': '20002ad2-4a97-4d2f-b3fc-c103477b5b91', 'volumeID': 'a4309ef3-01bb-45db-8bf7-0f9498a7feeb', 'path': '/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/a4309ef3-01bb-45db-8bf7-0f9498a7feeb', 'leasePath': '/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/a4309ef3-01bb-45db-8bf7-0f9498a7feeb.lease', 'leaseOffset': 0}]} from=internal, task_id=f2dfbacb-154c-4f5b-b57d-affcbf419691 (api:54) 2021-08-08 16:31:09,314+0200 INFO (vm/cde311f9) [vds] prepared volume path: /rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87 (clientIF:518)
And the direct lun:
2021-08-08 16:31:09,315+0200 INFO (vm/cde311f9) [vdsm.api] START appropriateDevice(guid='36001405bc9d94e4419b4b80a2f702e2f', thiefId='cde311f9-9a33-4eb9-8338-fa22ff49edc2', deviceType='mpath') from=internal, task_id=220f1c1f-caec-4327-a157-7a4fab3b54a5 (api:48) 2021-08-08 16:31:09,550+0200 INFO (vm/cde311f9) [vdsm.api] FINISH appropriateDevice return={'truesize': '21474836480', 'apparentsize': '21474836480', 'path': '/dev/mapper/36001405bc9d94e4419b4b80a2f702e2f'} from=internal, task_id=220f1c1f-caec-4327-a157-7a4fab3b54a5 (api:54) 2021-08-08 16:31:09,550+0200 INFO (vm/cde311f9) [vds] prepared volume path: /dev/mapper/36001405bc9d94e4419b4b80a2f702e2f (clientIF:518)
The interesting thing is that the qcow2 disk is using different path on the source and destination vms:
source: /rhev/data-center/7d97ea80-f849-11eb-ac79-5452d501341a/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87
destination: /rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87
On the source we have: /rhev/data-center/pool-id/domain-id/images/image-id/volume-id
On the destination we have: /rhev/data-center/mnt/mountpoint/domain-id/images/image-id/volume-id
Both lead to the same disk, but libvirt probably compared the strings and decided that we don't have shared storage.
It may be new validation in libvirt, or maybe engine changed the way the disk path is added to the xml recently?
In vdsm we use os.path.realpath() when we compare disk paths since we know that we can have different paths to the same volume.
This kind of issue is likely to be reproducible without ost.
Trying to understand why the drive path is different on source:
The source path was created by a snapshot:
2021-08-08 14:19:40,219+0000 INFO (tasks/9) [storage.Volume] Request to create snapshot 20002ad2-4a97-4d2f-b3fc-c103477b5b91/ 614abd56-4d4f-4412-aa2a-3f7bad2f3a87 of volume 20002ad2-4a97-4d2f-b3fc-c103477b5b91/1d3f07dc-b481-492f-a2a6-7c46689d82ba with capacity 117440512 (fileVolume:528)
Vdsm reports the volume as:
2021-08-08 14:19:56,944+0000 INFO (jsonrpc/6) [vdsm.api] FINISH getVolumeInfo return={'info': {'uuid': '614abd56-4d4f-4412-aa2a-3f7bad2f3a87', 'type': 'SPARSE', 'format': 'COW', 'disktype': 'DATA', 'voltype': 'LEAF', 'capacity': '117440512', 'parent': '1d3f07dc-b481-492f-a2a6-7c46689d82ba', 'description': '', 'pool': '', 'domain': '46fa5761-bb9e-46be-8f1c-35f4b03d0203', 'image': '20002ad2-4a97-4d2f-b3fc-c103477b5b91', 'ctime': '1628432381', 'mtime': '0', 'legality': 'LEGAL', 'generation': 0, 'apparentsize': '196616', 'truesize': '200704', 'status': 'OK', 'lease': {'path': '/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87.lease', 'offset': 0, 'owners': [], 'version': None}, 'children': []}} from=::ffff:192.168.200.61,44352, flow_id=fe1c52bf-a1ea-462f-ac32-14cd8f16c411, task_id=14914c43-a38e-4692-9753-78ac2ddfbe5d (api:54)
(Using the /rhev/data-center/mnt/mountpoint/...)
The image was prepared later:
2021-08-08 14:20:32,047+0000 INFO (vm/cde311f9) [vdsm.api] START prepareImage(sdUUID='46fa5761-bb9e-46be-8f1c-35f4b03d0203', spUUID='7d97ea80-f849-11eb-ac79-5452d501341a', imgUUID='20002ad2-4a97-4d2f-b3fc-c103477b5b91', leafUUID='614abd56-4d4f-4412-aa ... 2021-08-08 14:20:32,726+0000 INFO (vm/cde311f9) [vdsm.api] FINISH prepareImage return={'path': '/rhev/data-center/mnt/192.168 .200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412 -aa2a-3f7bad2f3a87', 'info': {'type': 'file', 'path': '/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-4 6be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87'}, 'imgVolumesInfo': [{ 'domainID': '46fa5761-bb9e-46be-8f1c-35f4b03d0203', 'imageID': '20002ad2-4a97-4d2f-b3fc-c103477b5b91', 'volumeID': '1d3f07dc-b 481-492f-a2a6-7c46689d82ba', 'path': '/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d 0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/1d3f07dc-b481-492f-a2a6-7c46689d82ba', 'leasePath': '/rhev/data-center/mnt/19 2.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/1d3f07dc-b481 -492f-a2a6-7c46689d82ba.lease', 'leaseOffset': 0}, {'domainID': '46fa5761-bb9e-46be-8f1c-35f4b03d0203', 'imageID': '20002ad2-4 a97-4d2f-b3fc-c103477b5b91', 'volumeID': '614abd56-4d4f-4412-aa2a-3f7bad2f3a87', 'path': '/rhev/data-center/mnt/192.168.200.2: _exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3 f7bad2f3a87', 'leasePath': '/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/image s/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87.lease', 'leaseOffset': 0}, {'domainID': '46fa5761- bb9e-46be-8f1c-35f4b03d0203', 'imageID': '20002ad2-4a97-4d2f-b3fc-c103477b5b91', 'volumeID': 'a4309ef3-01bb-45db-8bf7-0f9498a7 feeb', 'path': '/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4 a97-4d2f-b3fc-c103477b5b91/a4309ef3-01bb-45db-8bf7-0f9498a7feeb', 'leasePath': '/rhev/data-center/mnt/192.168.200.2:_exports_n fs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/a4309ef3-01bb-45db-8bf7-0f9498a7fee b.lease', 'leaseOffset': 0}]} from=internal, task_id=92af0386-5c0f-4495-9af2-9d5c47d02c48 (api:54) 2021-08-08 14:20:32,726+0000 INFO (vm/cde311f9) [vds] prepared volume path: /rhev/data-center/mnt/192.168.200.2:_exports_nfs_ share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87 ( clientIF:518) ... 2021-08-08 14:20:32,726+0000 INFO (vm/cde311f9) [vds] prepared volume path: /rhev/data-center/mnt/192.168.200.2:_exports_nfs_ share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87 ( clientIF:518)
And preapreImage returns the /rhev/data-center/mnt/... path - this path should be used by the VM!
But then we see this suspicious log:
2021-08-08 14:20:33,975+0000 INFO (vm/cde311f9) [virt.vm] (vmId='cde311f9-9a33-4eb9-8338-fa22ff49edc2') drive 'vda' path: 'file=/rhev/data-center/7d97ea80-f849-11eb-ac79-5452d501341a/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87' -> '*file=/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87' (storagexml:341)
And we see the wrong paths in the vm xml:
<disk device="disk" snapshot="no" type="file"> <driver cache="none" error_policy="stop" io="threads" iothread="1" name="qemu" type="qcow2" /> <source file="/rhev/data-center/7d97ea80-f849-11eb-ac79-5452d501341a/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87"> <seclabel model="dac" relabel="no" /> </source>
So it looks like code in storagexml is *changing* the path returned from storage (rhev/data-center/mnt/...), with another path (rhev/data-center/pool-id/...), which is incorrect and confuses libvirt.
I don't think we change anything in this area in vdsm, so this is likely a new validation added in libvirt, revealing old bug in vdsm/engine.
For the record, I rebased the OST patch, which triggered a check-patch run, which passed: https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/18658/ It didn't have to migrate vm0 while deactivating the host (during test_012, at least) - it was probably already on host-1, not sure why. So it's not a proof that above issue was "fixed". -- Didi