On Fri, Jan 22, 2016 at 5:15 PM, Pavel Gashev <Pax(a)acronis.com> wrote:
Nir,
On 21/01/16 23:55, "Nir Soffer" <nsoffer(a)redhat.com> wrote:
>live migration starts by creating a snapshot, then copying the disks to the new
>storage, and then mirroring the active layer so both the old and the
>new disks are
>the same. Finally we switch to the new disk, and delete the old disk.
>
>So probably the issue is in the mirroring step. This is most likely a
>qemu issue.
Thank you for clarification. This brought me an idea to check consistency of the old
disk.
I performed the following testing:
1. Create a VM on MS NFS
2. Initiate live disk migration to another storage
3. Catch the source files before oVirt has removed them by creating hard links to another
directory
4. Shutdown VM
5. Create another VM and move the catched files to the place where new disk files is
located
6. Check consistency of filesystem in both VMs
The source disk is consistent. The destination disk is corrupted.
>
>I'll try to get instructions for this from libvirt developers. If this
>happen with
>libvirt alone, this is a libvirt or qemu bug, and there is little we (ovirt) can
>do about it.
I've tried to reproduce the mirroring of active layer:
1. Create two thin template provisioned VMs from the same template on different
storages.
2. Start VM1
3. virsh blockcopy VM1 vda /rhev/data-center/...path.to.disk.of.VM2.. --wait --verbose
--reuse-external --shallow
4. virsh blockjob VM1 vda --abort --pivot
5. Shutdown VM1
6. Start VM2. Boot in recovery mode and check filesystem.
I did try this a dozen times. Everything works fine. No data corruption.
If you take same vm, and do a live storage migration in ovirt, the
file system is
corrupted after the migration?
What is the guest os? did you try with more then one?
Ideas?
Thanks for this research!
The next step is to open a bug with the logs I requested in my last
message. Please mark the bug
as urgent.
I'm adding Kevin (from qemu) and Eric (from libvirt), hopefully they
can tell if the virsh flow is
indeed identical to what ovirt does, and what should be the next step
for debugging this.
Ovirt is using blockCopy if available (should be available everywhere
for some time), or fallback
to blockRebase. Do you see this warning?
blockCopy not supported, using blockRebase
For reference, this is the relevant code in ovirt for the mirroring
part. The mirroring starts with
diskReplicateStart(), and ends with diskReplicateFinish(). I remove
the parts about managing
vdsm state and left the calls to libvirt.
3378 def diskReplicateFinish(self, srcDisk, dstDisk):
...
3394 blkJobInfo = self._dom.blockJobInfo(drive.name, 0)
...
3418 if srcDisk != dstDisk:
3419 self.log.debug("Stopping the disk replication
switching to the "
3420 "destination drive: %s", dstDisk)
3421 blockJobFlags = libvirt.VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT
...
3429 else:
3430 self.log.debug("Stopping the disk replication
remaining on the "
3431 "source drive: %s", dstDisk)
3432 blockJobFlags = 0
...
3435 try:
3436 # Stopping the replication
3437 self._dom.blockJobAbort(drive.name, blockJobFlags)
3438 except Exception:
3439 self.log.exception("Unable to stop the replication for"
3440 " the drive: %s", drive.name)
...
3462 def _startDriveReplication(self, drive):
3463 destxml = drive.getReplicaXML().toprettyxml()
3464 self.log.debug("Replicating drive %s to %s", drive.name, destxml)
3465
3466 flags = (libvirt.VIR_DOMAIN_BLOCK_COPY_SHALLOW |
3467 libvirt.VIR_DOMAIN_BLOCK_COPY_REUSE_EXT)
3468
3469 # TODO: Remove fallback when using libvirt >= 1.2.9.
3470 try:
3471 self._dom.blockCopy(drive.name, destxml, flags=flags)
3472 except libvirt.libvirtError as e:
3473 if e.get_error_code() != libvirt.VIR_ERR_NO_SUPPORT:
3474 raise
3475
3476 self.log.warning("blockCopy not supported, using blockRebase")
3477
3478 base = drive.diskReplicate["path"]
3479 self.log.debug("Replicating drive %s to %s", drive.name, base)
3480
3481 flags = (libvirt.VIR_DOMAIN_BLOCK_REBASE_COPY |
3482 libvirt.VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT |
3483 libvirt.VIR_DOMAIN_BLOCK_REBASE_SHALLOW)
3484
3485 if drive.diskReplicate["diskType"] == DISK_TYPE.BLOCK:
3486 flags |= libvirt.VIR_DOMAIN_BLOCK_REBASE_COPY_DEV
3487
3488 self._dom.blockRebase(drive.name, base, flags=flags)