[ovirt-users] Using Microsoft NFS server as storage domain

Nir Soffer nsoffer at redhat.com
Fri Jan 22 20:47:05 UTC 2016


On Fri, Jan 22, 2016 at 5:15 PM, Pavel Gashev <Pax at acronis.com> wrote:
> Nir,
>
>
> On 21/01/16 23:55, "Nir Soffer" <nsoffer at redhat.com> wrote:
>>live migration starts by creating a snapshot, then copying the disks to the new
>>storage, and then mirroring the active layer so both the old and the
>>new disks are
>>the same. Finally we switch to the new disk, and delete the old disk.
>>
>>So probably the issue is in the mirroring step. This is most likely a
>>qemu issue.
>
> Thank you for clarification. This brought me an idea to check consistency of the old disk.
>
> I performed the following testing:
> 1. Create a VM on MS NFS
> 2. Initiate live disk migration to another storage
> 3. Catch the source files before oVirt has removed them by creating hard links to another directory
> 4. Shutdown VM
> 5. Create another VM and move the catched files to the place where new disk files is located
> 6. Check consistency of filesystem in both VMs
>
>
> The source disk is consistent. The destination disk is corrupted.
>
>>
>>I'll try to get instructions for this from libvirt developers. If this
>>happen with
>>libvirt alone, this is a libvirt or qemu bug, and there is little we (ovirt) can
>>do about it.
>
>
> I've tried to reproduce the mirroring of active layer:
>
> 1. Create two thin template provisioned VMs from the same template on different storages.
> 2. Start VM1
> 3. virsh blockcopy VM1 vda /rhev/data-center/...path.to.disk.of.VM2.. --wait --verbose --reuse-external --shallow
> 4. virsh blockjob VM1 vda --abort --pivot
> 5. Shutdown VM1
> 6. Start VM2. Boot in recovery mode and check filesystem.
>
> I did try this a dozen times. Everything works fine. No data corruption.

If you take same vm, and do a live storage migration in ovirt, the
file system is
corrupted after the migration?

What is the guest os? did you try with more then one?

>
>
> Ideas?

Thanks for this research!

The next step is to open a bug with the logs I requested in my last
message. Please mark the bug
as urgent.

I'm adding Kevin (from qemu) and Eric (from libvirt), hopefully they
can tell if the virsh flow is
indeed identical to what ovirt does, and what should be the next step
for debugging this.

Ovirt is using blockCopy if available (should be available everywhere
for some time), or fallback
to blockRebase. Do you see this warning?

    blockCopy not supported, using blockRebase

For reference, this is the relevant code in ovirt for the mirroring
part. The mirroring starts with
diskReplicateStart(), and ends with diskReplicateFinish(). I remove
the parts about managing
vdsm state and left the calls to libvirt.

3378     def diskReplicateFinish(self, srcDisk, dstDisk):
...
3394         blkJobInfo = self._dom.blockJobInfo(drive.name, 0)
...
3418         if srcDisk != dstDisk:
3419             self.log.debug("Stopping the disk replication
switching to the "
3420                            "destination drive: %s", dstDisk)
3421             blockJobFlags = libvirt.VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT
...
3429         else:
3430             self.log.debug("Stopping the disk replication
remaining on the "
3431                            "source drive: %s", dstDisk)
3432             blockJobFlags = 0
...
3435         try:
3436             # Stopping the replication
3437             self._dom.blockJobAbort(drive.name, blockJobFlags)
3438         except Exception:
3439             self.log.exception("Unable to stop the replication for"
3440                                " the drive: %s", drive.name)
...

3462     def _startDriveReplication(self, drive):
3463         destxml = drive.getReplicaXML().toprettyxml()
3464         self.log.debug("Replicating drive %s to %s", drive.name, destxml)
3465
3466         flags = (libvirt.VIR_DOMAIN_BLOCK_COPY_SHALLOW |
3467                  libvirt.VIR_DOMAIN_BLOCK_COPY_REUSE_EXT)
3468
3469         # TODO: Remove fallback when using libvirt >= 1.2.9.
3470         try:
3471             self._dom.blockCopy(drive.name, destxml, flags=flags)
3472         except libvirt.libvirtError as e:
3473             if e.get_error_code() != libvirt.VIR_ERR_NO_SUPPORT:
3474                 raise
3475
3476             self.log.warning("blockCopy not supported, using blockRebase")
3477
3478             base = drive.diskReplicate["path"]
3479             self.log.debug("Replicating drive %s to %s", drive.name, base)
3480
3481             flags = (libvirt.VIR_DOMAIN_BLOCK_REBASE_COPY |
3482                      libvirt.VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT |
3483                      libvirt.VIR_DOMAIN_BLOCK_REBASE_SHALLOW)
3484
3485             if drive.diskReplicate["diskType"] == DISK_TYPE.BLOCK:
3486                 flags |= libvirt.VIR_DOMAIN_BLOCK_REBASE_COPY_DEV
3487
3488             self._dom.blockRebase(drive.name, base, flags=flags)



More information about the Users mailing list