qcow on LVM‽ Why‽

newer
Getting Redhat Yum mirrors issues...

Thorsten Glaser

4 Mar 2020 4 Mar '20

5:05 p.m.

Hi *, I’m a bit frustrated, so please excuse any harshness in this mail. Whose idea was it to place qcow on logical volumes anyway? I was shrinking a hard disc: first the filesystems inside the VM, then the partitions inside the VM, then the LV… then I wanted to convert the LV to a compressed qcow2 file for transport, and it told me that the source is corrupted. Huh? I had already wondered why I was unable to inspect the LV on the host the usual way (kpartx -v -a /dev/VG/LV after finding out, with “virsh --readonly -c qemu:///system domblklist VM_NAME”, which LV is the right one). Turns out that ovirt stores qcow on LVs instead of raw images ☹ Well, vgcfgrestore to my rescue: - vgcfgrestore -l VG_NAME - vgcfgrestore -f /etc/… VG_NAME The image was still marked as corrupted, but exported fine. I could not write it back to the LV as preallocated, which seems to be what ovirt does, because qemu-img doesn’t wish to do that when the target is a special device (not a regular file). Meh. Does ovirt handle raw images on LV, and if so, how can we enable this for new VMs? If not, whyever the hell not? And whose “great” idea was this anyway? Thanks in advance, //mirabilos -- tarent solutions GmbH Rochusstraße 2-4, D-53123 Bonn • http://www.tarent.de/ Tel: +49 228 54881-393 • Fax: +49 228 54881-235 HRB 5168 (AG Bonn) • USt-ID (VAT): DE122264941 Geschäftsführer: Dr. Stefan Barth, Kai Ebenrett, Boris Esser, Alexander Steeg ********** Mit der tarent Academy bieten wir auch Trainings und Schulungen in den Bereichen Softwareentwicklung, Agiles Arbeiten und Zukunftstechnologien an. Besuchen Sie uns auf www.tarent.de/academy. Wir freuen uns auf Ihren Kontakt. **********

Show replies by date

Strahil Nikolov

4 Mar 4 Mar

10:50 p.m.

On March 4, 2020 6:05:14 PM GMT+02:00, Thorsten Glaser <t.glaser@tarent.de> wrote:

...

Hi *,

I’m a bit frustrated, so please excuse any harshness in this mail.

Whose idea was it to place qcow on logical volumes anyway?

I was shrinking a hard disc: first the filesystems inside the VM, then the partitions inside the VM, then the LV… then I wanted to convert the LV to a compressed qcow2 file for transport, and it told me that the source is corrupted. Huh?

I had already wondered why I was unable to inspect the LV on the host the usual way (kpartx -v -a /dev/VG/LV after finding out, with “virsh --readonly -c qemu:///system domblklist VM_NAME”, which LV is the right one).

Turns out that ovirt stores qcow on LVs instead of raw images ☹

Well, vgcfgrestore to my rescue: - vgcfgrestore -l VG_NAME - vgcfgrestore -f /etc/… VG_NAME

The image was still marked as corrupted, but exported fine. I could not write it back to the LV as preallocated, which seems to be what ovirt does, because qemu-img doesn’t wish to do that when the target is a special device (not a regular file). Meh.

Does ovirt handle raw images on LV, and if so, how can we enable this for new VMs? If not, whyever the hell not? And whose “great” idea was this anyway?

Thanks in advance, //mirabilos

Hey Thorsten, That was harsh! I know a lot of Germans willing to be in control , yet this is not the case. What are you actually trying to do ? Are you trying to sparsify your VM's disks ? Are you sure that this approach is the correct one? I always thought that a storage migration always sparsifies the VM's disk. Maybe, I'm wrong ... who knows. Anyway, if you have recommendations -> this is the place , but be more diplomatic . Best Regards, Strahil Nikolov

Thorsten Glaser

6 Mar 6 Mar

4:39 p.m.

Hi all, calmer now, sorry for the initial mail, but I was under some pressure. Strahil Nikolov dixit:

...

What are you actually trying to do ? Are you trying to sparsify your VM's disks ?

No, not at all. The thing I really needed to do is to prepare a VM on an oVirt system then export it as qcow2 file for uploading to a “cloud” hoster. We originally created the hard disc larger than it turned out was permitted by the cloud hoster so I had to shrink it.

...

Are you sure that this approach is the correct one?

I’m very experienced with emulation and virtualisation, and for regular management, this approach is correct; I have done so with no problems for years, on a libvirt/virt-manager environment, with various storages. Gianluca Cecchi dixit:

...

Anyway the base idea is to provide thin provisioning disks when you have block storage (SAN, iSCSI)

But when I have LVM, I reserve an LV with enough space for the virtual HDD, and that will reserve the space entirely ahead of use. Thin provisioning makes zero sense on LVM, and “raw” is massively faster and less overhead, so it is the natural format to use with LVs as backing store.

...

I don't want to comment on the method, but in the order I would follow: filesystem LV PV (supposing your PV is on top of a partition, as you seem to write) partition

Huh? No. The setup is pretty standard: - hardware server with several physical HDDs - then either * RAID over those HDDs, and the RAID device is the (only) PV * one partition on each of these HDDs is a PV - one volume group - in which one LV per virtual HDD (that is, one or more per VM) Everything else happens inside the VM. A standard VM would get one LV assigned as its only virtual HDD (IDE, SCSI or VirtIO) and then have a partition table on that and guest partitions inside, such as /boot, / and swap on Linux or boot and C: on Windows.

...

Difficult in general with the method above to compute exact sizes to avoid corruption.

Trivial if the LV uses the raw format for the guest HDD. (OK, trivial for me, who has been known to use a hex editor to partition a HDD when no fdisk is near, and who has written his own boot managers.)

...

It turned out in the past (and I was one of the impacted guys), that if inside a VM you created PVs on whole virtual disks, this LVM structure was somehow exposed to the underlying LVM structure of the host, with nasty impacts in some activities.

Yes, but this didn’t happen here (guest was Windows anyway), and you can do that as long as the VG names differ. (You’ll want to NOT have os-prober installed on the host, though.) No LVM was used in the guest.

...

BTW: as you can see from the documentation link I sent in the previous reply, you get raw images on LVM (so when you have block based underlying storage domain) when you configure the virtual disk as preallocated.

Interesting, because that doesn’t appear what has happened here. I did not initially create the VM, so I cannot know 100%, but the size of the LV was precisely a little more than the size of the guest HDD, and qemu-img info reports that the LV contains a qcow2 in “preallocated” mode. (Incidentally, I was unable to replicate that: when using qemu-img to convert *to* qcow2 and telling it to use an LV as destination, it refuses to preallocate because it’s a special device, not a regular file, hence my surprise at how oVirt even did it.) Nir Soffer dixit:

...

On Wed, Mar 4, 2020 at 6:13 PM Thorsten Glaser <t.glaser@tarent.de> wrote: ...

...
I was shrinking a hard disc: first the filesystems inside the VM, then the partitions inside the VM, then the LV

This is the point where you probably corrupted your image.

Yes, I know — but only because I was expecting the LV to contain the guest HDD in “raw” format, because that’s literally the only thing that makes sense for LVs.

...

oVirt does not support shrinking existing disks. If you want to do this you must know what you are doing.

As I said above, I know what I’m doing and have done this in various environments over the years without any problems, it’s that oVirt did not meet common industry expectations which caused this.

...

...
… then I wanted to convert the LV to a compressed qcow2 file for transport, and it told me that the source is corrupted. Huh?

You corrupted it by shrinking the LV without checking the end of the image.

Next time try:

$ qemu-img check /dev/vg/lv ... Image end offset: 123456789

You must not shrink the LV less than Image end offset.

No, I would not have needed to shrink the LV at all in the first place, I would have needed to use qemu-img resize --shrink. Because my next step was qemu-img convert -p -c /dev/… foo.qcow, the size of the LV did not matter, only the size of the guest image; for “raw” format that *is* the size of the LV (or file, if one does that), for qcow2 it isn’t.

...

...
I had already wondered why I was unable to inspect the LV on the host the usual way (kpartx -v -a /dev/VG/LV after finding out, with “virsh --readonly -c qemu:///system domblklist VM_NAME”, which LV is the right one).

There’s qemu-nbd which can “mount” a qcow image, by the way, is something I found out later, mentioning here in case someone else needs it and for the list archives.

...

...
Turns out that ovirt stores qcow on LVs instead of raw images ☹

I think this this is documented. Did you read storage admin guide before

Incidentally I haven’t, because I was “thrown” onto oVirt, and I was extrapolating from preexisting libvirt knowledge and industry standards. I haven’t had enough time to get to know oVirt enough (and must confess I like it bad enough to not do that if I have the choice, it’s very over-complicated, graphical, slow and complex).

...

...
Well, vgcfgrestore to my rescue: - vgcfgrestore -l VG_NAME - vgcfgrestore -f /etc/… VG_NAME

This may be too late if another disk is using segments you removed from the original lv, but seems that you were lucky this time.

That cannot happen because for that someone would have needed to create a new or resize an LV on that VG in the meantime, which, thankfully, nobody could have done… because I was the only one working on that system at that time. On a more busy system, yes, that would’ve been a problem (and have cost tons more time to fix up, which, perhaps, explains my unhappiness).

...

You cannot change image format for existing disk.

That wasn’t what I tried… only to make the disc “not corrupted”, and “qemu-img check” ran into errors, so I’ve copied back the conversion target.

...

But you can delete the VM disk, upload the modified disk (e.g. via the UI or SDK) and attach the disk to the VM.

Or you can create a new empty preallocated disk, copy the image directly to the disk using qemu-img, and then attach the disk to the VM.

Does that mean oVirt _can_ handle raw guest discs?

...

qemu-img convert works with block devices. You can enable DEBUG log level in vdsm to check how vdsm run qemu-img.

Ah okay. Will do if/when I’m curious enough, thanks.

...

...
Does ovirt handle raw images on LV, and if so, how can we enable this for new VMs? If not, whyever the hell not? And whose “great” idea was this anyway?

oVirt supports raw format of course, and this is the default format for disks on iSCSI/FC storage domain.

But not on LVM? Whyever?

...

You probably chose "thin" when you created the disk. This means qcow2 format.

As I said, I didn’t initially create it, but the LV was certainly created in guest size plus qcow2 overhead, and the qcow2 itself had the “preallocated” attribute. Looking at the lvcreate manpage thin LVs exist, but they need a thin pool and while “lvs” says there exist thin volumes on this host… LV VG Attr LSize Pool Origin pool00 onn twi-aotz-- 3.53t root onn Vri---tz-k <3.51t pool00 […] … the VMs are not on it, and not thin: […] 26e50a25-a91b-498b-9d1b-52552e03c935 8e3e2c99-b806-452c-bfb7-1542ebabade4 -wi-ao---- 500.00g 5e3d9fca-02ac-4d69-9f1b-dde499097e40 8e3e2c99-b806-452c-bfb7-1542ebabade4 -wi------- 160.00g […] bye, //mirabilos -- tarent solutions GmbH Rochusstraße 2-4, D-53123 Bonn • http://www.tarent.de/ Tel: +49 228 54881-393 • Fax: +49 228 54881-235 HRB 5168 (AG Bonn) • USt-ID (VAT): DE122264941 Geschäftsführer: Dr. Stefan Barth, Kai Ebenrett, Boris Esser, Alexander Steeg ********** Mit der tarent Academy bieten wir auch Trainings und Schulungen in den Bereichen Softwareentwicklung, Agiles Arbeiten und Zukunftstechnologien an. Besuchen Sie uns auf www.tarent.de/academy. Wir freuen uns auf Ihren Kontakt. **********

Gianluca Cecchi

5 Mar 5 Mar

9:06 a.m.

On Wed, Mar 4, 2020 at 5:12 PM Thorsten Glaser <t.glaser@tarent.de> wrote:

...

Hi *,

I’m a bit frustrated, so please excuse any harshness in this mail.

I try... ;-)

...

Whose idea was it to place qcow on logical volumes anyway?

Not mine: I'm a final user and sometimes a contributor for ideas/solutions... and problems ... Anyway the base idea is to provide thin provisioning disks when you have block storage (SAN, iSCSI) The alternative would have been to implement a cluster file system on top of the SAN/iSCSI LUNs (such as vmfs is in vSphere or OCFS2 in Oracle Virtualization) But I think none of the existing solutions (eg GFS) was considered (and indeed it is not in my opinion) robust and fast enough to manage a workload with many hypervisors (so distributed consumers of the cluster file system files) and many users (VMs) on each one of the hypervisors. I think you could read this: https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.3/htm... " If the virtual disk is thinly provisioned, a 1 GB logical volume is created. The logical volume is continuously monitored by the host on which the virtual machine is running. As soon as the usage nears a threshold the host notifies the SPM, and the SPM extends the logical volume by 1 GB. The host is responsible for resuming the virtual machine after the logical volume has been extended. If the virtual machine goes into a paused state it means that the SPM could not extend the disk in time. This occurs if the SPM is too busy or if there is not enough storage space. " Note that you can modify parameters customizing the cited "threshold" and also the size of the extension (default 1Gb) when needed.

...

I was shrinking a hard disc: first the filesystems inside the VM, then the partitions inside the VM, then the LV… then I wanted to convert the LV to a compressed qcow2 file for transport, and it told me that the source is corrupted. Huh?

I don't want to comment on the method, but in the order I would follow: filesystem LV PV (supposing your PV is on top of a partition, as you seem to write) partition Difficult in general with the method above to compute exact sizes to avoid corruption. In general you have to be conservative... at the cost of loosing eventually some MBs. I have done something similar in the past (stopping at the level of LV, because I needed space for other LVs on the same VG, so no PV and partition resize involved)

...

I had already wondered why I was unable to inspect the LV on the host the usual way (kpartx -v -a /dev/VG/LV after finding out, with “virsh --readonly -c qemu:///system domblklist VM_NAME”, which LV is the right one).

It turned out in the past (and I was one of the impacted guys), that if inside a VM you created PVs on whole virtual disks, this LVM structure was somehow exposed to the underlying LVM structure of the host, with nasty impacts in some activities. In my case impatcs were on live storage migration and deleting disk of a VM. At that time (beginning of 2018) it was very helpful Red Hat Support (it was an RHV environment) and in particular from Olimp Bockowski. They resulted in some bugzillas and solutions, some of them: "RHV: Hosts boot with Guest LVs activated " https://access.redhat.com/solutions/2662261 https://bugzilla.redhat.com/show_bug.cgi?id=1450114 https://bugzilla.redhat.com/show_bug.cgi?id=1449968 https://bugzilla.redhat.com/show_bug.cgi?id=1202595 There is also filter tool available https://bugzilla.redhat.com/show_bug.cgi?id=1522926 So, base on opened bugzillas and final users problems, it was decided, in correct way in my opinion, to hide all the information apart what necessary For example on a plain CentOS host in 4.3.8 I have: [root@ov200 ~]# lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert root cl -wi-ao---- <119.12g swap cl -wi-ao---- 16.00g [root@ov200 ~]# If you want to display information and avoid LVM predefined filters on hypervisors (use only in case of problems or debug!) you can bypass the configuration using the standard "--config" option This switch was very useful when debugging problems with VM disks and gives you all the real LVM structure, also with the tags used by oVirt [root@ov200 ~]# lvs --config 'global { use_lvmetad=0 } devices { filter = [ "a|.*/|" ] } ' -o +tags LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert LV Tags root cl -wi-ao---- <119.12g swap cl -wi-ao---- 16.00g 01e20442-21e2-4237-abdb-6e919bb1f522 fa33df49-b09d-4f86-9719-ede649542c21 -wi------- 20.00g IU_24d917f3-0858-45a0-a7a4-eba8b28b2a58,MD_47,PU_00000000-0000-0000-0000-000000000000 ... but all above is for the past. If you have a 4.3 environment you have not to worry about it. HIH a little understanding, Gianluca

Gianluca Cecchi

9:38 a.m.

On Wed, Mar 4, 2020 at 5:12 PM Thorsten Glaser <t.glaser@tarent.de> wrote:

...

H Does ovirt handle raw images on LV, and if so, how can we enable this for new VMs? If not, whyever the hell not? And whose “great” idea was this anyway?

BTW: as you can see from the documentation link I sent in the previous reply, you get raw images on LVM (so when you have block based underlying storage domain) when you configure the virtual disk as preallocated. Gianluca

Nir Soffer

10:19 p.m.

On Wed, Mar 4, 2020 at 6:13 PM Thorsten Glaser <t.glaser@tarent.de> wrote: ...

...

I was shrinking a hard disc: first the filesystems inside the VM, then the partitions inside the VM, then the LV

This is the point where you probably corrupted your image. oVirt does not support shrinking existing disks. If you want to do this you must know what you are doing.

...

… then I wanted to convert the LV to a compressed qcow2 file for transport, and it told me that the source is corrupted. Huh?

You corrupted it by shrinking the LV without checking the end of the image. Next time try: $ qemu-img check /dev/vg/lv ... Image end offset: 123456789 You must not shrink the LV less than Image end offset.

...

I had already wondered why I was unable to inspect the LV on the host the usual way (kpartx -v -a /dev/VG/LV after finding out, with “virsh --readonly -c qemu:///system domblklist VM_NAME”, which LV is the right one).

Turns out that ovirt stores qcow on LVs instead of raw images ☹

I think this this is documented. Did you read storage admin guide before playing with the underlying logical volumes?

...

Well, vgcfgrestore to my rescue: - vgcfgrestore -l VG_NAME - vgcfgrestore -f /etc/… VG_NAME

This may be too late if another disk is using segments you removed from the original lv, but seems that you were lucky this time.

...

The image was still marked as corrupted, but exported fine. I could not write it back to the LV as preallocated,

You cannot change image format for existing disk. But you can delete the VM disk, upload the modified disk (e.g. via the UI or SDK) and attach the disk to the VM. Or you can create a new empty preallocated disk, copy the image directly to the disk using qemu-img, and then attach the disk to the VM.

...

which seems to be what ovirt does, because qemu-img doesn’t wish to do that when the target is a special device (not a regular file). Meh.

qemu-img convert works with block devices. You can enable DEBUG log level in vdsm to check how vdsm run qemu-img.

...

Does ovirt handle raw images on LV, and if so, how can we enable this for new VMs? If not, whyever the hell not? And whose “great” idea was this anyway?

oVirt supports raw format of course, and this is the default format for disks on iSCSI/FC storage domain. You probably chose "thin" when you created the disk. This means qcow2 format. Nir

2098

Age (days ago)

2100

Last active (days ago)

List overview

Download

5 comments

4 participants

uczestnicy (4)

Gianluca Cecchi
Nir Soffer
Strahil Nikolov
Thorsten Glaser