[ovirt-users] [Qemu-block] qcow2 images corruption

Nicolas Ecarnot nicolas at ecarnot.net
Wed Feb 14 14:51:43 UTC 2018



https://framadrop.org/r/Lvvr392QZo#/wOeYUUlHQAtkUw1E+x2YdqTqq21Pbic6OPBIH0TjZE=

Le 14/02/2018 à 00:01, John Snow a écrit :
> 
> 
> On 02/13/2018 04:41 AM, Kevin Wolf wrote:
>> Am 07.02.2018 um 18:06 hat Nicolas Ecarnot geschrieben:
>>> TL; DR : qcow2 images keep getting corrupted. Any workaround?
>>
>> Not without knowing the cause.
>>
>> The first thing to make sure is that the image isn't touched by a second
>> process while QEMU is running a VM. The classic one is using 'qemu-img
>> snapshot' on the image of a running VM, which is instant corruption (and
>> newer QEMU versions have locking in place to prevent this), but we have
>> seen more absurd cases of things outside QEMU tampering with the image
>> when we were investigating previous corruption reports.
>>
>> This covers the majority of all reports, we haven't had a real
>> corruption caused by a QEMU bug in ages.
>>
>>> After having found (https://access.redhat.com/solutions/1173623) the right
>>> logical volume hosting the qcow2 image, I can run qemu-img check on it.
>>> - On 80% of my VMs, I find no errors.
>>> - On 15% of them, I find Leaked cluster errors that I can correct using
>>> "qemu-img check -r all"
>>> - On 5% of them, I find Leaked clusters errors and further fatal errors,
>>> which can not be corrected with qemu-img.
>>> In rare cases, qemu-img can correct them, but destroys large parts of the
>>> image (becomes unusable), and on other cases it can not correct them at all.
>>
>> It would be good if you could make the 'qemu-img check' output available
>> somewhere.
>>
>> It would be even better if we could have a look at the respective image.
>> I seem to remember that John (CCed) had a few scripts to analyse
>> corrupted qcow2 images, maybe we would be able to see something there.
>>
> 
> Hi! I did write a pretty simplistic tool for trying to tell the shape of
> a corruption at a glance. It seems to work pretty similarly to the other
> tool you already found, but it won't hurt anything to run it:
> 
> https://github.com/jnsnow/qcheck
> 
> (Actually, that other tool looks like it has an awful lot of options.
> I'll have to check it out.)
> 
> It can print a really upsetting amount of data (especially for very
> corrupt images), but in the default case, the simple setting should do
> the trick just fine.
> 
> You could always put the output from this tool in a pastebin too; it
> might help me visualize the problem a bit more -- I find seeing the
> exact offsets and locations of where all the various tables and things
> to be pretty helpful.
> 
> You can also always use the "deluge" option and compress it if you want,
> just don't let it print to your terminal:
> 
> jsnow at probe (dev) ~/s/qcheck> ./qcheck -xd
> /home/bos/jsnow/src/qemu/bin/git/install_test_f26.qcow2 > deluge.log;
> and ls -sh deluge.log
> 4.3M deluge.log
> 
> but it compresses down very well:
> 
> jsnow at probe (dev) ~/s/qcheck> 7z a -t7z -m0=ppmd deluge.ppmd.7z deluge.log
> jsnow at probe (dev) ~/s/qcheck> ls -s deluge.ppmd.7z
> 316 deluge.ppmd.7z
> 
> So I suppose if you want to send along:
> (1) The basic output without any flags, in a pastebin
> (2) The zipped deluge output, just in case
> 
> and I will try my hand at guessing what went wrong.
> 
> 
> (Also, maybe my tool will totally choke for your image, who knows. It
> hasn't received an overwhelming amount of testing apart from when I go
> to use it personally and inevitably wind up displeased with how it
> handles certain situations, so ...)
> 
>>> What I read similar to my case is :
>>> - usage of qcow2
>>> - heavy disk I/O
>>> - using the virtio-blk driver
>>>
>>> In the proxmox thread, they tend to say that using virtio-scsi is the
>>> solution. Having asked this question to oVirt experts
>>> (https://lists.ovirt.org/pipermail/users/2018-February/086753.html) but it's
>>> not clear the driver is to blame.
>>
>> This seems very unlikely. The corruption you're seeing is in the qcow2
>> metadata, not only in the guest data. If anything, virtio-scsi exercises
>> more qcow2 code paths than virtio-blk, so any potential bug that affects
>> virtio-blk should also affect virtio-scsi, but not the other way around.
>>
>>> I agree with the answer Yaniv Kaul gave to me, saying I have to properly
>>> report the issue, so I'm longing to know which peculiar information I can
>>> give you now.
>>
>> To be honest, debugging corruption after the fact is pretty hard. We'd
>> need the 'qemu-img check' output and ideally the image to do anything,
>> but I can't promise that anything would come out of this.
>>
>> Best would be a reproducer, or at least some operation that you can link
>> to the appearance of the corruption. Then we could take a more targeted
>> look at the respective code.
>>
>>> As you can imagine, all this setup is in production, and for most of the
>>> VMs, I can not "play" with them. Moreover, we launched a campaign of nightly
>>> stopping every VM, qemu-img check them one by one, then boot.
>>> So it might take some time before I find another corrupted image.
>>> (which I'll preciously store for debug)
>>>
>>> Other informations : We very rarely do snapshots, but I'm close to imagine
>>> that automated migrations of VMs could trigger similar behaviors on qcow2
>>> images.
>>
>> To my knowledge, oVirt only uses external snapshots and creates them
>> with QMP. This should be perfectly safe because from the perspective of
>> the qcow2 image being snapshotted, it just means that it gets no new
>> write requests.
>>
>> Migration is something more involved, and if you could relate the
>> problem to migration, that would certainly be something to look into. In
>> that case, it would be important to know more about the setup, e.g. is
>> it migration with shared or non-shared storage?
>>
>>> Last point about the versions we use : yes that's old, yes we're planning to
>>> upgrade, but we don't know when.
>>
>> That would be helpful, too. Nothing is more frustrating that debugging a
>> bug in an old version only to find that it's already fixed in the
>> current version (well, except maybe debugging and finding nothing).
>>
>> Kevin
>>
> And, looking at your other email:
> 
> "- In the case of oVirt, we are here allowing tens of hosts to connect
> to the same LUN. This LUN is then managed by a classical LVM setup, but
> I see here no notion of concurrent access management. To date, I still
> haven't understood how was managed these concurrent access to the same
> LUN with no crash."
> 
> I'm hoping someone else on list can chime in with if this safe or not --
> I'm not really familiar with how oVirt does things, but as long as the
> rest of the stack is sound and nothing else is touching the qcow2 data
> area, we should be OK, I'd hope.
> 
> (Though the last big qcow2 corruption I had to debug wound up being in
> the storage stack and not in QEMU, so I have some prejudices here)
> 
> 
> 
> 
> anyway, I'll try to help as best as I'm able, but no promises.
> 
> --js
> 



-- 
Nicolas ECARNOT


More information about the Users mailing list