We're seeing this in RHEV 3.5 with snapshot management on VMs with multiple
disks. It would be awesome to have a "fsck" type script that could be run
daily which reports on any problems with the snapshot disks.
On Mon, Apr 18, 2016 at 10:59 PM, Clint Boggio <clint(a)theboggios.com> wrote:
Markus thank you so much for the information. I'll be focusing
on
resolution of this problem this week and I'll keep you in the loop.
On Apr 18, 2016, at 7:39 AM, Markus Stockhausen <stockhausen(a)collogia.de>
wrote:
>> Von: users-bounces(a)ovirt.org [users-bounces(a)ovirt.org]&quot; im
Auftrag von "Clint Boggio [clint(a)theboggios.com]
>> Gesendet: Montag, 18. April 2016 14:16
>> An: users(a)ovirt.org
>> Betreff: [ovirt-users] Disks Illegal State
>>
>> OVirt 3.6, 4 node cluster with dedicated engine. Main storage domain is
iscsi, ISO and Export domains are NFS.
>>
>> Several of my VM snapshot disks show to be in an "illegal state". The
system will not allow me to manipulate the snapshots in any way, nor clone
the active system, or create a new snapshot.
>>
>> In the logs I see that the system complains about not being able to
"get volume size for xxx", and also that the system appears to believe that
the image is "locked" and is currently in the snapshot process.
>>
>> Of the VM's with this status, one rebooted and was lost due to "cannot
get volume size for domain xxx".
>>
>> I fear that in this current condition, should any of the other machine
reboot, they too will be lost.
>>
>> How can I troubleshoot this problem further, and hopefully alleviate
the condition ?
>>
>> Thank you for your help.
>
> Hi Clint,
>
> for us the problem always boils down to the following steps. Might be
simpler as we use
> NFS for all of our domains and have direct access to the image files.
>
> 1) Check if snapshot disks are currently used. Capture the qemu command
line with a "ps -ef"
> on the nodes. There you can see what images qemu is started with. For
each of the files check
> the backing chain:
>
> # qemu-img info /rhev/.../bbd05dd8-c3bf-4d15-9317-73040e04abae
> image: bbd05dd8-c3bf-4d15-9317-73040e04abae
> file format: qcow2
> virtual size: 50G (53687091200 bytes)
> disk size: 133M
> cluster_size: 65536
> backing file:
../f8ebfb39-2ac6-4b87-b193-4204d1854edc/595b95f4-ce1a-4298-bd27-3f6745ae4e4c
> backing file format: raw
> Format specific information:
> compat: 0.10
>
> # qemu-img info .../595b95f4-ce1a-4298-bd27-3f6745ae4e4c (see above)
> ...
>
> I don't know how you can accomplish this on ISCSI (and LVM based images
inside iirc). We
> usually follow the backing chain and test if all the files exist and are
linked correctly. Especially
> if everything matches the OVirt GUI. I guess this is the most important
part for you.
>
> 2) In most of our cases everything is fine and only the OVirt database
is wrong. So we fix it
> at our own risk. Because of your explanation I do not recommend that for
you. It is just for
> documentation purpose.
>
> engine# su - postgres
>> psql engine postgres
>
>> select image_group_id,imagestatus from images where imagestatus =4;
>> ... list of illegal images
>> update images set imagestatus =1 where imagestatus = 4 and <other
criteria>;
>> commit
>
>> select description,status from snapshots where status <> 'OK';
>> ... list of locked snapshots
>> update snapshots set status = 'OK' where status <> 'OK'
and <other
criteria>;
>> commit
>
>> \q
>
> Restart engine and everything should be in sync again.
>
> Best regards.
>
> Markus=
> <InterScan_Disclaimer.txt>
_______________________________________________
Users mailing list
Users(a)ovirt.org
http://lists.ovirt.org/mailman/listinfo/users