In my case Markus, the backing disks are MIA and show only as bright
red broken symbolic links. Using the postgres commands to set them as
OK would be folly, and likley cause more trouble. if the snapshot disks
are truly gone, (and they are), what procedure would i use to inform
the database and set the VM's in a usable status status again ?
On Mon, 2016-04-18 at 12:39 +0000, Markus Stockhausen wrote:
>
> Von: users-bounces(a)ovirt.org [users-bounces(a)ovirt.org]" im
> Auftrag von "Clint Boggio [clint(a)theboggios.com]
> Gesendet: Montag, 18. April 2016 14:16
> An: users(a)ovirt.org
> Betreff: [ovirt-users] Disks Illegal State
>
> OVirt 3.6, 4 node cluster with dedicated engine. Main storage
> domain is iscsi, ISO and Export domains are NFS.
>
> Several of my VM snapshot disks show to be in an "illegal state".
> The system will not allow me to manipulate the snapshots in any
> way, nor clone the active system, or create a new snapshot.
>
> In the logs I see that the system complains about not being able to
> "get volume size for xxx", and also that the system appears to
> believe that the image is "locked" and is currently in the snapshot
> process.
>
> Of the VM's with this status, one rebooted and was lost due to
> "cannot get volume size for domain xxx".
>
> I fear that in this current condition, should any of the other
> machine reboot, they too will be lost.
>
> How can I troubleshoot this problem further, and hopefully
> alleviate the condition ?
>
> Thank you for your help.
Hi Clint,
for us the problem always boils down to the following steps. Might be
simpler as we use
NFS for all of our domains and have direct access to the image files.
1) Check if snapshot disks are currently used. Capture the qemu
command line with a "ps -ef"
on the nodes. There you can see what images qemu is started with. For
each of the files check
the backing chain:
# qemu-img info /rhev/.../bbd05dd8-c3bf-4d15-9317-73040e04abae
image: bbd05dd8-c3bf-4d15-9317-73040e04abae
file format: qcow2
virtual size: 50G (53687091200 bytes)
disk size: 133M
cluster_size: 65536
backing file: ../f8ebfb39-2ac6-4b87-b193-4204d1854edc/595b95f4-ce1a-
4298-bd27-3f6745ae4e4c
backing file format: raw
Format specific information:
compat: 0.10
# qemu-img info .../595b95f4-ce1a-4298-bd27-3f6745ae4e4c (see above)
...
I don't know how you can accomplish this on ISCSI (and LVM based
images inside iirc). We
usually follow the backing chain and test if all the files exist and
are linked correctly. Especially
if everything matches the OVirt GUI. I guess this is the most
important part for you.
2) In most of our cases everything is fine and only the OVirt
database is wrong. So we fix it
at our own risk. Because of your explanation I do not recommend that
for you. It is just for
documentation purpose.
engine# su - postgres
>
> psql engine postgres
>
> select image_group_id,imagestatus from images where imagestatus =4;
> ... list of illegal images
> update images set imagestatus =1 where imagestatus = 4 and <other
> criteria>;
> commit
>
> select description,status from snapshots where status <> 'OK';
> ... list of locked snapshots
> update snapshots set status = 'OK' where status <> 'OK' and
<other
> criteria>;
> commit
>
> \q
Restart engine and everything should be in sync again.
Best regards.
Markus