<div dir="ltr">Yeah, that looks like the fix Red Hat GSS came up with. Note that is only online snapshots that we've seen the problem with, never offline but YMMV.<div><br></div><div>What version of oVirt are you running? We're running RHEV 3.5.7 in prod and test environments but 3.6.5 in dev and we've not had a re-occurrence of this problem in dev since moving to 3.6.x</div><div><br></div><div>CC</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Jun 13, 2016 at 1:42 AM, gregor <span dir="ltr"><<a href="mailto:gregor_forum@catrix.at" target="_blank">gregor_forum@catrix.at</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>
<br>
I solved my problem, here are the steps but be carefully if you don't<br>
know what the commands did and how to restore from backup don't follow this:<br>
<br>
- ssh to the host<br>
- systemctl stop ovirt-engine<br>
- backup the database with "engine-backup"<br>
- navigate to the image files<br>
- backup the images: sudo -u vdsm rsync -av <uuid> <uuid_backup><br>
- check which one is the backing file: qemu-img info <file><br>
- check for damages: qemu-img check <file><br>
- qemu-img commit <snapshot file><br>
- rename the <snapshot file> + .lease and .meta so it can't be accessed<br>
<br>
- vmname=srv03<br>
- db=engine<br>
- sudo -u postgres psql $db -c "SELECT b.disk_alias, s.description,<br>
s.snapshot_id, i.creation_date, s.status, i.imagestatus, i.size,<br>
i.image_group_id, i.vm_snapshot_id, i.image_guid, i.parentid, i.active<br>
FROM images as i JOIN snapshots AS s ON (i.vm_snapshot_id =<br>
s.snapshot_id) LEFT JOIN vm_static AS v ON (s.vm_id = v.vm_guid) JOIN<br>
base_disks AS b ON (i.image_group_id = b.disk_id) WHERE v.vm_name =<br>
'$vmname' ORDER BY creation_date, description, disk_alias"<br>
<br>
- note the image_guid and parent_id from the broken snapshot and the<br>
active snapshot, the active state is the image_guuid with the parentid<br>
00000000-0000-0000-0000-000000000000<br>
- igid_active=<active uuid><br>
- igid_broken=<broken uuid><br>
- the parentid of the image_guuid of the broken snapshot must be the<br>
same as the activ snapshots image_guuid<br>
- note the snapshot id<br>
- sid_active=<id of the active snapshot with parrent id 000000><br>
- sid_broken=<id of the broken shapshot><br>
<br>
- delete the broken snapshot<br>
- sudo -u postgres psql $db -c "DELETE FROM snapshots AS s WHERE<br>
s.snapshot_id = '$sid_broken'"<br>
<br>
- pid_new=00000000-0000-0000-0000-000000000000<br>
- sudo -u postgres psql $db -c "SELECT * FROM images WHERE<br>
vm_snapshot_id = '$sid_active' AND image_guid = '$igid_broken'"<br>
- sudo -u postgres psql $db -c "DELETE FROM images WHERE vm_snapshot_id<br>
= '$sid_broken' AND image_guid = '$igid_active'"<br>
- sudo -u postgres psql $db -c "SELECT * FROM image_storage_domain_map<br>
WHERE image_id = '$igid_broken'"<br>
- sudo -u postgres psql $db -c "DELETE FROM image_storage_domain_map<br>
WHERE image_id = '$igid_broken'"<br>
- sudo -u postgres psql $db -c "UPDATE images SET image_guid =<br>
'$igid_active', parentid = '$pid_new' WHERE vm_snapshot_id =<br>
'$sid_active' AND image_guid = '$igid_broken'"<br>
- sudo -u postgres psql $db -c "SELECT * FROM image_storage_domain_map"<br>
- storid=<storage_domain_id><br>
- diskprofileid=<disk_profile_id><br>
- sudo -u postgres psql $db -c "INSERT INTO image_storage_domain_map<br>
(image_id, storage_domain_id, disk_profile_id) VALUES ('$igid_broken',<br>
'$stor_id', '$diskprofileid')"<br>
<br>
- check values<br>
- sudo -u postgres psql $db -c "SELECT b.disk_alias, s.description,<br>
s.snapshot_id, i.creation_date, s.status, i.imagestatus, i.size,<br>
i.image_group_id, i.vm_snapshot_id, i.image_guid, i.parentid, i.active<br>
FROM images as i JOIN snapshots AS s ON (i.vm_snapshot_id =<br>
s.snapshot_id) LEFT JOIN vm_static AS v ON (s.vm_id = v.vm_guid) JOIN<br>
base_disks AS b ON (i.image_group_id = b.disk_id) WHERE v.vm_name =<br>
'$vmname' ORDER BY creation_date, description, disk_alias"could not<br>
change directory to "/root/Backups/oVirt"<br>
<br>
- check for errors<br>
- engine-setup --offline<br>
- systemctl start ovirt-engine<br>
<br>
Now you should have a clean state and a working VM ;-)<br>
<br>
What was tested:<br>
- Power up and down the VM<br>
<br>
What does not work:<br>
- Its not possible to make offline snapshots, online was not tested<br>
because I will not getting into such trouble again. It took many hours<br>
after the machine is up again.<br>
<br>
PLEASE be aware and don't destroy your Host and VM !!!<br>
<br>
cheers<br>
gregor<br>
<span class=""><br>
On 12/06/16 13:40, Colin Coe wrote:<br>
> We've seen this with both Linux and Windows VMs. I'm guessing that<br>
> you've had failures on this VM in both snapshot create and delete<br>
> operations. oVirt/RHEV 3.5 seems particularly affected. I'm told that<br>
> oVirt 3.6.7 has the last of the fixes for these known snapshot problems.<br>
><br>
> My original email was eorded wrong. I meant that qemu-img gives<br>
> "backing filename too long" errors. You may have seen this in your logs.<br>
><br>
> Note also that you may be seeing an entirely un-related problem.<br>
><br>
> You may wish to post you're VDSM logs and the qemu log from<br>
> /var/lib/libvirt/qemu/<vm_name>.log<br>
><br>
> Hope this helps<br>
><br>
> CC<br>
><br>
><br>
> On Sun, Jun 12, 2016 at 4:45 PM, gregor <<a href="mailto:gregor_forum@catrix.at">gregor_forum@catrix.at</a><br>
</span><div><div class="h5">> <mailto:<a href="mailto:gregor_forum@catrix.at">gregor_forum@catrix.at</a>>> wrote:<br>
><br>
> Sound's bad. Recreating the VM is no way because this is a productive<br>
> VM. During testing I need to recreate it more than once. oVirt works<br>
> perfect which Linux VM's but when it comes to Windows VM's we get lots<br>
> of problems.<br>
><br>
> Which OS you used on the problematic VM?<br>
><br>
> cheers<br>
> gregor<br>
><br>
> On 11/06/16 19:22, Anantha Raghava wrote:<br>
> > Hi,<br>
> ><br>
> > Even I observed this behaviour.<br>
> ><br>
> > When we take the snapshot, the main VM using which the snapshot was<br>
> > taken is shutdown and a new VM with external-<VMName> comes to<br>
> life. We<br>
> > cannot get the original VM back to life, but a clone starts<br>
> functioning.<br>
> ><br>
> > We cannot remove the snapshot whether or not the VM is running. I<br>
> had to<br>
> > remove the entire VM that came to life with snapshot and recreate the<br>
> > entire VM from scratch. Luckily the VM was still not in production,<br>
> > hence could afford it.<br>
> ><br>
> > First I could not understand, why, when a snapshot is created, the VM<br>
> > with snapshot comes to life and starts running and not the<br>
> original VM.<br>
> ><br>
> > Is it necessary that we shutdown the VM before taking snapshots?<br>
> > Snapshot is supposed to be a backup of original VM, that unless we<br>
> > restore by cloning should not come to life as I understand.<br>
> ><br>
> > --<br>
> ><br>
> > Thanks & Regards,<br>
> ><br>
> > Anantha Raghava<br>
> ><br>
> ><br>
> > On Saturday 11 June 2016 08:09 PM, gregor wrote:<br>
> >> Hi,<br>
> >><br>
> >> a VM has snapshots which are unable to remove during when the VM<br>
> is up.<br>
> >> Therefore I power down the Windows Server 2012 VM. The snapshots are<br>
> >> still unable to remove and the VM can't boot anymore !!!<br>
> >><br>
> >> This is the message from engine.log<br>
> >><br>
> >> ------------------<br>
> >> Message: VM srv03 is down with error. Exit message: Bad volume<br>
> specification<br>
> >> ------------------<br>
> >><br>
> >> Clone is not possible I get:<br>
> >> ------------------<br>
> >> Message: VDSM command failed: Image is not a legal chain<br>
> >> ------------------<br>
> >><br>
> >> All others VM's can be powered down and start without any problem.<br>
> >> What can I do?<br>
> >> This is very important because now no one can work :-( !!!<br>
> >><br>
> >> cheers<br>
> >> gregor<br>
> >> _______________________________________________<br>
> >> Users mailing list<br>
</div></div>> >> <a href="mailto:Users@ovirt.org">Users@ovirt.org</a> <mailto:<a href="mailto:Users@ovirt.org">Users@ovirt.org</a>><br>
<span class="">> >> <a href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br>
> ><br>
> _______________________________________________<br>
> Users mailing list<br>
</span>> <a href="mailto:Users@ovirt.org">Users@ovirt.org</a> <mailto:<a href="mailto:Users@ovirt.org">Users@ovirt.org</a>><br>
> <a href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br>
><br>
><br>
</blockquote></div><br></div>