<div dir="ltr">Yeah, that looks like the fix Red Hat GSS came up with.  Note that is only online snapshots that we&#39;ve seen the problem with, never offline but YMMV.<div><br></div><div>What version of oVirt are you running?  We&#39;re running RHEV 3.5.7 in prod and test environments but 3.6.5 in dev and we&#39;ve not had a re-occurrence of this problem in dev since moving to 3.6.x</div><div><br></div><div>CC</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Jun 13, 2016 at 1:42 AM, gregor <span dir="ltr">&lt;<a href="mailto:gregor_forum@catrix.at" target="_blank">gregor_forum@catrix.at</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>

<br>

I solved my problem, here are the steps but be carefully if you don&#39;t<br>

know what the commands did and how to restore from backup don&#39;t follow this:<br>

<br>

- ssh to the host<br>

- systemctl stop ovirt-engine<br>

- backup the database with &quot;engine-backup&quot;<br>

- navigate to the image files<br>

- backup the images: sudo -u vdsm rsync -av &lt;uuid&gt; &lt;uuid_backup&gt;<br>

- check which one is the backing file: qemu-img info &lt;file&gt;<br>

- check for damages: qemu-img check &lt;file&gt;<br>

- qemu-img commit &lt;snapshot file&gt;<br>

- rename the &lt;snapshot file&gt; + .lease and .meta so it can&#39;t be accessed<br>

<br>

- vmname=srv03<br>

- db=engine<br>

- sudo -u postgres psql $db -c &quot;SELECT b.disk_alias, s.description,<br>

s.snapshot_id, i.creation_date, s.status, i.imagestatus, i.size,<br>

i.image_group_id, i.vm_snapshot_id, i.image_guid, i.parentid, i.active<br>

FROM images as i JOIN snapshots AS s ON (i.vm_snapshot_id =<br>

s.snapshot_id) LEFT JOIN vm_static AS v ON (s.vm_id = v.vm_guid) JOIN<br>

base_disks AS b ON (i.image_group_id = b.disk_id) WHERE v.vm_name =<br>

&#39;$vmname&#39; ORDER BY creation_date, description, disk_alias&quot;<br>

<br>

- note the image_guid and parent_id from the broken snapshot and the<br>

active snapshot, the active state is the image_guuid with the parentid<br>

00000000-0000-0000-0000-000000000000<br>

- igid_active=&lt;active uuid&gt;<br>

- igid_broken=&lt;broken uuid&gt;<br>

- the parentid of the image_guuid of the broken snapshot must be the<br>

same as the activ snapshots image_guuid<br>

- note the snapshot id<br>

- sid_active=&lt;id of the active snapshot with parrent id 000000&gt;<br>

- sid_broken=&lt;id of the broken shapshot&gt;<br>

<br>

- delete the broken snapshot<br>

- sudo -u postgres psql $db -c &quot;DELETE FROM snapshots AS s WHERE<br>

s.snapshot_id = &#39;$sid_broken&#39;&quot;<br>

<br>

- pid_new=00000000-0000-0000-0000-000000000000<br>

- sudo -u postgres psql $db -c &quot;SELECT * FROM images WHERE<br>

vm_snapshot_id = &#39;$sid_active&#39; AND image_guid = &#39;$igid_broken&#39;&quot;<br>

- sudo -u postgres psql $db -c &quot;DELETE FROM images WHERE vm_snapshot_id<br>

= &#39;$sid_broken&#39; AND image_guid = &#39;$igid_active&#39;&quot;<br>

- sudo -u postgres psql $db -c &quot;SELECT * FROM image_storage_domain_map<br>

WHERE image_id = &#39;$igid_broken&#39;&quot;<br>

- sudo -u postgres psql $db -c &quot;DELETE FROM image_storage_domain_map<br>

WHERE image_id = &#39;$igid_broken&#39;&quot;<br>

- sudo -u postgres psql $db -c &quot;UPDATE images SET image_guid =<br>

&#39;$igid_active&#39;, parentid = &#39;$pid_new&#39; WHERE vm_snapshot_id =<br>

&#39;$sid_active&#39; AND image_guid = &#39;$igid_broken&#39;&quot;<br>

- sudo -u postgres psql $db -c &quot;SELECT * FROM image_storage_domain_map&quot;<br>

- storid=&lt;storage_domain_id&gt;<br>

- diskprofileid=&lt;disk_profile_id&gt;<br>

- sudo -u postgres psql $db -c &quot;INSERT INTO image_storage_domain_map<br>

(image_id, storage_domain_id, disk_profile_id) VALUES (&#39;$igid_broken&#39;,<br>

&#39;$stor_id&#39;, &#39;$diskprofileid&#39;)&quot;<br>

<br>

- check values<br>

- sudo -u postgres psql $db -c &quot;SELECT b.disk_alias, s.description,<br>

s.snapshot_id, i.creation_date, s.status, i.imagestatus, i.size,<br>

i.image_group_id, i.vm_snapshot_id, i.image_guid, i.parentid, i.active<br>

FROM images as i JOIN snapshots AS s ON (i.vm_snapshot_id =<br>

s.snapshot_id) LEFT JOIN vm_static AS v ON (s.vm_id = v.vm_guid) JOIN<br>

base_disks AS b ON (i.image_group_id = b.disk_id) WHERE v.vm_name =<br>

&#39;$vmname&#39; ORDER BY creation_date, description, disk_alias&quot;could not<br>

change directory to &quot;/root/Backups/oVirt&quot;<br>

<br>

- check for errors<br>

- engine-setup --offline<br>

- systemctl start ovirt-engine<br>

<br>

Now you should have a clean state and a working VM ;-)<br>

<br>

What was tested:<br>

- Power up and down the VM<br>

<br>

What does not work:<br>

- Its not possible to make offline snapshots, online was not tested<br>

because I will not getting into such trouble again. It took many hours<br>

after the machine is up again.<br>

<br>

PLEASE be aware and don&#39;t destroy your Host and VM !!!<br>

<br>

cheers<br>

gregor<br>

<span class=""><br>

On 12/06/16 13:40, Colin Coe wrote:<br>

&gt; We&#39;ve seen this with both Linux and Windows VMs.  I&#39;m guessing that<br>

&gt; you&#39;ve had failures on this VM in both snapshot create and delete<br>

&gt; operations.  oVirt/RHEV 3.5 seems particularly affected.  I&#39;m told that<br>

&gt; oVirt 3.6.7 has the last of the fixes for these known snapshot problems.<br>

&gt;<br>

&gt; My original email was eorded wrong.  I meant that qemu-img gives<br>

&gt; &quot;backing filename too long&quot; errors.  You may have seen this in your logs.<br>

&gt;<br>

&gt; Note also that you may be seeing an entirely un-related problem.<br>

&gt;<br>

&gt; You may wish to post you&#39;re VDSM logs and the qemu log from<br>

&gt; /var/lib/libvirt/qemu/&lt;vm_name&gt;.log<br>

&gt;<br>

&gt; Hope this helps<br>

&gt;<br>

&gt; CC<br>

&gt;<br>

&gt;<br>

&gt; On Sun, Jun 12, 2016 at 4:45 PM, gregor &lt;<a href="mailto:gregor_forum@catrix.at">gregor_forum@catrix.at</a><br>

</span><div><div class="h5">&gt; &lt;mailto:<a href="mailto:gregor_forum@catrix.at">gregor_forum@catrix.at</a>&gt;&gt; wrote:<br>

&gt;<br>

&gt;     Sound&#39;s bad. Recreating the VM is no way because this is a productive<br>

&gt;     VM. During testing I need to recreate it more than once. oVirt works<br>

&gt;     perfect which Linux VM&#39;s but when it comes to Windows VM&#39;s we get lots<br>

&gt;     of problems.<br>

&gt;<br>

&gt;     Which OS you used on the problematic VM?<br>

&gt;<br>

&gt;     cheers<br>

&gt;     gregor<br>

&gt;<br>

&gt;     On 11/06/16 19:22, Anantha Raghava wrote:<br>

&gt;     &gt; Hi,<br>

&gt;     &gt;<br>

&gt;     &gt; Even I observed this behaviour.<br>

&gt;     &gt;<br>

&gt;     &gt; When we take the snapshot, the main VM using which the snapshot was<br>

&gt;     &gt; taken is shutdown and a new VM with external-&lt;VMName&gt; comes to<br>

&gt;     life. We<br>

&gt;     &gt; cannot get the original VM back to life, but a clone starts<br>

&gt;     functioning.<br>

&gt;     &gt;<br>

&gt;     &gt; We cannot remove the snapshot whether or not the VM is running. I<br>

&gt;     had to<br>

&gt;     &gt; remove the entire VM that came to life with snapshot and recreate the<br>

&gt;     &gt; entire VM from scratch. Luckily the VM was still not in production,<br>

&gt;     &gt; hence could afford it.<br>

&gt;     &gt;<br>

&gt;     &gt; First I could not understand, why, when a snapshot is created, the VM<br>

&gt;     &gt; with snapshot comes to life and starts running and not the<br>

&gt;     original VM.<br>

&gt;     &gt;<br>

&gt;     &gt; Is it necessary that we shutdown the VM before taking snapshots?<br>

&gt;     &gt; Snapshot is supposed to be a backup of original VM, that unless we<br>

&gt;     &gt; restore by cloning should not come to life as I understand.<br>

&gt;     &gt;<br>

&gt;     &gt; --<br>

&gt;     &gt;<br>

&gt;     &gt; Thanks &amp; Regards,<br>

&gt;     &gt;<br>

&gt;     &gt; Anantha Raghava<br>

&gt;     &gt;<br>

&gt;     &gt;<br>

&gt;     &gt; On Saturday 11 June 2016 08:09 PM, gregor wrote:<br>

&gt;     &gt;&gt; Hi,<br>

&gt;     &gt;&gt;<br>

&gt;     &gt;&gt; a VM has snapshots which are unable to remove during when the VM<br>

&gt;     is up.<br>

&gt;     &gt;&gt; Therefore I power down the Windows Server 2012 VM. The snapshots are<br>

&gt;     &gt;&gt; still unable to remove and the VM can&#39;t boot anymore !!!<br>

&gt;     &gt;&gt;<br>

&gt;     &gt;&gt; This is the message from engine.log<br>

&gt;     &gt;&gt;<br>

&gt;     &gt;&gt; ------------------<br>

&gt;     &gt;&gt; Message: VM srv03 is down with error. Exit message: Bad volume<br>

&gt;     specification<br>

&gt;     &gt;&gt; ------------------<br>

&gt;     &gt;&gt;<br>

&gt;     &gt;&gt; Clone is not possible I get:<br>

&gt;     &gt;&gt; ------------------<br>

&gt;     &gt;&gt; Message: VDSM command failed: Image is not a legal chain<br>

&gt;     &gt;&gt; ------------------<br>

&gt;     &gt;&gt;<br>

&gt;     &gt;&gt; All others VM&#39;s can be powered down and start without any problem.<br>

&gt;     &gt;&gt; What can I do?<br>

&gt;     &gt;&gt; This is very important because now no one can work :-( !!!<br>

&gt;     &gt;&gt;<br>

&gt;     &gt;&gt; cheers<br>

&gt;     &gt;&gt; gregor<br>

&gt;     &gt;&gt; _______________________________________________<br>

&gt;     &gt;&gt; Users mailing list<br>

</div></div>&gt;     &gt;&gt; <a href="mailto:Users@ovirt.org">Users@ovirt.org</a> &lt;mailto:<a href="mailto:Users@ovirt.org">Users@ovirt.org</a>&gt;<br>

<span class="">&gt;     &gt;&gt; <a href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br>

&gt;     &gt;<br>

&gt;     _______________________________________________<br>

&gt;     Users mailing list<br>

</span>&gt;     <a href="mailto:Users@ovirt.org">Users@ovirt.org</a> &lt;mailto:<a href="mailto:Users@ovirt.org">Users@ovirt.org</a>&gt;<br>

&gt;     <a href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br>

&gt;<br>

&gt;<br>

</blockquote></div><br></div>