[ovirt-users] Can't start VM after shutdown

Colin Coe colin.coe at gmail.com
Sun Jun 12 22:03:26 UTC 2016


Yeah, that looks like the fix Red Hat GSS came up with.  Note that is only
online snapshots that we've seen the problem with, never offline but YMMV.

What version of oVirt are you running?  We're running RHEV 3.5.7 in prod
and test environments but 3.6.5 in dev and we've not had a re-occurrence of
this problem in dev since moving to 3.6.x

CC

On Mon, Jun 13, 2016 at 1:42 AM, gregor <gregor_forum at catrix.at> wrote:

> Hi,
>
> I solved my problem, here are the steps but be carefully if you don't
> know what the commands did and how to restore from backup don't follow
> this:
>
> - ssh to the host
> - systemctl stop ovirt-engine
> - backup the database with "engine-backup"
> - navigate to the image files
> - backup the images: sudo -u vdsm rsync -av <uuid> <uuid_backup>
> - check which one is the backing file: qemu-img info <file>
> - check for damages: qemu-img check <file>
> - qemu-img commit <snapshot file>
> - rename the <snapshot file> + .lease and .meta so it can't be accessed
>
> - vmname=srv03
> - db=engine
> - sudo -u postgres psql $db -c "SELECT b.disk_alias, s.description,
> s.snapshot_id, i.creation_date, s.status, i.imagestatus, i.size,
> i.image_group_id, i.vm_snapshot_id, i.image_guid, i.parentid, i.active
> FROM images as i JOIN snapshots AS s ON (i.vm_snapshot_id =
> s.snapshot_id) LEFT JOIN vm_static AS v ON (s.vm_id = v.vm_guid) JOIN
> base_disks AS b ON (i.image_group_id = b.disk_id) WHERE v.vm_name =
> '$vmname' ORDER BY creation_date, description, disk_alias"
>
> - note the image_guid and parent_id from the broken snapshot and the
> active snapshot, the active state is the image_guuid with the parentid
> 00000000-0000-0000-0000-000000000000
> - igid_active=<active uuid>
> - igid_broken=<broken uuid>
> - the parentid of the image_guuid of the broken snapshot must be the
> same as the activ snapshots image_guuid
> - note the snapshot id
> - sid_active=<id of the active snapshot with parrent id 000000>
> - sid_broken=<id of the broken shapshot>
>
> - delete the broken snapshot
> - sudo -u postgres psql $db -c "DELETE FROM snapshots AS s WHERE
> s.snapshot_id = '$sid_broken'"
>
> - pid_new=00000000-0000-0000-0000-000000000000
> - sudo -u postgres psql $db -c "SELECT * FROM images WHERE
> vm_snapshot_id = '$sid_active' AND image_guid = '$igid_broken'"
> - sudo -u postgres psql $db -c "DELETE FROM images WHERE vm_snapshot_id
> = '$sid_broken' AND image_guid = '$igid_active'"
> - sudo -u postgres psql $db -c "SELECT * FROM image_storage_domain_map
> WHERE image_id = '$igid_broken'"
> - sudo -u postgres psql $db -c "DELETE FROM image_storage_domain_map
> WHERE image_id = '$igid_broken'"
> - sudo -u postgres psql $db -c "UPDATE images SET image_guid =
> '$igid_active', parentid = '$pid_new' WHERE vm_snapshot_id =
> '$sid_active' AND image_guid = '$igid_broken'"
> - sudo -u postgres psql $db -c "SELECT * FROM image_storage_domain_map"
> - storid=<storage_domain_id>
> - diskprofileid=<disk_profile_id>
> - sudo -u postgres psql $db -c "INSERT INTO image_storage_domain_map
> (image_id, storage_domain_id, disk_profile_id) VALUES ('$igid_broken',
> '$stor_id', '$diskprofileid')"
>
> - check values
> - sudo -u postgres psql $db -c "SELECT b.disk_alias, s.description,
> s.snapshot_id, i.creation_date, s.status, i.imagestatus, i.size,
> i.image_group_id, i.vm_snapshot_id, i.image_guid, i.parentid, i.active
> FROM images as i JOIN snapshots AS s ON (i.vm_snapshot_id =
> s.snapshot_id) LEFT JOIN vm_static AS v ON (s.vm_id = v.vm_guid) JOIN
> base_disks AS b ON (i.image_group_id = b.disk_id) WHERE v.vm_name =
> '$vmname' ORDER BY creation_date, description, disk_alias"could not
> change directory to "/root/Backups/oVirt"
>
> - check for errors
> - engine-setup --offline
> - systemctl start ovirt-engine
>
> Now you should have a clean state and a working VM ;-)
>
> What was tested:
> - Power up and down the VM
>
> What does not work:
> - Its not possible to make offline snapshots, online was not tested
> because I will not getting into such trouble again. It took many hours
> after the machine is up again.
>
> PLEASE be aware and don't destroy your Host and VM !!!
>
> cheers
> gregor
>
> On 12/06/16 13:40, Colin Coe wrote:
> > We've seen this with both Linux and Windows VMs.  I'm guessing that
> > you've had failures on this VM in both snapshot create and delete
> > operations.  oVirt/RHEV 3.5 seems particularly affected.  I'm told that
> > oVirt 3.6.7 has the last of the fixes for these known snapshot problems.
> >
> > My original email was eorded wrong.  I meant that qemu-img gives
> > "backing filename too long" errors.  You may have seen this in your logs.
> >
> > Note also that you may be seeing an entirely un-related problem.
> >
> > You may wish to post you're VDSM logs and the qemu log from
> > /var/lib/libvirt/qemu/<vm_name>.log
> >
> > Hope this helps
> >
> > CC
> >
> >
> > On Sun, Jun 12, 2016 at 4:45 PM, gregor <gregor_forum at catrix.at
> > <mailto:gregor_forum at catrix.at>> wrote:
> >
> >     Sound's bad. Recreating the VM is no way because this is a productive
> >     VM. During testing I need to recreate it more than once. oVirt works
> >     perfect which Linux VM's but when it comes to Windows VM's we get
> lots
> >     of problems.
> >
> >     Which OS you used on the problematic VM?
> >
> >     cheers
> >     gregor
> >
> >     On 11/06/16 19:22, Anantha Raghava wrote:
> >     > Hi,
> >     >
> >     > Even I observed this behaviour.
> >     >
> >     > When we take the snapshot, the main VM using which the snapshot was
> >     > taken is shutdown and a new VM with external-<VMName> comes to
> >     life. We
> >     > cannot get the original VM back to life, but a clone starts
> >     functioning.
> >     >
> >     > We cannot remove the snapshot whether or not the VM is running. I
> >     had to
> >     > remove the entire VM that came to life with snapshot and recreate
> the
> >     > entire VM from scratch. Luckily the VM was still not in production,
> >     > hence could afford it.
> >     >
> >     > First I could not understand, why, when a snapshot is created, the
> VM
> >     > with snapshot comes to life and starts running and not the
> >     original VM.
> >     >
> >     > Is it necessary that we shutdown the VM before taking snapshots?
> >     > Snapshot is supposed to be a backup of original VM, that unless we
> >     > restore by cloning should not come to life as I understand.
> >     >
> >     > --
> >     >
> >     > Thanks & Regards,
> >     >
> >     > Anantha Raghava
> >     >
> >     >
> >     > On Saturday 11 June 2016 08:09 PM, gregor wrote:
> >     >> Hi,
> >     >>
> >     >> a VM has snapshots which are unable to remove during when the VM
> >     is up.
> >     >> Therefore I power down the Windows Server 2012 VM. The snapshots
> are
> >     >> still unable to remove and the VM can't boot anymore !!!
> >     >>
> >     >> This is the message from engine.log
> >     >>
> >     >> ------------------
> >     >> Message: VM srv03 is down with error. Exit message: Bad volume
> >     specification
> >     >> ------------------
> >     >>
> >     >> Clone is not possible I get:
> >     >> ------------------
> >     >> Message: VDSM command failed: Image is not a legal chain
> >     >> ------------------
> >     >>
> >     >> All others VM's can be powered down and start without any problem.
> >     >> What can I do?
> >     >> This is very important because now no one can work :-( !!!
> >     >>
> >     >> cheers
> >     >> gregor
> >     >> _______________________________________________
> >     >> Users mailing list
> >     >> Users at ovirt.org <mailto:Users at ovirt.org>
> >     >> http://lists.ovirt.org/mailman/listinfo/users
> >     >
> >     _______________________________________________
> >     Users mailing list
> >     Users at ovirt.org <mailto:Users at ovirt.org>
> >     http://lists.ovirt.org/mailman/listinfo/users
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160613/8df88a1d/attachment-0001.html>


More information about the Users mailing list