[ovirt-users] Can't start VM after shutdown

Colin Coe colin.coe at gmail.com
Mon Jun 13 09:48:27 UTC 2016


Initially we only saw this on VMs with 2 or more disks. Subsequently we
confirmed that it does happen on single disk VMs also.

CC

---

Sent from my Nexus 5
On Jun 13, 2016 5:12 PM, "gregor" <gregor_forum at catrix.at> wrote:

> The VM has two disks both are VirtIO. During testing its now clear that
> the problem occur only with two disks. When I select only one disk for
> the snapshost it works.
> Is this a problem of oVirt or is it not possible to use two disks on a
> VM in oVirt?
>
> Have you also two or more disks on your VM?
>
> Here are the Testresults:
> -------------------------
> What does not work:
> - Export the VM: Failed with error "ImageIsNotLegalChain and code 262"
> - Clone the VM: Failed with error "IRSErrorException: Image is not a
> legal chain" with the ID of the second Disk.
>
> After removing the second Disk:
> - Create offline snapshot: Works
> - Remove offline snapshot: After two hours I run "engine-setup
> --offline" to clean the looked snapshot !!!
> - Export the VM: Works
> - Import the exported VM: Works
> - Add Disk to the imported VM: Works
> - Create offline snapshot of the imported VM: Failed
> - Clone the VM: Works
> - Add Disk to the cloned VM: Works
> - Create offline snapshot of the cloned VM: Failed
>
> What works:
> - Make offline snapshot only with the system disk: Works
> - Remove offline snapshot of the system disk: Works
> - Make online snapshot only with the system disk: Works
> - Remove online snapshot of the system disk: Works
>
> cheers
> gregor
>
> On 12/06/16 19:42, gregor wrote:
> > Hi,
> >
> > I solved my problem, here are the steps but be carefully if you don't
> > know what the commands did and how to restore from backup don't follow
> this:
> >
> > - ssh to the host
> > - systemctl stop ovirt-engine
> > - backup the database with "engine-backup"
> > - navigate to the image files
> > - backup the images: sudo -u vdsm rsync -av <uuid> <uuid_backup>
> > - check which one is the backing file: qemu-img info <file>
> > - check for damages: qemu-img check <file>
> > - qemu-img commit <snapshot file>
> > - rename the <snapshot file> + .lease and .meta so it can't be accessed
> >
> > - vmname=srv03
> > - db=engine
> > - sudo -u postgres psql $db -c "SELECT b.disk_alias, s.description,
> > s.snapshot_id, i.creation_date, s.status, i.imagestatus, i.size,
> > i.image_group_id, i.vm_snapshot_id, i.image_guid, i.parentid, i.active
> > FROM images as i JOIN snapshots AS s ON (i.vm_snapshot_id =
> > s.snapshot_id) LEFT JOIN vm_static AS v ON (s.vm_id = v.vm_guid) JOIN
> > base_disks AS b ON (i.image_group_id = b.disk_id) WHERE v.vm_name =
> > '$vmname' ORDER BY creation_date, description, disk_alias"
> >
> > - note the image_guid and parent_id from the broken snapshot and the
> > active snapshot, the active state is the image_guuid with the parentid
> > 00000000-0000-0000-0000-000000000000
> > - igid_active=<active uuid>
> > - igid_broken=<broken uuid>
> > - the parentid of the image_guuid of the broken snapshot must be the
> > same as the activ snapshots image_guuid
> > - note the snapshot id
> > - sid_active=<id of the active snapshot with parrent id 000000>
> > - sid_broken=<id of the broken shapshot>
> >
> > - delete the broken snapshot
> > - sudo -u postgres psql $db -c "DELETE FROM snapshots AS s WHERE
> > s.snapshot_id = '$sid_broken'"
> >
> > - pid_new=00000000-0000-0000-0000-000000000000
> > - sudo -u postgres psql $db -c "SELECT * FROM images WHERE
> > vm_snapshot_id = '$sid_active' AND image_guid = '$igid_broken'"
> > - sudo -u postgres psql $db -c "DELETE FROM images WHERE vm_snapshot_id
> > = '$sid_broken' AND image_guid = '$igid_active'"
> > - sudo -u postgres psql $db -c "SELECT * FROM image_storage_domain_map
> > WHERE image_id = '$igid_broken'"
> > - sudo -u postgres psql $db -c "DELETE FROM image_storage_domain_map
> > WHERE image_id = '$igid_broken'"
> > - sudo -u postgres psql $db -c "UPDATE images SET image_guid =
> > '$igid_active', parentid = '$pid_new' WHERE vm_snapshot_id =
> > '$sid_active' AND image_guid = '$igid_broken'"
> > - sudo -u postgres psql $db -c "SELECT * FROM image_storage_domain_map"
> > - storid=<storage_domain_id>
> > - diskprofileid=<disk_profile_id>
> > - sudo -u postgres psql $db -c "INSERT INTO image_storage_domain_map
> > (image_id, storage_domain_id, disk_profile_id) VALUES ('$igid_broken',
> > '$stor_id', '$diskprofileid')"
> >
> > - check values
> > - sudo -u postgres psql $db -c "SELECT b.disk_alias, s.description,
> > s.snapshot_id, i.creation_date, s.status, i.imagestatus, i.size,
> > i.image_group_id, i.vm_snapshot_id, i.image_guid, i.parentid, i.active
> > FROM images as i JOIN snapshots AS s ON (i.vm_snapshot_id =
> > s.snapshot_id) LEFT JOIN vm_static AS v ON (s.vm_id = v.vm_guid) JOIN
> > base_disks AS b ON (i.image_group_id = b.disk_id) WHERE v.vm_name =
> > '$vmname' ORDER BY creation_date, description, disk_alias"could not
> > change directory to "/root/Backups/oVirt"
> >
> > - check for errors
> > - engine-setup --offline
> > - systemctl start ovirt-engine
> >
> > Now you should have a clean state and a working VM ;-)
> >
> > What was tested:
> > - Power up and down the VM
> >
> > What does not work:
> > - Its not possible to make offline snapshots, online was not tested
> > because I will not getting into such trouble again. It took many hours
> > after the machine is up again.
> >
> > PLEASE be aware and don't destroy your Host and VM !!!
> >
> > cheers
> > gregor
> >
> > On 12/06/16 13:40, Colin Coe wrote:
> >> We've seen this with both Linux and Windows VMs.  I'm guessing that
> >> you've had failures on this VM in both snapshot create and delete
> >> operations.  oVirt/RHEV 3.5 seems particularly affected.  I'm told that
> >> oVirt 3.6.7 has the last of the fixes for these known snapshot problems.
> >>
> >> My original email was eorded wrong.  I meant that qemu-img gives
> >> "backing filename too long" errors.  You may have seen this in your
> logs.
> >>
> >> Note also that you may be seeing an entirely un-related problem.
> >>
> >> You may wish to post you're VDSM logs and the qemu log from
> >> /var/lib/libvirt/qemu/<vm_name>.log
> >>
> >> Hope this helps
> >>
> >> CC
> >>
> >>
> >> On Sun, Jun 12, 2016 at 4:45 PM, gregor <gregor_forum at catrix.at
> >> <mailto:gregor_forum at catrix.at>> wrote:
> >>
> >>     Sound's bad. Recreating the VM is no way because this is a
> productive
> >>     VM. During testing I need to recreate it more than once. oVirt works
> >>     perfect which Linux VM's but when it comes to Windows VM's we get
> lots
> >>     of problems.
> >>
> >>     Which OS you used on the problematic VM?
> >>
> >>     cheers
> >>     gregor
> >>
> >>     On 11/06/16 19:22, Anantha Raghava wrote:
> >>     > Hi,
> >>     >
> >>     > Even I observed this behaviour.
> >>     >
> >>     > When we take the snapshot, the main VM using which the snapshot
> was
> >>     > taken is shutdown and a new VM with external-<VMName> comes to
> >>     life. We
> >>     > cannot get the original VM back to life, but a clone starts
> >>     functioning.
> >>     >
> >>     > We cannot remove the snapshot whether or not the VM is running. I
> >>     had to
> >>     > remove the entire VM that came to life with snapshot and recreate
> the
> >>     > entire VM from scratch. Luckily the VM was still not in
> production,
> >>     > hence could afford it.
> >>     >
> >>     > First I could not understand, why, when a snapshot is created,
> the VM
> >>     > with snapshot comes to life and starts running and not the
> >>     original VM.
> >>     >
> >>     > Is it necessary that we shutdown the VM before taking snapshots?
> >>     > Snapshot is supposed to be a backup of original VM, that unless we
> >>     > restore by cloning should not come to life as I understand.
> >>     >
> >>     > --
> >>     >
> >>     > Thanks & Regards,
> >>     >
> >>     > Anantha Raghava
> >>     >
> >>     >
> >>     > On Saturday 11 June 2016 08:09 PM, gregor wrote:
> >>     >> Hi,
> >>     >>
> >>     >> a VM has snapshots which are unable to remove during when the VM
> >>     is up.
> >>     >> Therefore I power down the Windows Server 2012 VM. The snapshots
> are
> >>     >> still unable to remove and the VM can't boot anymore !!!
> >>     >>
> >>     >> This is the message from engine.log
> >>     >>
> >>     >> ------------------
> >>     >> Message: VM srv03 is down with error. Exit message: Bad volume
> >>     specification
> >>     >> ------------------
> >>     >>
> >>     >> Clone is not possible I get:
> >>     >> ------------------
> >>     >> Message: VDSM command failed: Image is not a legal chain
> >>     >> ------------------
> >>     >>
> >>     >> All others VM's can be powered down and start without any
> problem.
> >>     >> What can I do?
> >>     >> This is very important because now no one can work :-( !!!
> >>     >>
> >>     >> cheers
> >>     >> gregor
> >>     >> _______________________________________________
> >>     >> Users mailing list
> >>     >> Users at ovirt.org <mailto:Users at ovirt.org>
> >>     >> http://lists.ovirt.org/mailman/listinfo/users
> >>     >
> >>     _______________________________________________
> >>     Users mailing list
> >>     Users at ovirt.org <mailto:Users at ovirt.org>
> >>     http://lists.ovirt.org/mailman/listinfo/users
> >>
> >>
> > _______________________________________________
> > Users mailing list
> > Users at ovirt.org
> > http://lists.ovirt.org/mailman/listinfo/users
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160613/0e2ea5a4/attachment-0001.html>


More information about the Users mailing list