[ovirt-users] Can't start VM after shutdown

Mon Jun 13 09:11:53 UTC 2016

The VM has two disks both are VirtIO. During testing its now clear that
the problem occur only with two disks. When I select only one disk for
the snapshost it works.
Is this a problem of oVirt or is it not possible to use two disks on a
VM in oVirt?

Have you also two or more disks on your VM?

Here are the Testresults:
-------------------------
What does not work:
- Export the VM: Failed with error "ImageIsNotLegalChain and code 262"
- Clone the VM: Failed with error "IRSErrorException: Image is not a
legal chain" with the ID of the second Disk.

After removing the second Disk:
- Create offline snapshot: Works
- Remove offline snapshot: After two hours I run "engine-setup
--offline" to clean the looked snapshot !!!
- Export the VM: Works
- Import the exported VM: Works
- Add Disk to the imported VM: Works
- Create offline snapshot of the imported VM: Failed
- Clone the VM: Works
- Add Disk to the cloned VM: Works
- Create offline snapshot of the cloned VM: Failed

What works:
- Make offline snapshot only with the system disk: Works
- Remove offline snapshot of the system disk: Works
- Make online snapshot only with the system disk: Works
- Remove online snapshot of the system disk: Works

cheers
gregor

On 12/06/16 19:42, gregor wrote:
> Hi,
> 
> I solved my problem, here are the steps but be carefully if you don't
> know what the commands did and how to restore from backup don't follow this:
> 
> - ssh to the host
> - systemctl stop ovirt-engine
> - backup the database with "engine-backup"
> - navigate to the image files
> - backup the images: sudo -u vdsm rsync -av <uuid> <uuid_backup>
> - check which one is the backing file: qemu-img info <file>
> - check for damages: qemu-img check <file>
> - qemu-img commit <snapshot file>
> - rename the <snapshot file> + .lease and .meta so it can't be accessed
> 
> - vmname=srv03
> - db=engine
> - sudo -u postgres psql $db -c "SELECT b.disk_alias, s.description,
> s.snapshot_id, i.creation_date, s.status, i.imagestatus, i.size,
> i.image_group_id, i.vm_snapshot_id, i.image_guid, i.parentid, i.active
> FROM images as i JOIN snapshots AS s ON (i.vm_snapshot_id =
> s.snapshot_id) LEFT JOIN vm_static AS v ON (s.vm_id = v.vm_guid) JOIN
> base_disks AS b ON (i.image_group_id = b.disk_id) WHERE v.vm_name =
> '$vmname' ORDER BY creation_date, description, disk_alias"
> 
> - note the image_guid and parent_id from the broken snapshot and the
> active snapshot, the active state is the image_guuid with the parentid
> 00000000-0000-0000-0000-000000000000
> - igid_active=<active uuid>
> - igid_broken=<broken uuid>
> - the parentid of the image_guuid of the broken snapshot must be the
> same as the activ snapshots image_guuid
> - note the snapshot id
> - sid_active=<id of the active snapshot with parrent id 000000>
> - sid_broken=<id of the broken shapshot>
> 
> - delete the broken snapshot
> - sudo -u postgres psql $db -c "DELETE FROM snapshots AS s WHERE
> s.snapshot_id = '$sid_broken'"
> 
> - pid_new=00000000-0000-0000-0000-000000000000
> - sudo -u postgres psql $db -c "SELECT * FROM images WHERE
> vm_snapshot_id = '$sid_active' AND image_guid = '$igid_broken'"
> - sudo -u postgres psql $db -c "DELETE FROM images WHERE vm_snapshot_id
> = '$sid_broken' AND image_guid = '$igid_active'"
> - sudo -u postgres psql $db -c "SELECT * FROM image_storage_domain_map
> WHERE image_id = '$igid_broken'"
> - sudo -u postgres psql $db -c "DELETE FROM image_storage_domain_map
> WHERE image_id = '$igid_broken'"
> - sudo -u postgres psql $db -c "UPDATE images SET image_guid =
> '$igid_active', parentid = '$pid_new' WHERE vm_snapshot_id =
> '$sid_active' AND image_guid = '$igid_broken'"
> - sudo -u postgres psql $db -c "SELECT * FROM image_storage_domain_map"
> - storid=<storage_domain_id>
> - diskprofileid=<disk_profile_id>
> - sudo -u postgres psql $db -c "INSERT INTO image_storage_domain_map
> (image_id, storage_domain_id, disk_profile_id) VALUES ('$igid_broken',
> '$stor_id', '$diskprofileid')"
> 
> - check values
> - sudo -u postgres psql $db -c "SELECT b.disk_alias, s.description,
> s.snapshot_id, i.creation_date, s.status, i.imagestatus, i.size,
> i.image_group_id, i.vm_snapshot_id, i.image_guid, i.parentid, i.active
> FROM images as i JOIN snapshots AS s ON (i.vm_snapshot_id =
> s.snapshot_id) LEFT JOIN vm_static AS v ON (s.vm_id = v.vm_guid) JOIN
> base_disks AS b ON (i.image_group_id = b.disk_id) WHERE v.vm_name =
> '$vmname' ORDER BY creation_date, description, disk_alias"could not
> change directory to "/root/Backups/oVirt"
> 
> - check for errors
> - engine-setup --offline
> - systemctl start ovirt-engine
> 
> Now you should have a clean state and a working VM ;-)
> 
> What was tested:
> - Power up and down the VM
> 
> What does not work:
> - Its not possible to make offline snapshots, online was not tested
> because I will not getting into such trouble again. It took many hours
> after the machine is up again.
> 
> PLEASE be aware and don't destroy your Host and VM !!!
> 
> cheers
> gregor
> 
> On 12/06/16 13:40, Colin Coe wrote:
>> We've seen this with both Linux and Windows VMs.  I'm guessing that
>> you've had failures on this VM in both snapshot create and delete
>> operations.  oVirt/RHEV 3.5 seems particularly affected.  I'm told that
>> oVirt 3.6.7 has the last of the fixes for these known snapshot problems.
>>
>> My original email was eorded wrong.  I meant that qemu-img gives
>> "backing filename too long" errors.  You may have seen this in your logs.
>>
>> Note also that you may be seeing an entirely un-related problem.
>>
>> You may wish to post you're VDSM logs and the qemu log from
>> /var/lib/libvirt/qemu/<vm_name>.log
>>
>> Hope this helps
>>
>> CC
>>
>>
>> On Sun, Jun 12, 2016 at 4:45 PM, gregor <gregor_forum at catrix.at
>> <mailto:gregor_forum at catrix.at>> wrote:
>>
>>     Sound's bad. Recreating the VM is no way because this is a productive
>>     VM. During testing I need to recreate it more than once. oVirt works
>>     perfect which Linux VM's but when it comes to Windows VM's we get lots
>>     of problems.
>>
>>     Which OS you used on the problematic VM?
>>
>>     cheers
>>     gregor
>>
>>     On 11/06/16 19:22, Anantha Raghava wrote:
>>     > Hi,
>>     >
>>     > Even I observed this behaviour.
>>     >
>>     > When we take the snapshot, the main VM using which the snapshot was
>>     > taken is shutdown and a new VM with external-<VMName> comes to
>>     life. We
>>     > cannot get the original VM back to life, but a clone starts
>>     functioning.
>>     >
>>     > We cannot remove the snapshot whether or not the VM is running. I
>>     had to
>>     > remove the entire VM that came to life with snapshot and recreate the
>>     > entire VM from scratch. Luckily the VM was still not in production,
>>     > hence could afford it.
>>     >
>>     > First I could not understand, why, when a snapshot is created, the VM
>>     > with snapshot comes to life and starts running and not the
>>     original VM.
>>     >
>>     > Is it necessary that we shutdown the VM before taking snapshots?
>>     > Snapshot is supposed to be a backup of original VM, that unless we
>>     > restore by cloning should not come to life as I understand.
>>     >
>>     > --
>>     >
>>     > Thanks & Regards,
>>     >
>>     > Anantha Raghava
>>     >
>>     >
>>     > On Saturday 11 June 2016 08:09 PM, gregor wrote:
>>     >> Hi,
>>     >>
>>     >> a VM has snapshots which are unable to remove during when the VM
>>     is up.
>>     >> Therefore I power down the Windows Server 2012 VM. The snapshots are
>>     >> still unable to remove and the VM can't boot anymore !!!
>>     >>
>>     >> This is the message from engine.log
>>     >>
>>     >> ------------------
>>     >> Message: VM srv03 is down with error. Exit message: Bad volume
>>     specification
>>     >> ------------------
>>     >>
>>     >> Clone is not possible I get:
>>     >> ------------------
>>     >> Message: VDSM command failed: Image is not a legal chain
>>     >> ------------------
>>     >>
>>     >> All others VM's can be powered down and start without any problem.
>>     >> What can I do?
>>     >> This is very important because now no one can work :-( !!!
>>     >>
>>     >> cheers
>>     >> gregor
>>     >> _______________________________________________
>>     >> Users mailing list
>>     >> Users at ovirt.org <mailto:Users at ovirt.org>
>>     >> http://lists.ovirt.org/mailman/listinfo/users
>>     >
>>     _______________________________________________
>>     Users mailing list
>>     Users at ovirt.org <mailto:Users at ovirt.org>
>>     http://lists.ovirt.org/mailman/listinfo/users
>>
>>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>