[ovirt-users] Can't remove snapshot

Nir Soffer nsoffer at redhat.com
Fri Mar 18 19:10:50 UTC 2016


On Fri, Mar 18, 2016 at 7:55 PM, Nathanaël Blanchet <blanchet at abes.fr> wrote:
> Hello,
>
> I can create snapshot when no one exists but I'm not able to remove it
> after.

Do you try to remove it when the vm is running?

> It concerns many of my vms, and when stopping them, they can't boot anymore
> because of the illegal status of the disks, this leads me in a critical
> situation
>
> VM fedora23 is down with error. Exit message: Unable to get volume size for
> domain 5ef8572c-0ab5-4491-994a-e4c30230a525 volume
> e5969faa-97ea-41df-809b-cc62161ab1bc
>
> As far as I didn't initiate any live merge, am I concerned by this bug
> https://bugzilla.redhat.com/show_bug.cgi?id=1306741?
> I'm running 3.6.2, will upgrade to 3.6.3 solve this issue?

If you tried to remove a snapshot while the vm is running you did
initiate live merge, and this bug may effect you.

Adding Greg for adding more info about this.

>
> 2016-03-18 18:26:57,652 ERROR
> [org.ovirt.engine.core.bll.RemoveSnapshotCommand]
> (org.ovirt.thread.pool-8-thread-39) [a1e222d] Ending command
> 'org.ovirt.engine.core.bll.RemoveSnapshotCommand' with failure.
> 2016-03-18 18:26:57,663 ERROR
> [org.ovirt.engine.core.bll.RemoveSnapshotCommand]
> (org.ovirt.thread.pool-8-thread-39) [a1e222d] Could not delete image
> '46e9ecc8-e168-4f4d-926c-e769f5df1f2c' from snapshot
> '88fcf167-4302-405e-825f-ad7e0e9f6564'
> 2016-03-18 18:26:57,678 WARN
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (org.ovirt.thread.pool-8-thread-39) [a1e222d] Correlation ID: a1e222d, Job
> ID: 00d3e364-7e47-4022-82ff-f772cd79d4a1, Call Stack: null, Custom Event ID:
> -1, Message: Due to partial snapshot removal, Snapshot 'test' of VM
> 'fedora23' now contains only the following disks: 'fedora23_Disk1'.
> 2016-03-18 18:26:57,695 ERROR
> [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskCommand]
> (org.ovirt.thread.pool-8-thread-39) [724e99fd] Ending command
> 'org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskCommand' with failure.
> 2016-03-18 18:26:57,708 ERROR
> [org.ovirt.engine.core.dal.dbbroker.auditloghandlin
>
> Thank you for your help.
>
>
> Le 23/02/2016 19:51, Greg Padgett a écrit :
>>
>> On 02/22/2016 07:10 AM, Marcelo Leandro wrote:
>>>
>>> Hello,
>>>
>>> The bug with snapshot  it will be fixed in ovirt 3.6.3?
>>>
>>> thanks.
>>>
>>
>> Hi Marcelo,
>>
>> Yes, the bug below (bug 1301709) is now targeted to 3.6.3.
>>
>> Thanks,
>> Greg
>>
>>> 2016-02-18 11:34 GMT-03:00 Adam Litke <alitke at redhat.com>:
>>>>
>>>> On 18/02/16 10:37 +0100, Rik Theys wrote:
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> On 02/17/2016 05:29 PM, Adam Litke wrote:
>>>>>>
>>>>>>
>>>>>> On 17/02/16 11:14 -0500, Greg Padgett wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 02/17/2016 03:42 AM, Rik Theys wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> On 02/16/2016 10:52 PM, Greg Padgett wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 02/16/2016 08:50 AM, Rik Theys wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>   From the above I conclude that the disk with id that ends with
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Similar to what I wrote to Marcelo above in the thread, I'd
>>>>>>>>> recommend
>>>>>>>>> running the "VM disk info gathering tool" attached to [1].  It's
>>>>>>>>> the
>>>>>>>>> best way to ensure the merge was completed and determine which
>>>>>>>>> image
>>>>>>>>> is
>>>>>>>>> the "bad" one that is no longer in use by any volume chains.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I've ran the disk info gathering tool and this outputs (for the
>>>>>>>> affected
>>>>>>>> VM):
>>>>>>>>
>>>>>>>> VM lena
>>>>>>>>      Disk b2390535-744f-4c02-bdc8-5a897226554b
>>>>>>>> (sd:a7ba2db3-517c-408a-8b27-ea45989d6416)
>>>>>>>>      Volumes:
>>>>>>>>          24d78600-22f4-44f7-987b-fbd866736249
>>>>>>>>
>>>>>>>> The id of the volume is the ID of the snapshot that is marked
>>>>>>>> "illegal".
>>>>>>>> So the "bad" image would be the dc39 one, which according to the UI
>>>>>>>> is
>>>>>>>> in use by the "Active VM" snapshot. Can this make sense?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> It looks accurate.  Live merges are "backwards" merges, so the merge
>>>>>>> would have pushed data from the volume associated with "Active VM"
>>>>>>> into the volume associated with the snapshot you're trying to remove.
>>>>>>>
>>>>>>> Upon completion, we "pivot" so that the VM uses that older volume,
>>>>>>> and
>>>>>>> we update the engine database to reflect this (basically we
>>>>>>> re-associate that older volume with, in your case, "Active VM").
>>>>>>>
>>>>>>> In your case, it seems the pivot operation was done, but the database
>>>>>>> wasn't updated to reflect it.  Given snapshot/image associations
>>>>>>> e.g.:
>>>>>>>
>>>>>>>   VM Name  Snapshot Name  Volume
>>>>>>>   -------  -------------  ------
>>>>>>>   My-VM    Active VM      123-abc
>>>>>>>   My-VM    My-Snapshot    789-def
>>>>>>>
>>>>>>> My-VM in your case is actually running on volume 789-def.  If you run
>>>>>>> the db fixup script and supply ("My-VM", "My-Snapshot", "123-abc")
>>>>>>> (note the volume is the newer, "bad" one), then it will switch the
>>>>>>> volume association for you and remove the invalid entries.
>>>>>>>
>>>>>>> Of course, I'd shut down the VM, and back up the db beforehand.
>>>>>
>>>>>
>>>>>
>>>>> I've executed the sql script and it seems to have worked. Thanks!
>>>>>
>>>>>>> "Active VM" should now be unused; it previously (pre-merge) was the
>>>>>>> data written since the snapshot was taken.  Normally the larger
>>>>>>> actual
>>>>>>> size might be from qcow format overhead.  If your listing above is
>>>>>>> complete (ie one volume for the vm), then I'm not sure why the base
>>>>>>> volume would have a larger actual size than virtual size.
>>>>>>>
>>>>>>> Adam, Nir--any thoughts on this?
>>>>>>
>>>>>>
>>>>>>
>>>>>> There is a bug which has caused inflation of the snapshot volumes when
>>>>>> performing a live merge.  We are submitting fixes for 3.5, 3.6, and
>>>>>> master right at this moment.
>>>>>
>>>>>
>>>>>
>>>>> Which bug number is assigned to this bug? Will upgrading to a release
>>>>> with a fix reduce the disk usage again?
>>>>
>>>>
>>>>
>>>> See https://bugzilla.redhat.com/show_bug.cgi?id=1301709 for the bug.
>>>> It's about a clone disk failure after the problem occurs.
>>>> Unfortunately, there is not an automatic way to repair the raw base
>>>> volumes if they were affected by this bug.  They will need to be
>>>> manually shrunk using lvreduce if you are certain that they are
>>>> inflated.
>>>>
>>>>
>>>> --
>>>> Adam Litke
>>>>
>>>> _______________________________________________
>>>> Users mailing list
>>>> Users at ovirt.org
>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users at ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/users
>>>
>>
>> _______________________________________________
>> Users mailing list
>> Users at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>
>
> --
> Nathanaël Blanchet
>
> Supervision réseau
> Pôle Infrastrutures Informatiques
> 227 avenue Professeur-Jean-Louis-Viala
> 34193 MONTPELLIER CEDEX 5
> Tél. 33 (0)4 67 54 84 55
> Fax  33 (0)4 67 54 84 14
> blanchet at abes.fr
>
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users



More information about the Users mailing list