[ovirt-users] Bug in Snapshot Removing

Soeren Malchow soeren.malchow at mcon.net
Thu Jun 11 11:00:31 UTC 2015


We are still having this problem and we can not figure out what to do, i
sent the logs already as download, can i do anything else to help ?




On 04/06/15 17:08, "Soeren Malchow" <soeren.malchow at mcon.net> wrote:

>Hi,
>
>I would send those, but unfortunately we did not think about the journals
>getting deleted after a reboot.
>
>I just made the journals persistent on the servers, we are trying to
>trigger the error again last time we only got half way through the VM’s
>when removing the snapshots so we have a good chance that it comes up
>again.
>
>Also the libvirt logs to the journal not to libvirtd.log, i would send the
>journal directly to you and Eric via our data exchange servers
>
>
>Soeren 
>
>On 04/06/15 16:17, "Adam Litke" <alitke at redhat.com> wrote:
>
>>On 04/06/15 13:08 +0000, Soeren Malchow wrote:
>>>Hi Adam, Hi Eric,
>>>
>>>We had this issue again a few minutes ago.
>>>
>>>One machine went down exactly the same way as described, the machine had
>>>only one snapshot and it was the only snapshot that was removed, before
>>>that in the same scriptrun we deleted the snapshots of 15 other Vms,
>>>some
>>>without, some with 1 and some with several snapshots.
>>>
>>>Can i provide anything from the logs that helps ?
>>
>>Let's start with the libvirtd.log on that host.  It might be rather
>>large so we may need to find a creative place to host it.
>>
>>>
>>>Regards
>>>Soeren
>>>
>>>
>>>
>>>On 03/06/15 18:07, "Soeren Malchow" <soeren.malchow at mcon.net> wrote:
>>>
>>>>Hi,
>>>>
>>>>This is not happening every time, the last time i had this, it was a
>>>>script runnning, and something like th 9. Vm and the 23. Vm had a
>>>>problem,
>>>>and it is not always the same VMS, it is not about the OS (happen for
>>>>Windows and Linux alike)
>>>>
>>>>And as i said it also happened when i tried to remove the snapshots
>>>>sequentially, here is the code (i know it is probably not the elegant
>>>>way,
>>>>but i am not a developer) and the code actually has correct indentions.
>>>>
>>>><― snip ―>
>>>>
>>>>print "Snapshot deletion"
>>>>try:
>>>>    time.sleep(300)
>>>>    Connect()
>>>>    vms = api.vms.list()
>>>>    for vm in vms:
>>>>        print ("Deleting snapshots for %s ") % vm.name
>>>>        snapshotlist = vm.snapshots.list()
>>>>        for snapshot in snapshotlist:
>>>>            if snapshot.description != "Active VM":
>>>>                time.sleep(30)
>>>>                snapshot.delete()
>>>>                try:
>>>>                    while
>>>>api.vms.get(name=vm.name).snapshots.get(id=snapshot.id).snapshot_status
>>>>==
>>>>"locked":
>>>>                        print("Waiting for snapshot %s on %s deletion
>>>>to
>>>>finish") % (snapshot.description, vm.name)
>>>>                        time.sleep(60)
>>>>                except Exception as e:
>>>>                    print ("Snapshot %s does not exist anymore") %
>>>>snapshot.description
>>>>        print ("Snapshot deletion for %s done") % vm.name
>>>>    print ("Deletion of snapshots done")
>>>>    api.disconnect()
>>>>except Exception as e:
>>>>    print ("Something went wrong when deleting the snapshots\n%s") %
>>>>str(e)
>>>>
>>>>
>>>>
>>>><― snip ―>
>>>>
>>>>
>>>>Cheers
>>>>Soeren
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>On 03/06/15 15:20, "Adam Litke" <alitke at redhat.com> wrote:
>>>>
>>>>>On 03/06/15 07:36 +0000, Soeren Malchow wrote:
>>>>>>Dear Adam
>>>>>>
>>>>>>First we were using a python script that was working on 4 threads and
>>>>>>therefore removing 4 snapshots at the time throughout the cluster,
>>>>>>that
>>>>>>still caused problems.
>>>>>>
>>>>>>Now i took the snapshot removing out of the threaded part an i am
>>>>>>just
>>>>>>looping through each snapshot on each VM one after another, even with
>>>>>>³sleeps² inbetween, but the problem remains.
>>>>>>But i am getting the impression that it is a problem with the amount
>>>>>>of
>>>>>>snapshots that are deleted in a certain time, if i delete manually
>>>>>>and
>>>>>>one
>>>>>>after another (meaning every 10 min or so) i do not have problems, if
>>>>>>i
>>>>>>delete manually and do several at once and on one VM the next one
>>>>>>just
>>>>>>after one finished, the risk seems to increase.
>>>>>
>>>>>Hmm.  In our lab we extensively tested removing a snapshot for a VM
>>>>>with 4 disks.  This means 4 block jobs running simultaneously.  Less
>>>>>than 10 minutes later (closer to 1 minute) we would remove a second
>>>>>snapshot for the same VM (again involving 4 block jobs).  I guess we
>>>>>should rerun this flow on a fully updated CentOS 7.1 host to see about
>>>>>local reproduction.  Seems your case is much simpler than this though.
>>>>>Is this happening every time or intermittently?
>>>>>
>>>>>>I do not think it is the number of VMS because we had this on hosts
>>>>>>with
>>>>>>only 3 or 4 Vms running
>>>>>>
>>>>>>I will try restarting the libvirt and see what happens.
>>>>>>
>>>>>>We are not using RHEL 7.1 only CentOS 7.1
>>>>>>
>>>>>>Is there anything else we can look at when this happens again ?
>>>>>
>>>>>I'll defer to Eric Blake for the libvirt side of this.  Eric, would
>>>>>enabling debug logging in libvirtd help to shine some light on the
>>>>>problem?
>>>>>
>>>>>--
>>>>>Adam Litke
>>>>
>>>>_______________________________________________
>>>>Users mailing list
>>>>Users at ovirt.org
>>>>http://lists.ovirt.org/mailman/listinfo/users
>>>
>>
>>-- 
>>Adam Litke
>
>_______________________________________________
>Users mailing list
>Users at ovirt.org
>http://lists.ovirt.org/mailman/listinfo/users



More information about the Users mailing list