Hi,
This is not happening every time, the last time i had this, it was a
script runnning, and something like th 9. Vm and the 23. Vm had a problem,
and it is not always the same VMS, it is not about the OS (happen for
Windows and Linux alike)
And as i said it also happened when i tried to remove the snapshots
sequentially, here is the code (i know it is probably not the elegant way,
but i am not a developer) and the code actually has correct indentions.
<― snip ―>
print "Snapshot deletion"
try:
time.sleep(300)
Connect()
vms = api.vms.list()
for vm in vms:
print ("Deleting snapshots for %s ") % vm.name
snapshotlist = vm.snapshots.list()
for snapshot in snapshotlist:
if snapshot.description != "Active VM":
time.sleep(30)
snapshot.delete()
try:
while
api.vms.get(name=vm.name).snapshots.get(id=snapshot.id).snapshot_status ==
"locked":
print("Waiting for snapshot %s on %s deletion to
finish") % (snapshot.description, vm.name)
time.sleep(60)
except Exception as e:
print ("Snapshot %s does not exist anymore") %
snapshot.description
print ("Snapshot deletion for %s done") % vm.name
print ("Deletion of snapshots done")
api.disconnect()
except Exception as e:
print ("Something went wrong when deleting the snapshots\n%s") % str(e)
<― snip ―>
Cheers
Soeren
On 03/06/15 15:20, "Adam Litke" <alitke(a)redhat.com> wrote:
On 03/06/15 07:36 +0000, Soeren Malchow wrote:
>Dear Adam
>
>First we were using a python script that was working on 4 threads and
>therefore removing 4 snapshots at the time throughout the cluster, that
>still caused problems.
>
>Now i took the snapshot removing out of the threaded part an i am just
>looping through each snapshot on each VM one after another, even with
>³sleeps² inbetween, but the problem remains.
>But i am getting the impression that it is a problem with the amount of
>snapshots that are deleted in a certain time, if i delete manually and
>one
>after another (meaning every 10 min or so) i do not have problems, if i
>delete manually and do several at once and on one VM the next one just
>after one finished, the risk seems to increase.
Hmm. In our lab we extensively tested removing a snapshot for a VM
with 4 disks. This means 4 block jobs running simultaneously. Less
than 10 minutes later (closer to 1 minute) we would remove a second
snapshot for the same VM (again involving 4 block jobs). I guess we
should rerun this flow on a fully updated CentOS 7.1 host to see about
local reproduction. Seems your case is much simpler than this though.
Is this happening every time or intermittently?
>I do not think it is the number of VMS because we had this on hosts with
>only 3 or 4 Vms running
>
>I will try restarting the libvirt and see what happens.
>
>We are not using RHEL 7.1 only CentOS 7.1
>
>Is there anything else we can look at when this happens again ?
I'll defer to Eric Blake for the libvirt side of this. Eric, would
enabling debug logging in libvirtd help to shine some light on the
problem?
--
Adam Litke