[ovirt-users] Bug in Snapshot Removing

Soeren Malchow soeren.malchow at mcon.net
Wed Jun 3 07:36:01 UTC 2015


Dear Adam

First we were using a python script that was working on 4 threads and
therefore removing 4 snapshots at the time throughout the cluster, that
still caused problems.

Now i took the snapshot removing out of the threaded part an i am just
looping through each snapshot on each VM one after another, even with
³sleeps² inbetween, but the problem remains.
But i am getting the impression that it is a problem with the amount of
snapshots that are deleted in a certain time, if i delete manually and one
after another (meaning every 10 min or so) i do not have problems, if i
delete manually and do several at once and on one VM the next one just
after one finished, the risk seems to increase.

I do not think it is the number of VMS because we had this on hosts with
only 3 or 4 Vms running

I will try restarting the libvirt and see what happens.

We are not using RHEL 7.1 only CentOS 7.1

Is there anything else we can look at when this happens again ?

Regards
Soeren 



On 02/06/15 18:53, "Adam Litke" <alitke at redhat.com> wrote:

>Hello Soeren.
>
>I've started to look at this issue and I'd agree that at first glance
>it looks like a libvirt issue.  The 'cannot acquire state change lock'
>messages suggest a locking bug or severe contention at least.  To help
>me better understand the problem I have a few questions about your
>setup.
>
>From your earlier report it appears that you have 15 VMs running on
>the failing host.  Are you attempting to remove snapshots from all VMs
>at the same time?  Have you tried with fewer concurrent operations?
>I'd be curious to understand if the problem is connected to the
>number of VMs running or the number of active block jobs.
>
>Have you tried RHEL-7.1 as a hypervisor host?
>
>Rather than rebooting the host, does restarting libvirtd cause the VMs
>to become responsive again?  Note that this operation may cause the
>host to move to Unresponsive state in the UI for a short period of
>time.
>
>Thanks for your report.
>
>On 31/05/15 23:39 +0000, Soeren Malchow wrote:
>>And sorry, another update, it does kill the VM partly, it was still
>>pingable when i wrote the last mail, but no ssh and no spice console
>>possible
>>
>>From: Soeren Malchow
>><soeren.malchow at mcon.net<mailto:soeren.malchow at mcon.net>>
>>Date: Monday 1 June 2015 01:35
>>To: Soeren Malchow
>><soeren.malchow at mcon.net<mailto:soeren.malchow at mcon.net>>,
>>"libvirt-users at redhat.com<mailto:libvirt-users at redhat.com>"
>><libvirt-users at redhat.com<mailto:libvirt-users at redhat.com>>, users
>><users at ovirt.org<mailto:users at ovirt.org>>
>>Subject: Re: [ovirt-users] Bug in Snapshot Removing
>>
>>Small addition again:
>>
>>This error shows up in the log while removing snapshots WITHOUT
>>rendering the Vms unresponsive
>>
>>>>Jun 01 01:33:45 mc-dc3ham-compute-02-live.mc.mcon.net libvirtd[1657]:
>>Timed out during operation: cannot acquire state change lock
>>Jun 01 01:33:45 mc-dc3ham-compute-02-live.mc.mcon.net vdsm[6839]: vdsm
>>vm.Vm ERROR vmId=`56848f4a-cd73-4eda-bf79-7eb80ae569a9`::Error getting
>>block job info
>>                 
>>Traceback (most recent call last):
>>                                                                    File
>>"/usr/share/vdsm/virt/vm.py", line 5759, in queryBlockJobsŠ
>>
>>>>
>>
>>
>>From: Soeren Malchow
>><soeren.malchow at mcon.net<mailto:soeren.malchow at mcon.net>>
>>Date: Monday 1 June 2015 00:56
>>To: "libvirt-users at redhat.com<mailto:libvirt-users at redhat.com>"
>><libvirt-users at redhat.com<mailto:libvirt-users at redhat.com>>, users
>><users at ovirt.org<mailto:users at ovirt.org>>
>>Subject: [ovirt-users] Bug in Snapshot Removing
>>
>>Dear all
>>
>>I am not sure if the mail just did not get any attention between all the
>>mails and this time it is also going to the libvirt mailing list.
>>
>>I am experiencing a problem with VM becoming unresponsive when removing
>>Snapshots (Live Merge) and i think there is a serious problem.
>>
>>Here are the previous mails,
>>
>>http://lists.ovirt.org/pipermail/users/2015-May/033083.html
>>
>>The problem is on a system with everything on the latest version, CentOS
>>7.1 and ovirt 3.5.2.1 all upgrades applied.
>>
>>This Problem did NOT exist before upgrading to CentOS 7.1 with an
>>environment running ovirt 3.5.0 and 3.5.1 and Fedora 20 with the
>>libvirt-preview repo activated.
>>
>>I think this is a bug in libvirt, not ovirt itself, but i am not sure.
>>The actual file throwing the exception is in VDSM
>>(/usr/share/vdsm/virt/vm.py, line 697).
>>
>>We are very willing to help, test and supply log files in anyway we can.
>>
>>Regards
>>Soeren
>>
>
>>_______________________________________________
>>Users mailing list
>>Users at ovirt.org
>>http://lists.ovirt.org/mailman/listinfo/users
>
>
>-- 
>Adam Litke




More information about the Users mailing list