Re: [ovirt-users] Bug in Snapshot Removing

--_000_D1916735D978soerenmalchowmconnet_ Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable Small addition again: This error shows up in the log while removing snapshots WITHOUT rendering t= he Vms unresponsive =97 Jun 01 01:33:45 mc-dc3ham-compute-02-live.mc.mcon.net libvirtd[1657]: Timed= out during operation: cannot acquire state change lock Jun 01 01:33:45 mc-dc3ham-compute-02-live.mc.mcon.net vdsm[6839]: vdsm vm.V= m ERROR vmId=3D`56848f4a-cd73-4eda-bf79-7eb80ae569a9`::Error getting block = job info Traceback= (most recent call last): File "/= usr/share/vdsm/virt/vm.py", line 5759, in queryBlockJobs=85 =97 From: Soeren Malchow <soeren.malchow@mcon.net<mailto:soeren.malchow@mcon.ne= t>> Date: Monday 1 June 2015 00:56 To: "libvirt-users@redhat.com<mailto:libvirt-users@redhat.com>" <libvirt-us= ers@redhat.com<mailto:libvirt-users@redhat.com>>, users <users@ovirt.org<ma= ilto:users@ovirt.org>> Subject: [ovirt-users] Bug in Snapshot Removing Dear all I am not sure if the mail just did not get any attention between all the ma= ils and this time it is also going to the libvirt mailing list. I am experiencing a problem with VM becoming unresponsive when removing Sna= pshots (Live Merge) and i think there is a serious problem. Here are the previous mails, http://lists.ovirt.org/pipermail/users/2015-May/033083.html The problem is on a system with everything on the latest version, CentOS 7.= 1 and ovirt 3.5.2.1 all upgrades applied. This Problem did NOT exist before upgrading to CentOS 7.1 with an environme= nt running ovirt 3.5.0 and 3.5.1 and Fedora 20 with the libvirt-preview rep= o activated. I think this is a bug in libvirt, not ovirt itself, but i am not sure. The = actual file throwing the exception is in VDSM (/usr/share/vdsm/virt/vm.py, = line 697). We are very willing to help, test and supply log files in anyway we can. Regards Soeren --_000_D1916735D978soerenmalchowmconnet_ Content-Type: text/html; charset="Windows-1252" Content-ID: <9FBCF9D40986EE4C94B0FC8E1FE85212@liquidcampaign.com> Content-Transfer-Encoding: quoted-printable <html> <head> <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3DWindows-1= 252"> </head> <body style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-lin= e-break: after-white-space; color: rgb(0, 0, 0); font-size: 14px; font-fami= ly: Calibri, sans-serif;"> <div>Small addition again:</div> <div><br> </div> <div>This error shows up in the log while removing snapshots WITHOUT render= ing the Vms unresponsive</div> <div><br> </div> <div>=97</div> <div> <div>Jun 01 01:33:45 mc-dc3ham-compute-02-live.mc.mcon.net libvirtd[1657]: = Timed out during operation: cannot acquire state change lock</div> <div>Jun 01 01:33:45 mc-dc3ham-compute-02-live.mc.mcon.net vdsm[6839]: vdsm= vm.Vm ERROR vmId=3D`56848f4a-cd73-4eda-bf79-7eb80ae569a9`::Error getting b= lock job info</div> <div> =  = ; &nb= sp; Traceback (most recent call last):</div> <div> =  = ; &nb= sp; File "/usr/share/vdsm/virt/vm.py", line 5759, i= n queryBlockJobs=85</div> </div> <div><br> </div> <div>=97</div> <div><br> </div> <div><br> </div> <div><br> </div> <span id=3D"OLK_SRC_BODY_SECTION"> <div style=3D"font-family:Calibri; font-size:11pt; text-align:left; color:b= lack; BORDER-BOTTOM: medium none; BORDER-LEFT: medium none; PADDING-BOTTOM:= 0in; PADDING-LEFT: 0in; PADDING-RIGHT: 0in; BORDER-TOP: #b5c4df 1pt solid;= BORDER-RIGHT: medium none; PADDING-TOP: 3pt"> <span style=3D"font-weight:bold">From: </span>Soeren Malchow <<a href=3D= "mailto:soeren.malchow@mcon.net">soeren.malchow@mcon.net</a>><br> <span style=3D"font-weight:bold">Date: </span>Monday 1 June 2015 00:56<br> <span style=3D"font-weight:bold">To: </span>"<a href=3D"mailto:libvirt= -users@redhat.com">libvirt-users@redhat.com</a>" <<a href=3D"mailto= :libvirt-users@redhat.com">libvirt-users@redhat.com</a>>, users <<a h= ref=3D"mailto:users@ovirt.org">users@ovirt.org</a>><br> <span style=3D"font-weight:bold">Subject: </span>[ovirt-users] Bug in Snaps= hot Removing<br> </div> <div><br> </div> <div> <div style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line= -break: after-white-space; color: rgb(0, 0, 0); font-size: 14px; font-famil= y: Calibri, sans-serif;"> <div>Dear all</div> <div><br> </div> <div>I am not sure if the mail just did not get any attention between all t= he mails and this time it is also going to the libvirt mailing list.</div> <div><br> </div> <div>I am experiencing a problem with VM becoming unresponsive when removin= g Snapshots (Live Merge) and i think there is a serious problem.</div> <div><br> </div> <div>Here are the previous mails,</div> <div><br> </div> <div><a href=3D"http://lists.ovirt.org/pipermail/users/2015-May/033083.html= ">http://lists.ovirt.org/pipermail/users/2015-May/033083.html</a></div> <div><br> </div> <div>The problem is on a system with everything on the latest version, Cent= OS 7.1 and ovirt 3.5.2.1 all upgrades applied.</div> <div><br> </div> <div>This Problem did NOT exist before upgrading to CentOS 7.1 with an envi= ronment running ovirt 3.5.0 and 3.5.1 and Fedora 20 with the libvirt-previe= w repo activated.</div> <div><br> </div> <div>I think this is a bug in libvirt, not ovirt itself, but i am not sure.= The actual file throwing the exception is in VDSM (/usr/share/vdsm/virt/vm= .py, line 697).</div> <div><br> </div> <div>We are very willing to help, test and supply log files in anyway we ca= n. </div> <div><br> </div> <div>Regards</div> <div>Soeren </div> <div><br> </div> </div> </div> </span> </body> </html> --_000_D1916735D978soerenmalchowmconnet_--

--_000_D1916815D97Csoerenmalchowmconnet_ Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable And sorry, another update, it does kill the VM partly, it was still pingabl= e when i wrote the last mail, but no ssh and no spice console possible From: Soeren Malchow <soeren.malchow@mcon.net<mailto:soeren.malchow@mcon.ne= t>> Date: Monday 1 June 2015 01:35 To: Soeren Malchow <soeren.malchow@mcon.net<mailto:soeren.malchow@mcon.net>=
, "libvirt-users@redhat.com<mailto:libvirt-users@redhat.com>" <libvirt-use= rs@redhat.com<mailto:libvirt-users@redhat.com>>, users <users@ovirt.org<mai= lto:users@ovirt.org>> Subject: Re: [ovirt-users] Bug in Snapshot Removing
Small addition again: This error shows up in the log while removing snapshots WITHOUT rendering t= he Vms unresponsive =97 Jun 01 01:33:45 mc-dc3ham-compute-02-live.mc.mcon.net libvirtd[1657]: Timed= out during operation: cannot acquire state change lock Jun 01 01:33:45 mc-dc3ham-compute-02-live.mc.mcon.net vdsm[6839]: vdsm vm.V= m ERROR vmId=3D`56848f4a-cd73-4eda-bf79-7eb80ae569a9`::Error getting block = job info Traceback= (most recent call last): File "/= usr/share/vdsm/virt/vm.py", line 5759, in queryBlockJobs=85 =97 From: Soeren Malchow <soeren.malchow@mcon.net<mailto:soeren.malchow@mcon.ne= t>> Date: Monday 1 June 2015 00:56 To: "libvirt-users@redhat.com<mailto:libvirt-users@redhat.com>" <libvirt-us= ers@redhat.com<mailto:libvirt-users@redhat.com>>, users <users@ovirt.org<ma= ilto:users@ovirt.org>> Subject: [ovirt-users] Bug in Snapshot Removing Dear all I am not sure if the mail just did not get any attention between all the ma= ils and this time it is also going to the libvirt mailing list. I am experiencing a problem with VM becoming unresponsive when removing Sna= pshots (Live Merge) and i think there is a serious problem. Here are the previous mails, http://lists.ovirt.org/pipermail/users/2015-May/033083.html The problem is on a system with everything on the latest version, CentOS 7.= 1 and ovirt 3.5.2.1 all upgrades applied. This Problem did NOT exist before upgrading to CentOS 7.1 with an environme= nt running ovirt 3.5.0 and 3.5.1 and Fedora 20 with the libvirt-preview rep= o activated. I think this is a bug in libvirt, not ovirt itself, but i am not sure. The = actual file throwing the exception is in VDSM (/usr/share/vdsm/virt/vm.py, = line 697). We are very willing to help, test and supply log files in anyway we can. Regards Soeren --_000_D1916815D97Csoerenmalchowmconnet_ Content-Type: text/html; charset="Windows-1252" Content-ID: <B6827B6A7C21184AA6E9FC852AF33213@liquidcampaign.com> Content-Transfer-Encoding: quoted-printable <html> <head> <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3DWindows-1= 252"> </head> <body style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-lin= e-break: after-white-space; color: rgb(0, 0, 0); font-size: 14px; font-fami= ly: Calibri, sans-serif;"> <div>And sorry, another update, it does kill the VM partly, it was still pi= ngable when i wrote the last mail, but no ssh and no spice console possible= </div> <div><br> </div> <span id=3D"OLK_SRC_BODY_SECTION"> <div style=3D"font-family:Calibri; font-size:11pt; text-align:left; color:b= lack; BORDER-BOTTOM: medium none; BORDER-LEFT: medium none; PADDING-BOTTOM:= 0in; PADDING-LEFT: 0in; PADDING-RIGHT: 0in; BORDER-TOP: #b5c4df 1pt solid;= BORDER-RIGHT: medium none; PADDING-TOP: 3pt"> <span style=3D"font-weight:bold">From: </span>Soeren Malchow <<a href=3D= "mailto:soeren.malchow@mcon.net">soeren.malchow@mcon.net</a>><br> <span style=3D"font-weight:bold">Date: </span>Monday 1 June 2015 01:35<br> <span style=3D"font-weight:bold">To: </span>Soeren Malchow <<a href=3D"m= ailto:soeren.malchow@mcon.net">soeren.malchow@mcon.net</a>>, "<a hr= ef=3D"mailto:libvirt-users@redhat.com">libvirt-users@redhat.com</a>" &= lt;<a href=3D"mailto:libvirt-users@redhat.com">libvirt-users@redhat.com</a>= >, users <<a href=3D"mailto:users@ovirt.org">users@ovirt.org</a>><br> <span style=3D"font-weight:bold">Subject: </span>Re: [ovirt-users] Bug in S= napshot Removing<br> </div> <div><br> </div> <div> <div style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line= -break: after-white-space; color: rgb(0, 0, 0); font-size: 14px; font-famil= y: Calibri, sans-serif;"> <div>Small addition again:</div> <div><br> </div> <div>This error shows up in the log while removing snapshots WITHOUT render= ing the Vms unresponsive</div> <div><br> </div> <div>=97</div> <div> <div>Jun 01 01:33:45 mc-dc3ham-compute-02-live.mc.mcon.net libvirtd[1657]: = Timed out during operation: cannot acquire state change lock</div> <div>Jun 01 01:33:45 mc-dc3ham-compute-02-live.mc.mcon.net vdsm[6839]: vdsm= vm.Vm ERROR vmId=3D`56848f4a-cd73-4eda-bf79-7eb80ae569a9`::Error getting b= lock job info</div> <div> =  = ; &nb= sp; Traceback (most recent call last):</div> <div> =  = ; &nb= sp; File "/usr/share/vdsm/virt/vm.py", line 5759, i= n queryBlockJobs=85</div> </div> <div><br> </div> <div>=97</div> <div><br> </div> <div><br> </div> <div><br> </div> <span id=3D"OLK_SRC_BODY_SECTION"> <div style=3D"font-family:Calibri; font-size:11pt; text-align:left; color:b= lack; BORDER-BOTTOM: medium none; BORDER-LEFT: medium none; PADDING-BOTTOM:= 0in; PADDING-LEFT: 0in; PADDING-RIGHT: 0in; BORDER-TOP: #b5c4df 1pt solid;= BORDER-RIGHT: medium none; PADDING-TOP: 3pt"> <span style=3D"font-weight:bold">From: </span>Soeren Malchow <<a href=3D= "mailto:soeren.malchow@mcon.net">soeren.malchow@mcon.net</a>><br> <span style=3D"font-weight:bold">Date: </span>Monday 1 June 2015 00:56<br> <span style=3D"font-weight:bold">To: </span>"<a href=3D"mailto:libvirt= -users@redhat.com">libvirt-users@redhat.com</a>" <<a href=3D"mailto= :libvirt-users@redhat.com">libvirt-users@redhat.com</a>>, users <<a h= ref=3D"mailto:users@ovirt.org">users@ovirt.org</a>><br> <span style=3D"font-weight:bold">Subject: </span>[ovirt-users] Bug in Snaps= hot Removing<br> </div> <div><br> </div> <div> <div style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line= -break: after-white-space; color: rgb(0, 0, 0); font-size: 14px; font-famil= y: Calibri, sans-serif;"> <div>Dear all</div> <div><br> </div> <div>I am not sure if the mail just did not get any attention between all t= he mails and this time it is also going to the libvirt mailing list.</div> <div><br> </div> <div>I am experiencing a problem with VM becoming unresponsive when removin= g Snapshots (Live Merge) and i think there is a serious problem.</div> <div><br> </div> <div>Here are the previous mails,</div> <div><br> </div> <div><a href=3D"http://lists.ovirt.org/pipermail/users/2015-May/033083.html= ">http://lists.ovirt.org/pipermail/users/2015-May/033083.html</a></div> <div><br> </div> <div>The problem is on a system with everything on the latest version, Cent= OS 7.1 and ovirt 3.5.2.1 all upgrades applied.</div> <div><br> </div> <div>This Problem did NOT exist before upgrading to CentOS 7.1 with an envi= ronment running ovirt 3.5.0 and 3.5.1 and Fedora 20 with the libvirt-previe= w repo activated.</div> <div><br> </div> <div>I think this is a bug in libvirt, not ovirt itself, but i am not sure.= The actual file throwing the exception is in VDSM (/usr/share/vdsm/virt/vm= .py, line 697).</div> <div><br> </div> <div>We are very willing to help, test and supply log files in anyway we ca= n. </div> <div><br> </div> <div>Regards</div> <div>Soeren </div> <div><br> </div> </div> </div> </span></div> </div> </span> </body> </html> --_000_D1916815D97Csoerenmalchowmconnet_--

------=_Part_9087426_1230111981.1433245671829 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Adam, can you take a look at this please?=20 Thanks!=20 ----- Original Message -----
From: "Soeren Malchow" <soeren.malchow@mcon.net> To: "Soeren Malchow" <soeren.malchow@mcon.net>, libvirt-users@redhat.com, "users" <users@ovirt.org> Sent: Monday, June 1, 2015 2:39:24 AM Subject: Re: [ovirt-users] Bug in Snapshot Removing
And sorry, another update, it does kill the VM partly, it was still pinga= ble when i wrote the last mail, but no ssh and no spice console possible
From: Soeren Malchow < soeren.malchow@mcon.net > Date: Monday 1 June 2015 01:35 To: Soeren Malchow < soeren.malchow@mcon.net >, " libvirt-users@redhat.co= m " < libvirt-users@redhat.com >, users < users@ovirt.org > Subject: Re: [ovirt-users] Bug in Snapshot Removing
Small addition again:
This error shows up in the log while removing snapshots WITHOUT rendering= the Vms unresponsive
=E2=80=94 Jun 01 01:33:45 mc-dc3ham-compute-02-live.mc.mcon.net libvirtd[1657]: Tim= ed out during operation: cannot acquire state change lock Jun 01 01:33:45 mc-dc3ham-compute-02-live.mc.mcon.net vdsm[6839]: vdsm vm= .Vm ERROR vmId=3D`56848f4a-cd73-4eda-bf79-7eb80ae569a9`::Error getting block = job info Traceback (most recent call last): File "/usr/share/vdsm/virt/vm.py", line 5759, in queryBlockJobs=E2=80=A6
=E2=80=94
From: Soeren Malchow < soeren.malchow@mcon.net > Date: Monday 1 June 2015 00:56 To: " libvirt-users@redhat.com " < libvirt-users@redhat.com >, users < users@ovirt.org > Subject: [ovirt-users] Bug in Snapshot Removing
Dear all
I am not sure if the mail just did not get any attention between all the mails and this time it is also going to the libvirt mailing list.
I am experiencing a problem with VM becoming unresponsive when removing Snapshots (Live Merge) and i think there is a serious problem.
Here are the previous mails,
The problem is on a system with everything on the latest version, CentOS = 7.1 and ovirt 3.5.2.1 all upgrades applied.
This Problem did NOT exist before upgrading to CentOS 7.1 with an environ= ment running ovirt 3.5.0 and 3.5.1 and Fedora 20 with the libvirt-preview repo activated.
I think this is a bug in libvirt, not ovirt itself, but i am not sure. Th= e actual file throwing the exception is in VDSM (/usr/share/vdsm/virt/vm.py= , line 697).
We are very willing to help, test and supply log files in anyway we can.
Regards Soeren
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
------=_Part_9087426_1230111981.1433245671829 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable <html><body><div style=3D"font-family: times new roman,new york,times,serif= ; font-size: 12pt; color: #000000"><div>Adam, can you take a look at this p= lease?</div><div><br></div><div>Thanks!<br><br></div><div><br></div><hr id= =3D"zwchr"><blockquote style=3D"border-left:2px solid #1010FF;margin-left:5= px;padding-left:5px;color:#000;font-weight:normal;font-style:normal;text-de= coration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;"><b>Fr= om: </b>"Soeren Malchow" <soeren.malchow@mcon.net><br><b>To: </b>"Soe= ren Malchow" <soeren.malchow@mcon.net>, libvirt-users@redhat.com, "us= ers" <users@ovirt.org><br><b>Sent: </b>Monday, June 1, 2015 2:39:24 A= M<br><b>Subject: </b>Re: [ovirt-users] Bug in Snapshot Removing<br><div><br=
</div>
>, "<a href=3D"mailto:libvirt-users@redhat.com" target=3D"_blank">libvi= rt-users@redhat.com</a>" <<a href=3D"mailto:libvirt-users@redhat.com" ta= rget=3D"_blank">libvirt-users@redhat.com</a>>, users <<a href=3D"mailto:users@ovirt.org" target=3D"_blank">users@ovirt= .org</a>><br> <span style=3D"font-weight:bold">Subject: </span>Re: [ovirt-users] Bug in S= napshot Removing<br> </div> <div><br> </div> <div> <div style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line= -break: after-white-space; color: rgb(0, 0, 0); font-size: 14px; font-famil= y: Calibri, sans-serif;"> <div>Small addition again:</div> <div><br> </div> <div>This error shows up in the log while removing snapshots WITHOUT render= ing the Vms unresponsive</div> <div><br> </div> <div>=E2=80=94</div> <div> <div>Jun 01 01:33:45 mc-dc3ham-compute-02-live.mc.mcon.net libvirtd[1657]: = Timed out during operation: cannot acquire state change lock</div> <div>Jun 01 01:33:45 mc-dc3ham-compute-02-live.mc.mcon.net vdsm[6839]: vdsm= vm.Vm ERROR vmId=3D`56848f4a-cd73-4eda-bf79-7eb80ae569a9`::Error getting b= lock job info</div> <div> =  = ; &nb= sp; Traceback (most recent call last):</div> <div> =  = ; &nb= sp; File "/usr/share/vdsm/virt/vm.py", line 5759, in queryBlo= ckJobs=E2=80=A6</div> </div> <div><br> </div> <div>=E2=80=94</div> <div><br> </div> <div><br> </div> <div><br> </div> <span id=3D"OLK_SRC_BODY_SECTION"> <div style=3D"font-family:Calibri; font-size:11pt; text-align:left; color:b= lack; BORDER-BOTTOM: medium none; BORDER-LEFT: medium none; PADDING-BOTTOM:= 0in; PADDING-LEFT: 0in; PADDING-RIGHT: 0in; BORDER-TOP: #b5c4df 1pt solid;= BORDER-RIGHT: medium none; PADDING-TOP: 3pt"> <span style=3D"font-weight:bold">From: </span>Soeren Malchow <<a href=3D= "mailto:soeren.malchow@mcon.net" target=3D"_blank">soeren.malchow@mcon.net<= /a>><br> <span style=3D"font-weight:bold">Date: </span>Monday 1 June 2015 00:56<br> <span style=3D"font-weight:bold">To: </span>"<a href=3D"mailto:libvirt-user= s@redhat.com" target=3D"_blank">libvirt-users@redhat.com</a>" <<a href= =3D"mailto:libvirt-users@redhat.com" target=3D"_blank">libvirt-users@redhat= .com</a>>, users <<a href=3D"mailto:users@ovirt.org" target=3D"_blank= ">users@ovirt.org</a>><br> <span style=3D"font-weight:bold">Subject: </span>[ovirt-users] Bug in Snaps= hot Removing<br> </div> <div><br> </div> <div> <div style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line= -break: after-white-space; color: rgb(0, 0, 0); font-size: 14px; font-famil= y: Calibri, sans-serif;"> <div>Dear all</div> <div><br> </div> <div>I am not sure if the mail just did not get any attention between all t= he mails and this time it is also going to the libvirt mailing list.</div> <div><br> </div> <div>I am experiencing a problem with VM becoming unresponsive when removin= g Snapshots (Live Merge) and i think there is a serious problem.</div> <div><br> </div> <div>Here are the previous mails,</div> <div><br> </div> <div><a href=3D"http://lists.ovirt.org/pipermail/users/2015-May/033083.html= " target=3D"_blank">http://lists.ovirt.org/pipermail/users/2015-May/033083.=
<div>And sorry, another update, it does kill the VM partly, it was still pi= ngable when i wrote the last mail, but no ssh and no spice console possible= </div> <div><br> </div> <span id=3D"OLK_SRC_BODY_SECTION"> <div style=3D"font-family:Calibri; font-size:11pt; text-align:left; color:b= lack; BORDER-BOTTOM: medium none; BORDER-LEFT: medium none; PADDING-BOTTOM:= 0in; PADDING-LEFT: 0in; PADDING-RIGHT: 0in; BORDER-TOP: #b5c4df 1pt solid;= BORDER-RIGHT: medium none; PADDING-TOP: 3pt"> <span style=3D"font-weight:bold">From: </span>Soeren Malchow <<a href=3D= "mailto:soeren.malchow@mcon.net" target=3D"_blank">soeren.malchow@mcon.net<= /a>><br> <span style=3D"font-weight:bold">Date: </span>Monday 1 June 2015 01:35<br> <span style=3D"font-weight:bold">To: </span>Soeren Malchow <<a href=3D"m= ailto:soeren.malchow@mcon.net" target=3D"_blank">soeren.malchow@mcon.net</a= html</a></div> <div><br> </div> <div>The problem is on a system with everything on the latest version, Cent= OS 7.1 and ovirt 3.5.2.1 all upgrades applied.</div> <div><br> </div> <div>This Problem did NOT exist before upgrading to CentOS 7.1 with an envi= ronment running ovirt 3.5.0 and 3.5.1 and Fedora 20 with the libvirt-previe= w repo activated.</div> <div><br> </div> <div>I think this is a bug in libvirt, not ovirt itself, but i am not sure.= The actual file throwing the exception is in VDSM (/usr/share/vdsm/virt/vm= .py, line 697).</div> <div><br> </div> <div>We are very willing to help, test and supply log files in anyway we ca= n. </div> <div><br> </div> <div>Regards</div> <div>Soeren </div> <div><br> </div> </div> </div> </span></div> </div> </span> <br>_______________________________________________<br>Users mailing list<b= r>Users@ovirt.org<br>http://lists.ovirt.org/mailman/listinfo/users<br></blo= ckquote><div><br></div></div></body></html> ------=_Part_9087426_1230111981.1433245671829--

From your earlier report it appears that you have 15 VMs running on
Hello Soeren. I've started to look at this issue and I'd agree that at first glance it looks like a libvirt issue. The 'cannot acquire state change lock' messages suggest a locking bug or severe contention at least. To help me better understand the problem I have a few questions about your setup. the failing host. Are you attempting to remove snapshots from all VMs at the same time? Have you tried with fewer concurrent operations? I'd be curious to understand if the problem is connected to the number of VMs running or the number of active block jobs. Have you tried RHEL-7.1 as a hypervisor host? Rather than rebooting the host, does restarting libvirtd cause the VMs to become responsive again? Note that this operation may cause the host to move to Unresponsive state in the UI for a short period of time. Thanks for your report. On 31/05/15 23:39 +0000, Soeren Malchow wrote:
And sorry, another update, it does kill the VM partly, it was still pingable when i wrote the last mail, but no ssh and no spice console possible
From: Soeren Malchow <soeren.malchow@mcon.net<mailto:soeren.malchow@mcon.net>> Date: Monday 1 June 2015 01:35 To: Soeren Malchow <soeren.malchow@mcon.net<mailto:soeren.malchow@mcon.net>>, "libvirt-users@redhat.com<mailto:libvirt-users@redhat.com>" <libvirt-users@redhat.com<mailto:libvirt-users@redhat.com>>, users <users@ovirt.org<mailto:users@ovirt.org>> Subject: Re: [ovirt-users] Bug in Snapshot Removing
Small addition again:
This error shows up in the log while removing snapshots WITHOUT rendering the Vms unresponsive
— Jun 01 01:33:45 mc-dc3ham-compute-02-live.mc.mcon.net libvirtd[1657]: Timed out during operation: cannot acquire state change lock Jun 01 01:33:45 mc-dc3ham-compute-02-live.mc.mcon.net vdsm[6839]: vdsm vm.Vm ERROR vmId=`56848f4a-cd73-4eda-bf79-7eb80ae569a9`::Error getting block job info Traceback (most recent call last): File "/usr/share/vdsm/virt/vm.py", line 5759, in queryBlockJobs…
—
From: Soeren Malchow <soeren.malchow@mcon.net<mailto:soeren.malchow@mcon.net>> Date: Monday 1 June 2015 00:56 To: "libvirt-users@redhat.com<mailto:libvirt-users@redhat.com>" <libvirt-users@redhat.com<mailto:libvirt-users@redhat.com>>, users <users@ovirt.org<mailto:users@ovirt.org>> Subject: [ovirt-users] Bug in Snapshot Removing
Dear all
I am not sure if the mail just did not get any attention between all the mails and this time it is also going to the libvirt mailing list.
I am experiencing a problem with VM becoming unresponsive when removing Snapshots (Live Merge) and i think there is a serious problem.
Here are the previous mails,
http://lists.ovirt.org/pipermail/users/2015-May/033083.html
The problem is on a system with everything on the latest version, CentOS 7.1 and ovirt 3.5.2.1 all upgrades applied.
This Problem did NOT exist before upgrading to CentOS 7.1 with an environment running ovirt 3.5.0 and 3.5.1 and Fedora 20 with the libvirt-preview repo activated.
I think this is a bug in libvirt, not ovirt itself, but i am not sure. The actual file throwing the exception is in VDSM (/usr/share/vdsm/virt/vm.py, line 697).
We are very willing to help, test and supply log files in anyway we can.
Regards Soeren
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- Adam Litke

Dear Adam First we were using a python script that was working on 4 threads and therefore removing 4 snapshots at the time throughout the cluster, that still caused problems. Now i took the snapshot removing out of the threaded part an i am just looping through each snapshot on each VM one after another, even with ³sleeps² inbetween, but the problem remains. But i am getting the impression that it is a problem with the amount of snapshots that are deleted in a certain time, if i delete manually and one after another (meaning every 10 min or so) i do not have problems, if i delete manually and do several at once and on one VM the next one just after one finished, the risk seems to increase. I do not think it is the number of VMS because we had this on hosts with only 3 or 4 Vms running I will try restarting the libvirt and see what happens. We are not using RHEL 7.1 only CentOS 7.1 Is there anything else we can look at when this happens again ? Regards Soeren On 02/06/15 18:53, "Adam Litke" <alitke@redhat.com> wrote:
Hello Soeren.
I've started to look at this issue and I'd agree that at first glance it looks like a libvirt issue. The 'cannot acquire state change lock' messages suggest a locking bug or severe contention at least. To help me better understand the problem I have a few questions about your setup.
From your earlier report it appears that you have 15 VMs running on the failing host. Are you attempting to remove snapshots from all VMs at the same time? Have you tried with fewer concurrent operations? I'd be curious to understand if the problem is connected to the number of VMs running or the number of active block jobs.
Have you tried RHEL-7.1 as a hypervisor host?
Rather than rebooting the host, does restarting libvirtd cause the VMs to become responsive again? Note that this operation may cause the host to move to Unresponsive state in the UI for a short period of time.
Thanks for your report.
On 31/05/15 23:39 +0000, Soeren Malchow wrote:
And sorry, another update, it does kill the VM partly, it was still pingable when i wrote the last mail, but no ssh and no spice console possible
From: Soeren Malchow <soeren.malchow@mcon.net<mailto:soeren.malchow@mcon.net>> Date: Monday 1 June 2015 01:35 To: Soeren Malchow <soeren.malchow@mcon.net<mailto:soeren.malchow@mcon.net>>, "libvirt-users@redhat.com<mailto:libvirt-users@redhat.com>" <libvirt-users@redhat.com<mailto:libvirt-users@redhat.com>>, users <users@ovirt.org<mailto:users@ovirt.org>> Subject: Re: [ovirt-users] Bug in Snapshot Removing
Small addition again:
This error shows up in the log while removing snapshots WITHOUT rendering the Vms unresponsive
‹ Jun 01 01:33:45 mc-dc3ham-compute-02-live.mc.mcon.net libvirtd[1657]: Timed out during operation: cannot acquire state change lock Jun 01 01:33:45 mc-dc3ham-compute-02-live.mc.mcon.net vdsm[6839]: vdsm vm.Vm ERROR vmId=`56848f4a-cd73-4eda-bf79-7eb80ae569a9`::Error getting block job info
Traceback (most recent call last): File "/usr/share/vdsm/virt/vm.py", line 5759, in queryBlockJobsŠ
‹
From: Soeren Malchow <soeren.malchow@mcon.net<mailto:soeren.malchow@mcon.net>> Date: Monday 1 June 2015 00:56 To: "libvirt-users@redhat.com<mailto:libvirt-users@redhat.com>" <libvirt-users@redhat.com<mailto:libvirt-users@redhat.com>>, users <users@ovirt.org<mailto:users@ovirt.org>> Subject: [ovirt-users] Bug in Snapshot Removing
Dear all
I am not sure if the mail just did not get any attention between all the mails and this time it is also going to the libvirt mailing list.
I am experiencing a problem with VM becoming unresponsive when removing Snapshots (Live Merge) and i think there is a serious problem.
Here are the previous mails,
http://lists.ovirt.org/pipermail/users/2015-May/033083.html
The problem is on a system with everything on the latest version, CentOS 7.1 and ovirt 3.5.2.1 all upgrades applied.
This Problem did NOT exist before upgrading to CentOS 7.1 with an environment running ovirt 3.5.0 and 3.5.1 and Fedora 20 with the libvirt-preview repo activated.
I think this is a bug in libvirt, not ovirt itself, but i am not sure. The actual file throwing the exception is in VDSM (/usr/share/vdsm/virt/vm.py, line 697).
We are very willing to help, test and supply log files in anyway we can.
Regards Soeren
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- Adam Litke

On 03/06/15 07:36 +0000, Soeren Malchow wrote:
Dear Adam
First we were using a python script that was working on 4 threads and therefore removing 4 snapshots at the time throughout the cluster, that still caused problems.
Now i took the snapshot removing out of the threaded part an i am just looping through each snapshot on each VM one after another, even with ³sleeps² inbetween, but the problem remains. But i am getting the impression that it is a problem with the amount of snapshots that are deleted in a certain time, if i delete manually and one after another (meaning every 10 min or so) i do not have problems, if i delete manually and do several at once and on one VM the next one just after one finished, the risk seems to increase.
Hmm. In our lab we extensively tested removing a snapshot for a VM with 4 disks. This means 4 block jobs running simultaneously. Less than 10 minutes later (closer to 1 minute) we would remove a second snapshot for the same VM (again involving 4 block jobs). I guess we should rerun this flow on a fully updated CentOS 7.1 host to see about local reproduction. Seems your case is much simpler than this though. Is this happening every time or intermittently?
I do not think it is the number of VMS because we had this on hosts with only 3 or 4 Vms running
I will try restarting the libvirt and see what happens.
We are not using RHEL 7.1 only CentOS 7.1
Is there anything else we can look at when this happens again ?
I'll defer to Eric Blake for the libvirt side of this. Eric, would enabling debug logging in libvirtd help to shine some light on the problem? -- Adam Litke

Hi, This is not happening every time, the last time i had this, it was a script runnning, and something like th 9. Vm and the 23. Vm had a problem, and it is not always the same VMS, it is not about the OS (happen for Windows and Linux alike) And as i said it also happened when i tried to remove the snapshots sequentially, here is the code (i know it is probably not the elegant way, but i am not a developer) and the code actually has correct indentions. <― snip ―> print "Snapshot deletion" try: time.sleep(300) Connect() vms = api.vms.list() for vm in vms: print ("Deleting snapshots for %s ") % vm.name snapshotlist = vm.snapshots.list() for snapshot in snapshotlist: if snapshot.description != "Active VM": time.sleep(30) snapshot.delete() try: while api.vms.get(name=vm.name).snapshots.get(id=snapshot.id).snapshot_status == "locked": print("Waiting for snapshot %s on %s deletion to finish") % (snapshot.description, vm.name) time.sleep(60) except Exception as e: print ("Snapshot %s does not exist anymore") % snapshot.description print ("Snapshot deletion for %s done") % vm.name print ("Deletion of snapshots done") api.disconnect() except Exception as e: print ("Something went wrong when deleting the snapshots\n%s") % str(e) <― snip ―> Cheers Soeren On 03/06/15 15:20, "Adam Litke" <alitke@redhat.com> wrote:
On 03/06/15 07:36 +0000, Soeren Malchow wrote:
Dear Adam
First we were using a python script that was working on 4 threads and therefore removing 4 snapshots at the time throughout the cluster, that still caused problems.
Now i took the snapshot removing out of the threaded part an i am just looping through each snapshot on each VM one after another, even with ³sleeps² inbetween, but the problem remains. But i am getting the impression that it is a problem with the amount of snapshots that are deleted in a certain time, if i delete manually and one after another (meaning every 10 min or so) i do not have problems, if i delete manually and do several at once and on one VM the next one just after one finished, the risk seems to increase.
Hmm. In our lab we extensively tested removing a snapshot for a VM with 4 disks. This means 4 block jobs running simultaneously. Less than 10 minutes later (closer to 1 minute) we would remove a second snapshot for the same VM (again involving 4 block jobs). I guess we should rerun this flow on a fully updated CentOS 7.1 host to see about local reproduction. Seems your case is much simpler than this though. Is this happening every time or intermittently?
I do not think it is the number of VMS because we had this on hosts with only 3 or 4 Vms running
I will try restarting the libvirt and see what happens.
We are not using RHEL 7.1 only CentOS 7.1
Is there anything else we can look at when this happens again ?
I'll defer to Eric Blake for the libvirt side of this. Eric, would enabling debug logging in libvirtd help to shine some light on the problem?
-- Adam Litke

Hi Adam, Hi Eric, We had this issue again a few minutes ago. One machine went down exactly the same way as described, the machine had only one snapshot and it was the only snapshot that was removed, before that in the same scriptrun we deleted the snapshots of 15 other Vms, some without, some with 1 and some with several snapshots. Can i provide anything from the logs that helps ? Regards Soeren On 03/06/15 18:07, "Soeren Malchow" <soeren.malchow@mcon.net> wrote:
Hi,
This is not happening every time, the last time i had this, it was a script runnning, and something like th 9. Vm and the 23. Vm had a problem, and it is not always the same VMS, it is not about the OS (happen for Windows and Linux alike)
And as i said it also happened when i tried to remove the snapshots sequentially, here is the code (i know it is probably not the elegant way, but i am not a developer) and the code actually has correct indentions.
<― snip ―>
print "Snapshot deletion" try: time.sleep(300) Connect() vms = api.vms.list() for vm in vms: print ("Deleting snapshots for %s ") % vm.name snapshotlist = vm.snapshots.list() for snapshot in snapshotlist: if snapshot.description != "Active VM": time.sleep(30) snapshot.delete() try: while api.vms.get(name=vm.name).snapshots.get(id=snapshot.id).snapshot_status == "locked": print("Waiting for snapshot %s on %s deletion to finish") % (snapshot.description, vm.name) time.sleep(60) except Exception as e: print ("Snapshot %s does not exist anymore") % snapshot.description print ("Snapshot deletion for %s done") % vm.name print ("Deletion of snapshots done") api.disconnect() except Exception as e: print ("Something went wrong when deleting the snapshots\n%s") % str(e)
<― snip ―>
Cheers Soeren
On 03/06/15 15:20, "Adam Litke" <alitke@redhat.com> wrote:
On 03/06/15 07:36 +0000, Soeren Malchow wrote:
Dear Adam
First we were using a python script that was working on 4 threads and therefore removing 4 snapshots at the time throughout the cluster, that still caused problems.
Now i took the snapshot removing out of the threaded part an i am just looping through each snapshot on each VM one after another, even with ³sleeps² inbetween, but the problem remains. But i am getting the impression that it is a problem with the amount of snapshots that are deleted in a certain time, if i delete manually and one after another (meaning every 10 min or so) i do not have problems, if i delete manually and do several at once and on one VM the next one just after one finished, the risk seems to increase.
Hmm. In our lab we extensively tested removing a snapshot for a VM with 4 disks. This means 4 block jobs running simultaneously. Less than 10 minutes later (closer to 1 minute) we would remove a second snapshot for the same VM (again involving 4 block jobs). I guess we should rerun this flow on a fully updated CentOS 7.1 host to see about local reproduction. Seems your case is much simpler than this though. Is this happening every time or intermittently?
I do not think it is the number of VMS because we had this on hosts with only 3 or 4 Vms running
I will try restarting the libvirt and see what happens.
We are not using RHEL 7.1 only CentOS 7.1
Is there anything else we can look at when this happens again ?
I'll defer to Eric Blake for the libvirt side of this. Eric, would enabling debug logging in libvirtd help to shine some light on the problem?
-- Adam Litke
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

On 04/06/15 13:08 +0000, Soeren Malchow wrote:
Hi Adam, Hi Eric,
We had this issue again a few minutes ago.
One machine went down exactly the same way as described, the machine had only one snapshot and it was the only snapshot that was removed, before that in the same scriptrun we deleted the snapshots of 15 other Vms, some without, some with 1 and some with several snapshots.
Can i provide anything from the logs that helps ?
Let's start with the libvirtd.log on that host. It might be rather large so we may need to find a creative place to host it.
Regards Soeren
On 03/06/15 18:07, "Soeren Malchow" <soeren.malchow@mcon.net> wrote:
Hi,
This is not happening every time, the last time i had this, it was a script runnning, and something like th 9. Vm and the 23. Vm had a problem, and it is not always the same VMS, it is not about the OS (happen for Windows and Linux alike)
And as i said it also happened when i tried to remove the snapshots sequentially, here is the code (i know it is probably not the elegant way, but i am not a developer) and the code actually has correct indentions.
<― snip ―>
print "Snapshot deletion" try: time.sleep(300) Connect() vms = api.vms.list() for vm in vms: print ("Deleting snapshots for %s ") % vm.name snapshotlist = vm.snapshots.list() for snapshot in snapshotlist: if snapshot.description != "Active VM": time.sleep(30) snapshot.delete() try: while api.vms.get(name=vm.name).snapshots.get(id=snapshot.id).snapshot_status == "locked": print("Waiting for snapshot %s on %s deletion to finish") % (snapshot.description, vm.name) time.sleep(60) except Exception as e: print ("Snapshot %s does not exist anymore") % snapshot.description print ("Snapshot deletion for %s done") % vm.name print ("Deletion of snapshots done") api.disconnect() except Exception as e: print ("Something went wrong when deleting the snapshots\n%s") % str(e)
<― snip ―>
Cheers Soeren
On 03/06/15 15:20, "Adam Litke" <alitke@redhat.com> wrote:
On 03/06/15 07:36 +0000, Soeren Malchow wrote:
Dear Adam
First we were using a python script that was working on 4 threads and therefore removing 4 snapshots at the time throughout the cluster, that still caused problems.
Now i took the snapshot removing out of the threaded part an i am just looping through each snapshot on each VM one after another, even with ³sleeps² inbetween, but the problem remains. But i am getting the impression that it is a problem with the amount of snapshots that are deleted in a certain time, if i delete manually and one after another (meaning every 10 min or so) i do not have problems, if i delete manually and do several at once and on one VM the next one just after one finished, the risk seems to increase.
Hmm. In our lab we extensively tested removing a snapshot for a VM with 4 disks. This means 4 block jobs running simultaneously. Less than 10 minutes later (closer to 1 minute) we would remove a second snapshot for the same VM (again involving 4 block jobs). I guess we should rerun this flow on a fully updated CentOS 7.1 host to see about local reproduction. Seems your case is much simpler than this though. Is this happening every time or intermittently?
I do not think it is the number of VMS because we had this on hosts with only 3 or 4 Vms running
I will try restarting the libvirt and see what happens.
We are not using RHEL 7.1 only CentOS 7.1
Is there anything else we can look at when this happens again ?
I'll defer to Eric Blake for the libvirt side of this. Eric, would enabling debug logging in libvirtd help to shine some light on the problem?
-- Adam Litke
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- Adam Litke

Hi, I would send those, but unfortunately we did not think about the journals getting deleted after a reboot. I just made the journals persistent on the servers, we are trying to trigger the error again last time we only got half way through the VM’s when removing the snapshots so we have a good chance that it comes up again. Also the libvirt logs to the journal not to libvirtd.log, i would send the journal directly to you and Eric via our data exchange servers Soeren On 04/06/15 16:17, "Adam Litke" <alitke@redhat.com> wrote:
On 04/06/15 13:08 +0000, Soeren Malchow wrote:
Hi Adam, Hi Eric,
We had this issue again a few minutes ago.
One machine went down exactly the same way as described, the machine had only one snapshot and it was the only snapshot that was removed, before that in the same scriptrun we deleted the snapshots of 15 other Vms, some without, some with 1 and some with several snapshots.
Can i provide anything from the logs that helps ?
Let's start with the libvirtd.log on that host. It might be rather large so we may need to find a creative place to host it.
Regards Soeren
On 03/06/15 18:07, "Soeren Malchow" <soeren.malchow@mcon.net> wrote:
Hi,
This is not happening every time, the last time i had this, it was a script runnning, and something like th 9. Vm and the 23. Vm had a problem, and it is not always the same VMS, it is not about the OS (happen for Windows and Linux alike)
And as i said it also happened when i tried to remove the snapshots sequentially, here is the code (i know it is probably not the elegant way, but i am not a developer) and the code actually has correct indentions.
<― snip ―>
print "Snapshot deletion" try: time.sleep(300) Connect() vms = api.vms.list() for vm in vms: print ("Deleting snapshots for %s ") % vm.name snapshotlist = vm.snapshots.list() for snapshot in snapshotlist: if snapshot.description != "Active VM": time.sleep(30) snapshot.delete() try: while api.vms.get(name=vm.name).snapshots.get(id=snapshot.id).snapshot_status == "locked": print("Waiting for snapshot %s on %s deletion to finish") % (snapshot.description, vm.name) time.sleep(60) except Exception as e: print ("Snapshot %s does not exist anymore") % snapshot.description print ("Snapshot deletion for %s done") % vm.name print ("Deletion of snapshots done") api.disconnect() except Exception as e: print ("Something went wrong when deleting the snapshots\n%s") % str(e)
<― snip ―>
Cheers Soeren
On 03/06/15 15:20, "Adam Litke" <alitke@redhat.com> wrote:
On 03/06/15 07:36 +0000, Soeren Malchow wrote:
Dear Adam
First we were using a python script that was working on 4 threads and therefore removing 4 snapshots at the time throughout the cluster, that still caused problems.
Now i took the snapshot removing out of the threaded part an i am just looping through each snapshot on each VM one after another, even with ³sleeps² inbetween, but the problem remains. But i am getting the impression that it is a problem with the amount of snapshots that are deleted in a certain time, if i delete manually and one after another (meaning every 10 min or so) i do not have problems, if i delete manually and do several at once and on one VM the next one just after one finished, the risk seems to increase.
Hmm. In our lab we extensively tested removing a snapshot for a VM with 4 disks. This means 4 block jobs running simultaneously. Less than 10 minutes later (closer to 1 minute) we would remove a second snapshot for the same VM (again involving 4 block jobs). I guess we should rerun this flow on a fully updated CentOS 7.1 host to see about local reproduction. Seems your case is much simpler than this though. Is this happening every time or intermittently?
I do not think it is the number of VMS because we had this on hosts with only 3 or 4 Vms running
I will try restarting the libvirt and see what happens.
We are not using RHEL 7.1 only CentOS 7.1
Is there anything else we can look at when this happens again ?
I'll defer to Eric Blake for the libvirt side of this. Eric, would enabling debug logging in libvirtd help to shine some light on the problem?
-- Adam Litke
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- Adam Litke

We are still having this problem and we can not figure out what to do, i sent the logs already as download, can i do anything else to help ? On 04/06/15 17:08, "Soeren Malchow" <soeren.malchow@mcon.net> wrote:
Hi,
I would send those, but unfortunately we did not think about the journals getting deleted after a reboot.
I just made the journals persistent on the servers, we are trying to trigger the error again last time we only got half way through the VM’s when removing the snapshots so we have a good chance that it comes up again.
Also the libvirt logs to the journal not to libvirtd.log, i would send the journal directly to you and Eric via our data exchange servers
Soeren
On 04/06/15 16:17, "Adam Litke" <alitke@redhat.com> wrote:
On 04/06/15 13:08 +0000, Soeren Malchow wrote:
Hi Adam, Hi Eric,
We had this issue again a few minutes ago.
One machine went down exactly the same way as described, the machine had only one snapshot and it was the only snapshot that was removed, before that in the same scriptrun we deleted the snapshots of 15 other Vms, some without, some with 1 and some with several snapshots.
Can i provide anything from the logs that helps ?
Let's start with the libvirtd.log on that host. It might be rather large so we may need to find a creative place to host it.
Regards Soeren
On 03/06/15 18:07, "Soeren Malchow" <soeren.malchow@mcon.net> wrote:
Hi,
This is not happening every time, the last time i had this, it was a script runnning, and something like th 9. Vm and the 23. Vm had a problem, and it is not always the same VMS, it is not about the OS (happen for Windows and Linux alike)
And as i said it also happened when i tried to remove the snapshots sequentially, here is the code (i know it is probably not the elegant way, but i am not a developer) and the code actually has correct indentions.
<― snip ―>
print "Snapshot deletion" try: time.sleep(300) Connect() vms = api.vms.list() for vm in vms: print ("Deleting snapshots for %s ") % vm.name snapshotlist = vm.snapshots.list() for snapshot in snapshotlist: if snapshot.description != "Active VM": time.sleep(30) snapshot.delete() try: while api.vms.get(name=vm.name).snapshots.get(id=snapshot.id).snapshot_status == "locked": print("Waiting for snapshot %s on %s deletion to finish") % (snapshot.description, vm.name) time.sleep(60) except Exception as e: print ("Snapshot %s does not exist anymore") % snapshot.description print ("Snapshot deletion for %s done") % vm.name print ("Deletion of snapshots done") api.disconnect() except Exception as e: print ("Something went wrong when deleting the snapshots\n%s") % str(e)
<― snip ―>
Cheers Soeren
On 03/06/15 15:20, "Adam Litke" <alitke@redhat.com> wrote:
On 03/06/15 07:36 +0000, Soeren Malchow wrote:
Dear Adam
First we were using a python script that was working on 4 threads and therefore removing 4 snapshots at the time throughout the cluster, that still caused problems.
Now i took the snapshot removing out of the threaded part an i am just looping through each snapshot on each VM one after another, even with ³sleeps² inbetween, but the problem remains. But i am getting the impression that it is a problem with the amount of snapshots that are deleted in a certain time, if i delete manually and one after another (meaning every 10 min or so) i do not have problems, if i delete manually and do several at once and on one VM the next one just after one finished, the risk seems to increase.
Hmm. In our lab we extensively tested removing a snapshot for a VM with 4 disks. This means 4 block jobs running simultaneously. Less than 10 minutes later (closer to 1 minute) we would remove a second snapshot for the same VM (again involving 4 block jobs). I guess we should rerun this flow on a fully updated CentOS 7.1 host to see about local reproduction. Seems your case is much simpler than this though. Is this happening every time or intermittently?
I do not think it is the number of VMS because we had this on hosts with only 3 or 4 Vms running
I will try restarting the libvirt and see what happens.
We are not using RHEL 7.1 only CentOS 7.1
Is there anything else we can look at when this happens again ?
I'll defer to Eric Blake for the libvirt side of this. Eric, would enabling debug logging in libvirtd help to shine some light on the problem?
-- Adam Litke
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- Adam Litke
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

On 11/06/15 11:00 +0000, Soeren Malchow wrote:
We are still having this problem and we can not figure out what to do, i sent the logs already as download, can i do anything else to help ?
Hi. I'm sorry but I don't have any new information for you yet. One thing you could do is create a new bug for this issue so we can track it better. Please try to include as much information as possible from this discussion (including relevant log files) in your report. So far you are the only one reporting these issues so we'll want to work to narrow down the specific scenario that is causing this problem and get the right people working on the solution.
On 04/06/15 17:08, "Soeren Malchow" <soeren.malchow@mcon.net> wrote:
Hi,
I would send those, but unfortunately we did not think about the journals getting deleted after a reboot.
I just made the journals persistent on the servers, we are trying to trigger the error again last time we only got half way through the VM’s when removing the snapshots so we have a good chance that it comes up again.
Also the libvirt logs to the journal not to libvirtd.log, i would send the journal directly to you and Eric via our data exchange servers
Soeren
On 04/06/15 16:17, "Adam Litke" <alitke@redhat.com> wrote:
On 04/06/15 13:08 +0000, Soeren Malchow wrote:
Hi Adam, Hi Eric,
We had this issue again a few minutes ago.
One machine went down exactly the same way as described, the machine had only one snapshot and it was the only snapshot that was removed, before that in the same scriptrun we deleted the snapshots of 15 other Vms, some without, some with 1 and some with several snapshots.
Can i provide anything from the logs that helps ?
Let's start with the libvirtd.log on that host. It might be rather large so we may need to find a creative place to host it.
Regards Soeren
On 03/06/15 18:07, "Soeren Malchow" <soeren.malchow@mcon.net> wrote:
Hi,
This is not happening every time, the last time i had this, it was a script runnning, and something like th 9. Vm and the 23. Vm had a problem, and it is not always the same VMS, it is not about the OS (happen for Windows and Linux alike)
And as i said it also happened when i tried to remove the snapshots sequentially, here is the code (i know it is probably not the elegant way, but i am not a developer) and the code actually has correct indentions.
<― snip ―>
print "Snapshot deletion" try: time.sleep(300) Connect() vms = api.vms.list() for vm in vms: print ("Deleting snapshots for %s ") % vm.name snapshotlist = vm.snapshots.list() for snapshot in snapshotlist: if snapshot.description != "Active VM": time.sleep(30) snapshot.delete() try: while api.vms.get(name=vm.name).snapshots.get(id=snapshot.id).snapshot_status == "locked": print("Waiting for snapshot %s on %s deletion to finish") % (snapshot.description, vm.name) time.sleep(60) except Exception as e: print ("Snapshot %s does not exist anymore") % snapshot.description print ("Snapshot deletion for %s done") % vm.name print ("Deletion of snapshots done") api.disconnect() except Exception as e: print ("Something went wrong when deleting the snapshots\n%s") % str(e)
<― snip ―>
Cheers Soeren
On 03/06/15 15:20, "Adam Litke" <alitke@redhat.com> wrote:
On 03/06/15 07:36 +0000, Soeren Malchow wrote: >Dear Adam > >First we were using a python script that was working on 4 threads and >therefore removing 4 snapshots at the time throughout the cluster, >that >still caused problems. > >Now i took the snapshot removing out of the threaded part an i am >just >looping through each snapshot on each VM one after another, even with >³sleeps² inbetween, but the problem remains. >But i am getting the impression that it is a problem with the amount >of >snapshots that are deleted in a certain time, if i delete manually >and >one >after another (meaning every 10 min or so) i do not have problems, if >i >delete manually and do several at once and on one VM the next one >just >after one finished, the risk seems to increase.
Hmm. In our lab we extensively tested removing a snapshot for a VM with 4 disks. This means 4 block jobs running simultaneously. Less than 10 minutes later (closer to 1 minute) we would remove a second snapshot for the same VM (again involving 4 block jobs). I guess we should rerun this flow on a fully updated CentOS 7.1 host to see about local reproduction. Seems your case is much simpler than this though. Is this happening every time or intermittently?
>I do not think it is the number of VMS because we had this on hosts >with >only 3 or 4 Vms running > >I will try restarting the libvirt and see what happens. > >We are not using RHEL 7.1 only CentOS 7.1 > >Is there anything else we can look at when this happens again ?
I'll defer to Eric Blake for the libvirt side of this. Eric, would enabling debug logging in libvirtd help to shine some light on the problem?
-- Adam Litke
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- Adam Litke
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- Adam Litke
participants (3)
-
Adam Litke
-
Allon Mureinik
-
Soeren Malchow