Deleting large snapshots blocks the whole cluster

This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --WvJXSM0TULQwmuWL0sQNHhbN8U0LDFtiQ Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hi, we have some really large snapshots left from a migration. Since our store is almost full now we have to delete them now. Some snapshots are around 1TB already. The last time we tried to delete a ~500GB snapshot the delete task blocked the whole diskstore's IO and the whole cluster and all hosts became unavailable. Is there a way to delete large snapshots in a "humane" way that will not block everything? Cheers, Stefan --WvJXSM0TULQwmuWL0sQNHhbN8U0LDFtiQ Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEARECAAYFAlROFw0ACgkQ01vvrYDXSqvzPACfUpQaEhRCNpuUwJfuwkPWV2GR QYkAnjKzxyXlbn+ZSXWAl1SNlzZnkTvj =KimT -----END PGP SIGNATURE----- --WvJXSM0TULQwmuWL0sQNHhbN8U0LDFtiQ--

------=_NextPartTM-000-446084d0-510e-46b9-9e16-94abeca8b350 Content-Type: multipart/alternative; boundary="_000_12EF8D94C6F8734FB2FF37B9FBEDD17358676BC5EXCHANGEcollogi_" --_000_12EF8D94C6F8734FB2FF37B9FBEDD17358676BC5EXCHANGEcollogi_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Do you see swapping on the SPM? If yes a regular echo 3 > drop_caches could= help. Markus Am 27.10.2014 10:57 schrieb Stefan Wendler <stefan.wendler@tngtech.com>: Hi, we have some really large snapshots left from a migration. Since our store is almost full now we have to delete them now. Some snapshots are around 1TB already. The last time we tried to delete a ~500GB snapshot the delete task blocked the whole diskstore's IO and the whole cluster and all hosts became unavailable. Is there a way to delete large snapshots in a "humane" way that will not block everything? Cheers, Stefan --_000_12EF8D94C6F8734FB2FF37B9FBEDD17358676BC5EXCHANGEcollogi_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable <html> <head> <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Dus-ascii"=
<meta name=3D"Generator" content=3D"Microsoft Exchange Server"> <!-- converted from text --><style><!-- .EmailQuote { margin-left: 1pt; pad= ding-left: 4pt; border-left: #800000 2px solid; } --></style> </head> <body> <div> <p dir=3D"ltr">Do you see swapping on the SPM? If yes a regular echo 3 >= drop_caches could help.</p> <p dir=3D"ltr">Markus</p> <div class=3D"x_quote">Am 27.10.2014 10:57 schrieb Stefan Wendler <stefa= n.wendler@tngtech.com>:<br type=3D"attribution"> </div> </div> <font size=3D"2"><span style=3D"font-size:10pt;"> <div class=3D"PlainText">Hi,<br> <br> we have some really large snapshots left from a migration. Since our<br> store is almost full now we have to delete them now.<br> <br> Some snapshots are around 1TB already.<br> <br> The last time we tried to delete a ~500GB snapshot the delete task<br> blocked the whole diskstore's IO and the whole cluster and all hosts<br> became unavailable.<br> <br> Is there a way to delete large snapshots in a "humane" way that w= ill not<br> block everything?<br> <br> Cheers,<br> Stefan<br> <br> </div> </span></font> </body> </html> --_000_12EF8D94C6F8734FB2FF37B9FBEDD17358676BC5EXCHANGEcollogi_-- ------=_NextPartTM-000-446084d0-510e-46b9-9e16-94abeca8b350 Content-Type: text/plain; name="InterScan_Disclaimer.txt" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="InterScan_Disclaimer.txt" **************************************************************************** Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet. Über das Internet versandte E-Mails können unter fremden Namen erstellt oder manipuliert werden. Deshalb ist diese als E-Mail verschickte Nachricht keine rechtsverbindliche Willenserklärung. Collogia Unternehmensberatung AG Ubierring 11 D-50678 Köln Vorstand: Kadir Akin Dr. Michael Höhnerbach Vorsitzender des Aufsichtsrates: Hans Kristian Langva Registergericht: Amtsgericht Köln Registernummer: HRB 52 497 This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden. e-mails sent over the internet may have been written under a wrong name or been manipulated. That is why this message sent as an e-mail is not a legally binding declaration of intention. Collogia Unternehmensberatung AG Ubierring 11 D-50678 Köln executive board: Kadir Akin Dr. Michael Höhnerbach President of the supervisory board: Hans Kristian Langva Registry office: district court Cologne Register number: HRB 52 497 **************************************************************************** ------=_NextPartTM-000-446084d0-510e-46b9-9e16-94abeca8b350--

This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --H13U6aH9iV7Nmtcjmrx43WbJI6iXvVa1N Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Hi, do you mean during snapshot deletion or in general? In general we do not have any swap usage at all. On none of the 4 hosts. We had some swap I/O on the host that is SPM most of the time. But not during that period where we tried to delete the snapshot. Cheers, Stefan On 10/27/14 11:08, Markus Stockhausen wrote:
Do you see swapping on the SPM? If yes a regular echo 3 > drop_caches c= ould help. =20 Markus =20 Am 27.10.2014 10:57 schrieb Stefan Wendler <stefan.wendler@tngtech.com>= : Hi, =20 we have some really large snapshots left from a migration. Since our store is almost full now we have to delete them now. =20 Some snapshots are around 1TB already. =20 The last time we tried to delete a ~500GB snapshot the delete task blocked the whole diskstore's IO and the whole cluster and all hosts became unavailable. =20 Is there a way to delete large snapshots in a "humane" way that will no= t block everything? =20 Cheers, Stefan =20 =20
--H13U6aH9iV7Nmtcjmrx43WbJI6iXvVa1N Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEARECAAYFAlROIOEACgkQ01vvrYDXSquXvQCeM9oyFAcihz7G8SZNgweD4EiC l3AAn1TGWxTLi0e4ZsNcINQKXmhZX/pz =vDPp -----END PGP SIGNATURE----- --H13U6aH9iV7Nmtcjmrx43WbJI6iXvVa1N--

Von: Stefan Wendler [stefan.wendler@tngtech.com]=0A= Gesendet: Montag, 27. Oktober 2014 11:39=0A= An: Markus Stockhausen=0A= Cc: users@ovirt.org=0A= Betreff: Re: [ovirt-users] Deleting large snapshots blocks the whole clus= ter=0A= =0A= Hi,=0A= =0A= do you mean during snapshot deletion or in general?=0A= =0A= In general we do not have any swap usage at all. On none of the 4 hosts.= =0A= =0A= We had some swap I/O on the host that is SPM most of the time. But not=0A= during that period where we tried to delete the snapshot.=0A= =0A= =0A= I thought of the bug described in RH bugzilla 1138690 "[SCALE] snapshot del= etion -> =0A= heavy swapping on SPM". In case you have no access to it (as it is RHEV tag= ged): =0A= =0A= qemu-img reads through pagecache during snapshot deletion. A new flag has b= een =0A= intrduced in August this year that allows to avoid page cache. This flag mu= st be=0A=
This is a multi-part message in MIME format. ------=_NextPartTM-000-a0780f88-f591-414a-acb9-ba0b16afdc8d Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable provided by OVirt/RHEV. Target release is 3.6. =0A= =0A= Until then the only bugfix is to drop pagecache manually on regular interva= ls on the =0A= hypversior and especially the SPM node.=0A= =0A= Markus=0A= =0A=
=0A= Cheers,=0A= Stefan=0A= =0A= On 10/27/14 11:08, Markus Stockhausen wrote:=0A=
Do you see swapping on the SPM? If yes a regular echo 3 > drop_caches c= ould help.=0A= =0A= Markus=0A= =0A= Am 27.10.2014 10:57 schrieb Stefan Wendler <stefan.wendler@tngtech.com>= :=0A= Hi,=0A= =0A= we have some really large snapshots left from a migration. Since our=0A= store is almost full now we have to delete them now.=0A= =0A= Some snapshots are around 1TB already.=0A= =0A= The last time we tried to delete a ~500GB snapshot the delete task=0A= blocked the whole diskstore's IO and the whole cluster and all hosts=0A= became unavailable.=0A= =0A= Is there a way to delete large snapshots in a "humane" way that will not= =0A= block everything?=0A= =0A= Cheers,=0A= Stefan=0A= =0A= =0A= =0A= ------=_NextPartTM-000-a0780f88-f591-414a-acb9-ba0b16afdc8d Content-Type: text/plain; name="InterScan_Disclaimer.txt" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="InterScan_Disclaimer.txt"
**************************************************************************** Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet. Über das Internet versandte E-Mails können unter fremden Namen erstellt oder manipuliert werden. Deshalb ist diese als E-Mail verschickte Nachricht keine rechtsverbindliche Willenserklärung. Collogia Unternehmensberatung AG Ubierring 11 D-50678 Köln Vorstand: Kadir Akin Dr. Michael Höhnerbach Vorsitzender des Aufsichtsrates: Hans Kristian Langva Registergericht: Amtsgericht Köln Registernummer: HRB 52 497 This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden. e-mails sent over the internet may have been written under a wrong name or been manipulated. That is why this message sent as an e-mail is not a legally binding declaration of intention. Collogia Unternehmensberatung AG Ubierring 11 D-50678 Köln executive board: Kadir Akin Dr. Michael Höhnerbach President of the supervisory board: Hans Kristian Langva Registry office: district court Cologne Register number: HRB 52 497 **************************************************************************** ------=_NextPartTM-000-a0780f88-f591-414a-acb9-ba0b16afdc8d--

This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --pRrw2Nflj1DiHOHuNiU2AGGrNfrnE9WXR Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable I see. Then I will keep an eye on this tonight where we have a huge "all ovirt downtime" just to delete those snapshots ^^ Thanks and Cheers, Stefan On 10/27/14 13:07, Markus Stockhausen wrote:
Von: Stefan Wendler [stefan.wendler@tngtech.com] Gesendet: Montag, 27. Oktober 2014 11:39 An: Markus Stockhausen Cc: users@ovirt.org Betreff: Re: [ovirt-users] Deleting large snapshots blocks the whole c= luster
Hi,
do you mean during snapshot deletion or in general?
In general we do not have any swap usage at all. On none of the 4 host= s.
We had some swap I/O on the host that is SPM most of the time. But not=
during that period where we tried to delete the snapshot.
=20 I thought of the bug described in RH bugzilla 1138690 "[SCALE] snapshot= deletion ->=20 heavy swapping on SPM". In case you have no access to it (as it is RHEV= tagged):=20 =20 qemu-img reads through pagecache during snapshot deletion. A new flag h= as been=20 intrduced in August this year that allows to avoid page cache. This fla= g must be provided by OVirt/RHEV. Target release is 3.6.=20 =20 Until then the only bugfix is to drop pagecache manually on regular int= ervals on the=20 hypversior and especially the SPM node. =20 Markus =20
Cheers, Stefan
On 10/27/14 11:08, Markus Stockhausen wrote:
Do you see swapping on the SPM? If yes a regular echo 3 > drop_caches=
could help.
Markus
Am 27.10.2014 10:57 schrieb Stefan Wendler <stefan.wendler@tngtech.co=
m>:
Hi,
we have some really large snapshots left from a migration. Since our store is almost full now we have to delete them now.
Some snapshots are around 1TB already.
The last time we tried to delete a ~500GB snapshot the delete task blocked the whole diskstore's IO and the whole cluster and all hosts became unavailable.
Is there a way to delete large snapshots in a "humane" way that will n= ot block everything?
Cheers, Stefan
=20
--pRrw2Nflj1DiHOHuNiU2AGGrNfrnE9WXR Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEARECAAYFAlROPm0ACgkQ01vvrYDXSqsNugCgmCvjPjgrTzUIWb5OEi2QAaTJ /NEAn1ahQxu3bb7OGpUcZgjBldYHn6st =etB8 -----END PGP SIGNATURE----- --pRrw2Nflj1DiHOHuNiU2AGGrNfrnE9WXR--
participants (2)
-
Markus Stockhausen
-
Stefan Wendler