OVirt 4.1.2 - trim/discard on HDD/XFS/NFS contraproductive

------=_NextPartTM-000-bc0df54e-354f-4430-b495-bb3fece85829 Content-Type: multipart/alternative; boundary="_000_12EF8D94C6F8734FB2FF37B9FBEDD173010E01F673EXCHANGEcollo_" --_000_12EF8D94C6F8734FB2FF37B9FBEDD173010E01F673EXCHANGEcollo_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi, we just set up a new 4.1.2 OVirt cluster. It is a quite normal HDD/XFS/NFS stack that worked quit well with 4.0 in the past. Inside the VMs we use XFS too. To our surprise we observe abysmal high IO during mkfs.xfs and fstrim inside the VM. A simple example: Step 1: Create 100G Thin disk Result 1: Disk occupies ~10M on storage Step 2: Format disk inside VM with mkfs.xfs Result 2: Disk occupies 100G on storage Changing the discard flag on the disk does not have any effect. Am I missing something? Best regards. Markus --_000_12EF8D94C6F8734FB2FF37B9FBEDD173010E01F673EXCHANGEcollo_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable <html dir=3D"ltr"> <head> <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Diso-8859-= 1"> <style type=3D"text/css" id=3D"owaParaStyle"></style> </head> <body fpstyle=3D"1" ocsi=3D"0"> <div style=3D"direction: ltr;font-family: Arial;color: #000000;font-size: 1= 0pt;"> <div style=3D"font-family:Tahoma; font-size:13px"> <div style=3D"font-family:Tahoma; font-size:13px"> <div style=3D"font-family:Tahoma; font-size:13px"> <div style=3D"font-family:Tahoma; font-size:13px"> <div style=3D"font-family:Tahoma; font-size:13px"> <div> <div>Hi, </div> <div><br> </div> <div>we just set up a new 4.1.2 OVirt cluster. It is a quite normal</div> <div>HDD/XFS/NFS stack that worked quit well with 4.0 in the past. </d= iv> <div>Inside the VMs we use XFS too.</div> <div><br> </div> <div>To our surprise we observe abysmal high IO during mkfs.xfs </div> <div>and fstrim inside the VM. A simple example:</div> <div><br> </div> <div>Step 1: Create 100G Thin disk</div> <div>Result 1: Disk occupies ~10M on storage</div> <div><br> </div> <div>Step 2: Format disk inside VM with mkfs.xfs</div> <div>Result 2: Disk occupies 100G on storage</div> <div><br> </div> <div>Changing the discard flag on the disk does not have any effect.</div> <div><br> </div> <div>Am I missing something?</div> <div><br> </div> <div>Best regards.</div> <div><br> </div> <div>Markus</div> </div> </div> </div> </div> </div> <div id=3D"ofmeet-extension-installed" style=3D"display:none"></div> </div> </div> </body> </html> --_000_12EF8D94C6F8734FB2FF37B9FBEDD173010E01F673EXCHANGEcollo_-- ------=_NextPartTM-000-bc0df54e-354f-4430-b495-bb3fece85829 Content-Type: text/plain; name="InterScan_Disclaimer.txt" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="InterScan_Disclaimer.txt" **************************************************************************** Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet. Über das Internet versandte E-Mails können unter fremden Namen erstellt oder manipuliert werden. Deshalb ist diese als E-Mail verschickte Nachricht keine rechtsverbindliche Willenserklärung. Collogia Unternehmensberatung AG Ubierring 11 D-50678 Köln Vorstand: Kadir Akin Dr. Michael Höhnerbach Vorsitzender des Aufsichtsrates: Hans Kristian Langva Registergericht: Amtsgericht Köln Registernummer: HRB 52 497 This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden. e-mails sent over the internet may have been written under a wrong name or been manipulated. That is why this message sent as an e-mail is not a legally binding declaration of intention. Collogia Unternehmensberatung AG Ubierring 11 D-50678 Köln executive board: Kadir Akin Dr. Michael Höhnerbach President of the supervisory board: Hans Kristian Langva Registry office: district court Cologne Register number: HRB 52 497 **************************************************************************** ------=_NextPartTM-000-bc0df54e-354f-4430-b495-bb3fece85829--

Hi Markus, AFAIK, mkfs.xfs tries to discard all the blocks before formatting the device. If you don't want it to do that, you can use the "-K Do not attempt to discard blocks at mkfs time" option of mkfs.xfs. In oVirt 4.1 we introduced the "Enable Discard" flag for a virtual machine's disk. When enabled, qemu is configured to pass on live UNMAP SCSI commands from the guest to the underlying storage. If you don't need live discarding, shutdown the VM and disable the "Enable Discard" option. That will cause qemu to ignore the live UNMAP SCSI commands coming from the guest and not pass it on to the underlying storage. Note that this makes fstrim completely redundant, as the purpose of the command is to discard unused blocks under the given path. Regards, Idan On Sat, Jun 17, 2017 at 1:25 AM, Markus Stockhausen <stockhausen@collogia.de
wrote:
Hi,
we just set up a new 4.1.2 OVirt cluster. It is a quite normal HDD/XFS/NFS stack that worked quit well with 4.0 in the past. Inside the VMs we use XFS too.
To our surprise we observe abysmal high IO during mkfs.xfs and fstrim inside the VM. A simple example:
Step 1: Create 100G Thin disk Result 1: Disk occupies ~10M on storage
Step 2: Format disk inside VM with mkfs.xfs Result 2: Disk occupies 100G on storage
Changing the discard flag on the disk does not have any effect.
Am I missing something?
Best regards.
Markus
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

בתאריך יום א׳, 18 ביוני 2017, 9:01, מאת Idan Shaby <ishaby@redhat.com>:
Hi Markus,
AFAIK, mkfs.xfs tries to discard all the blocks before formatting the device. If you don't want it to do that, you can use the "-K Do not attempt to discard blocks at mkfs time" option of mkfs.xfs.
In oVirt 4.1 we introduced the "Enable Discard" flag for a virtual machine's disk. When enabled, qemu is configured to pass on live UNMAP SCSI commands from the guest to the underlying storage. If you don't need live discarding, shutdown the VM and disable the "Enable Discard" option. That will cause qemu to ignore the live UNMAP SCSI commands coming from the guest and not pass it on to the underlying storage. Note that this makes fstrim completely redundant, as the purpose of the command is to discard unused blocks under the given path.
I think we need a bug for this, both fo documrnting this issue, and to investigate why discarding unused blocks allocate and zero all blocks. This behaviour is unhelpful. Markus, can you check if performing the same discard from the host leads to same result?
Regards, Idan
On Sat, Jun 17, 2017 at 1:25 AM, Markus Stockhausen < stockhausen@collogia.de> wrote:
Hi,
we just set up a new 4.1.2 OVirt cluster. It is a quite normal HDD/XFS/NFS stack that worked quit well with 4.0 in the past. Inside the VMs we use XFS too.
To our surprise we observe abysmal high IO during mkfs.xfs and fstrim inside the VM. A simple example:
Step 1: Create 100G Thin disk Result 1: Disk occupies ~10M on storage
Step 2: Format disk inside VM with mkfs.xfs Result 2: Disk occupies 100G on storage
Changing the discard flag on the disk does not have any effect.
Am I missing something?
Best regards.
Markus
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Le 18 juin 2017 à 08:00, Idan Shaby <ishaby@redhat.com> a écrit : If you don't need live discarding, shutdown the VM and disable the "Enable Discard" option. That will cause qemu to ignore the live UNMAP SCSI commands coming from the guest and not pass it on to the underlying storage. Note that this makes fstrim completely redundant, as the purpose of the command is to discard unused blocks under the given path.
Redundant ? Useless you mean ? From my comprehension, the purpose to fstrim is to send UNMAP SCSI on batch instead of mount -o discard that send them synchronously.

This is a multi-part message in MIME format. ------=_NextPartTM-000-bb13544b-1435-41ab-bb19-431fe0d6715d Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Thanks for all your feedback.=0A= =0A= Im trying to collect all the infos in BZ1462504.=0A= =0A= ________________________________________=0A= Von: Fabrice Bacchella [fabrice.bacchella@orange.fr]=0A= Gesendet: Sonntag, 18. Juni 2017 10:13=0A= An: Idan Shaby=0A= Cc: Markus Stockhausen; Ovirt Users=0A= Betreff: Re: [ovirt-users] OVirt 4.1.2 - trim/discard on HDD/XFS/NFS contra= productive=0A= =0A=
Le 18 juin 2017 =E0 08:00, Idan Shaby <ishaby@redhat.com> a =E9crit :=0A= If you don't need live discarding, shutdown the VM and disable the "Enabl= e Discard" option. That will cause qemu to ignore the live UNMAP SCSI comma= nds coming from the guest and not pass it on to the underlying storage.=0A= Note that this makes fstrim completely redundant, as the purpose of the c= ommand is to discard unused blocks under the given path.=0A= =0A= Redundant ? Useless you mean ? From my comprehension, the purpose to fstrim= is to send UNMAP SCSI on batch instead of mount -o discard that send them = synchronously.=0A= ------=_NextPartTM-000-bb13544b-1435-41ab-bb19-431fe0d6715d Content-Type: text/plain; name="InterScan_Disclaimer.txt" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="InterScan_Disclaimer.txt"
**************************************************************************** Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet. Über das Internet versandte E-Mails können unter fremden Namen erstellt oder manipuliert werden. Deshalb ist diese als E-Mail verschickte Nachricht keine rechtsverbindliche Willenserklärung. Collogia Unternehmensberatung AG Ubierring 11 D-50678 Köln Vorstand: Kadir Akin Dr. Michael Höhnerbach Vorsitzender des Aufsichtsrates: Hans Kristian Langva Registergericht: Amtsgericht Köln Registernummer: HRB 52 497 This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden. e-mails sent over the internet may have been written under a wrong name or been manipulated. That is why this message sent as an e-mail is not a legally binding declaration of intention. Collogia Unternehmensberatung AG Ubierring 11 D-50678 Köln executive board: Kadir Akin Dr. Michael Höhnerbach President of the supervisory board: Hans Kristian Langva Registry office: district court Cologne Register number: HRB 52 497 **************************************************************************** ------=_NextPartTM-000-bb13544b-1435-41ab-bb19-431fe0d6715d--

Right, but I just wanted to emphasize that disabling "Enable Discard" for that disk will cause qemu to ignore these UNMAP commands and not pass it on to the underlying storage. So if you've got this flag disabled, there's no reason to use fstrim. It makes sense to use it only when enabling "Discard enabled". Regards, Idan On Sun, Jun 18, 2017 at 11:13 AM, Fabrice Bacchella < fabrice.bacchella@orange.fr> wrote:
Le 18 juin 2017 à 08:00, Idan Shaby <ishaby@redhat.com> a écrit : If you don't need live discarding, shutdown the VM and disable the "Enable Discard" option. That will cause qemu to ignore the live UNMAP SCSI commands coming from the guest and not pass it on to the underlying storage. Note that this makes fstrim completely redundant, as the purpose of the command is to discard unused blocks under the given path.
Redundant ? Useless you mean ? From my comprehension, the purpose to fstrim is to send UNMAP SCSI on batch instead of mount -o discard that send them synchronously.

On Sat, Jun 17, 2017 at 1:25 AM, Markus Stockhausen <stockhausen@collogia.de
wrote:
Hi,
we just set up a new 4.1.2 OVirt cluster. It is a quite normal HDD/XFS/NFS stack that worked quit well with 4.0 in the past. Inside the VMs we use XFS too.
To our surprise we observe abysmal high IO during mkfs.xfs and fstrim inside the VM. A simple example:
Step 1: Create 100G Thin disk Result 1: Disk occupies ~10M on storage
Step 2: Format disk inside VM with mkfs.xfs Result 2: Disk occupies 100G on storage
Changing the discard flag on the disk does not have any effect.
Are you sure it's discarding, at all? 1. NFS: only NFSv4.2 supports discard. Is that the case in your setup? 2. What's the value of /sys/block/<disk>/queue/discard_granularity ? 3. Can you share the mkfs.xfs command line? 4. Are you sure it's not a raw-sparse image? Y.
Am I missing something?
Best regards.
Markus
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

------=_NextPartTM-000-9c273f1c-a2dd-4aba-ab90-4aa24b5ef225 Content-Type: multipart/alternative; boundary="_000_12EF8D94C6F8734FB2FF37B9FBEDD173010E01F85CEXCHANGEcollo_" --_000_12EF8D94C6F8734FB2FF37B9FBEDD173010E01F85CEXCHANGEcollo_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
Von: Yaniv Kaul [ykaul@redhat.com] Gesendet: Sonntag, 18. Juni 2017 09:58 An: Markus Stockhausen Cc: Ovirt Users Betreff: Re: [ovirt-users] OVirt 4.1.2 - trim/discard on HDD/XFS/NFS cont= raproductive
On Sat, Jun 17, 2017 at 1:25 AM, Markus Stockhausen <stockhausen@collogia.d= e<mailto:stockhausen@collogia.de>> wrote: Hi, we just set up a new 4.1.2 OVirt cluster. It is a quite normal HDD/XFS/NFS stack that worked quit well with 4.0 in the past. Inside the VMs we use XFS too. To our surprise we observe abysmal high IO during mkfs.xfs and fstrim inside the VM. A simple example: Step 1: Create 100G Thin disk Result 1: Disk occupies ~10M on storage Step 2: Format disk inside VM with mkfs.xfs Result 2: Disk occupies 100G on storage Changing the discard flag on the disk does not have any effect.
Are you sure it's discarding, at all? 1. NFS: only NFSv4.2 supports discard. Is that the case in your setup? 2. What's the value of /sys/block/<disk>/queue/discard_granularity ? 3. Can you share the mkfs.xfs command line? 4. Are you sure it's not a raw-sparse image?
Questions should answered in BZ1462504. When talking about thin provisioned disks I'm only referring to the OVirt disk-option. So I might mix up something here. Nevertheless the following is more than strange for me: - Create disk image: File on storage is small - Format inside VM: File on storage is fully allocated - Move around in Ovirt to another NFS storage: File is small again. That means: - mkfs.xfs inside VM and so qemu is hammering (empty) data into all blocks - But this data must be zeros as they can be compated afterwards. Best regards. Markus --_000_12EF8D94C6F8734FB2FF37B9FBEDD173010E01F85CEXCHANGEcollo_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable <html dir=3D"ltr"> <head> <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Diso-8859-= 1"> <style type=3D"text/css" id=3D"owaParaStyle"></style> </head> <body fpstyle=3D"1" ocsi=3D"0"> <div style=3D"direction: ltr;font-family: Arial;color: #000000;font-size: 1= 0pt;"> <div style=3D"font-family: Times New Roman; color: #000000; font-size: 16px= "> <div id=3D"divRpF331137" style=3D"direction: ltr;"><font face=3D"Tahoma" si= ze=3D"2" color=3D"#000000"><b>> Von:</b> Yaniv Kaul [ykaul@redhat.com]<b= r> <b>> Gesendet:</b> Sonntag, 18. Juni 2017 09:58<br> <b>> An:</b> Markus Stockhausen<br> <b>> Cc:</b> Ovirt Users<br> <b>> Betreff:</b> Re: [ovirt-users] OVirt 4.1.2 - trim/discard on HDD/XF= S/NFS contraproductive<br> </font></div> <div id=3D"divRpF331137" style=3D"direction: ltr;"><font face=3D"Tahoma" si= ze=3D"2" color=3D"#000000"><br> </font></div> <div> <div dir=3D"ltr"> <div class=3D"gmail_extra"> <div class=3D"gmail_quote">On Sat, Jun 17, 2017 at 1:25 AM, Markus Stockhau= sen <span dir=3D"ltr"> <<a href=3D"mailto:stockhausen@collogia.de" target=3D"_blank">stockhause= n@collogia.de</a>></span> wrote:<br> <blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex; border= -left:1px solid rgb(204,204,204); padding-left:1ex"> <div> <div style=3D"direction:ltr; font-family:Arial; color:rgb(0,0,0); font-size= :10pt"> <div style=3D"font-family:Tahoma; font-size:13px"> <div style=3D"font-family:Tahoma; font-size:13px"> <div style=3D"font-family:Tahoma; font-size:13px"> <div style=3D"font-family:Tahoma; font-size:13px"> <div style=3D"font-family:Tahoma; font-size:13px"> <div> <div>Hi, </div> <div><br> </div> <div>we just set up a new 4.1.2 OVirt cluster. It is a quite normal</div> <div>HDD/XFS/NFS stack that worked quit well with 4.0 in the past. </d= iv> <div>Inside the VMs we use XFS too.</div> <div><br> </div> <div>To our surprise we observe abysmal high IO during mkfs.xfs </div> <div>and fstrim inside the VM. A simple example:</div> <div><br> </div> <div>Step 1: Create 100G Thin disk</div> <div>Result 1: Disk occupies ~10M on storage</div> <div><br> </div> <div>Step 2: Format disk inside VM with mkfs.xfs</div> <div>Result 2: Disk occupies 100G on storage</div> <div><br> </div> <div>Changing the discard flag on the disk does not have any effect.</div> </div> </div> </div> </div> </div> </div> </div> </div> </blockquote> <div><br> </div> <div>> Are you sure it's discarding, at all?</div> <div>> 1. NFS: only NFSv4.2 supports discard. Is that the case in your s= etup?</div> <div>> 2. What's the value of /sys/block/<disk>/queue/discard= _granularity ?</div> <div>> 3. Can you share the mkfs.xfs command line?</div> <div>> 4. Are you sure it's not a raw-sparse image?</div> <div><br> </div> <div>Questions should answered in BZ1462504. When talking about thin</div> <div>provisioned disks I'm only referring to the OVirt disk-option. So I</d= iv> <div>might mix up something here. Nevertheless the following is more than</= div> <div>strange for me:</div> <div><br> </div> <div>- Create disk image: File on storage is small</div> <div>- Format inside VM: File on storage is fully allocated </div> <div>- Move around in Ovirt to another NFS storage: File is small again. </= div> <div><br> </div> <div>That means:</div> <div>- mkfs.xfs inside VM and so qemu is hammering (empty) data into all bl= ocks</div> <div>- But this data must be zeros as they can be compated afterwards.</div=
<div><br> </div> <div>Best regards.</div> <div><br> </div> <div>Markus</div> <div><br> </div> </div> </div> </div> </div> </div> </div> </body> </html> --_000_12EF8D94C6F8734FB2FF37B9FBEDD173010E01F85CEXCHANGEcollo_-- ------=_NextPartTM-000-9c273f1c-a2dd-4aba-ab90-4aa24b5ef225 Content-Type: text/plain; name="InterScan_Disclaimer.txt" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="InterScan_Disclaimer.txt" **************************************************************************** Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet. Über das Internet versandte E-Mails können unter fremden Namen erstellt oder manipuliert werden. Deshalb ist diese als E-Mail verschickte Nachricht keine rechtsverbindliche Willenserklärung. Collogia Unternehmensberatung AG Ubierring 11 D-50678 Köln Vorstand: Kadir Akin Dr. Michael Höhnerbach Vorsitzender des Aufsichtsrates: Hans Kristian Langva Registergericht: Amtsgericht Köln Registernummer: HRB 52 497 This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden. e-mails sent over the internet may have been written under a wrong name or been manipulated. That is why this message sent as an e-mail is not a legally binding declaration of intention. Collogia Unternehmensberatung AG Ubierring 11 D-50678 Köln executive board: Kadir Akin Dr. Michael Höhnerbach President of the supervisory board: Hans Kristian Langva Registry office: district court Cologne Register number: HRB 52 497 **************************************************************************** ------=_NextPartTM-000-9c273f1c-a2dd-4aba-ab90-4aa24b5ef225--
participants (5)
-
Fabrice Bacchella
-
Idan Shaby
-
Markus Stockhausen
-
Nir Soffer
-
Yaniv Kaul