[ovirt-devel] Stale "Make Template" tasks and locked templates + SEEK_HOLE optimization for copying images

Thursday, 7 May 2015

This is a multi-part message in MIME format.
--------------080700090609090405050009
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit

Hi,

I'm testing nightly builds.
In general, my experience after reinstalling from scratch and having 
solved my storage issues is very good.
Today I tried to "Make Template" and had some UX problems I would like 
to share with you.
I would also discuss a possible optimization for copying images using 
SEEK_HOLE.

1) "Make Template" UX and performance problems:

Source disk and destination disk (template) were on the same StorageDomain.
By accident (probably something very common) the SPM was set on an 
external host (that was not hosting this StorageDomain), so the whole 
image data went out and back to the same source machine.
This obviously takes very long (hours), while copying the sparse files 
directly only takes about 10 [s] with the below optimization.

While making the templates, I believe I restarted VDSM or rebooted the 
SPM so the tasks went stale. My fault again.
I was able to remove the stale tasks in Engine by suspending VM's, 
stopping VDSM to set the host as non responding and using "confirm host 
has been rebooted".
Setting the host in maintenance to confirm it was rebooted was not 
possible because it had async. running tasks.
Aren't this tasks PIDs being checked to see if they are still alive?

2) I saw that VDSM was running " /usr/bin/qemu-img convert":

In this case, I believe it is enough to just copy the images instead of 
converting them.
I made some tests and found that using "cp --sparse=always" is the best 
way to copy images to gluster mounts because it is faster and because 
the resulting files are still sparse ('du' reports exactly the same sizes).
But I also discovered a bottleneck.
When copying sparse files (e.g. a 2 TB sparse image that only uses 800 
MB in disk, a common scenario when we create templates from fresh 
installs) the 'cp' command behaves differently depending on if we are 
reading from a gluster mount or from a filesystem supporting SEEK_HOLE 
(available in kernels >= 3.1):

a) If we read from a gluster mount, 'cp' reads the 2 TB of zeros, even 
when it only writes the non-zeros (iotop shows the 'cp' process reading 
those 2 TB). Only sparse writing is optimized.

b) If we read from a SEEK_HOLE supporting filesystem (ext, xfs, etc), 
'cp' only reads the non-zero content, thus reading and writing takes 
like 10 [s] instead of hours.

It seems gluster is not using SEEK_HOLE for reading (?!).

Considering the source image is not being modified during the "Make 
Template" process and we have access to the gluster bricks, it is 
possible to 'cp' the source image directly from the bricks (on top of 
the SEEK_HOLE supporting FS) instead of reading from the gluster mount.

The difference is really impressive (seconds instead of hours).
I tested it cloning some VM's and it works.

Am I missing something?
Maybe a gluster optimization to enable SEEK_HOLE support on gluster mounts?

--------------080700090609090405050009
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: 7bit

<html>
  <head>

    <meta http-equiv="content-type" content="text/html;
charset=utf-8">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    Hi,<br>
    <br>
    I'm testing nightly builds.<br>
    In general, my experience after reinstalling from scratch and having
    solved my storage issues is very good.<br>
    Today I tried to "Make Template" and had some UX problems I would
    like to share with you.<br>
    I would also discuss a possible optimization for copying images
    using SEEK_HOLE.<br>
    <br>
    1) "Make Template" UX and performance problems:<br>
    <br>
    Source disk and destination disk (template) were on the same
    StorageDomain.<br>
    By accident (probably something very common) the SPM was set on an
    external host (that was not hosting this StorageDomain), so the
    whole image data went out and back to the same source machine.<br>
    This obviously takes very long (hours), while copying the sparse
    files directly only takes about 10 [s] with the below optimization.<br>
    <br>
    While making the templates, I believe I restarted VDSM or rebooted
    the SPM so the tasks went stale. My fault again.<br>
    <span style="color: rgb(0, 0, 0); font-family: 'Arial Unicode MS',
      Arial, sans-serif; font-size: small; font-style: normal;
      font-variant: normal; font-weight: normal; letter-spacing: normal;
      line-height: 21.6666679382324px; orphans: auto; text-align: start;
      text-indent: 0px; text-transform: none; white-space: normal;
      widows: 1; word-spacing: 0px; -webkit-text-stroke-width: 0px;
      display: inline !important; float: none; background-color:
      rgb(255, 255, 255);">I was able to remove the stale tasks in
      Engine by suspending VM's, stopping VDSM to set the host as non
      responding and using "confirm host has been rebooted".<br>
      Setting the host in maintenance to confirm it was rebooted was not
      possible because it had async. running tasks.<br>
      Aren't this tasks PIDs being checked to see if they are still
      alive?<br>
      <br>
      2) I saw that VDSM was running " /usr/bin/qemu-img convert":<br>
      <br>
      In this case, I believe it is enough to just copy the images
      instead of converting them<big>.</big><br>
      I made some tests and found that using "cp --sparse=always" is the
      best way to copy images </span><span style="color: rgb(0, 0, 0);
      font-family: 'Arial Unicode MS', Arial, sans-serif; font-size:
      small; font-style: normal; font-variant: normal; font-weight:
      normal; letter-spacing: normal; line-height: 21.6666679382324px;
      orphans: auto; text-align: start; text-indent: 0px;
      text-transform: none; white-space: normal; widows: 1;
      word-spacing: 0px; -webkit-text-stroke-width: 0px; display: inline
      !important; float: none; background-color: rgb(255, 255, 255);"><span
        style="color: rgb(0, 0, 0); font-family: 'Arial Unicode MS',
        Arial, sans-serif; font-size: small; font-style: normal;
        font-variant: normal; font-weight: normal; letter-spacing:
        normal; line-height: 21.6666679382324px; orphans: auto;
        text-align: start; text-indent: 0px; text-transform: none;
        white-space: normal; widows: 1; word-spacing: 0px;
        -webkit-text-stroke-width: 0px; display: inline !important;
        float: none; background-color: rgb(255, 255, 255);">to
</span>gluster
      mounts </span><span style="color: rgb(0, 0, 0); font-family:
      'Arial Unicode MS', Arial, sans-serif; font-size: small;
      font-style: normal; font-variant: normal; font-weight: normal;
      letter-spacing: normal; line-height: 21.6666679382324px; orphans:
      auto; text-align: start; text-indent: 0px; text-transform: none;
      white-space: normal; widows: 1; word-spacing: 0px;
      -webkit-text-stroke-width: 0px; display: inline !important; float:
      none; background-color: rgb(255, 255, 255);"><span style="color:
        rgb(0, 0, 0); font-family: 'Arial Unicode MS', Arial,
        sans-serif; font-size: small; font-style: normal; font-variant:
        normal; font-weight: normal; letter-spacing: normal;
        line-height: 21.6666679382324px; orphans: auto; text-align:
        start; text-indent: 0px; text-transform: none; white-space:
        normal; widows: 1; word-spacing: 0px; -webkit-text-stroke-width:
        0px; display: inline !important; float: none; background-color:
        rgb(255, 255, 255);">because it is faster </span>and because
      the resulting files are still sparse ('du' reports exactly the
      same sizes).<br>
      But I also discovered a bottleneck.<br>
      When copying sparse files (e.g. a 2 TB sparse image that only uses
      800 MB in disk, a common scenario when we create templates from
      fresh installs) the 'cp' command behaves differently depending on
      if we are reading from a gluster mount or from a filesystem
      supporting SEEK_HOLE (available in kernels &gt;= 3.1):<br>
      <br>
      a) If we read from a gluster mount, 'cp' reads the 2 TB of zeros,
      even when it only writes the non-zeros (iotop shows the 'cp'
      process reading those 2 TB). Only sparse writing is optimized.<br>
      <br>
    </span><span style="color: rgb(0, 0, 0); font-family: 'Arial
Unicode
      MS', Arial, sans-serif; font-size: small; font-style: normal;
      font-variant: normal; font-weight: normal; letter-spacing: normal;
      line-height: 21.6666679382324px; orphans: auto; text-align: start;
      text-indent: 0px; text-transform: none; white-space: normal;
      widows: 1; word-spacing: 0px; -webkit-text-stroke-width: 0px;
      display: inline !important; float: none; background-color:
      rgb(255, 255, 255);">b) If we read from a SEEK_HOLE supporting
      filesystem (ext, xfs, etc), 'cp' only reads the non-zero content,
      thus reading and writing takes like 10 [s] instead of hours.<br>
      <br>
      It seems gluster is not using SEEK_HOLE for reading (?!).<br>
      <br>
    </span>Considering the source image is not being modified during the
    "Make Template" process and we have access to the gluster bricks, it
    is possible to 'cp' the source image directly from the bricks (on
    top of the SEEK_HOLE supporting FS) instead of reading from the
    gluster mount.<br>
    <br>
    The difference is really impressive (seconds instead of hours).<br>
    I tested it cloning some VM's and it works.<br>
    <br>
    Am I missing something?<br>
    Maybe a gluster optimization to enable SEEK_HOLE support on gluster
    mounts?<br>
    <br>
  </body>
</html>

--------------080700090609090405050009--

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

[ovirt-devel] Stale "Make Template" tasks and locked templates + SEEK_HOLE optimization for copying images