Re: [ovirt-users] xfs fragmentation problem caused data domain to hang

2 Oct 2017


      This is a multi-part message in MIME format.
--------------64D829377F18A02EC6553431
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit


On 10/02/2017 11:05 AM, Jason Keltz wrote:
...
On 10/02/2017 11:00 AM, Yaniv Kaul wrote:
...
On Mon, Oct 2, 2017 at 5:57 PM, Jason Keltz <jas@cse.yorku.ca 
<mailto:jas@cse.yorku.ca>> wrote:
On 10/02/2017 10:51 AM, Yaniv Kaul wrote:
...
On Mon, Oct 2, 2017 at 5:14 PM, Jason Keltz <jas@cse.yorku.ca
    <mailto:jas@cse.yorku.ca>> wrote:
On 10/02/2017 01:22 AM, Yaniv Kaul wrote:
...
On Mon, Oct 2, 2017 at 5:11 AM, Jason Keltz
        <jas@cse.yorku.ca <mailto:jas@cse.yorku.ca>> wrote:
Hi.
For my data domain, I have one NFS server with a large
            RAID filesystem (9 TB).
            I'm only using 2 TB of that at the moment. Today, my
            NFS server  hung with
            the following error:
xfs: possible memory allocation deadlock in kmem_alloc
Can you share more of the log so we'll see what happened
        before and after?
        Y.
Here is engine-log from yesterday.. the problem started
            around 14:29 PM.
            http://www.eecs.yorku.ca/~jas/ovirt-debug/10012017/engine-log.txt
            <http://www.eecs.yorku.ca/%7Ejas/ovirt-debug/10012017/engine-log.txt>
Here is the vdsm log on one of the virtualization
            hosts, virt01:
            http://www.eecs.yorku.ca/~jas/ovirt-debug/10012017/vdsm.log.2
            <http://www.eecs.yorku.ca/%7Ejas/ovirt-debug/10012017/vdsm.log.2>
Doing further investigation, I found that the XFS error
            messages didn't start yesterday.  You'll see they
            started at the very end of the day on September 23.  See:
http://www.eecs.yorku.ca/~jas/ovirt-debug/messages-20170924
            <http://www.eecs.yorku.ca/%7Ejas/ovirt-debug/messages-20170924>
Our storage guys do NOT think it's an XFS fragmentation
        issue, but we'll be looking at it.
This is an interesting thread to read because the problem sounds quite 
similar:
http://oss.sgi.com/archives/xfs/2016-03/msg00447.html

In particular, quoted from that:
...
XFS maintains the full extent list for an active inode in memory,
As it is, yes, the memory allocation problem is with the in-core
extent tree, and we've known about it for some time. The issue is
that as memory gets fragmented, the top level indirection array
grows too large to be allocated as a contiguous chunk. When this
happens really depends on memory load, uptime and the way the extent
tree is being modified.
So in my case, I have a bunch of big XFS disk images for virtual disks.  
As the files are big with many extents, keeping all that information in 
memory at the same time may be the culprit.   Having many extents per se 
isn't the problem, but having enough memory to be able to store all the 
information simultaneously may be. Possible solutions would be to 
increase the default extent size of the volume (which I'm not sure how 
to do), defragment the disk, and hence less extents, or potentially add 
more memory to the file server.  It has 64G.
...
...
...
Hmmm... almost sorry to hear that because that would be easy
        to "fix"...
...
They continued on the 24th, then on the 26th... I think
            there were a few "hangs" on those times that people
            were complaining about, but we didn't catch the
            problem. However, the errors hit big time yesterday at
            14:27 PM... see here:
http://www.eecs.yorku.ca/~jas/ovirt-debug/messages-20171001
            <http://www.eecs.yorku.ca/%7Ejas/ovirt-debug/messages-20171001>
If you want any other logs, I'm happy to provide them. 
            I just don't know exactly what to provide.
Do you know if I can run the XFS defrag command live?
            Rather than on a disk by disk, I'd rather just do it on
            the whole filesystem.  There really aren't that many
            files since it's just ovirt disk images.  However, I
            don't understand the implications to running VMs.  I
            wouldn't want to do anything to create more downtime.
Should be enough to copy the disks to make them less
        fragmented.
Yes, but this requires downtime.. but there's plenty of
        additional storage, so this would fix things well.
Live storage migration could be used.
Y.
...
I had upgraded the engine server + 4 virtualization hosts
        from 4.1.1 to current on September 20 along with upgrading
        them from CentOS 7.3 to CentOS 7.4.  virtfs, the NFS file
        server, was running CentOS 7.3 and kernel
        vmlinuz-3.10.0-514.16.1.el7.x86_64. Only yesterday, did I
        upgrade it to CentOS 7.4 and hence kernel
        vmlinuz-3.10.0-693.2.2.el7.x86_64.
I believe the problem is fully XFS related, and not ovirt at
        all. Although, I must admit, ovirt didn't help either.  When
        I rebooted the file server, the iso and export domains were
        immediately active, but the data domain took quite a long
        time.  I kept trying to activate it, and it couldn't do it.
        I couldn't make a host an SPM.  I found that the data domain
        directory on the virtualization host was a "stale NFS file
        handle".  I rebooted one of the virtualization hosts
        (virt1), and tried to make it the SPM.  Again, it wouldn't
        work.  Finally, I ended up turning everything into
        maintenance mode, then activating just it, and I was able to
        make it the SPM.  I was then able to bring everything up.  I
        would have expected ovirt to handle the problem a little
        more gracefully, and give me more information because I was
        sweating thinking I had to restore all the VMs!
Stale NFS is on our todo list to handle. Quite challenging.
Thanks..
...
I didn't think when I chose XFS as the filesystem for my
        virtualization NFS server that I would have to defragment
        the filesystem manually.  This is like the old days of
        running Norton SpeedDisk to defrag my 386...
We are still not convinced it's an issue - but we'll look into
    it (and perhaps ask for more stats and data).
Thanks!
...
Y.
Thanks for any help you can provide...
Jason.
...
All 4 virtualization hosts of course had problems since
            there was no
            longer any storage.
In the end, it seems like the problem is related to XFS
            fragmentation...
I read this great blog here:
https://blog.codecentric.de/en/2017/04/xfs-possible-memory-allocation-deadlo...
            <https://blog.codecentric.de/en/2017/04/xfs-possible-memory-allocation-deadlock-kmem_alloc/>
In short, I tried this:
# xfs_db -r -c "frag -f" /dev/sdb1
            actual 4314253, ideal 43107, fragmentation factor 99.00%
Apparently the fragmentation factor doesn't mean much,
            but the fact that
            "actual" number of extents is considerably higher than
            "ideal" extents seems that it
            may be the problem.
I saw that many of my virtual disks that are written to
            a lot have, of course,
            a lot of extents...
For example, on our main web server disk image, there
            were 247,597
            extents alone!  I took the web server down, and ran the
            XFS defrag
            command on the disk...
# xfs_fsr -v 9a634692-1302-471f-a92e-c978b2b67fd0
            9a634692-1302-471f-a92e-c978b2b67fd0
            extents before:247597 after:429 DONE
            9a634692-1302-471f-a92e-c978b2b67fd0
247,597 before and 429 after!  WOW!
Are virtual disks a problem with XFS?  Why isn't this
            memory allocation
            deadlock issue more prevalent.  I do see this article
            mentioned on many
            web posts.  I don't specifically see any recommendation
            to *not* use
            XFS for the data domain though.
I was running CentOS 7.3 on the file server, but before
            rebooting the server,
            I upgraded to the latest kernel and CentOS 7.4 in the
            hopes that if there
            was a kernel issue, that this would solve it.
I took a few virtual systems down, and ran the defrag
            on the disks.  However,
            with over 30 virtual systems, I don't really want to do
            this individually.
            I was wondering if I could run xfs_fsr on all the disks
            LIVE?  It says in the
            manual that you can run it live, but I can't see how
            this would be good when
            a system is using that disk, and I don't want to deal
            with major
            corruption across the board. Any thoughts?
Thanks,
Jason.
_______________________________________________
            Users mailing list
            Users@ovirt.org <mailto:Users@ovirt.org>
            http://lists.ovirt.org/mailman/listinfo/users
            <http://lists.ovirt.org/mailman/listinfo/users>
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
--------------64D829377F18A02EC6553431
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: 8bit

<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <br>
    <div class="moz-cite-prefix">On 10/02/2017 11:05 AM, Jason Keltz
      wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:b3e2ab23-7bab-ddc2-e230-a650f87a1773@cse.yorku.ca">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <div class="moz-cite-prefix">On 10/02/2017 11:00 AM, Yaniv Kaul
        wrote:<br>
      </div>
      <blockquote type="cite"
cite="mid:CAJgorsb2ctuEaTpNkzvixsDSjF-_ABH6JDMgw5X03WUgZgbo2A@mail.gmail.com">
        <div dir="ltr"><br>
          <div class="gmail_extra"><br>
            <div class="gmail_quote">On Mon, Oct 2, 2017 at 5:57 PM,
              Jason Keltz <span dir="ltr"><<a
                  href="mailto:jas@cse.yorku.ca" target="_blank"
                  moz-do-not-send="true">jas@cse.yorku.ca</a>></span>
              wrote:<br>
              <blockquote class="gmail_quote" style="margin:0 0 0
                .8ex;border-left:1px #ccc solid;padding-left:1ex">
                <div text="#000000" bgcolor="#FFFFFF"><span class=""> <br>
                    <div class="m_3456688468548054330moz-cite-prefix">On
                      10/02/2017 10:51 AM, Yaniv Kaul wrote:<br>
                    </div>
                    <blockquote type="cite">
                      <div dir="ltr"><br>
                        <div class="gmail_extra"><br>
                          <div class="gmail_quote">On Mon, Oct 2, 2017
                            at 5:14 PM, Jason Keltz <span dir="ltr"><<a
                                href="mailto:jas@cse.yorku.ca"
                                target="_blank" moz-do-not-send="true">jas@cse.yorku.ca</a>></span>
                            wrote:<br>
                            <blockquote class="gmail_quote"
                              style="margin:0 0 0 .8ex;border-left:1px
                              #ccc solid;padding-left:1ex">
                              <div text="#000000" bgcolor="#FFFFFF"><span>
                                  <br>
                                  <div
                                    class="m_3456688468548054330m_-6564063642909371047moz-cite-prefix">On
                                    10/02/2017 01:22 AM, Yaniv Kaul
                                    wrote:<br>
                                  </div>
                                  <blockquote type="cite">
                                    <div dir="ltr"><br>
                                      <div class="gmail_extra"><br>
                                        <div class="gmail_quote">On Mon,
                                          Oct 2, 2017 at 5:11 AM, Jason
                                          Keltz <span dir="ltr"><<a
href="mailto:jas@cse.yorku.ca" target="_blank" moz-do-not-send="true">jas@cse.yorku.ca</a>></span>
                                          wrote:<br>
                                          <blockquote
                                            class="gmail_quote"
                                            style="margin:0 0 0
                                            .8ex;border-left:1px #ccc
                                            solid;padding-left:1ex">Hi.<br>
                                            <br>
                                            For my data domain, I have
                                            one NFS server with a large
                                            RAID filesystem (9 TB).<br>
                                            I'm only using 2 TB of that
                                            at the moment. Today, my NFS
                                            server  hung with<br>
                                            the following error:<br>
                                            <br>
                                            <blockquote
                                              class="gmail_quote"
                                              style="margin:0 0 0
                                              .8ex;border-left:1px #ccc
                                              solid;padding-left:1ex">
                                              xfs: possible memory
                                              allocation deadlock in
                                              kmem_alloc<br>
                                            </blockquote>
                                          </blockquote>
                                          <div><br>
                                          </div>
                                          <div>Can you share more of the
                                            log so we'll see what
                                            happened before and after?</div>
                                          <div>Y.</div>
                                        </div>
                                      </div>
                                    </div>
                                  </blockquote>
                                </span><span class="">
                                  <blockquote type="cite">
                                    <div dir="ltr">
                                      <div class="gmail_extra">
                                        <div class="gmail_quote">
                                          <blockquote
                                            class="gmail_quote"
                                            style="margin:0 0 0
                                            .8ex;border-left:1px #ccc
                                            solid;padding-left:1ex">
                                            <div text="#000000"
                                              bgcolor="#FFFFFF"> <br>
                                              Here is engine-log from
                                              yesterday.. the problem
                                              started around 14:29 PM.<br>
                                              <a
                                                class="m_3456688468548054330m_-6564063642909371047moz-txt-link-freetext"
href="http://www.eecs.yorku.ca/%7Ejas/ovirt-debug/10012017/engine-log.txt"
                                                target="_blank"
                                                moz-do-not-send="true">http://www.eecs.yorku.ca/~jas/<wbr>ovirt-debug/10012017/engine-lo<wbr>g.txt</a><br>
                                              <br>
                                              Here is the vdsm log on
                                              one of the virtualization
                                              hosts, virt01:<br>
                                              <a
                                                class="m_3456688468548054330m_-6564063642909371047moz-txt-link-freetext"
href="http://www.eecs.yorku.ca/%7Ejas/ovirt-debug/10012017/vdsm.log.2"
                                                target="_blank"
                                                moz-do-not-send="true">http://www.eecs.yorku.ca/~jas/<wbr>ovirt-debug/10012017/vdsm.log.<wbr>2</a><br>
                                              <br>
                                              Doing further
                                              investigation, I found
                                              that the XFS error
                                              messages didn't start
                                              yesterday.  You'll see
                                              they started at the very
                                              end of the day on
                                              September 23.  See:<br>
                                              <br>
                                              <a
                                                class="m_3456688468548054330m_-6564063642909371047moz-txt-link-freetext"
href="http://www.eecs.yorku.ca/%7Ejas/ovirt-debug/messages-20170924"
                                                target="_blank"
                                                moz-do-not-send="true">http://www.eecs.yorku.ca/~jas/<wbr>ovirt-debug/messages-20170924</a>
                                              <br>
                                            </div>
                                          </blockquote>
                                          <div><br>
                                          </div>
                                          <div>Our storage guys do NOT
                                            think it's an XFS
                                            fragmentation issue, but
                                            we'll be looking at it.</div>
                                          <div> </div>
                                        </div>
                                      </div>
                                    </div>
                                  </blockquote>
                                </span></div>
                            </blockquote>
                          </div>
                        </div>
                      </div>
                    </blockquote>
                  </span></div>
              </blockquote>
            </div>
          </div>
        </div>
      </blockquote>
    </blockquote>
    This is an interesting thread to read because the problem sounds
    quite similar:<br>
    <br>
    <a class="moz-txt-link-freetext" href="http://oss.sgi.com/archives/xfs/2016-03/msg00447.html">http://oss.sgi.com/archives/xfs/2016-03/msg00447.html</a><br>
    <br>
    In particular, quoted from that:<br>
    <blockquote type="cite">
      <pre>XFS maintains the full extent list for an active inode in memory,</pre>
    </blockquote>
    <blockquote type="cite">
      <pre>As it is, yes, the memory allocation problem is with the in-core
extent tree, and we've known about it for some time. The issue is
that as memory gets fragmented, the top level indirection array
grows too large to be allocated as a contiguous chunk. When this
happens really depends on memory load, uptime and the way the extent
tree is being modified.
</pre>
    </blockquote>
    <br>
    So in my case, I have a bunch of big XFS disk images for virtual
    disks.  As the files are big with many extents, keeping all that
    information in memory at the same time may be the culprit.   Having
    many extents per se isn't the problem, but having enough memory to
    be able to store all the information simultaneously may be. 
    Possible solutions would be to increase the default extent size of
    the volume (which I'm not sure how to do), defragment the disk, and
    hence less extents, or potentially add more memory to the file
    server.  It has 64G.  <br>
    <br>
    <blockquote type="cite"
      cite="mid:b3e2ab23-7bab-ddc2-e230-a650f87a1773@cse.yorku.ca">
      <blockquote type="cite"
cite="mid:CAJgorsb2ctuEaTpNkzvixsDSjF-_ABH6JDMgw5X03WUgZgbo2A@mail.gmail.com">
        <div dir="ltr">
          <div class="gmail_extra">
            <div class="gmail_quote">
              <blockquote class="gmail_quote" style="margin:0 0 0
                .8ex;border-left:1px #ccc solid;padding-left:1ex">
                <div text="#000000" bgcolor="#FFFFFF"><span class="">
                    <blockquote type="cite">
                      <div dir="ltr">
                        <div class="gmail_extra">
                          <div class="gmail_quote">
                            <blockquote class="gmail_quote"
                              style="margin:0 0 0 .8ex;border-left:1px
                              #ccc solid;padding-left:1ex">
                              <div text="#000000" bgcolor="#FFFFFF"><span
                                  class="">
                                  <blockquote type="cite">
                                    <div dir="ltr">
                                      <div class="gmail_extra">
                                        <div class="gmail_quote"> </div>
                                      </div>
                                    </div>
                                  </blockquote>
                                </span> Hmmm... almost sorry to hear
                                that because that would be easy to
                                "fix"...  <br>
                                <span class=""> <br>
                                  <blockquote type="cite">
                                    <div dir="ltr">
                                      <div class="gmail_extra">
                                        <div class="gmail_quote">
                                          <blockquote
                                            class="gmail_quote"
                                            style="margin:0 0 0
                                            .8ex;border-left:1px #ccc
                                            solid;padding-left:1ex">
                                            <div text="#000000"
                                              bgcolor="#FFFFFF"> <br>
                                              They continued on the
                                              24th, then on the 26th...
                                              I think there were a few
                                              "hangs" on those times
                                              that people were
                                              complaining about, but we
                                              didn't catch the problem. 
                                              However, the errors hit
                                              big time yesterday at
                                              14:27 PM... see here:<br>
                                              <br>
                                              <a
                                                class="m_3456688468548054330m_-6564063642909371047moz-txt-link-freetext"
href="http://www.eecs.yorku.ca/%7Ejas/ovirt-debug/messages-20171001"
                                                target="_blank"
                                                moz-do-not-send="true">http://www.eecs.yorku.ca/~jas/<wbr>ovirt-debug/messages-20171001</a><br>
                                              <br>
                                              If you want any other
                                              logs, I'm happy to provide
                                              them.  I just don't know
                                              exactly what to provide.<br>
                                              <br>
                                              Do you know if I can run
                                              the XFS defrag command
                                              live? Rather than on a
                                              disk by disk, I'd rather
                                              just do it on the whole
                                              filesystem.  There really
                                              aren't that many files
                                              since it's just ovirt disk
                                              images.  However, I don't
                                              understand the
                                              implications to running
                                              VMs.  I wouldn't want to
                                              do anything to create more
                                              downtime.<br>
                                            </div>
                                          </blockquote>
                                          <div><br>
                                          </div>
                                          <div>Should be enough to copy
                                            the disks to make them less
                                            fragmented.</div>
                                          <div> </div>
                                        </div>
                                      </div>
                                    </div>
                                  </blockquote>
                                </span> Yes, but this requires
                                downtime.. but there's plenty of
                                additional storage, so this would fix
                                things well.</div>
                            </blockquote>
                          </div>
                        </div>
                      </div>
                    </blockquote>
                  </span></div>
              </blockquote>
              <div><br>
              </div>
              <div>Live storage migration could be used.</div>
              <div>Y.</div>
              <div> </div>
              <blockquote class="gmail_quote" style="margin:0 0 0
                .8ex;border-left:1px #ccc solid;padding-left:1ex">
                <div text="#000000" bgcolor="#FFFFFF"><span class=""><br>
                    <br>
                    <blockquote type="cite">
                      <div dir="ltr">
                        <div class="gmail_extra">
                          <div class="gmail_quote">
                            <blockquote class="gmail_quote"
                              style="margin:0 0 0 .8ex;border-left:1px
                              #ccc solid;padding-left:1ex">
                              <div text="#000000" bgcolor="#FFFFFF"> <br>
                                I had upgraded the engine server + 4
                                virtualization hosts from 4.1.1 to
                                current on September 20 along with
                                upgrading them from CentOS 7.3 to CentOS
                                7.4.  virtfs, the NFS file server, was
                                running CentOS 7.3 and kernel 
                                vmlinuz-3.10.0-514.16.1.el7.x8<wbr>6_64. 
                                Only yesterday, did I upgrade it to
                                CentOS 7.4 and hence kernel
                                vmlinuz-3.10.0-693.2.2.el7.x86<wbr>_64.<br>
                                <br>
                                I believe the problem is fully XFS
                                related, and not ovirt at all.  
                                Although, I must admit, ovirt didn't
                                help either.  When I rebooted the file
                                server, the iso and export domains were
                                immediately active, but the data domain
                                took quite a long time.  I kept trying
                                to activate it, and it couldn't do it. 
                                I couldn't make a host an SPM.  I found
                                that the data domain directory on the
                                virtualization host was a "stale NFS
                                file handle".  I rebooted one of the
                                virtualization hosts (virt1), and tried
                                to make it the SPM.  Again, it wouldn't
                                work.  Finally, I ended up turning
                                everything into maintenance mode, then
                                activating just it, and I was able to
                                make it the SPM.  I was then able to
                                bring everything up.  I would have
                                expected ovirt to handle the problem a
                                little more gracefully, and give me more
                                information because I was sweating
                                thinking I had to restore all the VMs!<br>
                              </div>
                            </blockquote>
                            <div><br>
                            </div>
                            <div>Stale NFS is on our todo list to
                              handle. Quite challenging.</div>
                            <div> </div>
                          </div>
                        </div>
                      </div>
                    </blockquote>
                  </span> Thanks..<span class=""><br>
                    <br>
                    <blockquote type="cite">
                      <div dir="ltr">
                        <div class="gmail_extra">
                          <div class="gmail_quote">
                            <blockquote class="gmail_quote"
                              style="margin:0 0 0 .8ex;border-left:1px
                              #ccc solid;padding-left:1ex">
                              <div text="#000000" bgcolor="#FFFFFF"> <br>
                                I didn't think when I chose XFS as the
                                filesystem for my virtualization NFS
                                server that I would have to defragment
                                the filesystem manually.  This is like
                                the old days of running Norton SpeedDisk
                                to defrag my 386...<br>
                              </div>
                            </blockquote>
                            <div><br>
                            </div>
                            <div>We are still not convinced it's an
                              issue - but we'll look into it (and
                              perhaps ask for more stats and data).</div>
                          </div>
                        </div>
                      </div>
                    </blockquote>
                  </span> Thanks!
                  <div>
                    <div class="h5"><br>
                      <br>
                      <blockquote type="cite">
                        <div dir="ltr">
                          <div class="gmail_extra">
                            <div class="gmail_quote">
                              <div>Y.</div>
                              <div> </div>
                              <blockquote class="gmail_quote"
                                style="margin:0 0 0 .8ex;border-left:1px
                                #ccc solid;padding-left:1ex">
                                <div text="#000000" bgcolor="#FFFFFF"> <br>
                                  Thanks for any help you can provide...<span
                                    class="m_3456688468548054330HOEnZb"><font
                                      color="#888888"><br>
                                      <br>
                                      Jason.</font></span>
                                  <div>
                                    <div class="m_3456688468548054330h5"><br>
                                      <br>
                                      <blockquote type="cite">
                                        <div dir="ltr">
                                          <div class="gmail_extra">
                                            <div class="gmail_quote">
                                              <div> </div>
                                              <blockquote
                                                class="gmail_quote"
                                                style="margin:0 0 0
                                                .8ex;border-left:1px
                                                #ccc
                                                solid;padding-left:1ex">
                                                <blockquote
                                                  class="gmail_quote"
                                                  style="margin:0 0 0
                                                  .8ex;border-left:1px
                                                  #ccc
                                                  solid;padding-left:1ex">
                                                </blockquote>
                                                <br>
                                                All 4 virtualization
                                                hosts of course had
                                                problems since there was
                                                no<br>
                                                longer any storage.<br>
                                                <br>
                                                In the end, it seems
                                                like the problem is
                                                related to XFS
                                                fragmentation...<br>
                                                <br>
                                                I read this great blog
                                                here:<br>
                                                <br>
                                                <a
href="https://blog.codecentric.de/en/2017/04/xfs-possible-memory-allocation-deadlo..."
                                                  rel="noreferrer"
                                                  target="_blank"
                                                  moz-do-not-send="true">https://blog.codecentric.de/en<wbr>/2017/04/xfs-possible-memory-a<wbr>llocation-deadlock-kmem_alloc/</a><br>
                                                <br>
                                                In short, I tried this:<br>
                                                <br>
                                                # xfs_db -r -c "frag -f"
                                                /dev/sdb1<br>
                                                actual 4314253, ideal
                                                43107, fragmentation
                                                factor 99.00%<br>
                                                <br>
                                                Apparently the
                                                fragmentation factor
                                                doesn't mean much, but
                                                the fact that<br>
                                                "actual" number of
                                                extents is considerably
                                                higher than "ideal"
                                                extents seems that it<br>
                                                may be the problem.<br>
                                                <br>
                                                I saw that many of my
                                                virtual disks that are
                                                written to a lot have,
                                                of course,<br>
                                                a lot of extents...<br>
                                                <br>
                                                For example, on our main
                                                web server disk image,
                                                there were 247,597<br>
                                                extents alone!  I took
                                                the web server down, and
                                                ran the XFS defrag<br>
                                                command on the disk...<br>
                                                <br>
                                                # xfs_fsr -v
                                                9a634692-1302-471f-a92e-c978b2<wbr>b67fd0<br>
9a634692-1302-471f-a92e-c978b2<wbr>b67fd0<br>
                                                extents before:247597
                                                after:429 DONE
                                                9a634692-1302-471f-a92e-c978b2<wbr>b67fd0<br>
                                                <br>
                                                247,597 before and 429
                                                after!  WOW!<br>
                                                <br>
                                                Are virtual disks a
                                                problem with XFS?  Why
                                                isn't this memory
                                                allocation<br>
                                                deadlock issue more
                                                prevalent.  I do see
                                                this article mentioned
                                                on many<br>
                                                web posts.  I don't
                                                specifically see any
                                                recommendation to *not*
                                                use<br>
                                                XFS for the data domain
                                                though.<br>
                                                <br>
                                                I was running CentOS 7.3
                                                on the file server, but
                                                before rebooting the
                                                server,<br>
                                                I upgraded to the latest
                                                kernel and CentOS 7.4 in
                                                the hopes that if there<br>
                                                was a kernel issue, that
                                                this would solve it.<br>
                                                <br>
                                                I took a few virtual
                                                systems down, and ran
                                                the defrag on the
                                                disks.  However,<br>
                                                with over 30 virtual
                                                systems, I don't really
                                                want to do this
                                                individually.<br>
                                                I was wondering if I
                                                could run xfs_fsr on all
                                                the disks LIVE?  It says
                                                in the<br>
                                                manual that you can run
                                                it live, but I can't see
                                                how this would be good
                                                when<br>
                                                a system is using that
                                                disk, and I don't want
                                                to deal with major<br>
                                                corruption across the
                                                board. Any thoughts?<br>
                                                <br>
                                                Thanks,<br>
                                                <br>
                                                Jason.<br>
                                                 <br>
______________________________<wbr>_________________<br>
                                                Users mailing list<br>
                                                <a
                                                  href="mailto:Users@ovirt.org"
                                                  target="_blank"
                                                  moz-do-not-send="true">Users@ovirt.org</a><br>
                                                <a
                                                  href="http://lists.ovirt.org/mailman/listinfo/users"
                                                  rel="noreferrer"
                                                  target="_blank"
                                                  moz-do-not-send="true">http://lists.ovirt.org/mailman<wbr>/listinfo/users</a><br>
                                              </blockquote>
                                            </div>
                                            <br>
                                          </div>
                                        </div>
                                      </blockquote>
                                      <br>
                                    </div>
                                  </div>
                                </div>
                              </blockquote>
                            </div>
                            <br>
                          </div>
                        </div>
                      </blockquote>
                      <br>
                    </div>
                  </div>
                </div>
              </blockquote>
            </div>
            <br>
          </div>
        </div>
      </blockquote>
      <br>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
Users mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Users@ovirt.org">Users@ovirt.org</a>
<a class="moz-txt-link-freetext" href="http://lists.ovirt.org/mailman/listinfo/users">http://lists.ovirt.org/mailman/listinfo/users</a>
</pre>
    </blockquote>
    <br>
  </body>
</html>

--------------64D829377F18A02EC6553431--