<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <br>
    <div class="moz-cite-prefix">On 10/02/2017 11:05 AM, Jason Keltz
      wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:b3e2ab23-7bab-ddc2-e230-a650f87a1773@cse.yorku.ca">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <div class="moz-cite-prefix">On 10/02/2017 11:00 AM, Yaniv Kaul
        wrote:<br>
      </div>
      <blockquote type="cite"
cite="mid:CAJgorsb2ctuEaTpNkzvixsDSjF-_ABH6JDMgw5X03WUgZgbo2A@mail.gmail.com">
        <div dir="ltr"><br>
          <div class="gmail_extra"><br>
            <div class="gmail_quote">On Mon, Oct 2, 2017 at 5:57 PM,
              Jason Keltz <span dir="ltr">&lt;<a
                  href="mailto:jas@cse.yorku.ca" target="_blank"
                  moz-do-not-send="true">jas@cse.yorku.ca</a>&gt;</span>
              wrote:<br>
              <blockquote class="gmail_quote" style="margin:0 0 0
                .8ex;border-left:1px #ccc solid;padding-left:1ex">
                <div text="#000000" bgcolor="#FFFFFF"><span class=""> <br>
                    <div class="m_3456688468548054330moz-cite-prefix">On
                      10/02/2017 10:51 AM, Yaniv Kaul wrote:<br>
                    </div>
                    <blockquote type="cite">
                      <div dir="ltr"><br>
                        <div class="gmail_extra"><br>
                          <div class="gmail_quote">On Mon, Oct 2, 2017
                            at 5:14 PM, Jason Keltz <span dir="ltr">&lt;<a
                                href="mailto:jas@cse.yorku.ca"
                                target="_blank" moz-do-not-send="true">jas@cse.yorku.ca</a>&gt;</span>
                            wrote:<br>
                            <blockquote class="gmail_quote"
                              style="margin:0 0 0 .8ex;border-left:1px
                              #ccc solid;padding-left:1ex">
                              <div text="#000000" bgcolor="#FFFFFF"><span>
                                  <br>
                                  <div
                                    class="m_3456688468548054330m_-6564063642909371047moz-cite-prefix">On
                                    10/02/2017 01:22 AM, Yaniv Kaul
                                    wrote:<br>
                                  </div>
                                  <blockquote type="cite">
                                    <div dir="ltr"><br>
                                      <div class="gmail_extra"><br>
                                        <div class="gmail_quote">On Mon,
                                          Oct 2, 2017 at 5:11 AM, Jason
                                          Keltz <span dir="ltr">&lt;<a
href="mailto:jas@cse.yorku.ca" target="_blank" moz-do-not-send="true">jas@cse.yorku.ca</a>&gt;</span>
                                          wrote:<br>
                                          <blockquote
                                            class="gmail_quote"
                                            style="margin:0 0 0
                                            .8ex;border-left:1px #ccc
                                            solid;padding-left:1ex">Hi.<br>
                                            <br>
                                            For my data domain, I have
                                            one NFS server with a large
                                            RAID filesystem (9 TB).<br>
                                            I'm only using 2 TB of that
                                            at the moment. Today, my NFS
                                            server  hung with<br>
                                            the following error:<br>
                                            <br>
                                            <blockquote
                                              class="gmail_quote"
                                              style="margin:0 0 0
                                              .8ex;border-left:1px #ccc
                                              solid;padding-left:1ex">
                                              xfs: possible memory
                                              allocation deadlock in
                                              kmem_alloc<br>
                                            </blockquote>
                                          </blockquote>
                                          <div><br>
                                          </div>
                                          <div>Can you share more of the
                                            log so we'll see what
                                            happened before and after?</div>
                                          <div>Y.</div>
                                        </div>
                                      </div>
                                    </div>
                                  </blockquote>
                                </span><span class="">
                                  <blockquote type="cite">
                                    <div dir="ltr">
                                      <div class="gmail_extra">
                                        <div class="gmail_quote">
                                          <blockquote
                                            class="gmail_quote"
                                            style="margin:0 0 0
                                            .8ex;border-left:1px #ccc
                                            solid;padding-left:1ex">
                                            <div text="#000000"
                                              bgcolor="#FFFFFF"> <br>
                                              Here is engine-log from
                                              yesterday.. the problem
                                              started around 14:29 PM.<br>
                                              <a
                                                class="m_3456688468548054330m_-6564063642909371047moz-txt-link-freetext"
href="http://www.eecs.yorku.ca/%7Ejas/ovirt-debug/10012017/engine-log.txt"
                                                target="_blank"
                                                moz-do-not-send="true">http://www.eecs.yorku.ca/~jas/<wbr>ovirt-debug/10012017/engine-lo<wbr>g.txt</a><br>
                                              <br>
                                              Here is the vdsm log on
                                              one of the virtualization
                                              hosts, virt01:<br>
                                              <a
                                                class="m_3456688468548054330m_-6564063642909371047moz-txt-link-freetext"
href="http://www.eecs.yorku.ca/%7Ejas/ovirt-debug/10012017/vdsm.log.2"
                                                target="_blank"
                                                moz-do-not-send="true">http://www.eecs.yorku.ca/~jas/<wbr>ovirt-debug/10012017/vdsm.log.<wbr>2</a><br>
                                              <br>
                                              Doing further
                                              investigation, I found
                                              that the XFS error
                                              messages didn't start
                                              yesterday.  You'll see
                                              they started at the very
                                              end of the day on
                                              September 23.  See:<br>
                                              <br>
                                              <a
                                                class="m_3456688468548054330m_-6564063642909371047moz-txt-link-freetext"
href="http://www.eecs.yorku.ca/%7Ejas/ovirt-debug/messages-20170924"
                                                target="_blank"
                                                moz-do-not-send="true">http://www.eecs.yorku.ca/~jas/<wbr>ovirt-debug/messages-20170924</a>
                                              <br>
                                            </div>
                                          </blockquote>
                                          <div><br>
                                          </div>
                                          <div>Our storage guys do NOT
                                            think it's an XFS
                                            fragmentation issue, but
                                            we'll be looking at it.</div>
                                          <div> </div>
                                        </div>
                                      </div>
                                    </div>
                                  </blockquote>
                                </span></div>
                            </blockquote>
                          </div>
                        </div>
                      </div>
                    </blockquote>
                  </span></div>
              </blockquote>
            </div>
          </div>
        </div>
      </blockquote>
    </blockquote>
    This is an interesting thread to read because the problem sounds
    quite similar:<br>
    <br>
    <a class="moz-txt-link-freetext" href="http://oss.sgi.com/archives/xfs/2016-03/msg00447.html">http://oss.sgi.com/archives/xfs/2016-03/msg00447.html</a><br>
    <br>
    In particular, quoted from that:<br>
    <blockquote type="cite">
      <pre>XFS maintains the full extent list for an active inode in memory,</pre>
    </blockquote>
    <blockquote type="cite">
      <pre>As it is, yes, the memory allocation problem is with the in-core
extent tree, and we've known about it for some time. The issue is
that as memory gets fragmented, the top level indirection array
grows too large to be allocated as a contiguous chunk. When this
happens really depends on memory load, uptime and the way the extent
tree is being modified.
</pre>
    </blockquote>
    <br>
    So in my case, I have a bunch of big XFS disk images for virtual
    disks.  As the files are big with many extents, keeping all that
    information in memory at the same time may be the culprit.   Having
    many extents per se isn't the problem, but having enough memory to
    be able to store all the information simultaneously may be. 
    Possible solutions would be to increase the default extent size of
    the volume (which I'm not sure how to do), defragment the disk, and
    hence less extents, or potentially add more memory to the file
    server.  It has 64G.  <br>
    <br>
    <blockquote type="cite"
      cite="mid:b3e2ab23-7bab-ddc2-e230-a650f87a1773@cse.yorku.ca">
      <blockquote type="cite"
cite="mid:CAJgorsb2ctuEaTpNkzvixsDSjF-_ABH6JDMgw5X03WUgZgbo2A@mail.gmail.com">
        <div dir="ltr">
          <div class="gmail_extra">
            <div class="gmail_quote">
              <blockquote class="gmail_quote" style="margin:0 0 0
                .8ex;border-left:1px #ccc solid;padding-left:1ex">
                <div text="#000000" bgcolor="#FFFFFF"><span class="">
                    <blockquote type="cite">
                      <div dir="ltr">
                        <div class="gmail_extra">
                          <div class="gmail_quote">
                            <blockquote class="gmail_quote"
                              style="margin:0 0 0 .8ex;border-left:1px
                              #ccc solid;padding-left:1ex">
                              <div text="#000000" bgcolor="#FFFFFF"><span
                                  class="">
                                  <blockquote type="cite">
                                    <div dir="ltr">
                                      <div class="gmail_extra">
                                        <div class="gmail_quote"> </div>
                                      </div>
                                    </div>
                                  </blockquote>
                                </span> Hmmm... almost sorry to hear
                                that because that would be easy to
                                "fix"...  <br>
                                <span class=""> <br>
                                  <blockquote type="cite">
                                    <div dir="ltr">
                                      <div class="gmail_extra">
                                        <div class="gmail_quote">
                                          <blockquote
                                            class="gmail_quote"
                                            style="margin:0 0 0
                                            .8ex;border-left:1px #ccc
                                            solid;padding-left:1ex">
                                            <div text="#000000"
                                              bgcolor="#FFFFFF"> <br>
                                              They continued on the
                                              24th, then on the 26th...
                                              I think there were a few
                                              "hangs" on those times
                                              that people were
                                              complaining about, but we
                                              didn't catch the problem. 
                                              However, the errors hit
                                              big time yesterday at
                                              14:27 PM... see here:<br>
                                              <br>
                                              <a
                                                class="m_3456688468548054330m_-6564063642909371047moz-txt-link-freetext"
href="http://www.eecs.yorku.ca/%7Ejas/ovirt-debug/messages-20171001"
                                                target="_blank"
                                                moz-do-not-send="true">http://www.eecs.yorku.ca/~jas/<wbr>ovirt-debug/messages-20171001</a><br>
                                              <br>
                                              If you want any other
                                              logs, I'm happy to provide
                                              them.  I just don't know
                                              exactly what to provide.<br>
                                              <br>
                                              Do you know if I can run
                                              the XFS defrag command
                                              live? Rather than on a
                                              disk by disk, I'd rather
                                              just do it on the whole
                                              filesystem.  There really
                                              aren't that many files
                                              since it's just ovirt disk
                                              images.  However, I don't
                                              understand the
                                              implications to running
                                              VMs.  I wouldn't want to
                                              do anything to create more
                                              downtime.<br>
                                            </div>
                                          </blockquote>
                                          <div><br>
                                          </div>
                                          <div>Should be enough to copy
                                            the disks to make them less
                                            fragmented.</div>
                                          <div> </div>
                                        </div>
                                      </div>
                                    </div>
                                  </blockquote>
                                </span> Yes, but this requires
                                downtime.. but there's plenty of
                                additional storage, so this would fix
                                things well.</div>
                            </blockquote>
                          </div>
                        </div>
                      </div>
                    </blockquote>
                  </span></div>
              </blockquote>
              <div><br>
              </div>
              <div>Live storage migration could be used.</div>
              <div>Y.</div>
              <div> </div>
              <blockquote class="gmail_quote" style="margin:0 0 0
                .8ex;border-left:1px #ccc solid;padding-left:1ex">
                <div text="#000000" bgcolor="#FFFFFF"><span class=""><br>
                    <br>
                    <blockquote type="cite">
                      <div dir="ltr">
                        <div class="gmail_extra">
                          <div class="gmail_quote">
                            <blockquote class="gmail_quote"
                              style="margin:0 0 0 .8ex;border-left:1px
                              #ccc solid;padding-left:1ex">
                              <div text="#000000" bgcolor="#FFFFFF"> <br>
                                I had upgraded the engine server + 4
                                virtualization hosts from 4.1.1 to
                                current on September 20 along with
                                upgrading them from CentOS 7.3 to CentOS
                                7.4.  virtfs, the NFS file server, was
                                running CentOS 7.3 and kernel 
                                vmlinuz-3.10.0-514.16.1.el7.x8<wbr>6_64. 
                                Only yesterday, did I upgrade it to
                                CentOS 7.4 and hence kernel
                                vmlinuz-3.10.0-693.2.2.el7.x86<wbr>_64.<br>
                                <br>
                                I believe the problem is fully XFS
                                related, and not ovirt at all.  
                                Although, I must admit, ovirt didn't
                                help either.  When I rebooted the file
                                server, the iso and export domains were
                                immediately active, but the data domain
                                took quite a long time.  I kept trying
                                to activate it, and it couldn't do it. 
                                I couldn't make a host an SPM.  I found
                                that the data domain directory on the
                                virtualization host was a "stale NFS
                                file handle".  I rebooted one of the
                                virtualization hosts (virt1), and tried
                                to make it the SPM.  Again, it wouldn't
                                work.  Finally, I ended up turning
                                everything into maintenance mode, then
                                activating just it, and I was able to
                                make it the SPM.  I was then able to
                                bring everything up.  I would have
                                expected ovirt to handle the problem a
                                little more gracefully, and give me more
                                information because I was sweating
                                thinking I had to restore all the VMs!<br>
                              </div>
                            </blockquote>
                            <div><br>
                            </div>
                            <div>Stale NFS is on our todo list to
                              handle. Quite challenging.</div>
                            <div> </div>
                          </div>
                        </div>
                      </div>
                    </blockquote>
                  </span> Thanks..<span class=""><br>
                    <br>
                    <blockquote type="cite">
                      <div dir="ltr">
                        <div class="gmail_extra">
                          <div class="gmail_quote">
                            <blockquote class="gmail_quote"
                              style="margin:0 0 0 .8ex;border-left:1px
                              #ccc solid;padding-left:1ex">
                              <div text="#000000" bgcolor="#FFFFFF"> <br>
                                I didn't think when I chose XFS as the
                                filesystem for my virtualization NFS
                                server that I would have to defragment
                                the filesystem manually.  This is like
                                the old days of running Norton SpeedDisk
                                to defrag my 386...<br>
                              </div>
                            </blockquote>
                            <div><br>
                            </div>
                            <div>We are still not convinced it's an
                              issue - but we'll look into it (and
                              perhaps ask for more stats and data).</div>
                          </div>
                        </div>
                      </div>
                    </blockquote>
                  </span> Thanks!
                  <div>
                    <div class="h5"><br>
                      <br>
                      <blockquote type="cite">
                        <div dir="ltr">
                          <div class="gmail_extra">
                            <div class="gmail_quote">
                              <div>Y.</div>
                              <div> </div>
                              <blockquote class="gmail_quote"
                                style="margin:0 0 0 .8ex;border-left:1px
                                #ccc solid;padding-left:1ex">
                                <div text="#000000" bgcolor="#FFFFFF"> <br>
                                  Thanks for any help you can provide...<span
                                    class="m_3456688468548054330HOEnZb"><font
                                      color="#888888"><br>
                                      <br>
                                      Jason.</font></span>
                                  <div>
                                    <div class="m_3456688468548054330h5"><br>
                                      <br>
                                      <blockquote type="cite">
                                        <div dir="ltr">
                                          <div class="gmail_extra">
                                            <div class="gmail_quote">
                                              <div> </div>
                                              <blockquote
                                                class="gmail_quote"
                                                style="margin:0 0 0
                                                .8ex;border-left:1px
                                                #ccc
                                                solid;padding-left:1ex">
                                                <blockquote
                                                  class="gmail_quote"
                                                  style="margin:0 0 0
                                                  .8ex;border-left:1px
                                                  #ccc
                                                  solid;padding-left:1ex">
                                                </blockquote>
                                                <br>
                                                All 4 virtualization
                                                hosts of course had
                                                problems since there was
                                                no<br>
                                                longer any storage.<br>
                                                <br>
                                                In the end, it seems
                                                like the problem is
                                                related to XFS
                                                fragmentation...<br>
                                                <br>
                                                I read this great blog
                                                here:<br>
                                                <br>
                                                <a
href="https://blog.codecentric.de/en/2017/04/xfs-possible-memory-allocation-deadlock-kmem_alloc/"
                                                  rel="noreferrer"
                                                  target="_blank"
                                                  moz-do-not-send="true">https://blog.codecentric.de/en<wbr>/2017/04/xfs-possible-memory-a<wbr>llocation-deadlock-kmem_alloc/</a><br>
                                                <br>
                                                In short, I tried this:<br>
                                                <br>
                                                # xfs_db -r -c "frag -f"
                                                /dev/sdb1<br>
                                                actual 4314253, ideal
                                                43107, fragmentation
                                                factor 99.00%<br>
                                                <br>
                                                Apparently the
                                                fragmentation factor
                                                doesn't mean much, but
                                                the fact that<br>
                                                "actual" number of
                                                extents is considerably
                                                higher than "ideal"
                                                extents seems that it<br>
                                                may be the problem.<br>
                                                <br>
                                                I saw that many of my
                                                virtual disks that are
                                                written to a lot have,
                                                of course,<br>
                                                a lot of extents...<br>
                                                <br>
                                                For example, on our main
                                                web server disk image,
                                                there were 247,597<br>
                                                extents alone!  I took
                                                the web server down, and
                                                ran the XFS defrag<br>
                                                command on the disk...<br>
                                                <br>
                                                # xfs_fsr -v
                                                9a634692-1302-471f-a92e-c978b2<wbr>b67fd0<br>
9a634692-1302-471f-a92e-c978b2<wbr>b67fd0<br>
                                                extents before:247597
                                                after:429 DONE
                                                9a634692-1302-471f-a92e-c978b2<wbr>b67fd0<br>
                                                <br>
                                                247,597 before and 429
                                                after!  WOW!<br>
                                                <br>
                                                Are virtual disks a
                                                problem with XFS?  Why
                                                isn't this memory
                                                allocation<br>
                                                deadlock issue more
                                                prevalent.  I do see
                                                this article mentioned
                                                on many<br>
                                                web posts.  I don't
                                                specifically see any
                                                recommendation to *not*
                                                use<br>
                                                XFS for the data domain
                                                though.<br>
                                                <br>
                                                I was running CentOS 7.3
                                                on the file server, but
                                                before rebooting the
                                                server,<br>
                                                I upgraded to the latest
                                                kernel and CentOS 7.4 in
                                                the hopes that if there<br>
                                                was a kernel issue, that
                                                this would solve it.<br>
                                                <br>
                                                I took a few virtual
                                                systems down, and ran
                                                the defrag on the
                                                disks.  However,<br>
                                                with over 30 virtual
                                                systems, I don't really
                                                want to do this
                                                individually.<br>
                                                I was wondering if I
                                                could run xfs_fsr on all
                                                the disks LIVE?  It says
                                                in the<br>
                                                manual that you can run
                                                it live, but I can't see
                                                how this would be good
                                                when<br>
                                                a system is using that
                                                disk, and I don't want
                                                to deal with major<br>
                                                corruption across the
                                                board. Any thoughts?<br>
                                                <br>
                                                Thanks,<br>
                                                <br>
                                                Jason.<br>
                                                 <br>
______________________________<wbr>_________________<br>
                                                Users mailing list<br>
                                                <a
                                                  href="mailto:Users@ovirt.org"
                                                  target="_blank"
                                                  moz-do-not-send="true">Users@ovirt.org</a><br>
                                                <a
                                                  href="http://lists.ovirt.org/mailman/listinfo/users"
                                                  rel="noreferrer"
                                                  target="_blank"
                                                  moz-do-not-send="true">http://lists.ovirt.org/mailman<wbr>/listinfo/users</a><br>
                                              </blockquote>
                                            </div>
                                            <br>
                                          </div>
                                        </div>
                                      </blockquote>
                                      <br>
                                    </div>
                                  </div>
                                </div>
                              </blockquote>
                            </div>
                            <br>
                          </div>
                        </div>
                      </blockquote>
                      <br>
                    </div>
                  </div>
                </div>
              </blockquote>
            </div>
            <br>
          </div>
        </div>
      </blockquote>
      <br>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
Users mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Users@ovirt.org">Users@ovirt.org</a>
<a class="moz-txt-link-freetext" href="http://lists.ovirt.org/mailman/listinfo/users">http://lists.ovirt.org/mailman/listinfo/users</a>
</pre>
    </blockquote>
    <br>
  </body>
</html>