<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <br>

    <div class="moz-cite-prefix">On 10/02/2017 11:05 AM, Jason Keltz

      wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:b3e2ab23-7bab-ddc2-e230-a650f87a1773@cse.yorku.ca">

      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

      <div class="moz-cite-prefix">On 10/02/2017 11:00 AM, Yaniv Kaul

        wrote:<br>

      </div>

      <blockquote type="cite"

cite="mid:CAJgorsb2ctuEaTpNkzvixsDSjF-_ABH6JDMgw5X03WUgZgbo2A@mail.gmail.com">

        <div dir="ltr"><br>

          <div class="gmail_extra"><br>

            <div class="gmail_quote">On Mon, Oct 2, 2017 at 5:57 PM,

              Jason Keltz <span dir="ltr">&lt;<a

                  href="mailto:jas@cse.yorku.ca" target="_blank"

                  moz-do-not-send="true">jas@cse.yorku.ca</a>&gt;</span>

              wrote:<br>

              <blockquote class="gmail_quote" style="margin:0 0 0

                .8ex;border-left:1px #ccc solid;padding-left:1ex">

                <div text="#000000" bgcolor="#FFFFFF"><span class=""> <br>

                    <div class="m_3456688468548054330moz-cite-prefix">On

                      10/02/2017 10:51 AM, Yaniv Kaul wrote:<br>

                    </div>

                    <blockquote type="cite">

                      <div dir="ltr"><br>

                        <div class="gmail_extra"><br>

                          <div class="gmail_quote">On Mon, Oct 2, 2017

                            at 5:14 PM, Jason Keltz <span dir="ltr">&lt;<a

                                href="mailto:jas@cse.yorku.ca"

                                target="_blank" moz-do-not-send="true">jas@cse.yorku.ca</a>&gt;</span>

                            wrote:<br>

                            <blockquote class="gmail_quote"

                              style="margin:0 0 0 .8ex;border-left:1px

                              #ccc solid;padding-left:1ex">

                              <div text="#000000" bgcolor="#FFFFFF"><span>

                                  <br>

                                  <div

                                    class="m_3456688468548054330m_-6564063642909371047moz-cite-prefix">On

                                    10/02/2017 01:22 AM, Yaniv Kaul

                                    wrote:<br>

                                  </div>

                                  <blockquote type="cite">

                                    <div dir="ltr"><br>

                                      <div class="gmail_extra"><br>

                                        <div class="gmail_quote">On Mon,

                                          Oct 2, 2017 at 5:11 AM, Jason

                                          Keltz <span dir="ltr">&lt;<a

href="mailto:jas@cse.yorku.ca" target="_blank" moz-do-not-send="true">jas@cse.yorku.ca</a>&gt;</span>

                                          wrote:<br>

                                          <blockquote

                                            class="gmail_quote"

                                            style="margin:0 0 0

                                            .8ex;border-left:1px #ccc

                                            solid;padding-left:1ex">Hi.<br>

                                            <br>

                                            For my data domain, I have

                                            one NFS server with a large

                                            RAID filesystem (9 TB).<br>

                                            I'm only using 2 TB of that

                                            at the moment. Today, my NFS

                                            server  hung with<br>

                                            the following error:<br>

                                            <br>

                                            <blockquote

                                              class="gmail_quote"

                                              style="margin:0 0 0

                                              .8ex;border-left:1px #ccc

                                              solid;padding-left:1ex">

                                              xfs: possible memory

                                              allocation deadlock in

                                              kmem_alloc<br>

                                            </blockquote>

                                          </blockquote>

                                          <div><br>

                                          </div>

                                          <div>Can you share more of the

                                            log so we'll see what

                                            happened before and after?</div>

                                          <div>Y.</div>

                                        </div>

                                      </div>

                                    </div>

                                  </blockquote>

                                </span><span class="">

                                  <blockquote type="cite">

                                    <div dir="ltr">

                                      <div class="gmail_extra">

                                        <div class="gmail_quote">

                                          <blockquote

                                            class="gmail_quote"

                                            style="margin:0 0 0

                                            .8ex;border-left:1px #ccc

                                            solid;padding-left:1ex">

                                            <div text="#000000"

                                              bgcolor="#FFFFFF"> <br>

                                              Here is engine-log from

                                              yesterday.. the problem

                                              started around 14:29 PM.<br>

                                              <a

                                                class="m_3456688468548054330m_-6564063642909371047moz-txt-link-freetext"

href="http://www.eecs.yorku.ca/%7Ejas/ovirt-debug/10012017/engine-log.txt"

                                                target="_blank"

                                                moz-do-not-send="true">http://www.eecs.yorku.ca/~jas/<wbr>ovirt-debug/10012017/engine-lo<wbr>g.txt</a><br>

                                              <br>

                                              Here is the vdsm log on

                                              one of the virtualization

                                              hosts, virt01:<br>

                                              <a

                                                class="m_3456688468548054330m_-6564063642909371047moz-txt-link-freetext"

href="http://www.eecs.yorku.ca/%7Ejas/ovirt-debug/10012017/vdsm.log.2"

                                                target="_blank"

                                                moz-do-not-send="true">http://www.eecs.yorku.ca/~jas/<wbr>ovirt-debug/10012017/vdsm.log.<wbr>2</a><br>

                                              <br>

                                              Doing further

                                              investigation, I found

                                              that the XFS error

                                              messages didn't start

                                              yesterday.  You'll see

                                              they started at the very

                                              end of the day on

                                              September 23.  See:<br>

                                              <br>

                                              <a

                                                class="m_3456688468548054330m_-6564063642909371047moz-txt-link-freetext"

href="http://www.eecs.yorku.ca/%7Ejas/ovirt-debug/messages-20170924"

                                                target="_blank"

                                                moz-do-not-send="true">http://www.eecs.yorku.ca/~jas/<wbr>ovirt-debug/messages-20170924</a>

                                              <br>

                                            </div>

                                          </blockquote>

                                          <div><br>

                                          </div>

                                          <div>Our storage guys do NOT

                                            think it's an XFS

                                            fragmentation issue, but

                                            we'll be looking at it.</div>

                                          <div> </div>

                                        </div>

                                      </div>

                                    </div>

                                  </blockquote>

                                </span></div>

                            </blockquote>

                          </div>

                        </div>

                      </div>

                    </blockquote>

                  </span></div>

              </blockquote>

            </div>

          </div>

        </div>

      </blockquote>

    </blockquote>

    This is an interesting thread to read because the problem sounds

    quite similar:<br>

    <br>

    <a class="moz-txt-link-freetext" href="http://oss.sgi.com/archives/xfs/2016-03/msg00447.html">http://oss.sgi.com/archives/xfs/2016-03/msg00447.html</a><br>

    <br>

    In particular, quoted from that:<br>

    <blockquote type="cite">

      <pre>XFS maintains the full extent list for an active inode in memory,</pre>

    </blockquote>

    <blockquote type="cite">

      <pre>As it is, yes, the memory allocation problem is with the in-core

extent tree, and we've known about it for some time. The issue is

that as memory gets fragmented, the top level indirection array

grows too large to be allocated as a contiguous chunk. When this

happens really depends on memory load, uptime and the way the extent

tree is being modified.

</pre>

    </blockquote>

    <br>

    So in my case, I have a bunch of big XFS disk images for virtual

    disks.  As the files are big with many extents, keeping all that

    information in memory at the same time may be the culprit.   Having

    many extents per se isn't the problem, but having enough memory to

    be able to store all the information simultaneously may be. 

    Possible solutions would be to increase the default extent size of

    the volume (which I'm not sure how to do), defragment the disk, and

    hence less extents, or potentially add more memory to the file

    server.  It has 64G.  <br>

    <br>

    <blockquote type="cite"

      cite="mid:b3e2ab23-7bab-ddc2-e230-a650f87a1773@cse.yorku.ca">

      <blockquote type="cite"

cite="mid:CAJgorsb2ctuEaTpNkzvixsDSjF-_ABH6JDMgw5X03WUgZgbo2A@mail.gmail.com">

        <div dir="ltr">

          <div class="gmail_extra">

            <div class="gmail_quote">

              <blockquote class="gmail_quote" style="margin:0 0 0

                .8ex;border-left:1px #ccc solid;padding-left:1ex">

                <div text="#000000" bgcolor="#FFFFFF"><span class="">

                    <blockquote type="cite">

                      <div dir="ltr">

                        <div class="gmail_extra">

                          <div class="gmail_quote">

                            <blockquote class="gmail_quote"

                              style="margin:0 0 0 .8ex;border-left:1px

                              #ccc solid;padding-left:1ex">

                              <div text="#000000" bgcolor="#FFFFFF"><span

                                  class="">

                                  <blockquote type="cite">

                                    <div dir="ltr">

                                      <div class="gmail_extra">

                                        <div class="gmail_quote"> </div>

                                      </div>

                                    </div>

                                  </blockquote>

                                </span> Hmmm... almost sorry to hear

                                that because that would be easy to

                                "fix"...  <br>

                                <span class=""> <br>

                                  <blockquote type="cite">

                                    <div dir="ltr">

                                      <div class="gmail_extra">

                                        <div class="gmail_quote">

                                          <blockquote

                                            class="gmail_quote"

                                            style="margin:0 0 0

                                            .8ex;border-left:1px #ccc

                                            solid;padding-left:1ex">

                                            <div text="#000000"

                                              bgcolor="#FFFFFF"> <br>

                                              They continued on the

                                              24th, then on the 26th...

                                              I think there were a few

                                              "hangs" on those times

                                              that people were

                                              complaining about, but we

                                              didn't catch the problem. 

                                              However, the errors hit

                                              big time yesterday at

                                              14:27 PM... see here:<br>

                                              <br>

                                              <a

                                                class="m_3456688468548054330m_-6564063642909371047moz-txt-link-freetext"

href="http://www.eecs.yorku.ca/%7Ejas/ovirt-debug/messages-20171001"

                                                target="_blank"

                                                moz-do-not-send="true">http://www.eecs.yorku.ca/~jas/<wbr>ovirt-debug/messages-20171001</a><br>

                                              <br>

                                              If you want any other

                                              logs, I'm happy to provide

                                              them.  I just don't know

                                              exactly what to provide.<br>

                                              <br>

                                              Do you know if I can run

                                              the XFS defrag command

                                              live? Rather than on a

                                              disk by disk, I'd rather

                                              just do it on the whole

                                              filesystem.  There really

                                              aren't that many files

                                              since it's just ovirt disk

                                              images.  However, I don't

                                              understand the

                                              implications to running

                                              VMs.  I wouldn't want to

                                              do anything to create more

                                              downtime.<br>

                                            </div>

                                          </blockquote>

                                          <div><br>

                                          </div>

                                          <div>Should be enough to copy

                                            the disks to make them less

                                            fragmented.</div>

                                          <div> </div>

                                        </div>

                                      </div>

                                    </div>

                                  </blockquote>

                                </span> Yes, but this requires

                                downtime.. but there's plenty of

                                additional storage, so this would fix

                                things well.</div>

                            </blockquote>

                          </div>

                        </div>

                      </div>

                    </blockquote>

                  </span></div>

              </blockquote>

              <div><br>

              </div>

              <div>Live storage migration could be used.</div>

              <div>Y.</div>

              <div> </div>

              <blockquote class="gmail_quote" style="margin:0 0 0

                .8ex;border-left:1px #ccc solid;padding-left:1ex">

                <div text="#000000" bgcolor="#FFFFFF"><span class=""><br>

                    <br>

                    <blockquote type="cite">

                      <div dir="ltr">

                        <div class="gmail_extra">

                          <div class="gmail_quote">

                            <blockquote class="gmail_quote"

                              style="margin:0 0 0 .8ex;border-left:1px

                              #ccc solid;padding-left:1ex">

                              <div text="#000000" bgcolor="#FFFFFF"> <br>

                                I had upgraded the engine server + 4

                                virtualization hosts from 4.1.1 to

                                current on September 20 along with

                                upgrading them from CentOS 7.3 to CentOS

                                7.4.  virtfs, the NFS file server, was

                                running CentOS 7.3 and kernel 

                                vmlinuz-3.10.0-514.16.1.el7.x8<wbr>6_64. 

                                Only yesterday, did I upgrade it to

                                CentOS 7.4 and hence kernel

                                vmlinuz-3.10.0-693.2.2.el7.x86<wbr>_64.<br>

                                <br>

                                I believe the problem is fully XFS

                                related, and not ovirt at all.  

                                Although, I must admit, ovirt didn't

                                help either.  When I rebooted the file

                                server, the iso and export domains were

                                immediately active, but the data domain

                                took quite a long time.  I kept trying

                                to activate it, and it couldn't do it. 

                                I couldn't make a host an SPM.  I found

                                that the data domain directory on the

                                virtualization host was a "stale NFS

                                file handle".  I rebooted one of the

                                virtualization hosts (virt1), and tried

                                to make it the SPM.  Again, it wouldn't

                                work.  Finally, I ended up turning

                                everything into maintenance mode, then

                                activating just it, and I was able to

                                make it the SPM.  I was then able to

                                bring everything up.  I would have

                                expected ovirt to handle the problem a

                                little more gracefully, and give me more

                                information because I was sweating

                                thinking I had to restore all the VMs!<br>

                              </div>

                            </blockquote>

                            <div><br>

                            </div>

                            <div>Stale NFS is on our todo list to

                              handle. Quite challenging.</div>

                            <div> </div>

                          </div>

                        </div>

                      </div>

                    </blockquote>

                  </span> Thanks..<span class=""><br>

                    <br>

                    <blockquote type="cite">

                      <div dir="ltr">

                        <div class="gmail_extra">

                          <div class="gmail_quote">

                            <blockquote class="gmail_quote"

                              style="margin:0 0 0 .8ex;border-left:1px

                              #ccc solid;padding-left:1ex">

                              <div text="#000000" bgcolor="#FFFFFF"> <br>

                                I didn't think when I chose XFS as the

                                filesystem for my virtualization NFS

                                server that I would have to defragment

                                the filesystem manually.  This is like

                                the old days of running Norton SpeedDisk

                                to defrag my 386...<br>

                              </div>

                            </blockquote>

                            <div><br>

                            </div>

                            <div>We are still not convinced it's an

                              issue - but we'll look into it (and

                              perhaps ask for more stats and data).</div>

                          </div>

                        </div>

                      </div>

                    </blockquote>

                  </span> Thanks!

                  <div>

                    <div class="h5"><br>

                      <br>

                      <blockquote type="cite">

                        <div dir="ltr">

                          <div class="gmail_extra">

                            <div class="gmail_quote">

                              <div>Y.</div>

                              <div> </div>

                              <blockquote class="gmail_quote"

                                style="margin:0 0 0 .8ex;border-left:1px

                                #ccc solid;padding-left:1ex">

                                <div text="#000000" bgcolor="#FFFFFF"> <br>

                                  Thanks for any help you can provide...<span

                                    class="m_3456688468548054330HOEnZb"><font

                                      color="#888888"><br>

                                      <br>

                                      Jason.</font></span>

                                  <div>

                                    <div class="m_3456688468548054330h5"><br>

                                      <br>

                                      <blockquote type="cite">

                                        <div dir="ltr">

                                          <div class="gmail_extra">

                                            <div class="gmail_quote">

                                              <div> </div>

                                              <blockquote

                                                class="gmail_quote"

                                                style="margin:0 0 0

                                                .8ex;border-left:1px

                                                #ccc

                                                solid;padding-left:1ex">

                                                <blockquote

                                                  class="gmail_quote"

                                                  style="margin:0 0 0

                                                  .8ex;border-left:1px

                                                  #ccc

                                                  solid;padding-left:1ex">

                                                </blockquote>

                                                <br>

                                                All 4 virtualization

                                                hosts of course had

                                                problems since there was

                                                no<br>

                                                longer any storage.<br>

                                                <br>

                                                In the end, it seems

                                                like the problem is

                                                related to XFS

                                                fragmentation...<br>

                                                <br>

                                                I read this great blog

                                                here:<br>

                                                <br>

                                                <a

href="https://blog.codecentric.de/en/2017/04/xfs-possible-memory-allocation-deadlock-kmem_alloc/"

                                                  rel="noreferrer"

                                                  target="_blank"

                                                  moz-do-not-send="true">https://blog.codecentric.de/en<wbr>/2017/04/xfs-possible-memory-a<wbr>llocation-deadlock-kmem_alloc/</a><br>

                                                <br>

                                                In short, I tried this:<br>

                                                <br>

                                                # xfs_db -r -c "frag -f"

                                                /dev/sdb1<br>

                                                actual 4314253, ideal

                                                43107, fragmentation

                                                factor 99.00%<br>

                                                <br>

                                                Apparently the

                                                fragmentation factor

                                                doesn't mean much, but

                                                the fact that<br>

                                                "actual" number of

                                                extents is considerably

                                                higher than "ideal"

                                                extents seems that it<br>

                                                may be the problem.<br>

                                                <br>

                                                I saw that many of my

                                                virtual disks that are

                                                written to a lot have,

                                                of course,<br>

                                                a lot of extents...<br>

                                                <br>

                                                For example, on our main

                                                web server disk image,

                                                there were 247,597<br>

                                                extents alone!  I took

                                                the web server down, and

                                                ran the XFS defrag<br>

                                                command on the disk...<br>

                                                <br>

                                                # xfs_fsr -v

                                                9a634692-1302-471f-a92e-c978b2<wbr>b67fd0<br>

9a634692-1302-471f-a92e-c978b2<wbr>b67fd0<br>

                                                extents before:247597

                                                after:429 DONE

                                                9a634692-1302-471f-a92e-c978b2<wbr>b67fd0<br>

                                                <br>

                                                247,597 before and 429

                                                after!  WOW!<br>

                                                <br>

                                                Are virtual disks a

                                                problem with XFS?  Why

                                                isn't this memory

                                                allocation<br>

                                                deadlock issue more

                                                prevalent.  I do see

                                                this article mentioned

                                                on many<br>

                                                web posts.  I don't

                                                specifically see any

                                                recommendation to *not*

                                                use<br>

                                                XFS for the data domain

                                                though.<br>

                                                <br>

                                                I was running CentOS 7.3

                                                on the file server, but

                                                before rebooting the

                                                server,<br>

                                                I upgraded to the latest

                                                kernel and CentOS 7.4 in

                                                the hopes that if there<br>

                                                was a kernel issue, that

                                                this would solve it.<br>

                                                <br>

                                                I took a few virtual

                                                systems down, and ran

                                                the defrag on the

                                                disks.  However,<br>

                                                with over 30 virtual

                                                systems, I don't really

                                                want to do this

                                                individually.<br>

                                                I was wondering if I

                                                could run xfs_fsr on all

                                                the disks LIVE?  It says

                                                in the<br>

                                                manual that you can run

                                                it live, but I can't see

                                                how this would be good

                                                when<br>

                                                a system is using that

                                                disk, and I don't want

                                                to deal with major<br>

                                                corruption across the

                                                board. Any thoughts?<br>

                                                <br>

                                                Thanks,<br>

                                                <br>

                                                Jason.<br>

                                                 <br>

______________________________<wbr>_________________<br>

                                                Users mailing list<br>

                                                <a

                                                  href="mailto:Users@ovirt.org"

                                                  target="_blank"

                                                  moz-do-not-send="true">Users@ovirt.org</a><br>

                                                <a

                                                  href="http://lists.ovirt.org/mailman/listinfo/users"

                                                  rel="noreferrer"

                                                  target="_blank"

                                                  moz-do-not-send="true">http://lists.ovirt.org/mailman<wbr>/listinfo/users</a><br>

                                              </blockquote>

                                            </div>

                                            <br>

                                          </div>

                                        </div>

                                      </blockquote>

                                      <br>

                                    </div>

                                  </div>

                                </div>

                              </blockquote>

                            </div>

                            <br>

                          </div>

                        </div>

                      </blockquote>

                      <br>

                    </div>

                  </div>

                </div>

              </blockquote>

            </div>

            <br>

          </div>

        </div>

      </blockquote>

      <br>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

Users mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Users@ovirt.org">Users@ovirt.org</a>

<a class="moz-txt-link-freetext" href="http://lists.ovirt.org/mailman/listinfo/users">http://lists.ovirt.org/mailman/listinfo/users</a>

</pre>

    </blockquote>

    <br>

  </body>

</html>