<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <div class="moz-cite-prefix">On 10/02/2017 11:00 AM, Yaniv Kaul
      wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CAJgorsb2ctuEaTpNkzvixsDSjF-_ABH6JDMgw5X03WUgZgbo2A@mail.gmail.com">
      <div dir="ltr"><br>
        <div class="gmail_extra"><br>
          <div class="gmail_quote">On Mon, Oct 2, 2017 at 5:57 PM, Jason
            Keltz <span dir="ltr">&lt;<a href="mailto:jas@cse.yorku.ca"
                target="_blank" moz-do-not-send="true">jas@cse.yorku.ca</a>&gt;</span>
            wrote:<br>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div text="#000000" bgcolor="#FFFFFF"><span class=""> <br>
                  <div class="m_3456688468548054330moz-cite-prefix">On
                    10/02/2017 10:51 AM, Yaniv Kaul wrote:<br>
                  </div>
                  <blockquote type="cite">
                    <div dir="ltr"><br>
                      <div class="gmail_extra"><br>
                        <div class="gmail_quote">On Mon, Oct 2, 2017 at
                          5:14 PM, Jason Keltz <span dir="ltr">&lt;<a
                              href="mailto:jas@cse.yorku.ca"
                              target="_blank" moz-do-not-send="true">jas@cse.yorku.ca</a>&gt;</span>
                          wrote:<br>
                          <blockquote class="gmail_quote"
                            style="margin:0 0 0 .8ex;border-left:1px
                            #ccc solid;padding-left:1ex">
                            <div text="#000000" bgcolor="#FFFFFF"><span>
                                <br>
                                <div
                                  class="m_3456688468548054330m_-6564063642909371047moz-cite-prefix">On
                                  10/02/2017 01:22 AM, Yaniv Kaul wrote:<br>
                                </div>
                                <blockquote type="cite">
                                  <div dir="ltr"><br>
                                    <div class="gmail_extra"><br>
                                      <div class="gmail_quote">On Mon,
                                        Oct 2, 2017 at 5:11 AM, Jason
                                        Keltz <span dir="ltr">&lt;<a
                                            href="mailto:jas@cse.yorku.ca"
                                            target="_blank"
                                            moz-do-not-send="true">jas@cse.yorku.ca</a>&gt;</span>
                                        wrote:<br>
                                        <blockquote class="gmail_quote"
                                          style="margin:0 0 0
                                          .8ex;border-left:1px #ccc
                                          solid;padding-left:1ex">Hi.<br>
                                          <br>
                                          For my data domain, I have one
                                          NFS server with a large RAID
                                          filesystem (9 TB).<br>
                                          I'm only using 2 TB of that at
                                          the moment. Today, my NFS
                                          server  hung with<br>
                                          the following error:<br>
                                          <br>
                                          <blockquote
                                            class="gmail_quote"
                                            style="margin:0 0 0
                                            .8ex;border-left:1px #ccc
                                            solid;padding-left:1ex">
                                            xfs: possible memory
                                            allocation deadlock in
                                            kmem_alloc<br>
                                          </blockquote>
                                        </blockquote>
                                        <div><br>
                                        </div>
                                        <div>Can you share more of the
                                          log so we'll see what happened
                                          before and after?</div>
                                        <div>Y.</div>
                                      </div>
                                    </div>
                                  </div>
                                </blockquote>
                              </span><span class="">
                                <blockquote type="cite">
                                  <div dir="ltr">
                                    <div class="gmail_extra">
                                      <div class="gmail_quote">
                                        <blockquote class="gmail_quote"
                                          style="margin:0 0 0
                                          .8ex;border-left:1px #ccc
                                          solid;padding-left:1ex">
                                          <div text="#000000"
                                            bgcolor="#FFFFFF"> <br>
                                            Here is engine-log from
                                            yesterday.. the problem
                                            started around 14:29 PM.<br>
                                            <a
                                              class="m_3456688468548054330m_-6564063642909371047moz-txt-link-freetext"
href="http://www.eecs.yorku.ca/%7Ejas/ovirt-debug/10012017/engine-log.txt"
                                              target="_blank"
                                              moz-do-not-send="true">http://www.eecs.yorku.ca/~jas/<wbr>ovirt-debug/10012017/engine-lo<wbr>g.txt</a><br>
                                            <br>
                                            Here is the vdsm log on one
                                            of the virtualization hosts,
                                            virt01:<br>
                                            <a
                                              class="m_3456688468548054330m_-6564063642909371047moz-txt-link-freetext"
href="http://www.eecs.yorku.ca/%7Ejas/ovirt-debug/10012017/vdsm.log.2"
                                              target="_blank"
                                              moz-do-not-send="true">http://www.eecs.yorku.ca/~jas/<wbr>ovirt-debug/10012017/vdsm.log.<wbr>2</a><br>
                                            <br>
                                            Doing further investigation,
                                            I found that the XFS error
                                            messages didn't start
                                            yesterday.  You'll see they
                                            started at the very end of
                                            the day on September 23. 
                                            See:<br>
                                            <br>
                                            <a
                                              class="m_3456688468548054330m_-6564063642909371047moz-txt-link-freetext"
href="http://www.eecs.yorku.ca/%7Ejas/ovirt-debug/messages-20170924"
                                              target="_blank"
                                              moz-do-not-send="true">http://www.eecs.yorku.ca/~jas/<wbr>ovirt-debug/messages-20170924</a>
                                            <br>
                                          </div>
                                        </blockquote>
                                        <div><br>
                                        </div>
                                        <div>Our storage guys do NOT
                                          think it's an XFS
                                          fragmentation issue, but we'll
                                          be looking at it.</div>
                                        <div> </div>
                                      </div>
                                    </div>
                                  </div>
                                </blockquote>
                              </span> Hmmm... almost sorry to hear that
                              because that would be easy to "fix"...  <br>
                              <span class=""> <br>
                                <blockquote type="cite">
                                  <div dir="ltr">
                                    <div class="gmail_extra">
                                      <div class="gmail_quote">
                                        <blockquote class="gmail_quote"
                                          style="margin:0 0 0
                                          .8ex;border-left:1px #ccc
                                          solid;padding-left:1ex">
                                          <div text="#000000"
                                            bgcolor="#FFFFFF"> <br>
                                            They continued on the 24th,
                                            then on the 26th... I think
                                            there were a few "hangs" on
                                            those times that people were
                                            complaining about, but we
                                            didn't catch the problem. 
                                            However, the errors hit big
                                            time yesterday at 14:27
                                            PM... see here:<br>
                                            <br>
                                            <a
                                              class="m_3456688468548054330m_-6564063642909371047moz-txt-link-freetext"
href="http://www.eecs.yorku.ca/%7Ejas/ovirt-debug/messages-20171001"
                                              target="_blank"
                                              moz-do-not-send="true">http://www.eecs.yorku.ca/~jas/<wbr>ovirt-debug/messages-20171001</a><br>
                                            <br>
                                            If you want any other logs,
                                            I'm happy to provide them. 
                                            I just don't know exactly
                                            what to provide.<br>
                                            <br>
                                            Do you know if I can run the
                                            XFS defrag command live?
                                            Rather than on a disk by
                                            disk, I'd rather just do it
                                            on the whole filesystem. 
                                            There really aren't that
                                            many files since it's just
                                            ovirt disk images.  However,
                                            I don't understand the
                                            implications to running
                                            VMs.  I wouldn't want to do
                                            anything to create more
                                            downtime.<br>
                                          </div>
                                        </blockquote>
                                        <div><br>
                                        </div>
                                        <div>Should be enough to copy
                                          the disks to make them less
                                          fragmented.</div>
                                        <div> </div>
                                      </div>
                                    </div>
                                  </div>
                                </blockquote>
                              </span> Yes, but this requires downtime..
                              but there's plenty of additional storage,
                              so this would fix things well.</div>
                          </blockquote>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                </span></div>
            </blockquote>
            <div><br>
            </div>
            <div>Live storage migration could be used.</div>
            <div>Y.</div>
            <div> </div>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div text="#000000" bgcolor="#FFFFFF"><span class=""><br>
                  <br>
                  <blockquote type="cite">
                    <div dir="ltr">
                      <div class="gmail_extra">
                        <div class="gmail_quote">
                          <blockquote class="gmail_quote"
                            style="margin:0 0 0 .8ex;border-left:1px
                            #ccc solid;padding-left:1ex">
                            <div text="#000000" bgcolor="#FFFFFF"> <br>
                              I had upgraded the engine server + 4
                              virtualization hosts from 4.1.1 to current
                              on September 20 along with upgrading them
                              from CentOS 7.3 to CentOS 7.4.  virtfs,
                              the NFS file server, was running CentOS
                              7.3 and kernel 
                              vmlinuz-3.10.0-514.16.1.el7.x8<wbr>6_64. 
                              Only yesterday, did I upgrade it to CentOS
                              7.4 and hence kernel
                              vmlinuz-3.10.0-693.2.2.el7.x86<wbr>_64.<br>
                              <br>
                              I believe the problem is fully XFS
                              related, and not ovirt at all.   Although,
                              I must admit, ovirt didn't help either. 
                              When I rebooted the file server, the iso
                              and export domains were immediately
                              active, but the data domain took quite a
                              long time.  I kept trying to activate it,
                              and it couldn't do it.  I couldn't make a
                              host an SPM.  I found that the data domain
                              directory on the virtualization host was a
                              "stale NFS file handle".  I rebooted one
                              of the virtualization hosts (virt1), and
                              tried to make it the SPM.  Again, it
                              wouldn't work.  Finally, I ended up
                              turning everything into maintenance mode,
                              then activating just it, and I was able to
                              make it the SPM.  I was then able to bring
                              everything up.  I would have expected
                              ovirt to handle the problem a little more
                              gracefully, and give me more information
                              because I was sweating thinking I had to
                              restore all the VMs!<br>
                            </div>
                          </blockquote>
                          <div><br>
                          </div>
                          <div>Stale NFS is on our todo list to handle.
                            Quite challenging.</div>
                          <div> </div>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                </span> Thanks..<span class=""><br>
                  <br>
                  <blockquote type="cite">
                    <div dir="ltr">
                      <div class="gmail_extra">
                        <div class="gmail_quote">
                          <blockquote class="gmail_quote"
                            style="margin:0 0 0 .8ex;border-left:1px
                            #ccc solid;padding-left:1ex">
                            <div text="#000000" bgcolor="#FFFFFF"> <br>
                              I didn't think when I chose XFS as the
                              filesystem for my virtualization NFS
                              server that I would have to defragment the
                              filesystem manually.  This is like the old
                              days of running Norton SpeedDisk to defrag
                              my 386...<br>
                            </div>
                          </blockquote>
                          <div><br>
                          </div>
                          <div>We are still not convinced it's an issue
                            - but we'll look into it (and perhaps ask
                            for more stats and data).</div>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                </span> Thanks!
                <div>
                  <div class="h5"><br>
                    <br>
                    <blockquote type="cite">
                      <div dir="ltr">
                        <div class="gmail_extra">
                          <div class="gmail_quote">
                            <div>Y.</div>
                            <div> </div>
                            <blockquote class="gmail_quote"
                              style="margin:0 0 0 .8ex;border-left:1px
                              #ccc solid;padding-left:1ex">
                              <div text="#000000" bgcolor="#FFFFFF"> <br>
                                Thanks for any help you can provide...<span
                                  class="m_3456688468548054330HOEnZb"><font
                                    color="#888888"><br>
                                    <br>
                                    Jason.</font></span>
                                <div>
                                  <div class="m_3456688468548054330h5"><br>
                                    <br>
                                    <blockquote type="cite">
                                      <div dir="ltr">
                                        <div class="gmail_extra">
                                          <div class="gmail_quote">
                                            <div> </div>
                                            <blockquote
                                              class="gmail_quote"
                                              style="margin:0 0 0
                                              .8ex;border-left:1px #ccc
                                              solid;padding-left:1ex">
                                              <blockquote
                                                class="gmail_quote"
                                                style="margin:0 0 0
                                                .8ex;border-left:1px
                                                #ccc
                                                solid;padding-left:1ex">
                                              </blockquote>
                                              <br>
                                              All 4 virtualization hosts
                                              of course had problems
                                              since there was no<br>
                                              longer any storage.<br>
                                              <br>
                                              In the end, it seems like
                                              the problem is related to
                                              XFS fragmentation...<br>
                                              <br>
                                              I read this great blog
                                              here:<br>
                                              <br>
                                              <a
href="https://blog.codecentric.de/en/2017/04/xfs-possible-memory-allocation-deadlock-kmem_alloc/"
                                                rel="noreferrer"
                                                target="_blank"
                                                moz-do-not-send="true">https://blog.codecentric.de/en<wbr>/2017/04/xfs-possible-memory-a<wbr>llocation-deadlock-kmem_alloc/</a><br>
                                              <br>
                                              In short, I tried this:<br>
                                              <br>
                                              # xfs_db -r -c "frag -f"
                                              /dev/sdb1<br>
                                              actual 4314253, ideal
                                              43107, fragmentation
                                              factor 99.00%<br>
                                              <br>
                                              Apparently the
                                              fragmentation factor
                                              doesn't mean much, but the
                                              fact that<br>
                                              "actual" number of extents
                                              is considerably higher
                                              than "ideal" extents seems
                                              that it<br>
                                              may be the problem.<br>
                                              <br>
                                              I saw that many of my
                                              virtual disks that are
                                              written to a lot have, of
                                              course,<br>
                                              a lot of extents...<br>
                                              <br>
                                              For example, on our main
                                              web server disk image,
                                              there were 247,597<br>
                                              extents alone!  I took the
                                              web server down, and ran
                                              the XFS defrag<br>
                                              command on the disk...<br>
                                              <br>
                                              # xfs_fsr -v
                                              9a634692-1302-471f-a92e-c978b2<wbr>b67fd0<br>
9a634692-1302-471f-a92e-c978b2<wbr>b67fd0<br>
                                              extents before:247597
                                              after:429 DONE
                                              9a634692-1302-471f-a92e-c978b2<wbr>b67fd0<br>
                                              <br>
                                              247,597 before and 429
                                              after!  WOW!<br>
                                              <br>
                                              Are virtual disks a
                                              problem with XFS?  Why
                                              isn't this memory
                                              allocation<br>
                                              deadlock issue more
                                              prevalent.  I do see this
                                              article mentioned on many<br>
                                              web posts.  I don't
                                              specifically see any
                                              recommendation to *not*
                                              use<br>
                                              XFS for the data domain
                                              though.<br>
                                              <br>
                                              I was running CentOS 7.3
                                              on the file server, but
                                              before rebooting the
                                              server,<br>
                                              I upgraded to the latest
                                              kernel and CentOS 7.4 in
                                              the hopes that if there<br>
                                              was a kernel issue, that
                                              this would solve it.<br>
                                              <br>
                                              I took a few virtual
                                              systems down, and ran the
                                              defrag on the disks. 
                                              However,<br>
                                              with over 30 virtual
                                              systems, I don't really
                                              want to do this
                                              individually.<br>
                                              I was wondering if I could
                                              run xfs_fsr on all the
                                              disks LIVE?  It says in
                                              the<br>
                                              manual that you can run it
                                              live, but I can't see how
                                              this would be good when<br>
                                              a system is using that
                                              disk, and I don't want to
                                              deal with major<br>
                                              corruption across the
                                              board. Any thoughts?<br>
                                              <br>
                                              Thanks,<br>
                                              <br>
                                              Jason.<br>
                                               <br>
______________________________<wbr>_________________<br>
                                              Users mailing list<br>
                                              <a
                                                href="mailto:Users@ovirt.org"
                                                target="_blank"
                                                moz-do-not-send="true">Users@ovirt.org</a><br>
                                              <a
                                                href="http://lists.ovirt.org/mailman/listinfo/users"
                                                rel="noreferrer"
                                                target="_blank"
                                                moz-do-not-send="true">http://lists.ovirt.org/mailman<wbr>/listinfo/users</a><br>
                                            </blockquote>
                                          </div>
                                          <br>
                                        </div>
                                      </div>
                                    </blockquote>
                                    <br>
                                  </div>
                                </div>
                              </div>
                            </blockquote>
                          </div>
                          <br>
                        </div>
                      </div>
                    </blockquote>
                    <br>
                  </div>
                </div>
              </div>
            </blockquote>
          </div>
          <br>
        </div>
      </div>
    </blockquote>
    <br>
  </body>
</html>