[ovirt-users] xfs fragmentation problem caused data domain to hang

Mon Oct 2 15:05:51 UTC 2017

On 10/02/2017 11:00 AM, Yaniv Kaul wrote:
>
>
> On Mon, Oct 2, 2017 at 5:57 PM, Jason Keltz <jas at cse.yorku.ca 
> <mailto:jas at cse.yorku.ca>> wrote:
>
>
>     On 10/02/2017 10:51 AM, Yaniv Kaul wrote:
>>
>>
>>     On Mon, Oct 2, 2017 at 5:14 PM, Jason Keltz <jas at cse.yorku.ca
>>     <mailto:jas at cse.yorku.ca>> wrote:
>>
>>
>>         On 10/02/2017 01:22 AM, Yaniv Kaul wrote:
>>>
>>>
>>>         On Mon, Oct 2, 2017 at 5:11 AM, Jason Keltz
>>>         <jas at cse.yorku.ca <mailto:jas at cse.yorku.ca>> wrote:
>>>
>>>             Hi.
>>>
>>>             For my data domain, I have one NFS server with a large
>>>             RAID filesystem (9 TB).
>>>             I'm only using 2 TB of that at the moment. Today, my NFS
>>>             server  hung with
>>>             the following error:
>>>
>>>                 xfs: possible memory allocation deadlock in kmem_alloc
>>>
>>>
>>>         Can you share more of the log so we'll see what happened
>>>         before and after?
>>>         Y.
>>>
>>>
>>>             Here is engine-log from yesterday.. the problem started
>>>             around 14:29 PM.
>>>             http://www.eecs.yorku.ca/~jas/ovirt-debug/10012017/engine-log.txt
>>>             <http://www.eecs.yorku.ca/%7Ejas/ovirt-debug/10012017/engine-log.txt>
>>>
>>>             Here is the vdsm log on one of the virtualization hosts,
>>>             virt01:
>>>             http://www.eecs.yorku.ca/~jas/ovirt-debug/10012017/vdsm.log.2
>>>             <http://www.eecs.yorku.ca/%7Ejas/ovirt-debug/10012017/vdsm.log.2>
>>>
>>>             Doing further investigation, I found that the XFS error
>>>             messages didn't start yesterday.  You'll see they
>>>             started at the very end of the day on September 23. See:
>>>
>>>             http://www.eecs.yorku.ca/~jas/ovirt-debug/messages-20170924
>>>             <http://www.eecs.yorku.ca/%7Ejas/ovirt-debug/messages-20170924>
>>>
>>>
>>>
>>>         Our storage guys do NOT think it's an XFS fragmentation
>>>         issue, but we'll be looking at it.
>>         Hmmm... almost sorry to hear that because that would be easy
>>         to "fix"...
>>
>>>
>>>             They continued on the 24th, then on the 26th... I think
>>>             there were a few "hangs" on those times that people were
>>>             complaining about, but we didn't catch the problem.
>>>             However, the errors hit big time yesterday at 14:27
>>>             PM... see here:
>>>
>>>             http://www.eecs.yorku.ca/~jas/ovirt-debug/messages-20171001
>>>             <http://www.eecs.yorku.ca/%7Ejas/ovirt-debug/messages-20171001>
>>>
>>>             If you want any other logs, I'm happy to provide them. I
>>>             just don't know exactly what to provide.
>>>
>>>             Do you know if I can run the XFS defrag command live?
>>>             Rather than on a disk by disk, I'd rather just do it on
>>>             the whole filesystem. There really aren't that many
>>>             files since it's just ovirt disk images.  However, I
>>>             don't understand the implications to running VMs.  I
>>>             wouldn't want to do anything to create more downtime.
>>>
>>>
>>>         Should be enough to copy the disks to make them less fragmented.
>>         Yes, but this requires downtime.. but there's plenty of
>>         additional storage, so this would fix things well.
>>
>
> Live storage migration could be used.
> Y.
>
>
>
>>
>>         I had upgraded the engine server + 4 virtualization hosts
>>         from 4.1.1 to current on September 20 along with upgrading
>>         them from CentOS 7.3 to CentOS 7.4.  virtfs, the NFS file
>>         server, was running CentOS 7.3 and kernel
>>         vmlinuz-3.10.0-514.16.1.el7.x86_64. Only yesterday, did I
>>         upgrade it to CentOS 7.4 and hence kernel
>>         vmlinuz-3.10.0-693.2.2.el7.x86_64.
>>
>>         I believe the problem is fully XFS related, and not ovirt at
>>         all.   Although, I must admit, ovirt didn't help either. When
>>         I rebooted the file server, the iso and export domains were
>>         immediately active, but the data domain took quite a long
>>         time.  I kept trying to activate it, and it couldn't do it. 
>>         I couldn't make a host an SPM.  I found that the data domain
>>         directory on the virtualization host was a "stale NFS file
>>         handle".  I rebooted one of the virtualization hosts (virt1),
>>         and tried to make it the SPM.  Again, it wouldn't work. 
>>         Finally, I ended up turning everything into maintenance mode,
>>         then activating just it, and I was able to make it the SPM. 
>>         I was then able to bring everything up.  I would have
>>         expected ovirt to handle the problem a little more
>>         gracefully, and give me more information because I was
>>         sweating thinking I had to restore all the VMs!
>>
>>
>>     Stale NFS is on our todo list to handle. Quite challenging.
>     Thanks..
>
>>
>>         I didn't think when I chose XFS as the filesystem for my
>>         virtualization NFS server that I would have to defragment the
>>         filesystem manually.  This is like the old days of running
>>         Norton SpeedDisk to defrag my 386...
>>
>>
>>     We are still not convinced it's an issue - but we'll look into it
>>     (and perhaps ask for more stats and data).
>     Thanks!
>
>
>>     Y.
>>
>>
>>         Thanks for any help you can provide...
>>
>>         Jason.
>>
>>
>>>
>>>             All 4 virtualization hosts of course had problems since
>>>             there was no
>>>             longer any storage.
>>>
>>>             In the end, it seems like the problem is related to XFS
>>>             fragmentation...
>>>
>>>             I read this great blog here:
>>>
>>>             https://blog.codecentric.de/en/2017/04/xfs-possible-memory-allocation-deadlock-kmem_alloc/
>>>             <https://blog.codecentric.de/en/2017/04/xfs-possible-memory-allocation-deadlock-kmem_alloc/>
>>>
>>>             In short, I tried this:
>>>
>>>             # xfs_db -r -c "frag -f" /dev/sdb1
>>>             actual 4314253, ideal 43107, fragmentation factor 99.00%
>>>
>>>             Apparently the fragmentation factor doesn't mean much,
>>>             but the fact that
>>>             "actual" number of extents is considerably higher than
>>>             "ideal" extents seems that it
>>>             may be the problem.
>>>
>>>             I saw that many of my virtual disks that are written to
>>>             a lot have, of course,
>>>             a lot of extents...
>>>
>>>             For example, on our main web server disk image, there
>>>             were 247,597
>>>             extents alone!  I took the web server down, and ran the
>>>             XFS defrag
>>>             command on the disk...
>>>
>>>             # xfs_fsr -v 9a634692-1302-471f-a92e-c978b2b67fd0
>>>             9a634692-1302-471f-a92e-c978b2b67fd0
>>>             extents before:247597 after:429 DONE
>>>             9a634692-1302-471f-a92e-c978b2b67fd0
>>>
>>>             247,597 before and 429 after!  WOW!
>>>
>>>             Are virtual disks a problem with XFS?  Why isn't this
>>>             memory allocation
>>>             deadlock issue more prevalent.  I do see this article
>>>             mentioned on many
>>>             web posts.  I don't specifically see any recommendation
>>>             to *not* use
>>>             XFS for the data domain though.
>>>
>>>             I was running CentOS 7.3 on the file server, but before
>>>             rebooting the server,
>>>             I upgraded to the latest kernel and CentOS 7.4 in the
>>>             hopes that if there
>>>             was a kernel issue, that this would solve it.
>>>
>>>             I took a few virtual systems down, and ran the defrag on
>>>             the disks. However,
>>>             with over 30 virtual systems, I don't really want to do
>>>             this individually.
>>>             I was wondering if I could run xfs_fsr on all the disks
>>>             LIVE?  It says in the
>>>             manual that you can run it live, but I can't see how
>>>             this would be good when
>>>             a system is using that disk, and I don't want to deal
>>>             with major
>>>             corruption across the board. Any thoughts?
>>>
>>>             Thanks,
>>>
>>>             Jason.
>>>
>>>             _______________________________________________
>>>             Users mailing list
>>>             Users at ovirt.org <mailto:Users at ovirt.org>
>>>             http://lists.ovirt.org/mailman/listinfo/users
>>>             <http://lists.ovirt.org/mailman/listinfo/users>
>>>
>>>
>>
>>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20171002/9d49cce9/attachment.html>