This host has NO VMs running on it, only 3 running cluster-wide (including
the engine, which is on its own storage):
top - 10:44:41 up 1 day, 17:10, 1 user, load average: 15.86, 14.33, 13.39
Tasks: 381 total, 1 running, 379 sleeping, 1 stopped, 0 zombie
%Cpu(s): 2.7 us, 2.1 sy, 0.0 ni, 89.0 id, 6.1 wa, 0.0 hi, 0.2 si,
0.0 st
KiB Mem : 32764284 total, 338232 free, 842324 used, 31583728 buff/cache
KiB Swap: 12582908 total, 12258660 free, 324248 used. 31076748 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND
13279 root 20 0 2380708 37628 4396 S 51.7 0.1 3768:03
glusterfsd
13273 root 20 0 2233212 20460 4380 S 17.2 0.1 105:50.44
glusterfsd
13287 root 20 0 2233212 20608 4340 S 4.3 0.1 34:27.20
glusterfsd
16205 vdsm 0 -20 5048672 88940 13364 S 1.3 0.3 0:32.69 vdsmd
16300 vdsm 20 0 608488 25096 5404 S 1.3 0.1 0:05.78 python
1109 vdsm 20 0 3127696 44228 8552 S 0.7 0.1 18:49.76
ovirt-ha-broker
25555 root 20 0 0 0 0 S 0.7 0.0 0:00.13
kworker/u64:3
10 root 20 0 0 0 0 S 0.3 0.0 4:22.36
rcu_sched
572 root 0 -20 0 0 0 S 0.3 0.0 0:12.02
kworker/1:1H
797 root 20 0 0 0 0 S 0.3 0.0 1:59.59
kdmwork-253:2
877 root 0 -20 0 0 0 S 0.3 0.0 0:11.34
kworker/3:1H
1028 root 20 0 0 0 0 S 0.3 0.0 0:35.35
xfsaild/dm-10
1869 root 20 0 1496472 10540 6564 S 0.3 0.0 2:15.46 python
3747 root 20 0 0 0 0 D 0.3 0.0 0:01.21
kworker/u64:1
10979 root 15 -5 723504 15644 3920 S 0.3 0.0 22:46.27
glusterfs
15085 root 20 0 680884 10792 4328 S 0.3 0.0 0:01.13
glusterd
16102 root 15 -5 1204216 44948 11160 S 0.3 0.1 0:18.61
supervdsmd
At the moment, the engine is barely usable, my other VMs appear to be
unresponsive. Two on one host, one on another, and none on the third.
On Sat, Jul 7, 2018 at 10:38 AM, Jim Kusznir <jim(a)palousetech.com> wrote:
I run 4-7 VMs, and most of them are 2GB ram. I have 2 VMs with 4GB.
Ram hasn't been an issue until recent ovirt/gluster upgrades. Storage has
always been slow, especially with these drives. However, even watching
network utilization on my switch, the gig-e links never max out.
The loadavg issues and unresponsive behavior started with yesterday's
ovirt updates. I now have one VM with low I/O that lives on a separate
storage volume (data, fully SSD backed instead of data-hdd, which was
having the issues). I moved it to a ovirt host with no other VMs on it,
and that had reshly been rebooted. Before it had this one VM on it,
loadavg was >0.5. Now its up in the 20's, with only one low Disk I/O, 4GB
ram VM on the host.
This to me says there's now a new problem separate from Gluster. I don't
have any non-gluster storage available to test with. I did notice that the
last update included a new kernel, and it appears its the qemu-kvm
processes that are consuming way more CPU than they used to now.
Are there any known issues? I'm going to reboot into my previous kernel
to see if its kernel-caused.
--Jim
On Fri, Jul 6, 2018 at 11:07 PM, Johan Bernhardsson <johan(a)kafit.se>
wrote:
> That is a single sata drive that is slow on random I/O and that has to be
> synced with 2 other servers. Gluster works syncronous so one write has to
> be written and acknowledged on all the three nodes.
>
> So you have a bottle neck in io on drives and one on network and
> depending on how many virtual servers you have and how much ram they take
> you might have memory.
>
> Load spikes when you have a wait somewhere and are overusing capacity.
> But it's now only CPU that load is counted on. It is waiting for resources
> so it can be memory or Network or drives.
>
> How many virtual server do you run and how much ram do they consume?
>
> On July 7, 2018 09:51:42 Jim Kusznir <jim(a)palousetech.com> wrote:
>
>> In case it matters, the data-hdd gluster volume uses these hard drives:
>>
>>
https://www.amazon.com/gp/product/B01M1NHCZT/ref=oh_aui_deta
>> ilpage_o05_s00?ie=UTF8&psc=1
>>
>> This is in a Dell R610 with PERC6/i (one drive per server, configured as
>> a single drive volume to pass it through as its own /dev/sd* device).
>> Inside the OS, its partitioned with lvm_thin, then an lvm volume formatted
>> with XFS and mounted as /gluster/brick3, with the data-hdd volume created
>> inside that.
>>
>> --Jim
>>
>> On Fri, Jul 6, 2018 at 10:45 PM, Jim Kusznir <jim(a)palousetech.com>
>> wrote:
>>
>>> So, I'm still at a loss...It sounds like its either insufficient
>>> ram/swap, or insufficient network. It seems to be neither now. At this
>>> point, it appears that gluster is just "broke" and killing my
systems for
>>> no descernable reason. Here's detals, all from the same system
(currently
>>> running 3 VMs):
>>>
>>> [root@ovirt3 ~]# w
>>> 22:26:53 up 36 days, 4:34, 1 user, load average: 42.78, 55.98, 53.31
>>> USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
>>> root pts/0 192.168.8.90 22:26 2.00s 0.12s 0.11s w
>>>
>>> bwm-ng reports the highest data usage was about 6MB/s during this test
>>> (and that was combined; I have two different gig networks. One gluster
>>> network (primary VM storage) runs on one, the other network handles
>>> everything else).
>>>
>>> [root@ovirt3 ~]# free -m
>>> total used free shared buff/cache
>>> available
>>> Mem: 31996 13236 232 18 18526
>>> 18195
>>> Swap: 16383 1475 14908
>>>
>>> top - 22:32:56 up 36 days, 4:41, 1 user, load average: 17.99, 39.69,
>>> 47.66
>>> Tasks: 407 total, 1 running, 405 sleeping, 1 stopped, 0 zombie
>>> %Cpu(s): 8.6 us, 2.1 sy, 0.0 ni, 87.6 id, 1.6 wa, 0.0 hi, 0.1
>>> si, 0.0 st
>>> KiB Mem : 32764284 total, 228296 free, 13541952 used, 18994036
>>> buff/cache
>>> KiB Swap: 16777212 total, 15246200 free, 1531012 used. 18643960 avail
>>> Mem
>>>
>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
>>> COMMAND
>>>
>>> 30036 qemu 20 0 6872324 5.2g 13532 S 144.6 16.5 216:14.55
>>> /usr/libexec/qemu-kvm -name guest=BillingWin,debug-threads=on -S
>>> -object secret,id=masterKey0,format=raw,file=/v+
>>> 28501 qemu 20 0 5034968 3.6g 12880 S 16.2 11.7 73:44.99
>>> /usr/libexec/qemu-kvm -name guest=FusionPBX,debug-threads=on -S
>>> -object secret,id=masterKey0,format=raw,file=/va+
>>> 2694 root 20 0 2169224 12164 3108 S 5.0 0.0 3290:42
>>> /usr/sbin/glusterfsd -s
ovirt3.nwfiber.com --volfile-id
>>> data.ovirt3.nwfiber.com.gluster-brick2-data -p /var/run/+
>>> 14293 root 15 -5 944700 13356 4436 S 4.0 0.0 16:32.15
>>> /usr/sbin/glusterfs --volfile-server=192.168.8.11
>>> --volfile-server=192.168.8.12 --volfile-server=192.168.8.13 --+
>>> 25100 vdsm 0 -20 6747440 107868 12836 S 2.3 0.3 21:35.20
>>> /usr/bin/python2 /usr/share/vdsm/vdsmd
>>>
>>> 28971 qemu 20 0 2842592 1.5g 13548 S 1.7 4.7 241:46.49
>>> /usr/libexec/qemu-kvm -name guest=unifi.palousetech.com,debug-threads=on
>>> -S -object secret,id=masterKey0,format=+
>>> 12095 root 20 0 162276 2836 1868 R 1.3 0.0 0:00.25
>>> top
>>>
>>> 2708 root 20 0 1906040 12404 3080 S 1.0 0.0 1083:33
>>> /usr/sbin/glusterfsd -s
ovirt3.nwfiber.com --volfile-id
>>> engine.ovirt3.nwfiber.com.gluster-brick1-engine -p /var/+
>>> 28623 qemu 20 0 4749536 1.7g 12896 S 0.7 5.5 4:30.64
>>> /usr/libexec/qemu-kvm -name guest=billing.nwfiber.com,debug-threads=on
>>> -S -object secret,id=masterKey0,format=ra+
>>> 10 root 20 0 0 0 0 S 0.3 0.0 215:54.72
>>> [rcu_sched]
>>>
>>> 1030 sanlock rt 0 773804 27908 2744 S 0.3 0.1 35:55.61
>>> /usr/sbin/sanlock daemon
>>>
>>> 1890 zabbix 20 0 83904 1696 1612 S 0.3 0.0 24:30.63
>>> /usr/sbin/zabbix_agentd: collector [idle 1 sec]
>>>
>>> 2722 root 20 0 1298004 6148 2580 S 0.3 0.0 38:10.82
>>> /usr/sbin/glusterfsd -s
ovirt3.nwfiber.com --volfile-id
>>> iso.ovirt3.nwfiber.com.gluster-brick4-iso -p /var/run/gl+
>>> 6340 root 20 0 0 0 0 S 0.3 0.0 0:04.30
>>> [kworker/7:0]
>>>
>>> 10652 root 20 0 0 0 0 S 0.3 0.0 0:00.23
>>> [kworker/u64:2]
>>>
>>> 14724 root 20 0 1076344 17400 3200 S 0.3 0.1 10:04.13
>>> /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
>>> /var/run/gluster/glustershd/glustershd.pid -+
>>> 22011 root 20 0 0 0 0 S 0.3 0.0 0:05.04
>>> [kworker/10:1]
>>>
>>>
>>> Not sure why the system load dropped other than I was trying to take a
>>> picture of it :)
>>>
>>> In any case, it appears that at this time, I have plenty of swap, ram,
>>> and network capacity, and yet things are still running very sluggish;
I'm
>>> still getting e-mails from servers complaining about loss of communication
>>> with something or another; I still get e-mails from the engine about bad
>>> engine status, then recovery, etc.
>>>
>>> I've shut down 2/3 of my VMs, too....just trying to keep the critical
>>> ones operating.
>>>
>>> At this point, I don't believe the problem is the memory leak, but it
>>> seems to be triggered by the memory leak, as in all my problems started
>>> when I got low ram warnings from one of my 3 nodes and began recovery
>>> efforts from that.
>>>
>>> I do really like the idea / concept behind glusterfs, but I really have
>>> to figure out why its been so poor performing from day one, and its caused
>>> 95% of my outages (including several large ones lately). If I can get it
>>> stable, reliable, and well performing, then I'd love to keep it. If I
>>> can't, then perhaps NFS is the way to go? I don't like the single
point of
>>> failure aspect of it, but my other NAS boxes I run for clients (central
>>> storage for windows boxes) have been very solid; If I could get that kind
>>> of reliability for my ovirt stack, it would be a substantial improvement.
>>> Currently, it seems about every other month I have a gluster-induced outage.
>>>
>>> Sometimes I wonder if its just hyperconverged is the issue, but my
>>> infrastructure doesn't justify three servers at the same location...I
might
>>> be able to do two, but even that seems like its pushing it.
>>>
>>> Looks like I can upgrade to 10G for about $900. I can order a
>>> dual-Xeon supermicro 12-disk server, loaded with 2TB WD Enterprise disks
>>> and a pair of SSDs for the os, 32GB ram, 2.67Ghz CPUs for about $720
>>> delivered. I've got to do something to improve my reliability; I
can't
>>> keep going the way I have been....
>>>
>>> --Jim
>>>
>>>
>>> On Fri, Jul 6, 2018 at 9:13 PM, Johan Bernhardsson <johan(a)kafit.se>
>>> wrote:
>>>
>>>> Load like that is mostly io based either the machine is swapping or
>>>> network is to slow. Check I/o wait in top.
>>>>
>>>> And the problem where you get oom killer to kill off gluster. That
>>>> means that you don't monitor ram usage on the servers? Either
it's eating
>>>> all your ram and swap gets really io intensive and then is killed off.
Or
>>>> you have the wrong swap settings in sysctl.conf (there are tons of
broken
>>>> guides that recommends swappines to 0 but that disables swap on newer
>>>> kernels. The proper swappines for only swapping when nesseary is 1 or a
>>>> sufficiently low number like 10 default is 60)
>>>>
>>>>
>>>> Moving to nfs will not improve things. You will get more memory since
>>>> gluster isn't running and that is good. But you will have a single
node
>>>> that can fail with all your storage and it would still be on 1 gigabit
only
>>>> and your three node cluster would easily saturate that link.
>>>>
>>>> On July 7, 2018 04:13:13 Jim Kusznir <jim(a)palousetech.com> wrote:
>>>>
>>>>> So far it does not appear to be helping much. I'm still getting
VM's
>>>>> locking up and all kinds of notices from overt engine about
non-responsive
>>>>> hosts. I'm still seeing load averages in the 20-30 range.
>>>>>
>>>>> Jim
>>>>>
>>>>> On Fri, Jul 6, 2018, 3:13 PM Jim Kusznir <jim(a)palousetech.com>
wrote:
>>>>>
>>>>>> Thank you for the advice and help
>>>>>>
>>>>>> I do plan on going 10Gbps networking; haven't quite jumped
off that
>>>>>> cliff yet, though.
>>>>>>
>>>>>> I did put my data-hdd (main VM storage volume) onto a dedicated
>>>>>> 1Gbps network, and I've watched throughput on that and never
seen more than
>>>>>> 60GB/s achieved (as reported by bwm-ng). I have a separate 1Gbps
network
>>>>>> for communication and ovirt migration, but I wanted to break that
up
>>>>>> further (separate out VM traffice from migration/mgmt traffic).
My three
>>>>>> SSD-backed gluster volumes run the main network too, as I
haven't been able
>>>>>> to get them to move to the new network (which I was trying to use
as all
>>>>>> gluster). I tried bonding, but that seamed to reduce performance
rather
>>>>>> than improve it.
>>>>>>
>>>>>> --Jim
>>>>>>
>>>>>> On Fri, Jul 6, 2018 at 2:52 PM, Jamie Lawrence <
>>>>>> jlawrence(a)squaretrade.com> wrote:
>>>>>>
>>>>>>> Hi Jim,
>>>>>>>
>>>>>>> I don't have any targeted suggestions, because there
isn't much to
>>>>>>> latch on to. I can say Gluster replica three (no arbiters)
on dedicated
>>>>>>> servers serving a couple Ovirt VM clusters here have not had
these sorts of
>>>>>>> issues.
>>>>>>>
>>>>>>> I suspect your long heal times (and the resultant long
periods of
>>>>>>> high load) are at least partly related to 1G networking. That
is just a
>>>>>>> matter of IO - heals of VMs involve moving a lot of bits. My
cluster uses
>>>>>>> 10G bonded NICs on the gluster and ovirt boxes for storage
traffic and
>>>>>>> separate bonded 1G for ovirtmgmt and communication with
other
>>>>>>> machines/people, and we're occasionally hitting the
bandwidth ceiling on
>>>>>>> the storage network. I'm starting to think about 40/100G,
different ways of
>>>>>>> splitting up intensive systems, and considering iSCSI for
specific volumes,
>>>>>>> although I really don't want to go there.
>>>>>>>
>>>>>>> I don't run FreeNAS[1], but I do run FreeBSD as storage
servers for
>>>>>>> their excellent ZFS implementation, mostly for backups. ZFS
will make your
>>>>>>> `heal` problem go away, but not your bandwidth problems,
which become worse
>>>>>>> (because of fewer NICS pushing traffic). 10G hardware is not
exactly in the
>>>>>>> impulse-buy territory, but if you can, I'd recommend
doing some testing
>>>>>>> using it. I think at least some of your problems are
related.
>>>>>>>
>>>>>>> If that's not possible, my next stops would be
optimizing
>>>>>>> everything I could about sharding, healing and optimizing for
serving the
>>>>>>> shard size to squeeze as much performance out of 1G as I
could, but that
>>>>>>> will only go so far.
>>>>>>>
>>>>>>> -j
>>>>>>>
>>>>>>> [1] FreeNAS is just a storage-tuned FreeBSD with a GUI.
>>>>>>>
>>>>>>> > On Jul 6, 2018, at 1:19 PM, Jim Kusznir
<jim(a)palousetech.com>
>>>>>>> wrote:
>>>>>>> >
>>>>>>> > hi all:
>>>>>>> >
>>>>>>> > Once again my production ovirt cluster is collapsing in
on
>>>>>>> itself. My servers are intermittently unavailable or
degrading, customers
>>>>>>> are noticing and calling in. This seems to be yet another
gluster failure
>>>>>>> that I haven't been able to pin down.
>>>>>>> >
>>>>>>> > I posted about this a while ago, but didn't get
anywhere (no
>>>>>>> replies that I found). The problem started out as a
glusterfsd process
>>>>>>> consuming large amounts of ram (up to the point where ram and
swap were
>>>>>>> exhausted and the kernel OOM killer killed off the glusterfsd
process).
>>>>>>> For reasons not clear to me at this time, that resulted in
any VMs running
>>>>>>> on that host and that gluster volume to be paused with I/O
error (the
>>>>>>> glusterfs process is usually unharmed; why it didn't
continue I/O with
>>>>>>> other servers is confusing to me).
>>>>>>> >
>>>>>>> > I have 3 servers and a total of 4 gluster volumes
(engine, iso,
>>>>>>> data, and data-hdd). The first 3 are replica 2+arb; the 4th
(data-hdd) is
>>>>>>> replica 3. The first 3 are backed by an LVM partition (some
thin
>>>>>>> provisioned) on an SSD; the 4th is on a seagate hybrid disk
(hdd + some
>>>>>>> internal flash for acceleration). data-hdd is the only thing
on the disk.
>>>>>>> Servers are Dell R610 with the PERC/6i raid card, with the
disks
>>>>>>> individually passed through to the OS (no raid enabled).
>>>>>>> >
>>>>>>> > The above RAM usage issue came from the data-hdd
volume.
>>>>>>> Yesterday, I cought one of the glusterfsd high ram usage
before the
>>>>>>> OOM-Killer had to run. I was able to migrate the VMs off the
machine and
>>>>>>> for good measure, reboot the entire machine (after taking
this opportunity
>>>>>>> to run the software updates that ovirt said were pending).
Upon booting
>>>>>>> back up, the necessary volume healing began. However, this
time, the
>>>>>>> healing caused all three servers to go to very, very high
load averages (I
>>>>>>> saw just under 200 on one server; typically they've been
40-70) with top
>>>>>>> reporting IO Wait at 7-20%. Network for this volume is a
dedicated gig
>>>>>>> network. According to bwm-ng, initially the network
bandwidth would hit
>>>>>>> 50MB/s (yes, bytes), but tailed off to mostly in the kB/s for
a while. All
>>>>>>> machines' load averages were still 40+ and gluster volume
heal data-hdd
>>>>>>> info reported 5 items needing healing. Server's were
intermittently
>>>>>>> experiencing IO issues, even on the 3 gluster volumes that
appeared largely
>>>>>>> unaffected. Even the OS activities on the hosts itself
(logging in,
>>>>>>> running commands) would often be very delayed. The ovirt
engine was
>>>>>>> seemingly randomly throwing engine down / engine up / engine
failed
>>>>>>> notifications. Responsiveness on ANY VM was horrific most of
the time,
>>>>>>> with random VMs being inaccessible.
>>>>>>> >
>>>>>>> > I let the gluster heal run overnight. By morning, there
were
>>>>>>> still 5 items needing healing, all three servers were still
experiencing
>>>>>>> high load, and servers were still largely unstable.
>>>>>>> >
>>>>>>> > I've noticed that all of my ovirt outages (and
I've had a lot,
>>>>>>> way more than is acceptable for a production cluster) have
come from
>>>>>>> gluster. I still have 3 VMs who's hard disk images have
become corrupted
>>>>>>> by my last gluster crash that I haven't had time to
repair / rebuild yet (I
>>>>>>> believe this crash was caused by the OOM issue previously
mentioned, but I
>>>>>>> didn't know it at the time).
>>>>>>> >
>>>>>>> > Is gluster really ready for production yet? It seems so
unstable
>>>>>>> to me.... I'm looking at replacing gluster with a
dedicated NFS server
>>>>>>> likely FreeNAS. Any suggestions? What is the
"right" way to do production
>>>>>>> storage on this (3 node cluster)? Can I get this gluster
volume stable
>>>>>>> enough to get my VMs to run reliably again until I can deploy
another
>>>>>>> storage solution?
>>>>>>> >
>>>>>>> > --Jim
>>>>>>> > _______________________________________________
>>>>>>> > Users mailing list -- users(a)ovirt.org
>>>>>>> > To unsubscribe send an email to users-leave(a)ovirt.org
>>>>>>> > Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
>>>>>>> > oVirt Code of Conduct:
https://www.ovirt.org/communit
>>>>>>> y/about/community-guidelines/
>>>>>>> > List Archives:
https://lists.ovirt.org/archiv
>>>>>>>
es/list/users(a)ovirt.org/message/YQX3LQFQQPW4JTCB7B6FY2LLR6NA2CB3/
>>>>>>>
>>>>>>>
>>>>>> _______________________________________________
>>>>> Users mailing list -- users(a)ovirt.org
>>>>> To unsubscribe send an email to users-leave(a)ovirt.org
>>>>> Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
>>>>> oVirt Code of Conduct:
https://www.ovirt.org/communit
>>>>> y/about/community-guidelines/
>>>>> List Archives:
https://lists.ovirt.org/archiv
>>>>> es/list/users(a)ovirt.org/message/O2HIECLFMYGKH3KSZHHSMDUVGOEBI7GQ/
>>>>>
>>>>
>>>>
>>>
>>
>