Hello:
This past week, I created a new gluster store, as I was running out of disk
space on my main, SSD-backed storage pool. I used 2TB Seagate FireCuda
drives (hybrid SSD/spinning). Hardware is Dell R610's with integral
PERC/6i cards. I placed one disk per machine, exported the disk as a
single disk volume from the raid controller, formatted it XFS, mounted it,
and dedicated it to a new replica 3 gluster volume.
Since doing so, I've been having major performance problems. One of my
windows VMs sits at 100% disk utilization nearly continously, and its
painful to do anything on it. A Zabbix install on CentOS using mysql as
the backing has 70%+ iowait nearly all the time, and I can't seem to get
graphs loaded from the web console. Its also always spewing errors that
ultimately come down to insufficient disk performance issues.
All of this was working OK before the changes. There are two:
Old storage was SSD backed, Replica 2 + arb, and running on the same GigE
network as management and main VM network.
New storage was created using the dedicated Gluster network (running on em4
on these servers, completely different subnet (174.x vs 192.x), and was
created replica 3 (no arb), on the FireCuda disks (seem to be the fastest I
could afford for non-SSD, as I needed a lot more storage).
My attempts to watch so far have NOT shown maxed network interfaces (using
bwm-ng on the command line); in fact, the gluster interface is usually
below 20% utilized.
I'm not sure how to meaningfully measure the performance of the disk
itself; I'm not sure what else to look at. My cluster is not very usable
currently, though. IOWait on my hosts appears to be below 0.5%, usually
0.0 to 0.1. Inside the VMs is a whole different story.
My cluster is currently running ovirt 4.1. I'm interested in going to 4.2,
but I think I need to fix this first.
Thanks!
--Jim