Hi everybody, my coworker and I have some decent hardware that would make great single
servers, and then we through in a 10gb switch with 2x10gbps cards in 4 boxes.
We have 2 excellent boxes - super micro board (disabled speed step) with 14 core intel
i9's 7940x's, 128 GB ram (3200), 1tb m.2 samsung 870 evo's, 1tb ssd samsung,
1 8TB WD Gold, 1 6TB WD gold.
Then we have 2 boxes (1 with 8 core i7-9700k, 1 with 6 core i7-8700) 128GB ram in one,
64GB ram in the other all 3000MHz with the same 1tb ssd,6tb wd gold, 8tb wd gold drives as
the other boxes and 10gbps cards.
Our problem is performance. We used the slower boxes for KVM(libvirt) and FreeNAS at
first which was great performance wise. Then we bought the new super micro boxes and
converted to oVirt + Gluster and did some basic write test using dd writing zero's to
files from 1GB up to 50GB and were happy with the numbers writing directly to the gluster.
But then we stuck a windows VM on it and turned it on...I'll stop there..because
turning it on stopped any performance testing. This thing blew goat cheese. It was so slow
the oVirt guest agent doesn't even start along with MS SQL server engine sometimes and
other errors.
So naturally, we removed the gluster from the equation. We took one of the 8TB WD Gold
drives, made it a linux NFS share and gave it to oVirt to put VM's on as an NFS
Domain. Just a single drive. Migrated the disk with the fresh windows 10 installation to
it configured as VirtIO-SCSI, and booted the vm with 16GB ram, 8:1:1 cpu's. To our
surprise it still blew. Just ran a winsat disk -drive c: for example purposes and the
spice viewer repeatedly freezing, had the resource monitor open watching the 10,000ms disk
response times with results...results were I rebooted because the results disappeared I
didn't run it as administrator. And opening a command prompt is painful, the disk is
still in use The task manager has no words on it. Disk is writing like 1MBps. command
prompt finally showed up and looked blank with the cursor offset with no words anywhere.
So the reboot took .. Well turning off took 2 minutes. Booting took 6 minutes 30 seconds
ish. Logging in: 1m+
So 9-10 minutes to reboot and log back in a fresh windows install. Then 2 minutes to open
a command prompt, task manager and resource monitor.
During the write test disk i/o on the vm was less than 8, from the graph looks like 6MBps.
Network traffic is like 20Mbps average, cpu is near zero, a couple spikes up to 30MBps on
the disk. I ran this same thing on my disk and it finished in <1m. Ran it on the
vm...still running after 30 minutes. I'll wait for the results to post them here. Ok
It's been 30 minutes and it's still writing. I don't see the writes in the
resource monitor, windows is doing a bunch of random app updates or something with candy
crush on a fresh install and, ok so I hit enter a bunch of times on the prompt and it
moved down to a flush-seq...and now it shows up in the resource monitor doing something
again...I just ran this on my pc and it finished in less than a minute... whatever
it's doing it's almost running at 1MB/s
I think something went wrong because it only shows like 2 minutes passing at any test and
then a total of 37 minutes. And at no time did the windows resource graphs or any of the
oVirt node system graphs show more than like 6MB/s and definitely not 50 or 1GBps... flat
out lies here.
C:\Windows\system32>winsat disk -drive c
Windows System Assessment Tool
Running: Feature Enumeration ''
Run Time 00:00:00.00
Running: Storage Assessment '-drive c -ran -read'
Run Time 00:00:12.95
Running: Storage Assessment '-drive c -seq -read'
Run Time 00:00:20.59
Running: Storage Assessment '-drive c -seq -write'
Run Time 00:02:04.56
Run Time 00:02:04.56
Running: Storage Assessment '-drive c -flush -seq'
Run Time 00:01:02.75
Running: Storage Assessment '-drive c -flush -ran'
Run Time 00:01:50.20
Dshow Video Encode Time 0.00000 s
Dshow Video Decode Time 0.00000 s
Media Foundation Decode Time 0.00000 s
Disk Random 16.0 Read 5.25 MB/s 5.1
Disk Sequential 64.0 Read 1220.56 MB/s 8.6
Disk Sequential 64.0 Write 53.61 MB/s 5.5
Average Read Time with Sequential Writes 22.994 ms 1.9
Latency: 95th Percentile 85.867 ms 1.9
Latency: Maximum 325.666 ms 6.5
Average Read Time with Random Writes 29.548 ms 1.9
Total Run Time 00:37:35.55
I even ran it again and it had the exact same results. So I'll try copying a file
from a 1gbps network location with an ssd to this pc. It's a 4GB CentOS7 ISO to the
desktop. It said 20MB/s then up to 90MB/s...then it dropped after doing a couple gigs.
2.25 gigs to go and it's going at 2.7MB/s with some fluctuations up to 5MB. So to
drive this home at the same time... I copied the file off the same server to another
server with ssd disks and it ran at 100MB/s which is what I'd expect over a 1gbps
network.
All this said, we do have an ssd gluster 2+1 arbiter (which seemed fastest when we tested
different variations) on the 1TB ssd's. I was able to do reads from the array inside
a VM at 550MB/s which is expected for an ssd. We did dd writing zero's and got about
550MB/s also from the ovirt node. But inside a VM the best we get is around ~10MB/s
writing.
Basically done the same testing using windows server 2016, boot terrible, opening
applications terrible. But with sql server running off the ssd gluster I can read at
550MB/s but writing is horrific somewhere around 2-10MB/s.
Latency between the nodes with ping is 100us ish. The hardware should be able to do
200MB/s HDD's, 550MB/s SSD's but it doesn't. And it's evident in every
writing scenario inside a vm. Migrating VM's is also at this speed.
Gluster healing seems to run faster, we've seen it conusme 7-9 Gbps. So I feel this
is an oVirt issue and not gluster. Especially since all the tests above are the same when
using an NFS mount on the box running the VM in oVirt.
Please guide me. I can post pictures and such if needed, logs whatever. Just ask.