Sorry about the delay. We did confirm the Jumbo frames. We dropped the
iSCSI and and switched to NFS on FreeNAS before I got your reply. Seems to
have gotten rid of any hiccups. And we like being able to see the files
better than iSCSI anyway.
So we decided that we were very confident it's oVirt (reason I'm here on
oVirt forum) and we took a complete shit box: a 4 core i5 from multiple
years ago with 16gb ram (unknown speed) and a 1tb hard disk drive fedora 27
box, mount the freenas NFS and run the a VM using libvirt/kvm directly
over 10gb nic and got 400-900 MBps. Also tested proxmox, got the slow
speeds again (faster than oVirt) at around 90MBps.
So there is definitely something with proxmox that is slower than kvm and
something with oVirt that is slower than proxmox. Each a step up in cool
features..and a large step down in performance.
So we're dropping all this. Spent way too much time on it. Going FreeNAS
with KVM virt manager. Lots of people online talking about oVirt actually
working well but then don't back it up with any info. No clear cut way to
create an oVirt setup that actually performs. So we're junking it. Thanks
for the help guys, I think you've set us on the right course for the next
decade.
On Thu, Mar 14, 2019 at 8:12 AM Karli Sjöberg <karli(a)inparadise.se> wrote:
On 2019-03-13 05:20, Drew Rash wrote:
Pictures and speeds are the latest. Which seems to be the best performance
we've ever gotten so far. Still seems like the hardware is sitting idling
by not doing much after an initial burst.
Took a picture of a file copy using the latest setup. You can see it
transfer like 25% of a 7gig file at some where around 1GBps or 600MBps ish
(it disappears quickly) down to 40MBps
The left vm "MikeWin10:1" is freeNAS'd and achieves much higher highs.
Still crawls down to the lows and has pause and weird stuff.
The right vm "MikeWin10_Drew:1" is a gluster fs mount. We tried nfs and
decided to try gluster again but with a "negative-timeout=1" option
set...appears to have made it faster by 4x.
*https://imgur.com/a/R2w6IcO <
https://imgur.com/a/R2w6IcO>*
*4 Boxes:*
(2)Two are c9x299-PG300F super micro boards with 14c (28thread) i9's
128GB 3200MHz Ram
(1)FreeNAS is our weakest of all 4 boxes - 6 core, 64GB ram i7 extreme
version.
Heyo!
Not that the thread is about ZFS, but I find this "stop and go" behavior
interesting.
FreeNAS is a excellent NAS platform, I mean, it's in the name, right? ;)
However, the ZFS filesystem and how you configure the system does impact
performance. First of all, how have you configured the drives in the zpool?
RAIDZ is not recommended for virtualization, just because it's random IOPS
performance are set to 1 HDD/vdev. If we assume a SATA drive has 150 random
IOPS and you create a 8 x 6 TB RAIDZ2 vdev, that entire pool only have 150
random IOPS total. Can you do a "zpool status" and post the output?
Second, it's worth mentioning that block sizes still matter. Most drives
still lie to the OS that they are 512 byte sectors while really being 4k,
just so that older OS'es don't freak out because they don't know drives can
have any else than 512. I don't know if FreeNAS solves this issue for you
but it's something I always take care of, either by "sysctl
vfs.zfs.min_auto_ashift=12" or trick ZFS into thinking the drives are true
4k disks with "gnop". A way to check is "zdb | grep ashift"; it
should be
12. If 9, you may have worse performance than you should have, but not way
worse. Still... Then there's alignment that I also think that FreeNAS takes
care of, probably... Most systems place the partition start at 1 MiB which
makes it OK for any disk regardless. Your disks should be called "adaX",
run "camcontrol devlist" to get a list of all of them, then pick one disk
to check the partitioning on with "gpart show adaX". The
"freebsd-zfs"
partition should start at something evenly divisible by 4096 (4k). Most of
the time they're at 2048, because 512*2048=1048576(1MiB) and that divided
by 4k is (1048576/4096=256), which is a beautifully even number.
Third and maybe most important, ZFS _does_ listen to "sync" calls, which
is about everything over iSCSI (with ctld) or NFS. That means, since your
hosts are connecting to it over one of the two, for _every_ write, the NAS
stops and waits for it to be actually written safely to disk before doing
another write, it's sooo slow (but super awesome, because it saves you from
data corruption). What you do with ZFS to mitigate that is to add a so
called SLOG (separate log) disk, typically a hella-fast SSD or NVME that
only does that and nothing else, so that the fast disk takes all the
random, small writes and turns them into big streaming writes that the
HDD's can take. You can partition just a bit of an SSD and use that as a
SLOG, typically not more than the bandwidth you could maximally take, times
the interval between write flushes in ZFS, which is 5 secs. So 10Gb/s is
about 1,25 GB/s, tops- and you have two of those. 2,5GB*5 = 12.5GB. Which
means 14GB should definitely cover it.
Lastly network, are you sure you activated jumbo frames, all the way from
the storage to the hosts? That makes a huge difference on 10 Gb ethernet. A
way to test this is to start tcpdump on the iSCSI/NFS storage interface,
looking for just ping, like "tcpdump -vnni Jumbo_NFS icmp" on both the
storage and a host system. Then from another terminal (as root) of the
storage or host, send just _one_ big ping packet, and see what you get,
like so: "ping -c 1 -s 8192 XXX.XXX.XXX.XXX". The tcpdump output should
just have recieved _one_ ICMP echo request and sent back just _one_ ICMP
echo reply; one to get there, and one back. That's how you're sure you've
got no fragmentation happening between the storage and host.
I swear, I didn't mean to write a book about it, it just happened :) You
put a quarter in the ZFS box and this is what you get...
/K
(1) The last is an 8c (16thrd) i7, 128GB 3000MHz Ram
*Network:*
All tied together with a 10Gbps managed switch, each machine having 2 x
10Gbps nic ports.
*Drives:*
4 8TB WD Gold Enterprise drives
4 6TB WD Gold Enterprise drives
4 m.2 500 GB samsung pro's
and like 10 ssd's for random things with 4 being 1TB samsung's running a
gluster for a production box. Which still also runs at around 13MBps inside
the VM.
Also I believe we tried using 9000 MTU on all networks and the setting is
still set to that.
We're testing using 2 8TB drives in a mirror 2 (no arb..testing) gluster.
And we took the 6TB drives and made a raid on freenas for testing.
m.2's are boot devices for the boxes.
It's pretty apparent there's some kind of cache happening and then if the
file copy is big enough, it'll just crawl down to nothing after it hits the
end of whatever it is.
Added a picture of the StoragePool page in freenas. And a picture of the
oVirt gluster box VM page.
I'm not sure where to find the dirty ratio and background ratio...?
On Tue, Mar 12, 2019 at 1:19 AM Strahil <hunter86_bg(a)yahoo.com> wrote:
> Hi Drew,
>
> What is the host RAM size and what is the setting for VM.dirty_ratio and
> background ratio on those hosts?
>
> What about your iSCSI target?
>
> Best Regards,
> Strahil Nikolov
> On Mar 11, 2019 23:51, Drew Rash <drew.rash(a)gmail.com> wrote:
>
> Added the disable:false, removed the gluster, re-added using nfs.
> Performance still in the low 10's MBps + or - 5
> Ran the showmount -e "" and it displayed the mount.
>
> Trying right now to re-mount using gluster with a negative-timeout=1
> option.
>
> We converted one of our 4 boxes to FreeNAS, took 4 6TB drives and made a
> raid iSCSI and connected it to oVirt. Boot windows. ( times 2, did 2 boxes
> with a 7GB file on each) copied from one to the other and it copied at
> 600MBps average. But then has weird pauses... I think it's doing some kind
> of cache..it'll go like 2GB and choke to zero Bps. Then speed up and choke,
> speed up choke averaging or getting up to 10MBps. Then at 99% it waits 15
> seconds with 0 bytes left...
> Small files, are instant basically. No complaint there.
> So...WAY faster. But suffers from the same thing....just requires
> writing some more to get to it. a few gigs and then it crawls.
>
> Seems to be related to if I JUST finished running a test. If I wait a
> while, I get it it to copy almost 4GB or so before choking.
> I made a 3rd windows 10 VM and copied the same file from the 1st to the
> 2nd (via a windows share and from the 3rd box) And it didn't choke or do
> any funny business...oddly. Maybe a fluke. Only did that once.
>
> So....switching to freenas appears to have increased the window size
> before it runs horribly. But it will still run horrifically if the disk is
> busy.
>
> And since we're planning on doing actual work on this... idle disks
> caching up on some hidden cache feature of oVirt isn't gonna work. We
> won't be writing gigs of data all over the place...but knowing that this
> chokes a VM to near death...is scary.
>
> It looks like for a windows 10 install to operate correctly, it expects
> at least 15MB/s with less than 1s latency. Otherwise services don't start
> and weird stuff happens and it runs slower than my dog while pooping out
> that extra little stringy bit near the end. So we gotta avoid that.
>
>
>
>
> On Sat, Mar 9, 2019 at 12:44 AM Strahil <hunter86_bg(a)yahoo.com> wrote:
>
> Hj Drew,
>
> For the test change the gluster parameter nfs.disabled to false.
> Something like gluster volume set volname nfs.dsiable false
>
> Then use shownount -e gluster-node-fqdn
> Note: NFS might not be allowed in the firewall.
>
> Then add this NFS domain (don't forget to remove the gluster storage
> domain before that) and do your tests.
>
> If it works well, you will have to switch off nfs.disable and deploy NFS
> Ganesha:
>
> gluster volume reset volname nfs.disable
>
> Best Regards,
> Strahil Nikolov
>
>
_______________________________________________
Users mailing list -- users(a)ovirt.org
To unsubscribe send an email to users-leave(a)ovirt.org
Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/C2CEUZTOFKJ...