On Thu, Mar 14, 2019 at 8:12 AM Karli Sjöberg <karli@inparadise.se> wrote:

On 2019-03-13 05:20, Drew Rash wrote:

Pictures and speeds are the latest. Which seems to be the best performance we've ever gotten so far. Still seems like the hardware is sitting idling by not doing much after an initial burst.

Took a picture of a file copy using the latest setup. You can see it transfer like 25% of a 7gig file at some where around 1GBps or 600MBps ish (it disappears quickly) down to 40MBps

The left vm "MikeWin10:1" is freeNAS'd and achieves much higher highs. Still crawls down to the lows and has pause and weird stuff.

The right vm "MikeWin10_Drew:1" is a gluster fs mount. We tried nfs and decided to try gluster again but with a "negative-timeout=1" option set...appears to have made it faster by 4x.

https://imgur.com/a/R2w6IcO

4 Boxes:
(2)Two are c9x299-PG300F super micro boards with 14c (28thread) i9's 128GB 3200MHz Ram
(1)FreeNAS is our weakest of all 4 boxes - 6 core, 64GB ram i7 extreme version.

Heyo!

Not that the thread is about ZFS, but I find this "stop and go" behavior interesting.

FreeNAS is a excellent NAS platform, I mean, it's in the name, right? ;) However, the ZFS filesystem and how you configure the system does impact performance. First of all, how have you configured the drives in the zpool? RAIDZ is not recommended for virtualization, just because it's random IOPS performance are set to 1 HDD/vdev. If we assume a SATA drive has 150 random IOPS and you create a 8 x 6 TB RAIDZ2 vdev, that entire pool only have 150 random IOPS total. Can you do a "zpool status" and post the output?

Second, it's worth mentioning that block sizes still matter. Most drives still lie to the OS that they are 512 byte sectors while really being 4k, just so that older OS'es don't freak out because they don't know drives can have any else than 512. I don't know if FreeNAS solves this issue for you but it's something I always take care of, either by "sysctl vfs.zfs.min_auto_ashift=12" or trick ZFS into thinking the drives are true 4k disks with "gnop". A way to check is "zdb | grep ashift"; it should be 12. If 9, you may have worse performance than you should have, but not way worse. Still... Then there's alignment that I also think that FreeNAS takes care of, probably... Most systems place the partition start at 1 MiB which makes it OK for any disk regardless. Your disks should be called "adaX", run "camcontrol devlist" to get a list of all of them, then pick one disk to check the partitioning on with "gpart show adaX". The "freebsd-zfs" partition should start at something evenly divisible by 4096 (4k). Most of the time they're at 2048, because 512*2048=1048576(1MiB) and that divided by 4k is (1048576/4096=256), which is a beautifully even number.

Third and maybe most important, ZFS _does_ listen to "sync" calls, which is about everything over iSCSI (with ctld) or NFS. That means, since your hosts are connecting to it over one of the two, for _every_ write, the NAS stops and waits for it to be actually written safely to disk before doing another write, it's sooo slow (but super awesome, because it saves you from data corruption). What you do with ZFS to mitigate that is to add a so called SLOG (separate log) disk, typically a hella-fast SSD or NVME that only does that and nothing else, so that the fast disk takes all the random, small writes and turns them into big streaming writes that the HDD's can take. You can partition just a bit of an SSD and use that as a SLOG, typically not more than the bandwidth you could maximally take, times the interval between write flushes in ZFS, which is 5 secs. So 10Gb/s is about 1,25 GB/s, tops- and you have two of those. 2,5GB*5 = 12.5GB. Which means 14GB should definitely cover it.

Lastly network, are you sure you activated jumbo frames, all the way from the storage to the hosts? That makes a huge difference on 10 Gb ethernet. A way to test this is to start tcpdump on the iSCSI/NFS storage interface, looking for just ping, like "tcpdump -vnni Jumbo_NFS icmp" on both the storage and a host system. Then from another terminal (as root) of the storage or host, send just _one_ big ping packet, and see what you get, like so: "ping -c 1 -s 8192 XXX.XXX.XXX.XXX". The tcpdump output should just have recieved _one_ ICMP echo request and sent back just _one_ ICMP echo reply; one to get there, and one back. That's how you're sure you've got no fragmentation happening between the storage and host.

I swear, I didn't mean to write a book about it, it just happened :) You put a quarter in the ZFS box and this is what you get...

/K
(1) The last is an 8c (16thrd) i7, 128GB 3000MHz Ram

Network:

All tied together with a 10Gbps managed switch, each machine having 2 x 10Gbps nic ports.

Drives:

4 8TB WD Gold Enterprise drives

4 6TB WD Gold Enterprise drives

4 m.2 500 GB samsung pro's

and like 10 ssd's for random things with 4 being 1TB samsung's running a gluster for a production box. Which still also runs at around 13MBps inside the VM.

Also I believe we tried using 9000 MTU on all networks and the setting is still set to that.

We're testing using 2 8TB drives in a mirror 2 (no arb..testing) gluster.

And we took the 6TB drives and made a raid on freenas for testing.

m.2's are boot devices for the boxes.

It's pretty apparent there's some kind of cache happening and then if the file copy is big enough, it'll just crawl down to nothing after it hits the end of whatever it is.

Added a picture of the StoragePool page in freenas. And a picture of the oVirt gluster box VM page.

I'm not sure where to find the dirty ratio and background ratio...?

On Tue, Mar 12, 2019 at 1:19 AM Strahil <hunter86_bg@yahoo.com> wrote:

Hi Drew,

What is the host RAM size and what is the setting for VM.dirty_ratio and background ratio on those hosts?

What about your iSCSI target?

Best Regards,
Strahil Nikolov

On Mar 11, 2019 23:51, Drew Rash <drew.rash@gmail.com> wrote:

Added the disable:false, removed the gluster, re-added using nfs. Performance still in the low 10's MBps + or - 5
Ran the showmount -e "" and it displayed the mount.

Trying right now to re-mount using gluster with a negative-timeout=1 option.

We converted one of our 4 boxes to FreeNAS, took 4 6TB drives and made a raid iSCSI and connected it to oVirt. Boot windows. ( times 2, did 2 boxes with a 7GB file on each) copied from one to the other and it copied at 600MBps average. But then has weird pauses... I think it's doing some kind of cache..it'll go like 2GB and choke to zero Bps. Then speed up and choke, speed up choke averaging or getting up to 10MBps. Then at 99% it waits 15 seconds with 0 bytes left...

Small files, are instant basically. No complaint there.

So...WAY faster. But suffers from the same thing....just requires writing some more to get to it. a few gigs and then it crawls.

Seems to be related to if I JUST finished running a test. If I wait a while, I get it it to copy almost 4GB or so before choking.

I made a 3rd windows 10 VM and copied the same file from the 1st to the 2nd (via a windows share and from the 3rd box) And it didn't choke or do any funny business...oddly. Maybe a fluke. Only did that once.

So....switching to freenas appears to have increased the window size before it runs horribly. But it will still run horrifically if the disk is busy.

And since we're planning on doing actual work on this... idle disks caching up on some hidden cache feature of oVirt isn't gonna work. We won't be writing gigs of data all over the place...but knowing that this chokes a VM to near death...is scary.

It looks like for a windows 10 install to operate correctly, it expects at least 15MB/s with less than 1s latency. Otherwise services don't start and weird stuff happens and it runs slower than my dog while pooping out that extra little stringy bit near the end. So we gotta avoid that.

On Sat, Mar 9, 2019 at 12:44 AM Strahil <hunter86_bg@yahoo.com> wrote:

Hj Drew,

For the test change the gluster parameter nfs.disabled to false.
Something like gluster volume set volname nfs.dsiable false

Then use shownount -e gluster-node-fqdn
Note: NFS might not be allowed in the firewall.

Then add this NFS domain (don't forget to remove the gluster storage domain before that) and do your tests.

If it works well, you will have to switch off nfs.disable and deploy NFS Ganesha:

gluster volume reset volname nfs.disable

Best Regards,
Strahil Nikolov
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/C2CEUZTOFKJ5BI72JXZTVAJKFHDEF5RN/