[ovirt-users] Re: Seeking best performance on oVirt cluster

Thursday, 14 July 2022

Hi Vicinius,

Thank you for your kind analysis.

Responses to your questions follow:

   - The setting was sync=standard in the original samples.
   - I have two 1TB NVME drives striped for the SLOG. You can see the
   utilization on NVD1
   - The pool is 65 TB is nize, with 4 tb (6%) used today.
   - The pool uses lz4 compression
   - The disks never get above 5% of their rated sustained write usage in
   spikes, even with sync=false. In effect, we're barely touching the hard
   drives. With Sync-false, I hit sustained 100% utilization of the 10 gbit
   network for 2 minutes when creating a VM from a template. During that
   network spike, the drives were barely active, averaging 3% with a spike to
   5% of rated sustained throughput.
   - Yes, we are running raidz2.
   - The system is heavily overspecced for the workload, with 256 GB RAM
   (196 GB zfs cache), 2x CPU E5-2683 v3 CPU (14 cores x 2). We have never
   seen more than 7 threads used, even under heavy loading.
   - Power loss is not a concern since we are colocated in the power
   utility's data center.  If we lose power, we've got much bigger
   issues than a few minutes of lost data - we have a local cataclysm
   that affects everyone.
   - Out of 89 active processes, over the lifetime of this system, on
   average 87 processes have been sleeping, with the range of sleeping
   processes being 86 to 88.
   - From the analysis that has gone into addressing this question over the
   past couple of weeks, we understand that TrueNAS is not the best long term
   solution for us. Our particular usage hits every weakness of TrueNAS, and
   TrueNAS' iSCSI implementation is particularly wrong for our usage. The long
   term goal is to move to a cinder storage option when ease of management is
   less of a concern.

Thank you again for taking the time to look at this with me.

On Thu, Jul 14, 2022 at 11:25 AM Vinícius Ferrão <ferrao(a)versatushpc.com.br&gt;
wrote:

...
 Hi David, as an advise. You should not set sync=disabled on TrueNAS.
Doing
 that you’re considering every write as async, and if you have a powerless
 you’ll have data lost.

 There are some conservatives that state that you should do the opposite:
 sync=always, which bogs down the performance, but I particularly use
 sync=standard.

 I think you should look at the real issue on your pool. Is it with more
 than 80% occupation? Do you have an SLOG device (to offload the sync
 writes, so you can continue using sync=standard)?

 Another issue is that RAID-Z is not really recommended for latency
 scenarios (VMs). You should be using stripe of mirrors instead. RAID-Z2
 would be a choice if you have lots of RAM and L2ARC devices to compensate
 the slow disk access. ZFS is not a performance beast, security is the first
 option in this filesystem.

 Regards.

 PS: I run FreeNAS (and now TrueNAS) since the 0.7 days and with VM storage
 since 8.2. It’s good if you know what you’re doing.

 On 14 Jul 2022, at 12:26, David Johnson <djohnson(a)maxistechnology.com&gt;
 wrote:

 Thank you Nir, this is good information.

 On Thu, Jul 14, 2022 at 9:34 AM Nir Soffer <nsoffer(a)redhat.com&gt; wrote:

> On Tue, Jul 12, 2022 at 9:02 AM David Johnson <
> djohnson(a)maxistechnology.com&gt; wrote:
>
>> Good morning all,
>>
>> I am trying to get the best performance out of my cluster possible,
>>
>> Here are the details of what I have now:
>>
>> Ovirt version: 4.4.10.7-1.el8
>> Bare metal for the ovirt engine
>> two hosts
>> TrueNAS cluster storage
>>    1 NFS share
>>    3 vdevs, 6 drives in raidz2 in each vdev
>>    2 nvme drives for silog
>> Storage network is 10 GBit all static IP addresses
>>
>> Tonight, I built a new VM from a template.  It had 5 attached disks
>> totalling 100 GB.  It took 30 minutes to deploy the new VM from the
>> template.
>>
>
>> Global utilization was 9%.
>> The SPM has 50% of its memory free and never showed more than 12%
>> network utilization
>>
>> 62 out of 65 TB are available on the newly created NFS backing store (no
>> fragmentation). The TureNAS system is probably overprovisioned for our use.
>>
>> There were peak throughputs of up to 4 GBytes/second (on a 10 GBit
>> network), but overall throughput on the NAS and the network were low.
>> ARC hits were 95 to 100%
>> L2 hits were 0 to 70%
>>
>> Here's the NFS usage stats:
>> <image.png>
>>
>> I believe the first peak is where the silog buffered the initial burst
>> of instructions, followed by sustained IO as the VM volumes were built in
>> parallel, and then finally tapering off to the one 50 GB volume that took
>> 40 minutes to copy.
>>
>> The indications of the NFS stats graph are that the network performance
>> is just fine.
>>
>> Here are the disk IO stats covering the same time frame, plus a bit
>> before to show a spike IO:
>>
>> <image.png>
>> The spike at 2250 (10 minutes before I started building my VM) shows
>> that the spinners actually hit write speed of almost 20 MBytes per second
>> briefly, then settled in at a sustained 3 to 4 MBytes per second.  The
>> silog absorbs several spikes, but remains mostly idle, with activity
>> measured in kilobytes per second.
>>
>> The HGST HUS726060AL5210 drives boast a spike throughput of 12 GB/S, and
>> sustained throughput of 227 Mbps.
>>
>> ------
>> Now to the questions:
>> 1. Am I asking the on the right list? Does this look like
>> something where tuning ovirt might make a difference, or is this more
>> likely a configuration issue with my storage appliances?
>>
>> 2. Am I expecting too much?  Is this well within the bounds of
>> acceptable (expected) performance?
>>
>> 3. How would I go about identifying the bottleneck, should I need to dig
>> deeper?
>>
>
> One thing that can be interesting to try to to apply this patch for vdsm:
>
> diff --git a/lib/vdsm/storage/sd.py b/lib/vdsm/storage/sd.py
> index 36c393b5a..9cb7486c0 100644
> --- a/lib/vdsm/storage/sd.py
> +++ b/lib/vdsm/storage/sd.py
> @@ -401,7 +401,7 @@ class StorageDomainManifest(object):
>          Unordered writes improve copy performance but are recommended
> only for
>          preallocated devices and raw format.
>          """
> -        return format == sc.RAW_FORMAT and not self.supportsSparseness
> +        return True
>
>      @property
>      def oop(self):
>
> This enables unordered writes for qemu-img convert, which can be up to 6
> times faster on
> block storage. When we tested it with file storage it did not give lot of
> improvement,  but this
> was tested a long time ago, and since then we use unordered writes
> everywhere else in the
> system.
>
> Another thing to try is NFS 4.2, which can be much faster when coping
> images, since it supports
> sparseness. But I don't think TrueNAS supports NFS 4.2 yet (in 12.x they
> did not).
>
> If you must work with older NFS, using qcow2 disks will be much faster
> when copying disks
> (e.g. create vm from template). The way to get qcow2 disks is to check
> "enable incremental backup"
> when creating disks.
>
> Nir
>
 _______________________________________________
 Users mailing list -- users(a)ovirt.org
 To unsubscribe send an email to users-leave(a)ovirt.org
 Privacy Statement: https://www.ovirt.org/privacy-policy.html
 oVirt Code of Conduct:
 https://www.ovirt.org/community/about/community-guidelines/
 List Archives:

https://lists.ovirt.org/archives/list/users@ovirt.org/message/US7ENNAE3F3...

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

[ovirt-users] Re: Seeking best performance on oVirt cluster