On Thu, Nov 23, 2017 at 3:33 PM Yaniv Kaul <ykaul@redhat.com> wrote:
On Thu, Nov 23, 2017 at 1:43 PM, Jiří Sléžka <jiri.slezka@slu.cz> wrote:
well, another idea

when I did not use the direct flag, the performace was much better

15787360256 bytes (16 GB) copied, 422.955159 s, 37.3 MB/s

That means you were hitting the cache.
 

probably qemu-img uses direct write too and I understand why. But in
case of backup it is not as hot I think. Is there a chance to modify
this behavior for backup case? Is it a good idea? Should I fill RFE?

Probably not. We really prefer direct IO to ensure data is consistent.
Y

I did some research in prehistoric oVirt source, and found why we started to 
use direct I/O. We have two issues:

1. reading stale data from storage
2. trashing host cache

If you don't use direct I/O when accessing shared storage, you risk reading
stale data from the kernel buffer cache. This cache may be stale since the kernel
does not know anything about other hosts writing to the same storage after the
last read from this storage.

The -t none option in vdsm was introduced because of

The qemu bug https://bugzilla.redhat.com/713743 explains the issue:
qemu-img was writing disk images using writeback and fillingup the cache buffers
which are then flushed by the kernel preventing other processes from accessing
the storage. This is particularly bad in cluster environments where time-based 
algorithms might be in place and accessing the storage within certain timeouts
is critical

I'm not sure it this issue relevant now. We use now sanlock instead of safelease,
(except for export domain still using safelease), and qemu or kernel may have
better options to avoid trashing the host cache, or guarantee reliable access 
to storage. 

Daivd, do you know if sanlock is effected by trashing the host cache?

Adding also qemu-block mailing list.

Nir
 
 

Cheers,

Jiri


On 11/23/2017 12:26 PM, Jiří Sléžka wrote:
> Hi,
>
> On 11/22/2017 07:30 PM, Nir Soffer wrote:
>> On Mon, Nov 20, 2017 at 5:22 PM Jiří Sléžka <jiri.slezka@slu.cz
>> <mailto:jiri.slezka@slu.cz>> wrote:
>>
>>     Hi,
>>
>>     I am trying realize why is exporting of vm to export storage on
>>     glusterfs such slow.
>>
>>     I am using oVirt and RHV, both instalations on version 4.1.7.
>>
>>     Hosts have dedicated nics for rhevm network - 1gbps, data storage itself
>>     is on FC.
>>
>>     GlusterFS cluster lives separate on 4 dedicated hosts. It has slow disks
>>     but I can achieve about 200-400mbit throughput in other applications (we
>>     are using it for "cold" data, backups mostly).
>>
>>     I am using this glusterfs cluster as backend for export storage. When I
>>     am exporting vm I can see only about 60-80mbit throughput.
>>
>>     What could be the bottleneck here?
>>
>>     Could it be qemu-img utility?
>>
>>     vdsm      97739  0.3  0.0 354212 29148 ?        S<l  15:43   0:06
>>     /usr/bin/qemu-img convert -p -t none -T none -f raw
>>     /rhev/data-center/2ff6d0ee-a10b-473d-b77c-be9149945f5f/ff3cd56a-1005-4426-8137-8f422c0b47c1/images/ba42cbcc-c068-4df8-af3d-00f2077b1e27/c57acd5f-d6cf-48cc-ad0c-4a7d979c0c1e
>>     -O raw
>>     /rhev/data-center/mnt/glusterSD/10.20.30.41:_rhv__export/81094499-a392-4ea2-b081-7c6288fbb636/images/ba42cbcc-c068-4df8-af3d-00f2077b1e27/c57acd5f-d6cf-48cc-ad0c-4a7d979c0c1e
>>
>>     Any idea how to make it work faster or what throughput should I
>>     expected?
>>
>>
>> gluster storage operations are using fuse mount - so every write:
>> - travel to the kernel
>> - travel back to the gluster fuse helper process
>> - travel to all 3 replicas - replication is done on client side
>> - return to kernel when all writes succeeded
>> - return to caller
>>
>> So gluster will never set any speed record.
>>
>> Additionally, you are copying from raw lv on FC - qemu-img cannot do
>> anything
>> smart and avoid copying unused clusters. Instead if copies gigabytes of
>> zeros
>> from FC.
>
> ok, it does make sense
>
>> However 7.5-10 MiB/s sounds too slow.
>>
>> I would try to test with dd - how much time it takes to copy
>> the same image from FC to your gluster storage?
>>
>> dd
>> if=/rhev/data-center/2ff6d0ee-a10b-473d-b77c-be9149945f5f/ff3cd56a-1005-4426-8137-8f422c0b47c1/images/ba42cbcc-c068-4df8-af3d-00f2077b1e27/c57acd5f-d6cf-48cc-ad0c-4a7d979c0c1e
>> of=/rhev/data-center/mnt/glusterSD/10.20.30.41:_rhv__export/81094499-a392-4ea2-b081-7c6288fbb636/__test__
>> bs=8M oflag=direct status=progress
>
> unfrotunately dd performs the same
>
> 1778384896 bytes (1.8 GB) copied, 198.565265 s, 9.0 MB/s
>
>
>> If dd can do this faster, please ask on qemu-discuss mailing list:
>> https://lists.nongnu.org/mailman/listinfo/qemu-discuss
>>
>> If both give similar results, I think asking in gluster mailing list
>> about this can help. Maybe your gluster setup can be optimized.
>
> ok, this is definitly on the gluster side. Thanks for your guidance.
>
> I will investigate the gluster side and also will try Export on NFS share.
>
> Cheers,
>
> Jiri
>
>
>>
>> Nir
>>  
>>
>>
>>     Cheers,
>>
>>     Jiri
>>
>>
>>     _______________________________________________
>>     Users mailing list
>>     Users@ovirt.org <mailto:Users@ovirt.org>
>>     http://lists.ovirt.org/mailman/listinfo/users
>>
>
>
>
>
> _______________________________________________
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>



_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users