[ovirt-users] slow performance with export storage on glusterfs

Nir Soffer nsoffer at redhat.com
Thu Dec 7 22:45:26 UTC 2017


On Thu, Nov 23, 2017 at 3:33 PM Yaniv Kaul <ykaul at redhat.com> wrote:

> On Thu, Nov 23, 2017 at 1:43 PM, Jiří Sléžka <jiri.slezka at slu.cz> wrote:
>
>> well, another idea
>>
>> when I did not use the direct flag, the performace was much better
>>
>> 15787360256 bytes (16 GB) copied, 422.955159 s, 37.3 MB/s
>>
>
> That means you were hitting the cache.
>
>
>>
>> probably qemu-img uses direct write too and I understand why. But in
>> case of backup it is not as hot I think. Is there a chance to modify
>> this behavior for backup case? Is it a good idea? Should I fill RFE?
>>
>
> Probably not. We really prefer direct IO to ensure data is consistent.
> Y
>

I did some research in prehistoric oVirt source, and found why we started
to
use direct I/O. We have two issues:

1. reading stale data from storage
2. trashing host cache

If you don't use direct I/O when accessing shared storage, you risk reading
stale data from the kernel buffer cache. This cache may be stale since the
kernel
does not know anything about other hosts writing to the same storage after
the
last read from this storage.

The -t none option in vdsm was introduced because of
https://bugzilla.redhat.com/699976.

The qemu bug https://bugzilla.redhat.com/713743 explains the issue:
qemu-img was writing disk images using writeback and fillingup the cache
buffers
which are then flushed by the kernel preventing other processes from
accessing
the storage. This is particularly bad in cluster environments where
time-based
algorithms might be in place and accessing the storage within certain
timeouts
is critical

I'm not sure it this issue relevant now. We use now sanlock instead of
safelease,
(except for export domain still using safelease), and qemu or kernel may
have
better options to avoid trashing the host cache, or guarantee reliable
access
to storage.

Daivd, do you know if sanlock is effected by trashing the host cache?

Adding also qemu-block mailing list.

Nir


>
>
>>
>> Cheers,
>>
>> Jiri
>>
>>
>> On 11/23/2017 12:26 PM, Jiří Sléžka wrote:
>> > Hi,
>> >
>> > On 11/22/2017 07:30 PM, Nir Soffer wrote:
>> >> On Mon, Nov 20, 2017 at 5:22 PM Jiří Sléžka <jiri.slezka at slu.cz
>> >> <mailto:jiri.slezka at slu.cz>> wrote:
>> >>
>> >>     Hi,
>> >>
>> >>     I am trying realize why is exporting of vm to export storage on
>> >>     glusterfs such slow.
>> >>
>> >>     I am using oVirt and RHV, both instalations on version 4.1.7.
>> >>
>> >>     Hosts have dedicated nics for rhevm network - 1gbps, data storage
>> itself
>> >>     is on FC.
>> >>
>> >>     GlusterFS cluster lives separate on 4 dedicated hosts. It has slow
>> disks
>> >>     but I can achieve about 200-400mbit throughput in other
>> applications (we
>> >>     are using it for "cold" data, backups mostly).
>> >>
>> >>     I am using this glusterfs cluster as backend for export storage.
>> When I
>> >>     am exporting vm I can see only about 60-80mbit throughput.
>> >>
>> >>     What could be the bottleneck here?
>> >>
>> >>     Could it be qemu-img utility?
>> >>
>> >>     vdsm      97739  0.3  0.0 354212 29148 ?        S<l  15:43   0:06
>> >>     /usr/bin/qemu-img convert -p -t none -T none -f raw
>> >>
>>  /rhev/data-center/2ff6d0ee-a10b-473d-b77c-be9149945f5f/ff3cd56a-1005-4426-8137-8f422c0b47c1/images/ba42cbcc-c068-4df8-af3d-00f2077b1e27/c57acd5f-d6cf-48cc-ad0c-4a7d979c0c1e
>> >>     -O raw
>> >>     /rhev/data-center/mnt/glusterSD/10.20.30.41:
>> _rhv__export/81094499-a392-4ea2-b081-7c6288fbb636/images/ba42cbcc-c068-4df8-af3d-00f2077b1e27/c57acd5f-d6cf-48cc-ad0c-4a7d979c0c1e
>> >>
>> >>     Any idea how to make it work faster or what throughput should I
>> >>     expected?
>> >>
>> >>
>> >> gluster storage operations are using fuse mount - so every write:
>> >> - travel to the kernel
>> >> - travel back to the gluster fuse helper process
>> >> - travel to all 3 replicas - replication is done on client side
>> >> - return to kernel when all writes succeeded
>> >> - return to caller
>> >>
>> >> So gluster will never set any speed record.
>> >>
>> >> Additionally, you are copying from raw lv on FC - qemu-img cannot do
>> >> anything
>> >> smart and avoid copying unused clusters. Instead if copies gigabytes of
>> >> zeros
>> >> from FC.
>> >
>> > ok, it does make sense
>> >
>> >> However 7.5-10 MiB/s sounds too slow.
>> >>
>> >> I would try to test with dd - how much time it takes to copy
>> >> the same image from FC to your gluster storage?
>> >>
>> >> dd
>> >>
>> if=/rhev/data-center/2ff6d0ee-a10b-473d-b77c-be9149945f5f/ff3cd56a-1005-4426-8137-8f422c0b47c1/images/ba42cbcc-c068-4df8-af3d-00f2077b1e27/c57acd5f-d6cf-48cc-ad0c-4a7d979c0c1e
>> >> of=/rhev/data-center/mnt/glusterSD/10.20.30.41:
>> _rhv__export/81094499-a392-4ea2-b081-7c6288fbb636/__test__
>> >> bs=8M oflag=direct status=progress
>> >
>> > unfrotunately dd performs the same
>> >
>> > 1778384896 bytes (1.8 GB) copied, 198.565265 s, 9.0 MB/s
>> >
>> >
>> >> If dd can do this faster, please ask on qemu-discuss mailing list:
>> >> https://lists.nongnu.org/mailman/listinfo/qemu-discuss
>> >>
>> >> If both give similar results, I think asking in gluster mailing list
>> >> about this can help. Maybe your gluster setup can be optimized.
>> >
>> > ok, this is definitly on the gluster side. Thanks for your guidance.
>> >
>> > I will investigate the gluster side and also will try Export on NFS
>> share.
>> >
>> > Cheers,
>> >
>> > Jiri
>> >
>> >
>> >>
>> >> Nir
>> >>
>> >>
>> >>
>> >>     Cheers,
>> >>
>> >>     Jiri
>> >>
>> >>
>> >>     _______________________________________________
>> >>     Users mailing list
>> >>     Users at ovirt.org <mailto:Users at ovirt.org>
>> >>     http://lists.ovirt.org/mailman/listinfo/users
>> >>
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > Users mailing list
>> > Users at ovirt.org
>> > http://lists.ovirt.org/mailman/listinfo/users
>> >
>>
>>
>>
>> _______________________________________________
>> Users mailing list
>> Users at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20171207/619c8e6a/attachment.html>


More information about the Users mailing list