<div dir="ltr"><div class="gmail_quote"><div dir="ltr">On Thu, Nov 23, 2017 at 3:33 PM Yaniv Kaul <<a href="mailto:ykaul@redhat.com" target="_blank">ykaul@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Thu, Nov 23, 2017 at 1:43 PM, Jiří Sléžka <span dir="ltr"><<a href="mailto:jiri.slezka@slu.cz" target="_blank">jiri.slezka@slu.cz</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">well, another idea<br>
<br>
when I did not use the direct flag, the performace was much better<br>
<br>
15787360256 bytes (16 GB) copied, 422.955159 s, 37.3 MB/s<br></blockquote><div><br></div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>That means you were hitting the cache.</div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
probably qemu-img uses direct write too and I understand why. But in<br>
case of backup it is not as hot I think. Is there a chance to modify<br>
this behavior for backup case? Is it a good idea? Should I fill RFE?<br></blockquote><div><br></div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>Probably not. We really prefer direct IO to ensure data is consistent.</div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>Y</div></div></div></div></blockquote><div><br></div></div><div dir="ltr"><div dir="ltr"><div dir="ltr"><div class="gmail_quote"><div>I did some research in prehistoric oVirt source, and found why we started to </div><div>use direct I/O. We have two issues:</div><div><br></div><div>1. reading stale data from storage</div><div>2. trashing host cache</div><div><br></div><div>If you don't use direct I/O when accessing shared storage, you risk reading</div><div>stale data from the kernel buffer cache. This cache may be stale since the kernel</div><div>does not know anything about other hosts writing to the same storage after the</div><div>last read from this storage.</div><div><br></div><div>The -t none option in vdsm was introduced because of</div><div><a href="https://bugzilla.redhat.com/699976" target="_blank">https://bugzilla.redhat.com/699976</a>.</div><div><br></div><div>The qemu bug <a href="https://bugzilla.redhat.com/713743" target="_blank">https://bugzilla.redhat.com/713743</a> explains the issue:</div><div><div>qemu-img was writing disk images using writeback and fillingup the cache buffers</div><div>which are then flushed by the kernel preventing
other processes from accessing</div><div>the storage. This is particularly bad in cluster environments where time-based </div><div>algorithms might be in place and accessing the storage within
certain timeouts</div><div>is critical<br><br></div></div><div>I'm not sure it this issue relevant now. We use now sanlock instead of safelease,</div><div>(except for export domain still using safelease), and qemu or kernel may have</div><div>better options to avoid trashing the host cache, or guarantee reliable access </div><div>to storage. </div><div><br></div><div>Daivd, do you know if sanlock is effected by trashing the host cache?</div><div><br></div><div>Adding also qemu-block mailing list.<br></div><div><br></div><div>Nir</div></div></div></div></div><div dir="ltr"><div dir="ltr"><div dir="ltr"><div class="gmail_quote"><div> <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
Cheers,<br>
<br>
Jiri<br>
<div><div class="m_5147757606813524487m_3736004872758655724m_5461044961271560518m_-1965579172604311050h5"><br>
<br>
On 11/23/2017 12:26 PM, Jiří Sléžka wrote:<br>
> Hi,<br>
><br>
> On 11/22/2017 07:30 PM, Nir Soffer wrote:<br>
>> On Mon, Nov 20, 2017 at 5:22 PM Jiří Sléžka <<a href="mailto:jiri.slezka@slu.cz" target="_blank">jiri.slezka@slu.cz</a><br>
>> <mailto:<a href="mailto:jiri.slezka@slu.cz" target="_blank">jiri.slezka@slu.cz</a>>> wrote:<br>
>><br>
>> Hi,<br>
>><br>
>> I am trying realize why is exporting of vm to export storage on<br>
>> glusterfs such slow.<br>
>><br>
>> I am using oVirt and RHV, both instalations on version 4.1.7.<br>
>><br>
>> Hosts have dedicated nics for rhevm network - 1gbps, data storage itself<br>
>> is on FC.<br>
>><br>
>> GlusterFS cluster lives separate on 4 dedicated hosts. It has slow disks<br>
>> but I can achieve about 200-400mbit throughput in other applications (we<br>
>> are using it for "cold" data, backups mostly).<br>
>><br>
>> I am using this glusterfs cluster as backend for export storage. When I<br>
>> am exporting vm I can see only about 60-80mbit throughput.<br>
>><br>
>> What could be the bottleneck here?<br>
>><br>
>> Could it be qemu-img utility?<br>
>><br>
>> vdsm 97739 0.3 0.0 354212 29148 ? S<l 15:43 0:06<br>
>> /usr/bin/qemu-img convert -p -t none -T none -f raw<br>
>> /rhev/data-center/2ff6d0ee-a10b-473d-b77c-be9149945f5f/ff3cd56a-1005-4426-8137-8f422c0b47c1/images/ba42cbcc-c068-4df8-af3d-00f2077b1e27/c57acd5f-d6cf-48cc-ad0c-4a7d979c0c1e<br>
>> -O raw<br>
>> /rhev/data-center/mnt/glusterSD/10.20.30.41:_rhv__export/81094499-a392-4ea2-b081-7c6288fbb636/images/ba42cbcc-c068-4df8-af3d-00f2077b1e27/c57acd5f-d6cf-48cc-ad0c-4a7d979c0c1e<br>
>><br>
>> Any idea how to make it work faster or what throughput should I<br>
>> expected?<br>
>><br>
>><br>
>> gluster storage operations are using fuse mount - so every write:<br>
>> - travel to the kernel<br>
>> - travel back to the gluster fuse helper process<br>
>> - travel to all 3 replicas - replication is done on client side<br>
>> - return to kernel when all writes succeeded<br>
>> - return to caller<br>
>><br>
>> So gluster will never set any speed record.<br>
>><br>
>> Additionally, you are copying from raw lv on FC - qemu-img cannot do<br>
>> anything<br>
>> smart and avoid copying unused clusters. Instead if copies gigabytes of<br>
>> zeros<br>
>> from FC.<br>
><br>
> ok, it does make sense<br>
><br>
>> However 7.5-10 MiB/s sounds too slow.<br>
>><br>
>> I would try to test with dd - how much time it takes to copy<br>
>> the same image from FC to your gluster storage?<br>
>><br>
>> dd<br>
>> if=/rhev/data-center/2ff6d0ee-a10b-473d-b77c-be9149945f5f/ff3cd56a-1005-4426-8137-8f422c0b47c1/images/ba42cbcc-c068-4df8-af3d-00f2077b1e27/c57acd5f-d6cf-48cc-ad0c-4a7d979c0c1e<br>
>> of=/rhev/data-center/mnt/glusterSD/10.20.30.41:_rhv__export/81094499-a392-4ea2-b081-7c6288fbb636/__test__<br>
>> bs=8M oflag=direct status=progress<br>
><br>
> unfrotunately dd performs the same<br>
><br>
> 1778384896 bytes (1.8 GB) copied, 198.565265 s, 9.0 MB/s<br>
><br>
><br>
>> If dd can do this faster, please ask on qemu-discuss mailing list:<br>
>> <a href="https://lists.nongnu.org/mailman/listinfo/qemu-discuss" rel="noreferrer" target="_blank">https://lists.nongnu.org/mailman/listinfo/qemu-discuss</a><br>
>><br>
>> If both give similar results, I think asking in gluster mailing list<br>
>> about this can help. Maybe your gluster setup can be optimized.<br>
><br>
> ok, this is definitly on the gluster side. Thanks for your guidance.<br>
><br>
> I will investigate the gluster side and also will try Export on NFS share.<br>
><br>
> Cheers,<br>
><br>
> Jiri<br>
><br>
><br>
>><br>
>> Nir<br>
>> <br>
>><br>
>><br>
>> Cheers,<br>
>><br>
>> Jiri<br>
>><br>
>><br>
>> _______________________________________________<br>
>> Users mailing list<br>
>> <a href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a> <mailto:<a href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a>><br>
>> <a href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br>
>><br>
><br>
><br>
><br>
><br>
> _______________________________________________<br>
> Users mailing list<br>
> <a href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a><br>
</div></div>> <a href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br>
><br>
<br>
<br>
<br>_______________________________________________<br>
Users mailing list<br>
<a href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a><br>
<a href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br>
<br></blockquote></div></div></div>
_______________________________________________<br>
Users mailing list<br>
<a href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a><br>
<a href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br>
</blockquote></div></div></div></div></div>