<div dir="ltr"><div class="gmail_quote"><div dir="ltr">On Thu, Nov 23, 2017 at 3:33 PM Yaniv Kaul &lt;<a href="mailto:ykaul@redhat.com" target="_blank">ykaul@redhat.com</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Thu, Nov 23, 2017 at 1:43 PM, Jiří Sléžka <span dir="ltr">&lt;<a href="mailto:jiri.slezka@slu.cz" target="_blank">jiri.slezka@slu.cz</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">well, another idea<br>

<br>

when I did not use the direct flag, the performace was much better<br>

<br>

15787360256 bytes (16 GB) copied, 422.955159 s, 37.3 MB/s<br></blockquote><div><br></div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>That means you were hitting the cache.</div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

probably qemu-img uses direct write too and I understand why. But in<br>

case of backup it is not as hot I think. Is there a chance to modify<br>

this behavior for backup case? Is it a good idea? Should I fill RFE?<br></blockquote><div><br></div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>Probably not. We really prefer direct IO to ensure data is consistent.</div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>Y</div></div></div></div></blockquote><div><br></div></div><div dir="ltr"><div dir="ltr"><div dir="ltr"><div class="gmail_quote"><div>I did some research in prehistoric oVirt source, and found why we started to </div><div>use direct I/O. We have two issues:</div><div><br></div><div>1. reading stale data from storage</div><div>2. trashing host cache</div><div><br></div><div>If you don&#39;t use direct I/O when accessing shared storage, you risk reading</div><div>stale data from the kernel buffer cache. This cache may be stale since the kernel</div><div>does not know anything about other hosts writing to the same storage after the</div><div>last read from this storage.</div><div><br></div><div>The -t none option in vdsm was introduced because of</div><div><a href="https://bugzilla.redhat.com/699976" target="_blank">https://bugzilla.redhat.com/699976</a>.</div><div><br></div><div>The qemu bug <a href="https://bugzilla.redhat.com/713743" target="_blank">https://bugzilla.redhat.com/713743</a> explains the issue:</div><div><div>qemu-img was writing disk images using writeback and fillingup the cache buffers</div><div>which are then flushed by the kernel preventing

other processes from accessing</div><div>the storage. This is particularly bad in cluster environments where time-based </div><div>algorithms might be in place and accessing the storage within

certain timeouts</div><div>is critical<br><br></div></div><div>I&#39;m not sure it this issue relevant now. We use now sanlock instead of safelease,</div><div>(except for export domain still using safelease), and qemu or kernel may have</div><div>better options to avoid trashing the host cache, or guarantee reliable access </div><div>to storage. </div><div><br></div><div>Daivd, do you know if sanlock is effected by trashing the host cache?</div><div><br></div><div>Adding also qemu-block mailing list.<br></div><div><br></div><div>Nir</div></div></div></div></div><div dir="ltr"><div dir="ltr"><div dir="ltr"><div class="gmail_quote"><div> <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

Cheers,<br>

<br>

Jiri<br>

<div><div class="m_5147757606813524487m_3736004872758655724m_5461044961271560518m_-1965579172604311050h5"><br>

<br>

On 11/23/2017 12:26 PM, Jiří Sléžka wrote:<br>

&gt; Hi,<br>

&gt;<br>

&gt; On 11/22/2017 07:30 PM, Nir Soffer wrote:<br>

&gt;&gt; On Mon, Nov 20, 2017 at 5:22 PM Jiří Sléžka &lt;<a href="mailto:jiri.slezka@slu.cz" target="_blank">jiri.slezka@slu.cz</a><br>

&gt;&gt; &lt;mailto:<a href="mailto:jiri.slezka@slu.cz" target="_blank">jiri.slezka@slu.cz</a>&gt;&gt; wrote:<br>

&gt;&gt;<br>

&gt;&gt;     Hi,<br>

&gt;&gt;<br>

&gt;&gt;     I am trying realize why is exporting of vm to export storage on<br>

&gt;&gt;     glusterfs such slow.<br>

&gt;&gt;<br>

&gt;&gt;     I am using oVirt and RHV, both instalations on version 4.1.7.<br>

&gt;&gt;<br>

&gt;&gt;     Hosts have dedicated nics for rhevm network - 1gbps, data storage itself<br>

&gt;&gt;     is on FC.<br>

&gt;&gt;<br>

&gt;&gt;     GlusterFS cluster lives separate on 4 dedicated hosts. It has slow disks<br>

&gt;&gt;     but I can achieve about 200-400mbit throughput in other applications (we<br>

&gt;&gt;     are using it for &quot;cold&quot; data, backups mostly).<br>

&gt;&gt;<br>

&gt;&gt;     I am using this glusterfs cluster as backend for export storage. When I<br>

&gt;&gt;     am exporting vm I can see only about 60-80mbit throughput.<br>

&gt;&gt;<br>

&gt;&gt;     What could be the bottleneck here?<br>

&gt;&gt;<br>

&gt;&gt;     Could it be qemu-img utility?<br>

&gt;&gt;<br>

&gt;&gt;     vdsm      97739  0.3  0.0 354212 29148 ?        S&lt;l  15:43   0:06<br>

&gt;&gt;     /usr/bin/qemu-img convert -p -t none -T none -f raw<br>

&gt;&gt;     /rhev/data-center/2ff6d0ee-a10b-473d-b77c-be9149945f5f/ff3cd56a-1005-4426-8137-8f422c0b47c1/images/ba42cbcc-c068-4df8-af3d-00f2077b1e27/c57acd5f-d6cf-48cc-ad0c-4a7d979c0c1e<br>

&gt;&gt;     -O raw<br>

&gt;&gt;     /rhev/data-center/mnt/glusterSD/10.20.30.41:_rhv__export/81094499-a392-4ea2-b081-7c6288fbb636/images/ba42cbcc-c068-4df8-af3d-00f2077b1e27/c57acd5f-d6cf-48cc-ad0c-4a7d979c0c1e<br>

&gt;&gt;<br>

&gt;&gt;     Any idea how to make it work faster or what throughput should I<br>

&gt;&gt;     expected?<br>

&gt;&gt;<br>

&gt;&gt;<br>

&gt;&gt; gluster storage operations are using fuse mount - so every write:<br>

&gt;&gt; - travel to the kernel<br>

&gt;&gt; - travel back to the gluster fuse helper process<br>

&gt;&gt; - travel to all 3 replicas - replication is done on client side<br>

&gt;&gt; - return to kernel when all writes succeeded<br>

&gt;&gt; - return to caller<br>

&gt;&gt;<br>

&gt;&gt; So gluster will never set any speed record.<br>

&gt;&gt;<br>

&gt;&gt; Additionally, you are copying from raw lv on FC - qemu-img cannot do<br>

&gt;&gt; anything<br>

&gt;&gt; smart and avoid copying unused clusters. Instead if copies gigabytes of<br>

&gt;&gt; zeros<br>

&gt;&gt; from FC.<br>

&gt;<br>

&gt; ok, it does make sense<br>

&gt;<br>

&gt;&gt; However 7.5-10 MiB/s sounds too slow.<br>

&gt;&gt;<br>

&gt;&gt; I would try to test with dd - how much time it takes to copy<br>

&gt;&gt; the same image from FC to your gluster storage?<br>

&gt;&gt;<br>

&gt;&gt; dd<br>

&gt;&gt; if=/rhev/data-center/2ff6d0ee-a10b-473d-b77c-be9149945f5f/ff3cd56a-1005-4426-8137-8f422c0b47c1/images/ba42cbcc-c068-4df8-af3d-00f2077b1e27/c57acd5f-d6cf-48cc-ad0c-4a7d979c0c1e<br>

&gt;&gt; of=/rhev/data-center/mnt/glusterSD/10.20.30.41:_rhv__export/81094499-a392-4ea2-b081-7c6288fbb636/__test__<br>

&gt;&gt; bs=8M oflag=direct status=progress<br>

&gt;<br>

&gt; unfrotunately dd performs the same<br>

&gt;<br>

&gt; 1778384896 bytes (1.8 GB) copied, 198.565265 s, 9.0 MB/s<br>

&gt;<br>

&gt;<br>

&gt;&gt; If dd can do this faster, please ask on qemu-discuss mailing list:<br>

&gt;&gt; <a href="https://lists.nongnu.org/mailman/listinfo/qemu-discuss" rel="noreferrer" target="_blank">https://lists.nongnu.org/mailman/listinfo/qemu-discuss</a><br>

&gt;&gt;<br>

&gt;&gt; If both give similar results, I think asking in gluster mailing list<br>

&gt;&gt; about this can help. Maybe your gluster setup can be optimized.<br>

&gt;<br>

&gt; ok, this is definitly on the gluster side. Thanks for your guidance.<br>

&gt;<br>

&gt; I will investigate the gluster side and also will try Export on NFS share.<br>

&gt;<br>

&gt; Cheers,<br>

&gt;<br>

&gt; Jiri<br>

&gt;<br>

&gt;<br>

&gt;&gt;<br>

&gt;&gt; Nir<br>

&gt;&gt;  <br>

&gt;&gt;<br>

&gt;&gt;<br>

&gt;&gt;     Cheers,<br>

&gt;&gt;<br>

&gt;&gt;     Jiri<br>

&gt;&gt;<br>

&gt;&gt;<br>

&gt;&gt;     _______________________________________________<br>

&gt;&gt;     Users mailing list<br>

&gt;&gt;     <a href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a> &lt;mailto:<a href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a>&gt;<br>

&gt;&gt;     <a href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br>

&gt;&gt;<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt; _______________________________________________<br>

&gt; Users mailing list<br>

&gt; <a href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a><br>

</div></div>&gt; <a href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br>

&gt;<br>

<br>

<br>

<br>_______________________________________________<br>

Users mailing list<br>

<a href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a><br>

<a href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br>

<br></blockquote></div></div></div>

_______________________________________________<br>

Users mailing list<br>

<a href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a><br>

<a href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br>

</blockquote></div></div></div></div></div>