<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
</head>
<body bgcolor="#FFFFFF" text="#000000">
Hi,<br>
<br>
I found that its a little difficult to detect IO bandwidth
congestion in ovirt storage domains supported by NFS or GlusterFs.<br>
<br>
<font size="3"><font color="#000000">For block based storage, it's
easier to detect, since</font></font> you can use some tool like
iostat.<font color="#000000"><font size="3"> F</font></font><font
size="3"><font color="#000000">or the case of file system based
storage, it's much harder.<br>
<br>
I investigate the existing solution. vsphere uses average IO
latency to detect it. I propose a similar scheme in
<a class="moz-txt-link-freetext" href="http://www.ovirt.org/Features/Design/SLA_for_storage_io_bandwidth">http://www.ovirt.org/Features/Design/SLA_for_storage_io_bandwidth</a>
. It simplifies the scheme by make the congestion decision on a
single host instead of using the statistics from all the hosts
use the backend storage. It doesn't need communication between
hosts and maybe in phase two we can add communication and make a
global decision.<br>
<br>
For now, it detects congestion via</font></font><font size="3"><font
color="#000000"> statistic</font></font><font size="3"><font
color="#000000">s of vms </font></font><font size="3"><font
color="#000000"> </font></font><font size="3"><font
color="#000000">using that backend storage in the local
host(This info is collected through iostat in vm). It collects
IO latency in such vms and compute an average latency for that
backend storage. If it is higher than threshold, a congestion is
found.<br>
<br>
However, when I did testing for such a policy, I found that
setting IO limit to a smaller value will make latency longer.
That means if average latency exceeds that threshold and then
our automatically tuning IO limit will be decreased which lead
to average IO latency longer. Of course, this </font></font><font
size="3"><font color="#000000"> IO latency will exceed the
threshold again and cause the IO limit be decreased.</font></font>
This will finally cause the IO limit to its lower bound. <br>
<br>
This scheme has affected by the following reasons:<br>
1. we collect statistic data from vms instead of host. (This is
because it is hard to collect such info for remote storage like NFS,
GlusterFS)<br>
2.The IO limit affect the latency.<br>
3. The threshold is a constant.<br>
4 I also find that iostat's await(latency info) is not good enough
since the latency is long for very light io or very heavy IO. <br>
<font size="3"><font color="#000000"><br>
<br>
Does anybody get an idea or have experience on this? Suggestions
are more than welcome. Thanks in advance.</font></font>
</body>
</html>