<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, May 25, 2016 at 6:17 PM, David Caro <span dir="ltr"><<a href="mailto:dcaro@redhat.com" target="_blank">dcaro@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On 05/25 17:06, David Caro wrote:<br>
> On 05/25 16:09, Barak Korren wrote:<br>
> > On 25 May 2016 at 14:52, David Caro <<a href="mailto:dcaro@redhat.com">dcaro@redhat.com</a>> wrote:<br>
> > > On 05/25 14:42, Barak Korren wrote:<br>
> > >> On 25 May 2016 at 12:44, Eyal Edri <<a href="mailto:eedri@redhat.com">eedri@redhat.com</a>> wrote:<br>
> > >> > OK,<br>
> > >> > I suggest to test using a VM with local disk (preferably on a host with SSD<br>
> > >> > configured), if its working,<br>
> > >> > lets expedite moving all VMs or at least a large amount of VMs to it until<br>
> > >> > we see network load reduced.<br>
> > >> ><br>
> > >><br>
> > >> This is not that easy, oVirt doesn't support mixing local disk and<br>
> > >> storage in the same cluster, so we will need to move hosts to a new<br>
> > >> cluster for this.<br>
> > >> Also we will lose the ability to use templates, or otherwise have to<br>
> > >> create the templates on each and every disk.<br>
> > >><br>
> > >> The scratch disk is a good solution for this, where you can have the<br>
> > >> OS image on the central storage and the ephemeral data on the local<br>
> > >> disk.<br>
> > >><br>
> > >> WRT to the storage architecture - a single huge (10.9T) ext4 is used<br>
> > >> as the FS on top of the DRBD, this is probably not the most efficient<br>
> > >> thing one can do (XFS would probably have been better, RAW via iSCSI -<br>
> > >> even better).<br>
> > ><br>
> > > That was done >3 years ago, xfs was not quite stable and widely used and<br>
> > > supported back then.<br>
> > ><br>
> > AFAIK it pre-dates EXT4<br>
><br>
> It does, but for el6, it was performing way poorly, and with more bugs (for<br>
> what the reviews of it said at the time).<br>
><br>
> > in any case this does not detract from the<br>
> > fact that the current configuration in not as efficient as we can make<br>
> > it.<br>
> ><br>
><br>
> It does not, I agree to better focus on what we can do now on, now what should<br>
> have been done then.<br>
><br>
> ><br>
> > >><br>
> > >> I'm guessing that those 10/9TB are not made from a single disk but<br>
> > >> with a hardware RAID of some sort. In this case deactivating the<br>
> > >> hardware RAID and re-exposing it as multiple separate iSCSI LUNs (That<br>
> > >> are then re-joined to a single sotrage domain in oVirt) will enable<br>
> > >> different VMs to concurrently work on different disks. This should<br>
> > >> lower the per-vm storage latency.<br>
> > ><br>
> > > That would get rid of the drbd too, it's a totally different setup, from<br>
> > > scratch (no nfs either).<br>
> ><br>
> > We can and should still use DRBD, just setup a device for each disk.<br>
> > But yeah, NFS should probably go away.<br>
> > (We are seeing dramatically better performance for iSCSI in<br>
> > integration-engine)<br>
><br>
> I don't understand then what you said about splitting the hardware raids, you<br>
> mean to setup one drdb device on top of each hard drive instead?<br>
<br>
<br>
</div></div>Though I really think we should move to gluster/ceph instead for the jenkins<br>
vms, anyone knows what's the current status of the hyperconverge?<br>
<br></blockquote><div><br></div><div>Neither Gluster nor Hyper-converge I think is stable enough to move all production infra into.</div><div>Hyperconverged is not supported yet for oVirt as well (might be a 4.x feature)</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
That would allow us for better scalable distributed storage, and properly use<br>
the hosts local disks (we have more space on the combined hosts right now that<br>
on the storage servers).<br></blockquote><div><br></div><div>I agree a stable distributed storage solution is the way to go if we can find one :)</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="HOEnZb"><div class="h5"><br>
><br>
><br>
> btw. I think that the nfs is used also for something more than just the engine<br>
> storage domain (just to keep it in mind that it has to be checked if we are<br>
> going to get rid of it)<br>
><br>
> ><br>
> > ><br>
> > >><br>
> > >> Looking at the storage machine I see strong indication it is IO bound<br>
> > >> - the load average is ~12 while there are just 1-5 working processes<br>
> > >> and the CPU is ~80% idle and the rest is IO wait.<br>
> > >><br>
> > >> Running 'du *' at:<br>
> > >> /srv/ovirt_storage/jenkins-dc/658e5b87-1207-4226-9fcc-4e5fa02b86b4/images<br>
> > >> one can see that most images are ~40G in size (that is _real_ 40G not<br>
> > >> sparse!). This means that despite having most VMs created based on<br>
> > >> templates, the VMs are full template copies rather then COW clones.<br>
> > ><br>
> > > That should not be like that, maybe the templates are wrongly configured? or<br>
> > > foreman images?<br>
> ><br>
> > This is the expected behaviour when creating a VM from template in the<br>
> > oVirt admin UI. I thought Foreman might behave differently, but it<br>
> > seems it does not.<br>
> ><br>
> > This behaviour is determined by the parameters you pass to the engine<br>
> > API when instantiating a VM, so it most probably doesn't have anything<br>
> > to do with the template configuration.<br>
><br>
> So maybe a misconfiguration in foreman?<br>
><br>
> ><br>
> > ><br>
> > >> What this means is that using pools (where all VMs are COW copies of<br>
> > >> the single pool template) is expected to significantly reduce the<br>
> > >> storage utilization and therefore the IO load on it (the less you<br>
> > >> store, the less you need to read back).<br>
> > ><br>
> > > That should happen too without pools, with normal qcow templates.<br>
> ><br>
> > Not unless you create all the VMs via the API and pass the right<br>
> > parameters. Pools are the easiest way to ensure you never mess that<br>
> > up...<br>
><br>
> That was the idea<br>
><br>
> ><br>
> > > And in any case, that will not lower the normal io, when not actually<br>
> > > creating vms, as any read and write will still hit the disk anyhow, it<br>
> > > only alleviates the io when creating new vms.<br>
> ><br>
> > Since you are reading the same bits over and over (for different VMs)<br>
> > you enable the various buffer caches along the way (in the storage<br>
> > machines and in the hypevirsors) to do what they are supposed to.<br>
><br>
><br>
> Once the vm is started, mostly all that's needed is on ram, so there are not<br>
> that much reads from disk, unless you start writing down to it, and that's<br>
> mostly what we are hitting, lots of writes.<br>
><br>
> ><br>
> > > The local disk (scratch disk) is the best option<br>
> > > imo, now and for the foreseeable future.<br>
> ><br>
> > This is not an either/or thing, IMO we need to do both.<br>
><br>
> I think that it's way more useful, because it will solve our current issues<br>
> faster and for longer, so IMO it should get more attention sooner.<br>
><br>
> Any improvement that does not remove the current bottleneck, is not really<br>
> giving any value to the overall infra (even if it might become valuable later).<br>
><br>
> ><br>
> > --<br>
> > Barak Korren<br>
> > <a href="mailto:bkorren@redhat.com">bkorren@redhat.com</a><br>
> > RHEV-CI Team<br>
><br>
> --<br>
> David Caro<br>
><br>
> Red Hat S.L.<br>
> Continuous Integration Engineer - EMEA ENG Virtualization R&D<br>
><br>
> Tel.: <a href="tel:%2B420%20532%20294%20605" value="+420532294605">+420 532 294 605</a><br>
> Email: <a href="mailto:dcaro@redhat.com">dcaro@redhat.com</a><br>
> IRC: dcaro|dcaroest@{freenode|oftc|redhat}<br>
> Web: <a href="http://www.redhat.com" rel="noreferrer" target="_blank">www.redhat.com</a><br>
> RHT Global #: 82-62605<br>
<br>
<br>
<br>
--<br>
David Caro<br>
<br>
Red Hat S.L.<br>
Continuous Integration Engineer - EMEA ENG Virtualization R&D<br>
<br>
Tel.: <a href="tel:%2B420%20532%20294%20605" value="+420532294605">+420 532 294 605</a><br>
Email: <a href="mailto:dcaro@redhat.com">dcaro@redhat.com</a><br>
IRC: dcaro|dcaroest@{freenode|oftc|redhat}<br>
Web: <a href="http://www.redhat.com" rel="noreferrer" target="_blank">www.redhat.com</a><br>
RHT Global #: 82-62605<br>
</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div>Eyal Edri<br>Associate Manager</div><div>RHEV DevOps<br>EMEA ENG Virtualization R&D<br>Red Hat Israel<br><br>phone: +972-9-7692018<br>irc: eedri (on #tlv #rhev-dev #rhev-integ)</div></div></div></div></div>
</div></div>