<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Wed, Jul 20, 2016 at 9:43 PM, Yedidyah Bar David <span dir="ltr"><<a href="mailto:didi@redhat.com" target="_blank">didi@redhat.com</a>></span> wrote:<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">Not sure it's my business, but whatever:<br></blockquote><div><br></div><div>It is. I think it is up to everybody who is interesting in making oVirt infra more reliable and easier to maintain. </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
1. Do you intend also separate data centers? Such that if one (or two) of<br>
them looses connectivity/power/etc, we are still up?<br></blockquote><div><br></div><div>I would like to. But so far we have only one physical Data Center. The only mitigation for that we can do now is the offsite backups and offsite mirrors. We have some of that now and working on improving the rest.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
<br>
2. If so, does it mean that recreating it means copying from one of the<br>
others many GBs of data? And if so, is that also the plan for recovering<br>
from bad tests?<br></blockquote><div><br></div><div>If they are completely removed than - yes. It is. But there should not be a problem for it unless the new data are coming faster so you cannot catch up on old. This is not the case for our infra, so eventually it will sync up. </div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
3. If so, it probably means we'll not do that very happily, because<br>
"undo" will take a lot of time and bandwidth.<br></blockquote><div><br></div><div>The good thing about having 3 instances is that you can allow one "instance" even days to sync up the data if needed leaving the whole construction in reliable state. So not sure about happily. But with such configuration I would call it pretty nervous-free. Also the only way to get perfect at something is well... to do it. So if it is not happily we should make it so.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
4. If we still want to go that route, perhaps consider having per-site<br>
backup, which allows syncing from the others the changes done on them<br>
since X (where X is "death of power/connectivity", "Start of test", etc).<br>
Some time ago I looked at backup tools, and found out that while there are<br>
several "field tested" tools, such as bacula and amanda, they are considered<br>
old-fashioned, but there are several different contenders for the future<br>
"perfect" tool. For an overview of some of them see [1]. For my own uses<br>
I chose 'bup', which isn't perfect, but seemed good and stable enough.<br></blockquote><div><br></div><div>We consider on and offsite backups. The thing is that the backups is kind of separate stuff. Because all "replicating" systems will happily replicate all the errors you have to all the instances. And good systems will do it very fast. So you essentially need both.</div><div><br></div><div>Also my proposal is based on the reliability on the service level. E.g. some things like "<a href="http://resources.ovirt.org">resources.ovirt.org</a>" are quite easy to make reliable at least for reads. You just start several ones and the only problem is the mutation that will required to be done on all ones. There are multiple ways to do that but I doubt we an find one solution for all services we have. But all of them will need the underlying infra to be ready. If we store all copies on one storage domain that is down it obviously will result in all copies be down - less reliable when copies are separate.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
5. This way we still can, perhaps need to, sync over the Internet many<br>
GBs of data if the local-site backup died too, but if it didn't, and we<br>
did everything right, we only need to sync diffs, which hopefully be much<br>
smaller.<br></blockquote><div><br></div><div>This is indeed what should happen in properly designed service. Although doubt it possible for all once we use. But if we choose per service approach than it can be decided individually on a per service basis. </div></div><br clear="all"><div>Anton.</div><div><br></div>-- <br><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><span><font color="#888888"><div>Anton Marchukov<br>Senior Software Engineer - <span><span><font color="#888888"><span><span><font color="#888888"><span><span><font color="#888888">RHEV CI - </font></span></span></font></span></span>Red Hat</font></span></span><br><br></div></font></span></div></div></div></div>
</div></div>