[Engine-devel] Design wiki page for trusted compute pools integration with oVirt has been updated

Sun Apr 28 08:34:38 UTC 2013

Hi Dave,

Just to make sure I fully understand, I'll repeat your basic arguments;

1. It takes time to query a big number of hosts (hundreds).

2. When backend is booting, a user may start a VM on a host which was
hacked during the downtime of the engine.

If the above is your concern, it shouldn't be so.
The reason is, that no host will become operational before you get a response
from the attestation server and allow it to become operational. So a user
cannot start a new VM on a non-operational host.

What this means is that your thread may need to update the user by sending
a periodic event that a large scale attestation operation is in progress.
Other than that, maybe your thread can work in smaller groups if it gets
better results? ie- instead of one query for 300 hosts, maybe you can run
3 serialized queries for 100 hosts each?
If this does not help, maybe you can run a short query for something like
10 hosts, which should get an answer relatively fast. The you can issue a
query for the other 290 hosts which will take longer. In this way the system
may get 10 hosts to work with quite fast, and later on the other 290 hosts
will join... So this can actually be configurable to a 2-phase process;
a short query and a longer one. The admin can choose the short query size
based on his setup, and the longer query can pick up all the other hosts.
What do you think?

Doron

----- Original Message -----
> From: "Wei D Chen" <wei.d.chen at intel.com>
> To: "Doron Fediuck" <dfediuck at redhat.com>
> Cc: "Oved Ourfalli" <ovedo at redhat.com>, engine-devel at ovirt.org
> Sent: Saturday, April 27, 2013 9:36:44 AM
> Subject: Re: [Engine-devel] Design wiki page for trusted compute pools integration with oVirt has been updated
> 
> Hi,
> 
> Our current consideration is add a new thread in engine's side to attest all
> of hosts (aggregated query from attestation sever) one time in case of
> engine's rebooting. There is still one potential issue under extreme
> condition, saying, hundreds of nodes in a datacenter, attest all of hosts
> still may take couple of mins, let's say, one hacked untrusted node before
> receiving the latest status may considered as a trusted host, so, the worst
> case in a datacenter with hundreds of nodes is,
> 1. engine is down for some reasons and boot up again, some trusted nodes may
> be hacked and rebooted during this period.
> 2. our thread is running to get all of node's status (trust /untrusted), may
> take couple of mins in large datacenter.
> 2. user create VMs on these hacked nodes and believe these VMs are trusted
> VMs launched on trusted nodes.
> 3. our thread get the correct status of these untrusted nodes, set these
> nodes as non-operational.
> 4. all of these "trusted" VMs running on these untrusted nodes are expected
> to migrate to other trusted node.
> 
> So, the question is in a trusted cluster with hundreds of nodes some VMs
> expected to create on trusted nodes may actually create on untrusted nodes
> instead, and this time may last for couple of mins. (worst case in my view
> is 10 mins with 1000 nodes).
> Does this acceptable from your point of view? Or any other suggestion?
> 
> 
> Best Regards,
> Dave Chen
> 
> 
> > -----Original Message-----
> > From: Doron Fediuck [mailto:dfediuck at redhat.com]
> > Sent: Sunday, April 21, 2013 11:58 PM
> > To: Chen, Wei D
> > Cc: Ofri Masad; Oved Ourfalli; engine-devel at ovirt.org
> > Subject: Re: [Engine-devel] Design wiki page for trusted compute pools
> > integration with oVirt has been updated
> > 
> > 
> > 
> > ----- Original Message -----
> > > From: "Wei D Chen" <wei.d.chen at intel.com>
> > > To: "Ofri Masad" <omasad at redhat.com>
> > > Cc: "Oved Ourfalli" <ovedo at redhat.com>, engine-devel at ovirt.org
> > > Sent: Sunday, April 21, 2013 4:00:55 PM
> > > Subject: Re: [Engine-devel] Design wiki page for trusted compute pools
> > > integration with oVirt has been updated
> > >
> > > Ofri,
> > >
> > > Absolutely right, aggregated query has a significantly time improve
> > > compared to separated queries. I agree a aggregated query on engine's
> > > starting. Is it possible to invoke attestation service in engine's
> > > initialization code block instead of "quartz job"? Is there any class
> > > similar with
> > "
> > > InitVdsOnUpCommand " for engine's initialization?
> > >
> > > Best Regards,
> > > Dave Chen
> > >
> > org.ovirt.engine.core.bll.Backend.Initialize()
> > 
> > Note you cannot block this method while waiting for results.
> > Instead I suggest you fire a one-time background request from this method.
> > 
> > 
> > > -----Original Message-----
> > > From: Ofri Masad [mailto:omasad at redhat.com]
> > > Sent: Sunday, April 21, 2013 3:29 PM
> > > To: Chen, Wei D
> > > Cc: Oved Ourfalli; engine-devel at ovirt.org; Itamar Heim
> > > Subject: Re: [Engine-devel] Design wiki page for trusted compute pools
> > > integration with oVirt has been updated
> > >
> > > Dave,
> > >
> > > If I'm not mistaking, there is a big difference between separated
> > > queries to the attestation server and aggregated one?
> > > Is it true?
> > >
> > > Thanks,
> > > Ofri
> > >
> > > ----- Original Message -----
> > > > From: "Itamar Heim" <iheim at redhat.com>
> > > > To: "Ofri Masad" <omasad at redhat.com>
> > > > Cc: "Oved Ourfalli" <ovedo at redhat.com>, "Wei D Chen"
> > > > <wei.d.chen at intel.com>, engine-devel at ovirt.org
> > > > Sent: Sunday, April 21, 2013 10:20:17 AM
> > > > Subject: Re: [Engine-devel] Design wiki page for trusted compute
> > > > pools integration with oVirt has been updated
> > > >
> > > > On 04/21/2013 10:13 AM, Ofri Masad wrote:
> > > > > Hi,
> > > > > One more thing we need to think about for the second approach -
> > > > > aggregated query. On engine start we need to determine the trust
> > > > > state of all the hosts. sending a separate query for each host
> > > > > will overload the attestation host and the network. an initial
> > > > > aggregated query needs to be send when the engine starts.
> > > > > Same thing can happen after management network fail and so on.
> > > > > Maybe we can run a quartz job every x minutes, checking if a large
> > > > > part of the hosts in the cluster (like 30%) are untrusted - in
> > > > > that case run the aggregated query.
> > > >
> > > > are we sure this optimization is needed?
> > > > how heavy/latent is the call to the attestation service?
> > > >
> > > _______________________________________________
> > > Engine-devel mailing list
> > > Engine-devel at ovirt.org
> > > http://lists.ovirt.org/mailman/listinfo/engine-devel
> > >
> _______________________________________________
> Engine-devel mailing list
> Engine-devel at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/engine-devel
>