[Engine-devel] Design wiki page for trusted compute pools integration with oVirt has been updated

Sun Apr 28 11:06:24 UTC 2013

I like the ideas of 2-phase aggregated attestation & cluster-by-cluster
order.

But I want to understand the process more clearly.

Without TCP, how does engine handle the states of existing hosts during
engine booting? Will engine put all existing hosts in non-operational state
and then perform some check via VDSM then turn it into operational state?
Put host in non-operational state will cause VM migration, right?

Or there is a global state in engine to indicate whether user is allowed to
create VM?

Thanks
Jimmy

Itamar Heim wrote on 2013-04-28:
> On 04/28/2013 11:34 AM, Doron Fediuck wrote:
>> Hi Dave,
>> 
>> Just to make sure I fully understand, I'll repeat your basic arguments;
>> 
>> 1. It takes time to query a big number of hosts (hundreds).
>> 
>> 2. When backend is booting, a user may start a VM on a host which was
>> hacked during the downtime of the engine.
>> 
>> If the above is your concern, it shouldn't be so.
>> The reason is, that no host will become operational before you get a
response
>> from the attestation server and allow it to become operational. So a user
>> cannot start a new VM on a non-operational host.
> 
> i'd do the queries in groups of "cluster", so cluste-by-cluster they get
> unblocked. cluster without attestation service shouldn't block on this
> of course.
> 
>> 
>> What this means is that your thread may need to update the user by
sending
>> a periodic event that a large scale attestation operation is in progress.
>> Other than that, maybe your thread can work in smaller groups if it gets
>> better results? ie- instead of one query for 300 hosts, maybe you can run
>> 3 serialized queries for 100 hosts each?
>> If this does not help, maybe you can run a short query for something like
>> 10 hosts, which should get an answer relatively fast. The you can issue a
>> query for the other 290 hosts which will take longer. In this way the
system
>> may get 10 hosts to work with quite fast, and later on the other 290
hosts
>> will join... So this can actually be configurable to a 2-phase process;
>> a short query and a longer one. The admin can choose the short query size
>> based on his setup, and the longer query can pick up all the other hosts.
>> What do you think?
>> 
>> Doron
>> 
>> ----- Original Message -----
>>> From: "Wei D Chen" <wei.d.chen at intel.com> To: "Doron Fediuck"
>>> <dfediuck at redhat.com> Cc: "Oved Ourfalli" <ovedo at redhat.com>,
>>> engine-devel at ovirt.org Sent: Saturday, April 27, 2013 9:36:44 AM
>>> Subject: Re: [Engine-devel] Design wiki page for trusted compute pools
>>> integration with oVirt has been updated
>>> 
>>> Hi,
>>> 
>>> Our current consideration is add a new thread in engine's side to
>>> attest all of hosts (aggregated query from attestation sever) one time
>>> in case of engine's rebooting. There is still one potential issue
>>> under extreme condition, saying, hundreds of nodes in a datacenter,
>>> attest all of hosts still may take couple of mins, let's say, one
>>> hacked untrusted node before receiving the latest status may
>>> considered as a trusted host, so, the worst case in a datacenter with
>>> hundreds of nodes is, 1. engine is down for some reasons and boot up
>>> again, some trusted nodes may be hacked and rebooted during this
>>> period. 2. our thread is running to get all of node's status (trust
>>> /untrusted), may take couple of mins in large datacenter. 2. user
>>> create VMs on these hacked nodes and believe these VMs are trusted VMs
>>> launched on trusted nodes. 3. our thread get the correct status of
>>> these untrusted nodes, set these nodes as non-operational. 4. all of
>>> these "trusted" VMs running on these untrusted nodes are expected to
>>> migrate to other trusted node.
>>> 
>>> So, the question is in a trusted cluster with hundreds of nodes some
>>> VMs expected to create on trusted nodes may actually create on
>>> untrusted nodes instead, and this time may last for couple of mins.
>>> (worst case in my view is 10 mins with 1000 nodes). Does this
>>> acceptable from your point of view? Or any other suggestion?
>>> 
>>> 
>>> Best Regards,
>>> Dave Chen
>>> 
>>> 
>>> Doron Fediuck wrote on 2013-04-21:
>>>> integration with oVirt has been updated
>>>> 
>>>> 
>>>> 
>>>> ----- Original Message -----
>>>>> From: "Wei D Chen" <wei.d.chen at intel.com>
>>>>> To: "Ofri Masad" <omasad at redhat.com>
>>>>> Cc: "Oved Ourfalli" <ovedo at redhat.com>, engine-devel at ovirt.org
>>>>> Sent: Sunday, April 21, 2013 4:00:55 PM
>>>>> Subject: Re: [Engine-devel] Design wiki page for trusted compute pools
>>>>> integration with oVirt has been updated
>>>>> 
>>>>> Ofri,
>>>>> 
>>>>> Absolutely right, aggregated query has a significantly time improve
>>>>> compared to separated queries. I agree a aggregated query on
>>>>> engine's starting. Is it possible to invoke attestation service in
>>>>> engine's initialization code block instead of "quartz job"? Is there
>>>>> any class similar with " InitVdsOnUpCommand " for engine's
>>>>> initialization?
>>>>> 
>>>>> Best Regards,
>>>>> Dave Chen
>>>>> 
>>>> org.ovirt.engine.core.bll.Backend.Initialize()
>>>> 
>>>> Note you cannot block this method while waiting for results. Instead
>>>> I suggest you fire a one-time background request from this method.
>>>> 
>>>> 
>>>> Ofri Masad wrote on 2013-04-21:
>>>>> integration with oVirt has been updated
>>>>> 
>>>>> Dave,
>>>>> 
>>>>> If I'm not mistaking, there is a big difference between separated
>>>>> queries to the attestation server and aggregated one?
>>>>> Is it true?
>>>>> 
>>>>> Thanks,
>>>>> Ofri
>>>>> 
>>>>> ----- Original Message -----
>>>>>> From: "Itamar Heim" <iheim at redhat.com>
>>>>>> To: "Ofri Masad" <omasad at redhat.com>
>>>>>> Cc: "Oved Ourfalli" <ovedo at redhat.com>, "Wei D Chen"
>>>>>> <wei.d.chen at intel.com>, engine-devel at ovirt.org
>>>>>> Sent: Sunday, April 21, 2013 10:20:17 AM
>>>>>> Subject: Re: [Engine-devel] Design wiki page for trusted compute
>>>>>> pools integration with oVirt has been updated
>>>>>> 
>>>>>> On 04/21/2013 10:13 AM, Ofri Masad wrote:
>>>>>>> Hi,
>>>>>>> One more thing we need to think about for the second approach -
>>>>>>> aggregated query. On engine start we need to determine the trust
>>>>>>> state of all the hosts. sending a separate query for each host
>>>>>>> will overload the attestation host and the network. an initial
>>>>>>> aggregated query needs to be send when the engine starts.
>>>>>>> Same thing can happen after management network fail and so on.
>>>>>>> Maybe we can run a quartz job every x minutes, checking if a large
>>>>>>> part of the hosts in the cluster (like 30%) are untrusted - in
>>>>>>> that case run the aggregated query.
>>>>>> 
>>>>>> are we sure this optimization is needed?
>>>>>> how heavy/latent is the call to the attestation service?
>>>>>> 
>>>>> _______________________________________________
>>>>> Engine-devel mailing list
>>>>> Engine-devel at ovirt.org
>>>>> http://lists.ovirt.org/mailman/listinfo/engine-devel
>>>>> 
>>> _______________________________________________
>>> Engine-devel mailing list
>>> Engine-devel at ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/engine-devel
>>> 
>> _______________________________________________
>> Engine-devel mailing list
>> Engine-devel at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/engine-devel
>> 
> 
> _______________________________________________
> Engine-devel mailing list
> Engine-devel at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/engine-devel

Jimmy

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 8586 bytes
Desc: not available
URL: <http://lists.ovirt.org/pipermail/engine-devel/attachments/20130428/58a08c56/attachment.p7s>