[ovirt-devel] [VDSM] [ENGINE] [RFC] A configuration verb for contexual vdsm operation mode

Michal Skrivanek michal.skrivanek at redhat.com
Tue Apr 4 15:33:59 UTC 2017


> On 4 Apr 2017, at 17:26, Dan Kenigsberg <danken at redhat.com> wrote:
> 
> On Tue, Apr 4, 2017 at 1:16 PM, Michal Skrivanek
> <michal.skrivanek at redhat.com> wrote:
>> 
>> On 4 Apr 2017, at 12:10, Roy Golan <rgolan at redhat.com> wrote:
>> 
>> 
>> 
>> On Tue, Apr 4, 2017 at 12:49 PM Yaniv Kaul <ykaul at redhat.com> wrote:
>>> 
>>> On Tue, Apr 4, 2017 at 12:29 PM, Roy Golan <rgolan at redhat.com> wrote:
>>>> 
>>>> I'm working on a POC lately on a change to stats collection and retrieval
>>>> by VDSM. The moto is to cut all we can from host/vm stats (possibly caps)
>>>> and report only core-business stuff to the engine. Engine will retrieve the
>>>> rest through a 3rd party provider
>>>> 
>>>> (nevermind what is it atm)
>> 
>> 
>> I hope it’s the same one as for VM stats, collectd:)
>> 
>>>> 
>>>> Being backward compatible by design, I have to support 2 API versions for
>>>> Host.getStats , '4.1' and '4.2'.
>>>> Except from supplying less parameters, I want VDSM to do less stuff. It
>>>> doesn't need to sample what it doesn't report. In other words I want
>>>> '4.1-sampling' and '4.2-sampling'
>>>> 
>>>> # Introducing 'configuration' Verb:
>>>> 
>>>> As engine knows always(Hosted Engine as well) what cluster version this
>>>> host belongs to, it can configure VDSM to operate in cluster version mode.
>> 
>> 
>> why not running it in parallel for one version?
>> 
>>>> 
>>>>  Host.configure(config={version: 4.2}
>>>> 
>>>> Consider this verb, pre-activating using 'Host.getCaps' to set the
>>>> context.
>>>> It will set the righjt sampling method, and other stuff if needed then
>>>> API endpoints will have the right permutation of the api to answer it.
>>>> 
>>>> 4.2 host can operate in 4.1 mode:
>>>>  Host.configure(config={version: 4.1}
>>>> 
>>>> Issue: moving a 4.2 host from 4.2 cluster to 4.1 is a problem since
>>>> engine needs to know this is a new vdsm that has the verb available. One way
>>>> to overcome that is to fire the verb for every host regardless of the
>>>> version and disregard an error that implies the verb doesn't exist.
>>> 
>>> 
>>> Isn't it solved by host re-installation?
>> 
>> 
>> We allow maintenance + change host cluster so not always. Was this changed?
>>> 
>>> 
>>>> 
>>>> # Engine:
>>>> Engine will have a handling of the verb per version.
>>>> Host/Vms monitoring should be changed - I suggest to move out of the
>>>> monitoring code the whole stats collection as it is a different task which
>>>> is orthogonal to 'monitoring' and in 4.2 more than before.
>>>> 
>>>> 
>>>> I know configuration for VDSM has been discussed before and there are
>>>> probably tons of ways to do it. When you share your thoughts please remember
>>>> that configuration is a by-product of the effort.
>>> 
>>> 
>>> How do we persist this level on VDSM? Or we don't, and if VDSM is
>>> restarted it is again back to 4.1 mode until Engine tells it otherwise?
>>> 
>>> Y.
>> 
>> 
>> Must persist it somehow otherwise there is a race when the engine will send
>> send a stats request and will get the wrong answer.  I'm wondering if using
>> differnt endpoints is the right solution here to prevent that from
>> happening.
>>  method: Host.getStats version: 4.1
>> 
>> 
>> would it be a problem? assuming that the code is easily started/stopped
>> within vdsm, we can just change the behavior based on receiving one or the
>> other verb for the first time after vdsm starts
> 
> It does not feel right to have a such a state in Vdsm. and making this
> state depend implicitly on a verb feels even worse than an explicit
> "configure" verb. We already have something like that in the
> debug-oriented setLogLevel verb; but that's not how client/server
> applications usually operate.

I don’t mind either way

> 
> I think that the proper way to do this would be to reconfigure
> vdsm.conf, set there collect_statistics=false (via ovirt-host-deploy
> or Anible), and restart vdsmd+supervdsmd. This way we are sure that
> all threads and services see the new config and act accordingly. This
> can be done by Engine whenever a host is added to a new cluster, based
> on the statistic-gathering policy in that cluster.

But why would you require restart for such a simple thing? We have the collection pretty well isolated already, it’s not even using periodic sampling, there are nice-to-drop things which can wait (repoStats, HE HA status?)



More information about the Devel mailing list