On Wed, Apr 5, 2017 at 7:16 AM, Dan Kenigsberg <danken@redhat.com> wrote:
On Tue, Apr 4, 2017 at 6:33 PM, Michal Skrivanek
<michal.skrivanek@redhat.com> wrote:
>> On 4 Apr 2017, at 17:26, Dan Kenigsberg <danken@redhat.com> wrote:
>> On Tue, Apr 4, 2017 at 1:16 PM, Michal Skrivanek
>> <michal.skrivanek@redhat.com> wrote:
>>> On 4 Apr 2017, at 12:10, Roy Golan <rgolan@redhat.com> wrote:
>>> On Tue, Apr 4, 2017 at 12:49 PM Yaniv Kaul <ykaul@redhat.com> wrote:
>>>> On Tue, Apr 4, 2017 at 12:29 PM, Roy Golan <rgolan@redhat.com> wrote:
>>>>> I'm working on a POC lately on a change to stats collection and retrieval
>>>>> by VDSM. The moto is to cut all we can from host/vm stats (possibly caps)
>>>>> and report only core-business stuff to the engine. Engine will retrieve the
>>>>> rest through a 3rd party provider
>>>>> (nevermind what is it atm)
>>> I hope it’s the same one as for VM stats, collectd:)
>>>>> Being backward compatible by design, I have to support 2 API versions for
>>>>> Host.getStats , '4.1' and '4.2'.
>>>>> Except from supplying less parameters, I want VDSM to do less stuff. It
>>>>> doesn't need to sample what it doesn't report. In other words I want
>>>>> '4.1-sampling' and '4.2-sampling'
>>>>> # Introducing 'configuration' Verb:
>>>>> As engine knows always(Hosted Engine as well) what cluster version this
>>>>> host belongs to, it can configure VDSM to operate in cluster version mode.
>>> why not running it in parallel for one version?
>>>>>  Host.configure(config={version: 4.2}
>>>>> Consider this verb, pre-activating using 'Host.getCaps' to set the
>>>>> context.
>>>>> It will set the righjt sampling method, and other stuff if needed then
>>>>> API endpoints will have the right permutation of the api to answer it.
>>>>> 4.2 host can operate in 4.1 mode:
>>>>>  Host.configure(config={version: 4.1}
>>>>> Issue: moving a 4.2 host from 4.2 cluster to 4.1 is a problem since
>>>>> engine needs to know this is a new vdsm that has the verb available. One way
>>>>> to overcome that is to fire the verb for every host regardless of the
>>>>> version and disregard an error that implies the verb doesn't exist.
>>>> Isn't it solved by host re-installation?
>>> We allow maintenance + change host cluster so not always. Was this changed?
>>>>> # Engine:
>>>>> Engine will have a handling of the verb per version.
>>>>> Host/Vms monitoring should be changed - I suggest to move out of the
>>>>> monitoring code the whole stats collection as it is a different task which
>>>>> is orthogonal to 'monitoring' and in 4.2 more than before.
>>>>> I know configuration for VDSM has been discussed before and there are
>>>>> probably tons of ways to do it. When you share your thoughts please remember
>>>>> that configuration is a by-product of the effort.
>>>> How do we persist this level on VDSM? Or we don't, and if VDSM is
>>>> restarted it is again back to 4.1 mode until Engine tells it otherwise?
>>>> Y.
>>> Must persist it somehow otherwise there is a race when the engine will send
>>> send a stats request and will get the wrong answer.  I'm wondering if using
>>> differnt endpoints is the right solution here to prevent that from
>>> happening.
>>>  method: Host.getStats version: 4.1
>>> would it be a problem? assuming that the code is easily started/stopped
>>> within vdsm, we can just change the behavior based on receiving one or the
>>> other verb for the first time after vdsm starts
>> It does not feel right to have a such a state in Vdsm. and making this
>> state depend implicitly on a verb feels even worse than an explicit
>> "configure" verb. We already have something like that in the
>> debug-oriented setLogLevel verb; but that's not how client/server
>> applications usually operate.
> I don’t mind either way
>> I think that the proper way to do this would be to reconfigure
>> vdsm.conf, set there collect_statistics=false (via ovirt-host-deploy
>> or Anible), and restart vdsmd+supervdsmd. This way we are sure that
>> all threads and services see the new config and act accordingly. This
>> can be done by Engine whenever a host is added to a new cluster, based
>> on the statistic-gathering policy in that cluster.
> But why would you require restart for such a simple thing? We have the collection pretty well isolated already, it’s not even using periodic sampling, there are nice-to-drop things which can wait (repoStats, HE HA status?)

A specific verb for enablePeriodicStatsCollection(False) can work like that.

Roy suggested a more general configure verb, which I think can lead to
all sort of unsatisfiable assumptions.

I wonder when we would enable and whether we want to disable it at some point.
How would it fit in to the host life cycle? (maintenance, engine not connected etc)

On the other hand if we want to send stats either by events or collectd we could
do it always. If collectd is not configured the stats would not be send (collectd's decision).
In the same way it would work for events if the engine connection is not there they would
not be sent. The only drawback of this approach is cpu utilization to prepare/collect
the stats. As far as I remember Yaniv B. already measured the impact.