On Wed, Apr 5, 2017 at 7:16 AM, Dan Kenigsberg <danken(a)redhat.com> wrote:
On Tue, Apr 4, 2017 at 6:33 PM, Michal Skrivanek
<michal.skrivanek(a)redhat.com> wrote:
>
>> On 4 Apr 2017, at 17:26, Dan Kenigsberg <danken(a)redhat.com> wrote:
>>
>> On Tue, Apr 4, 2017 at 1:16 PM, Michal Skrivanek
>> <michal.skrivanek(a)redhat.com> wrote:
>>>
>>> On 4 Apr 2017, at 12:10, Roy Golan <rgolan(a)redhat.com> wrote:
>>>
>>>
>>>
>>> On Tue, Apr 4, 2017 at 12:49 PM Yaniv Kaul <ykaul(a)redhat.com> wrote:
>>>>
>>>> On Tue, Apr 4, 2017 at 12:29 PM, Roy Golan <rgolan(a)redhat.com>
wrote:
>>>>>
>>>>> I'm working on a POC lately on a change to stats collection and
retrieval
>>>>> by VDSM. The moto is to cut all we can from host/vm stats (possibly
caps)
>>>>> and report only core-business stuff to the engine. Engine will
retrieve the
>>>>> rest through a 3rd party provider
>>>>>
>>>>> (nevermind what is it atm)
>>>
>>>
>>> I hope it’s the same one as for VM stats, collectd:)
>>>
>>>>>
>>>>> Being backward compatible by design, I have to support 2 API
versions for
>>>>> Host.getStats , '4.1' and '4.2'.
>>>>> Except from supplying less parameters, I want VDSM to do less
stuff.
It
>>>>> doesn't need to sample what it doesn't report. In other
words I want
>>>>> '4.1-sampling' and '4.2-sampling'
>>>>>
>>>>> # Introducing 'configuration' Verb:
>>>>>
>>>>> As engine knows always(Hosted Engine as well) what cluster version
this
>>>>> host belongs to, it can configure VDSM to operate in cluster
version
mode.
>>>
>>>
>>> why not running it in parallel for one version?
>>>
>>>>>
>>>>> Host.configure(config={version: 4.2}
>>>>>
>>>>> Consider this verb, pre-activating using 'Host.getCaps' to
set the
>>>>> context.
>>>>> It will set the righjt sampling method, and other stuff if needed
then
>>>>> API endpoints will have the right permutation of the api to answer
it.
>>>>>
>>>>> 4.2 host can operate in 4.1 mode:
>>>>> Host.configure(config={version: 4.1}
>>>>>
>>>>> Issue: moving a 4.2 host from 4.2 cluster to 4.1 is a problem since
>>>>> engine needs to know this is a new vdsm that has the verb
available.
One way
>>>>> to overcome that is to fire the verb for every host regardless of
the
>>>>> version and disregard an error that implies the verb doesn't
exist.
>>>>
>>>>
>>>> Isn't it solved by host re-installation?
>>>
>>>
>>> We allow maintenance + change host cluster so not always. Was this
changed?
>>>>
>>>>
>>>>>
>>>>> # Engine:
>>>>> Engine will have a handling of the verb per version.
>>>>> Host/Vms monitoring should be changed - I suggest to move out of
the
>>>>> monitoring code the whole stats collection as it is a different
task
which
>>>>> is orthogonal to 'monitoring' and in 4.2 more than before.
>>>>>
>>>>>
>>>>> I know configuration for VDSM has been discussed before and there
are
>>>>> probably tons of ways to do it. When you share your thoughts please
remember
>>>>> that configuration is a by-product of the effort.
>>>>
>>>>
>>>> How do we persist this level on VDSM? Or we don't, and if VDSM is
>>>> restarted it is again back to 4.1 mode until Engine tells it
otherwise?
>>>>
>>>> Y.
>>>
>>>
>>> Must persist it somehow otherwise there is a race when the engine will
send
>>> send a stats request and will get the wrong answer. I'm wondering if
using
>>> differnt endpoints is the right solution here to prevent that from
>>> happening.
>>> method: Host.getStats version: 4.1
>>>
>>>
>>> would it be a problem? assuming that the code is easily started/stopped
>>> within vdsm, we can just change the behavior based on receiving one or
the
>>> other verb for the first time after vdsm starts
>>
>> It does not feel right to have a such a state in Vdsm. and making this
>> state depend implicitly on a verb feels even worse than an explicit
>> "configure" verb. We already have something like that in the
>> debug-oriented setLogLevel verb; but that's not how client/server
>> applications usually operate.
>
> I don’t mind either way
>
>>
>> I think that the proper way to do this would be to reconfigure
>> vdsm.conf, set there collect_statistics=false (via ovirt-host-deploy
>> or Anible), and restart vdsmd+supervdsmd. This way we are sure that
>> all threads and services see the new config and act accordingly. This
>> can be done by Engine whenever a host is added to a new cluster, based
>> on the statistic-gathering policy in that cluster.
>
> But why would you require restart for such a simple thing? We have the
collection pretty well isolated already, it’s not even using periodic
sampling, there are nice-to-drop things which can wait (repoStats, HE HA
status?)
A specific verb for enablePeriodicStatsCollection(False) can work like
that.
Roy suggested a more general configure verb, which I think can lead to
all sort of unsatisfiable assumptions.
I wonder when we would enable and whether we want to disable it at some
point.
How would it fit in to the host life cycle? (maintenance, engine not
connected etc)
On the other hand if we want to send stats either by events or collectd we
could
do it always. If collectd is not configured the stats would not be send
(collectd's decision).
In the same way it would work for events if the engine connection is not
there they would
not be sent. The only drawback of this approach is cpu utilization to
prepare/collect
the stats. As far as I remember Yaniv B. already measured the impact.