<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Apr 5, 2017 at 7:16 AM, Dan Kenigsberg <span dir="ltr"><<a href="mailto:danken@redhat.com" target="_blank">danken@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Tue, Apr 4, 2017 at 6:33 PM, Michal Skrivanek<br>
<div><div class="gmail-h5"><<a href="mailto:michal.skrivanek@redhat.com">michal.skrivanek@redhat.com</a>> wrote:<br>
><br>
>> On 4 Apr 2017, at 17:26, Dan Kenigsberg <<a href="mailto:danken@redhat.com">danken@redhat.com</a>> wrote:<br>
>><br>
>> On Tue, Apr 4, 2017 at 1:16 PM, Michal Skrivanek<br>
>> <<a href="mailto:michal.skrivanek@redhat.com">michal.skrivanek@redhat.com</a>> wrote:<br>
>>><br>
>>> On 4 Apr 2017, at 12:10, Roy Golan <<a href="mailto:rgolan@redhat.com">rgolan@redhat.com</a>> wrote:<br>
>>><br>
>>><br>
>>><br>
>>> On Tue, Apr 4, 2017 at 12:49 PM Yaniv Kaul <<a href="mailto:ykaul@redhat.com">ykaul@redhat.com</a>> wrote:<br>
>>>><br>
>>>> On Tue, Apr 4, 2017 at 12:29 PM, Roy Golan <<a href="mailto:rgolan@redhat.com">rgolan@redhat.com</a>> wrote:<br>
>>>>><br>
>>>>> I'm working on a POC lately on a change to stats collection and retrieval<br>
>>>>> by VDSM. The moto is to cut all we can from host/vm stats (possibly caps)<br>
>>>>> and report only core-business stuff to the engine. Engine will retrieve the<br>
>>>>> rest through a 3rd party provider<br>
>>>>><br>
>>>>> (nevermind what is it atm)<br>
>>><br>
>>><br>
>>> I hope it’s the same one as for VM stats, collectd:)<br>
>>><br>
>>>>><br>
>>>>> Being backward compatible by design, I have to support 2 API versions for<br>
>>>>> Host.getStats , '4.1' and '4.2'.<br>
>>>>> Except from supplying less parameters, I want VDSM to do less stuff. It<br>
>>>>> doesn't need to sample what it doesn't report. In other words I want<br>
>>>>> '4.1-sampling' and '4.2-sampling'<br>
>>>>><br>
>>>>> # Introducing 'configuration' Verb:<br>
>>>>><br>
>>>>> As engine knows always(Hosted Engine as well) what cluster version this<br>
>>>>> host belongs to, it can configure VDSM to operate in cluster version mode.<br>
>>><br>
>>><br>
>>> why not running it in parallel for one version?<br>
>>><br>
>>>>><br>
>>>>> Host.configure(config={<wbr>version: 4.2}<br>
>>>>><br>
>>>>> Consider this verb, pre-activating using 'Host.getCaps' to set the<br>
>>>>> context.<br>
>>>>> It will set the righjt sampling method, and other stuff if needed then<br>
>>>>> API endpoints will have the right permutation of the api to answer it.<br>
>>>>><br>
>>>>> 4.2 host can operate in 4.1 mode:<br>
>>>>> Host.configure(config={<wbr>version: 4.1}<br>
>>>>><br>
>>>>> Issue: moving a 4.2 host from 4.2 cluster to 4.1 is a problem since<br>
>>>>> engine needs to know this is a new vdsm that has the verb available. One way<br>
>>>>> to overcome that is to fire the verb for every host regardless of the<br>
>>>>> version and disregard an error that implies the verb doesn't exist.<br>
>>>><br>
>>>><br>
>>>> Isn't it solved by host re-installation?<br>
>>><br>
>>><br>
>>> We allow maintenance + change host cluster so not always. Was this changed?<br>
>>>><br>
>>>><br>
>>>>><br>
>>>>> # Engine:<br>
>>>>> Engine will have a handling of the verb per version.<br>
>>>>> Host/Vms monitoring should be changed - I suggest to move out of the<br>
>>>>> monitoring code the whole stats collection as it is a different task which<br>
>>>>> is orthogonal to 'monitoring' and in 4.2 more than before.<br>
>>>>><br>
>>>>><br>
>>>>> I know configuration for VDSM has been discussed before and there are<br>
>>>>> probably tons of ways to do it. When you share your thoughts please remember<br>
>>>>> that configuration is a by-product of the effort.<br>
>>>><br>
>>>><br>
>>>> How do we persist this level on VDSM? Or we don't, and if VDSM is<br>
>>>> restarted it is again back to 4.1 mode until Engine tells it otherwise?<br>
>>>><br>
>>>> Y.<br>
>>><br>
>>><br>
>>> Must persist it somehow otherwise there is a race when the engine will send<br>
>>> send a stats request and will get the wrong answer. I'm wondering if using<br>
>>> differnt endpoints is the right solution here to prevent that from<br>
>>> happening.<br>
>>> method: Host.getStats version: 4.1<br>
>>><br>
>>><br>
>>> would it be a problem? assuming that the code is easily started/stopped<br>
>>> within vdsm, we can just change the behavior based on receiving one or the<br>
>>> other verb for the first time after vdsm starts<br>
>><br>
>> It does not feel right to have a such a state in Vdsm. and making this<br>
>> state depend implicitly on a verb feels even worse than an explicit<br>
>> "configure" verb. We already have something like that in the<br>
>> debug-oriented setLogLevel verb; but that's not how client/server<br>
>> applications usually operate.<br>
><br>
> I don’t mind either way<br>
><br>
>><br>
>> I think that the proper way to do this would be to reconfigure<br>
>> vdsm.conf, set there collect_statistics=false (via ovirt-host-deploy<br>
>> or Anible), and restart vdsmd+supervdsmd. This way we are sure that<br>
>> all threads and services see the new config and act accordingly. This<br>
>> can be done by Engine whenever a host is added to a new cluster, based<br>
>> on the statistic-gathering policy in that cluster.<br>
><br>
> But why would you require restart for such a simple thing? We have the collection pretty well isolated already, it’s not even using periodic sampling, there are nice-to-drop things which can wait (repoStats, HE HA status?)<br>
<br>
</div></div>A specific verb for enablePeriodicStatsCollection(<wbr>False) can work like that.<br>
<br>
Roy suggested a more general configure verb, which I think can lead to<br>
all sort of unsatisfiable assumptions.<br></blockquote><div><br></div><div>I wonder when we would enable and whether we want to disable it at some point.<br></div><div>How would it fit in to the host life cycle? (maintenance, engine not connected etc) <br><br></div><div>On the other hand if we want to send stats either by events or collectd we could<br></div><div>do it always. If collectd is not configured the stats would not be send (collectd's decision).<br>In the same way it would work for events if the engine connection is not there they would<br></div><div>not be sent. The only drawback of this approach is cpu utilization to prepare/collect<br></div><div>the stats. As far as I remember Yaniv B. already measured the impact.<br></div></div><br></div></div>