Chaning the statistics monitoring interval to 30s

newer
Re: [ovirt-devel] [ovirt-users]...

Roy Golan

5 Jul 2017 5 Jul '17

4:57 p.m.

Hi all, I would like to get feedback on $subject and see if I'm missing something. The impact of this is simply less resource consumption and by that we can support even greater number of hosts [1] and vms in the system. If you think more relaxed statistics collection will affect a core flow let me know - as far as I see I didn't spot anything critical. The overhead of a cycle per host something like that: 2 roundtrips per host in a cycle, (vm + host stats) and tons of memory allocation for char[] -> json-> maps of maps -> VM/Vds statistics -> Maps -> serialiazing to DB. To minimize the effect of this change we can leave a call to 'list' verb to at least detect vms existence in the same rate as today. Pros - Engine has rore resources to support more hosts/vms/other activities of the engine - Vdsm will have more resources as well (need to tweak vdsm to collect in the same frequency) - less DB writes and reads, approx half of what the system will do in the in its lifefpan (cause this is what is mainly does all the time) Cons - DWH/Dashboard will have less entries, I'm not sure what is graphical affect given our hourly resolution (cmiiw here) [1] https://bugzilla.redhat.com/show_bug.cgi?id=1430876

Attachments:

attachment.html (text/html — 1.5 KB)

Show replies by date

Arik Hadas

5 Jul 5 Jul

5:35 p.m.

On Wed, Jul 5, 2017 at 5:57 PM, Roy Golan <rgolan@redhat.com> wrote:

...

Hi all,

I would like to get feedback on $subject and see if I'm missing something. The impact of this is simply less resource consumption and by that we can support even greater number of hosts [1] and vms in the system.

...

If you think more relaxed statistics collection will affect a core flow let me know - as far as I see I didn't spot anything critical.

...

The overhead of a cycle per host something like that: 2 roundtrips per host in a cycle, (vm + host stats) and tons of memory allocation for char[] -> json-> maps of maps -> VM/Vds statistics -> Maps -> serialiazing to DB.

To minimize the effect of this change we can leave a call to 'list' verb to at least detect vms existence in the same rate as today.

...

Pros - Engine has rore resources to support more hosts/vms/other activities of the engine - Vdsm will have more resources as well (need to tweak vdsm to collect in the same frequency) - less DB writes and reads, approx half of what the system will do in the in its lifefpan (cause this is what is mainly does all the time)

Cons - DWH/Dashboard will have less entries, I'm not sure what is graphical affect given our hourly resolution (cmiiw here)

What's the frequency of the queries done by DWH/Dashboard? Do they count on the _update_date column of the queried data? I'm asking because if they query the database every minute and say "the time now is 10:30 and the queried data is ..." then there should not be less entries.

...

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1430876

...

_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

Shirly Radco

8:36 p.m.

-- SHIRLY RADCO BI SOFTWARE ENGINEER, Red Hat Israel <https://www.redhat.com/> sradco@redhat.com <https://red.ht/sig> <https://redhat.com/summit> On Wed, Jul 5, 2017 at 6:35 PM, Arik Hadas <ahadas@redhat.com> wrote:

...

On Wed, Jul 5, 2017 at 5:57 PM, Roy Golan <rgolan@redhat.com> wrote:

...
Hi all,

I would like to get feedback on $subject and see if I'm missing something. The impact of this is simply less resource consumption and by that we can support even greater number of hosts [1] and vms in the system.

...
If you think more relaxed statistics collection will affect a core flow let me know - as far as I see I didn't spot anything critical.

...
The overhead of a cycle per host something like that: 2 roundtrips per host in a cycle, (vm + host stats) and tons of memory allocation for char[] -> json-> maps of maps -> VM/Vds statistics -> Maps -> serialiazing to DB.

To minimize the effect of this change we can leave a call to 'list' verb to at least detect vms existence in the same rate as today.

+1

...
Pros - Engine has rore resources to support more hosts/vms/other activities of the engine - Vdsm will have more resources as well (need to tweak vdsm to collect in the same frequency) - less DB writes and reads, approx half of what the system will do in the in its lifefpan (cause this is what is mainly does all the time)

Cons - DWH/Dashboard will have less entries, I'm not sure what is graphical affect given our hourly resolution (cmiiw here)

What's the frequency of the queries done by DWH/Dashboard? Do they count on the _update_date column of the queried data?

Current frequency is 20 seconds. The configurations are queried based on the _update_date, but statistics are queried every interval. The affect will be less accuracy in the hourly calculations.

...

I'm asking because if they query the database every minute and say "the time now is 10:30 and the queried data is ..." then there should not be less entries.

...
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1430876

...
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

Arik Hadas

6 Jul 6 Jul

8:38 a.m.

On Wed, Jul 5, 2017 at 9:36 PM, Shirly Radco <sradco@redhat.com> wrote:

...

--

SHIRLY RADCO

BI SOFTWARE ENGINEER,

Red Hat Israel <https://www.redhat.com/>

sradco@redhat.com <https://red.ht/sig> <https://redhat.com/summit>

On Wed, Jul 5, 2017 at 6:35 PM, Arik Hadas <ahadas@redhat.com> wrote:

...
On Wed, Jul 5, 2017 at 5:57 PM, Roy Golan <rgolan@redhat.com> wrote:

...
Hi all,

I would like to get feedback on $subject and see if I'm missing something. The impact of this is simply less resource consumption and by that we can support even greater number of hosts [1] and vms in the system.

...
If you think more relaxed statistics collection will affect a core flow let me know - as far as I see I didn't spot anything critical.

...
The overhead of a cycle per host something like that: 2 roundtrips per host in a cycle, (vm + host stats) and tons of memory allocation for char[] -> json-> maps of maps -> VM/Vds statistics -> Maps -> serialiazing to DB.

To minimize the effect of this change we can leave a call to 'list' verb to at least detect vms existence in the same rate as today.

+1

...
Pros - Engine has rore resources to support more hosts/vms/other activities of the engine - Vdsm will have more resources as well (need to tweak vdsm to collect in the same frequency) - less DB writes and reads, approx half of what the system will do in the in its lifefpan (cause this is what is mainly does all the time)

Cons - DWH/Dashboard will have less entries, I'm not sure what is graphical affect given our hourly resolution (cmiiw here)

What's the frequency of the queries done by DWH/Dashboard? Do they count on the _update_date column of the queried data?

Current frequency is 20 seconds. The configurations are queried based on the _update_date, but statistics are queried every interval.

The affect will be less accuracy in the hourly calculations.

Ack. So if the proposed change is done, it would probably make sense to increase the inverval of those queries to be higher than 30 sec, or at least taking into consideration the _update_date of vm_statistics as well.

...

...
I'm asking because if they query the database every minute and say "the time now is 10:30 and the queried data is ..." then there should not be less entries.

...
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1430876

...
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

Oved Ourfali

9:04 a.m.

On Thu, Jul 6, 2017 at 9:38 AM, Arik Hadas <ahadas@redhat.com> wrote:

...

On Wed, Jul 5, 2017 at 9:36 PM, Shirly Radco <sradco@redhat.com> wrote:

...
--

SHIRLY RADCO

BI SOFTWARE ENGINEER,

Red Hat Israel <https://www.redhat.com/>

sradco@redhat.com <https://red.ht/sig> <https://redhat.com/summit>

On Wed, Jul 5, 2017 at 6:35 PM, Arik Hadas <ahadas@redhat.com> wrote:

...
On Wed, Jul 5, 2017 at 5:57 PM, Roy Golan <rgolan@redhat.com> wrote:

...
Hi all,

I would like to get feedback on $subject and see if I'm missing something. The impact of this is simply less resource consumption and by that we can support even greater number of hosts [1] and vms in the system.

...
If you think more relaxed statistics collection will affect a core flow let me know - as far as I see I didn't spot anything critical.

...
The overhead of a cycle per host something like that: 2 roundtrips per host in a cycle, (vm + host stats) and tons of memory allocation for char[] -> json-> maps of maps -> VM/Vds statistics -> Maps -> serialiazing to DB.

To minimize the effect of this change we can leave a call to 'list' verb to at least detect vms existence in the same rate as today.

+1

...
Pros - Engine has rore resources to support more hosts/vms/other activities of the engine - Vdsm will have more resources as well (need to tweak vdsm to collect in the same frequency) - less DB writes and reads, approx half of what the system will do in the in its lifefpan (cause this is what is mainly does all the time)

Cons - DWH/Dashboard will have less entries, I'm not sure what is graphical affect given our hourly resolution (cmiiw here)

What's the frequency of the queries done by DWH/Dashboard? Do they count on the _update_date column of the queried data?

Current frequency is 20 seconds. The configurations are queried based on the _update_date, but statistics are queried every interval.

The affect will be less accuracy in the hourly calculations.

Ack. So if the proposed change is done, it would probably make sense to increase the inverval of those queries to be higher than 30 sec, or at least taking into consideration the _update_date of vm_statistics as well.

Note that it will cause issues with cloudforms to change those queries to 30 seconds, so I guess we'll still query it every 20 seconds (although the data won't change in some of those queries).

...

...
...
I'm asking because if they query the database every minute and say "the time now is 10:30 and the queried data is ..." then there should not be less entries.

...
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1430876

...
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

Yaniv Kaul

9:58 a.m.

On Thu, Jul 6, 2017 at 10:04 AM, Oved Ourfali <oourfali@redhat.com> wrote:

...

On Thu, Jul 6, 2017 at 9:38 AM, Arik Hadas <ahadas@redhat.com> wrote:

...
On Wed, Jul 5, 2017 at 9:36 PM, Shirly Radco <sradco@redhat.com> wrote:

...
--

SHIRLY RADCO

BI SOFTWARE ENGINEER,

Red Hat Israel <https://www.redhat.com/>

sradco@redhat.com <https://red.ht/sig> <https://redhat.com/summit>

On Wed, Jul 5, 2017 at 6:35 PM, Arik Hadas <ahadas@redhat.com> wrote:

...
On Wed, Jul 5, 2017 at 5:57 PM, Roy Golan <rgolan@redhat.com> wrote:

...
Hi all,

I would like to get feedback on $subject and see if I'm missing something. The impact of this is simply less resource consumption and by that we can support even greater number of hosts [1] and vms in the system.

...
If you think more relaxed statistics collection will affect a core flow let me know - as far as I see I didn't spot anything critical.

...
The overhead of a cycle per host something like that: 2 roundtrips per host in a cycle, (vm + host stats) and tons of memory allocation for char[] -> json-> maps of maps -> VM/Vds statistics -> Maps -> serialiazing to DB.

To minimize the effect of this change we can leave a call to 'list' verb to at least detect vms existence in the same rate as today.

+1

...
Pros - Engine has rore resources to support more hosts/vms/other activities of the engine - Vdsm will have more resources as well (need to tweak vdsm to collect in the same frequency) - less DB writes and reads, approx half of what the system will do in the in its lifefpan (cause this is what is mainly does all the time)

Cons - DWH/Dashboard will have less entries, I'm not sure what is graphical affect given our hourly resolution (cmiiw here)

What's the frequency of the queries done by DWH/Dashboard? Do they count on the _update_date column of the queried data?

Current frequency is 20 seconds. The configurations are queried based on the _update_date, but statistics are queried every interval.

The affect will be less accuracy in the hourly calculations.

Ack. So if the proposed change is done, it would probably make sense to increase the inverval of those queries to be higher than 30 sec, or at least taking into consideration the _update_date of vm_statistics as well.

Note that it will cause issues with cloudforms to change those queries to 30 seconds, so I guess we'll still query it every 20 seconds (although the data won't change in some of those queries).

I thought it was configurable in ManageIQ how often to query, but in any case even if we query every 20 seconds, we'll get updated VM stats, which is fine, and not as updated hosts stats, which is fine as well, from my perspective. Y.

...

...
...
...
I'm asking because if they query the database every minute and say "the time now is 10:30 and the queried data is ..." then there should not be less entries.

...
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1430876

...
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

Roy Golan

11:18 a.m.

Action items: - Demonstrate the effect of the reduction of stats collection on the system - WIP - Code changes: - config item change: NumberVmRefreshesBeforeSave from 5 to 10 - make the 'poll' vms job to fire at NumberVmRefreshesBeforeSave / 2 (or just make the code to support explicit time interval) - VDSM should get a config set with the sampling inteval - to support back-compat On Thu, Jul 6, 2017 at 11:00 AM Yaniv Kaul <ykaul@redhat.com> wrote:

...

On Thu, Jul 6, 2017 at 10:04 AM, Oved Ourfali <oourfali@redhat.com> wrote:

...
On Thu, Jul 6, 2017 at 9:38 AM, Arik Hadas <ahadas@redhat.com> wrote:

...
On Wed, Jul 5, 2017 at 9:36 PM, Shirly Radco <sradco@redhat.com> wrote:

...
--

SHIRLY RADCO

BI SOFTWARE ENGINEER,

Red Hat Israel <https://www.redhat.com/>

sradco@redhat.com <https://red.ht/sig> <https://redhat.com/summit>

On Wed, Jul 5, 2017 at 6:35 PM, Arik Hadas <ahadas@redhat.com> wrote:

...
On Wed, Jul 5, 2017 at 5:57 PM, Roy Golan <rgolan@redhat.com> wrote:

...
Hi all,

I would like to get feedback on $subject and see if I'm missing something. The impact of this is simply less resource consumption and by that we can support even greater number of hosts [1] and vms in the system.

...
If you think more relaxed statistics collection will affect a core flow let me know - as far as I see I didn't spot anything critical.

...
The overhead of a cycle per host something like that: 2 roundtrips per host in a cycle, (vm + host stats) and tons of memory allocation for char[] -> json-> maps of maps -> VM/Vds statistics -> Maps -> serialiazing to DB.

To minimize the effect of this change we can leave a call to 'list' verb to at least detect vms existence in the same rate as today.

+1

...
Pros - Engine has rore resources to support more hosts/vms/other activities of the engine - Vdsm will have more resources as well (need to tweak vdsm to collect in the same frequency) - less DB writes and reads, approx half of what the system will do in the in its lifefpan (cause this is what is mainly does all the time)

Cons - DWH/Dashboard will have less entries, I'm not sure what is graphical affect given our hourly resolution (cmiiw here)

What's the frequency of the queries done by DWH/Dashboard? Do they count on the _update_date column of the queried data?

Current frequency is 20 seconds. The configurations are queried based on the _update_date, but statistics are queried every interval.

The affect will be less accuracy in the hourly calculations.

Ack. So if the proposed change is done, it would probably make sense to increase the inverval of those queries to be higher than 30 sec, or at least taking into consideration the _update_date of vm_statistics as well.

Note that it will cause issues with cloudforms to change those queries to 30 seconds, so I guess we'll still query it every 20 seconds (although the data won't change in some of those queries).

I thought it was configurable in ManageIQ how often to query, but in any case even if we query every 20 seconds, we'll get updated VM stats, which is fine, and not as updated hosts stats, which is fine as well, from my perspective. Y.

...
...
...
...
I'm asking because if they query the database every minute and say "the time now is 10:30 and the queried data is ..." then there should not be less entries.

...
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1430876

...
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

Roy Golan

11:19 a.m.

On Thu, Jul 6, 2017 at 12:18 PM Roy Golan <rgolan@redhat.com> wrote:

...

Action items: - Demonstrate the effect of the reduction of stats collection on the system - WIP - Code changes: - config item change: NumberVmRefreshesBeforeSave from 5 to 10 - make the 'poll' vms job to fire at NumberVmRefreshesBeforeSave / 2 (or just make the code to support explicit time interval) - VDSM should get a config set with the sampling inteval - to support back-compat

- Chages to DWH sampling and ManageIQ?

...

On Thu, Jul 6, 2017 at 11:00 AM Yaniv Kaul <ykaul@redhat.com> wrote:

...
On Thu, Jul 6, 2017 at 10:04 AM, Oved Ourfali <oourfali@redhat.com> wrote:

...
On Thu, Jul 6, 2017 at 9:38 AM, Arik Hadas <ahadas@redhat.com> wrote:

...
On Wed, Jul 5, 2017 at 9:36 PM, Shirly Radco <sradco@redhat.com> wrote:

...
--

SHIRLY RADCO

BI SOFTWARE ENGINEER,

Red Hat Israel <https://www.redhat.com/>

sradco@redhat.com <https://red.ht/sig> <https://redhat.com/summit>

On Wed, Jul 5, 2017 at 6:35 PM, Arik Hadas <ahadas@redhat.com> wrote:

...
On Wed, Jul 5, 2017 at 5:57 PM, Roy Golan <rgolan@redhat.com> wrote:

> Hi all, > > I would like to get feedback on $subject and see if I'm missing > something. The impact of this is simply less resource consumption and by > that we can support even greater number of hosts [1] and vms in the system. >

> If you think more relaxed statistics collection will affect a core > flow let me know - as far as I see I didn't spot anything critical. >

> The overhead of a cycle per host something like that: 2 roundtrips > per host in a cycle, (vm + host stats) and tons of memory allocation for > char[] -> json-> maps of maps -> VM/Vds statistics -> Maps -> serialiazing > to DB. > > To minimize the effect of this change we can leave a call to 'list' > verb to at least detect vms existence in the same rate as today. >

+1

> > Pros > - Engine has rore resources to support more hosts/vms/other > activities of the engine > - Vdsm will have more resources as well (need to tweak vdsm to > collect in the same > frequency) > - less DB writes and reads, approx half of what the system will do > in the in its lifefpan (cause this is what is mainly does all the time) > > Cons > - DWH/Dashboard will have less entries, I'm not sure what is > graphical affect given our hourly resolution (cmiiw here) >

What's the frequency of the queries done by DWH/Dashboard? Do they count on the _update_date column of the queried data?

Current frequency is 20 seconds. The configurations are queried based on the _update_date, but statistics are queried every interval.

The affect will be less accuracy in the hourly calculations.

Ack. So if the proposed change is done, it would probably make sense to increase the inverval of those queries to be higher than 30 sec, or at least taking into consideration the _update_date of vm_statistics as well.

Note that it will cause issues with cloudforms to change those queries to 30 seconds, so I guess we'll still query it every 20 seconds (although the data won't change in some of those queries).

I thought it was configurable in ManageIQ how often to query, but in any case even if we query every 20 seconds, we'll get updated VM stats, which is fine, and not as updated hosts stats, which is fine as well, from my perspective. Y.

...
...
...
...
I'm asking because if they query the database every minute and say "the time now is 10:30 and the queried data is ..." then there should not be less entries.

> > > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1430876 >

> > _______________________________________________ > Devel mailing list > Devel@ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel >

_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

Oved Ourfali

11:39 a.m.

On Thu, Jul 6, 2017 at 12:19 PM, Roy Golan <rgolan@redhat.com> wrote:

...

On Thu, Jul 6, 2017 at 12:18 PM Roy Golan <rgolan@redhat.com> wrote:

...
Action items: - Demonstrate the effect of the reduction of stats collection on the system - WIP - Code changes: - config item change: NumberVmRefreshesBeforeSave from 5 to 10 - make the 'poll' vms job to fire at NumberVmRefreshesBeforeSave / 2 (or just make the code to support explicit time interval) - VDSM should get a config set with the sampling inteval - to support back-compat

- Chages to DWH sampling and ManageIQ?

I think manageIQ can cope with either 60 seconds or 20 seconds intervals (after a change we've made when we moved to 20 seconds). Put an action item indeed to check that with us if we'll decide to do so.

...

...
On Thu, Jul 6, 2017 at 11:00 AM Yaniv Kaul <ykaul@redhat.com> wrote:

...
On Thu, Jul 6, 2017 at 10:04 AM, Oved Ourfali <oourfali@redhat.com> wrote:

...
On Thu, Jul 6, 2017 at 9:38 AM, Arik Hadas <ahadas@redhat.com> wrote:

...
On Wed, Jul 5, 2017 at 9:36 PM, Shirly Radco <sradco@redhat.com> wrote:

...
--

SHIRLY RADCO

BI SOFTWARE ENGINEER,

Red Hat Israel <https://www.redhat.com/>

sradco@redhat.com <https://red.ht/sig> <https://redhat.com/summit>

On Wed, Jul 5, 2017 at 6:35 PM, Arik Hadas <ahadas@redhat.com> wrote:

> > > On Wed, Jul 5, 2017 at 5:57 PM, Roy Golan <rgolan@redhat.com> wrote: > >> Hi all, >> >> I would like to get feedback on $subject and see if I'm missing >> something. The impact of this is simply less resource consumption and by >> that we can support even greater number of hosts [1] and vms in the system. >> > >> If you think more relaxed statistics collection will affect a core >> flow let me know - as far as I see I didn't spot anything critical. >> > >> The overhead of a cycle per host something like that: 2 roundtrips >> per host in a cycle, (vm + host stats) and tons of memory allocation for >> char[] -> json-> maps of maps -> VM/Vds statistics -> Maps -> serialiazing >> to DB. >> >> To minimize the effect of this change we can leave a call to 'list' >> verb to at least detect vms existence in the same rate as today. >> > > +1 > > >> >> Pros >> - Engine has rore resources to support more hosts/vms/other >> activities of the engine >> - Vdsm will have more resources as well (need to tweak vdsm to >> collect in the same >> frequency) >> - less DB writes and reads, approx half of what the system will do >> in the in its lifefpan (cause this is what is mainly does all the time) >> >> Cons >> - DWH/Dashboard will have less entries, I'm not sure what is >> graphical affect given our hourly resolution (cmiiw here) >> > > What's the frequency of the queries done by DWH/Dashboard? Do they > count on the _update_date column of the queried data? >

Current frequency is 20 seconds. The configurations are queried based on the _update_date, but statistics are queried every interval.

The affect will be less accuracy in the hourly calculations.

Ack. So if the proposed change is done, it would probably make sense to increase the inverval of those queries to be higher than 30 sec, or at least taking into consideration the _update_date of vm_statistics as well.

Note that it will cause issues with cloudforms to change those queries to 30 seconds, so I guess we'll still query it every 20 seconds (although the data won't change in some of those queries).

I thought it was configurable in ManageIQ how often to query, but in any case even if we query every 20 seconds, we'll get updated VM stats, which is fine, and not as updated hosts stats, which is fine as well, from my perspective. Y.

...
...
...
> I'm asking because if they query the database every minute and say > "the time now is 10:30 and the queried data is ..." then there should not > be less entries. > > >> >> >> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1430876 >> > >> >> _______________________________________________ >> Devel mailing list >> Devel@ovirt.org >> http://lists.ovirt.org/mailman/listinfo/devel >> > > > _______________________________________________ > Devel mailing list > Devel@ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel >

_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

Shirly Radco

3:37 p.m.

-- SHIRLY RADCO BI SOFTWARE ENGINEER, Red Hat Israel <https://www.redhat.com/> sradco@redhat.com <https://red.ht/sig> <https://redhat.com/summit> On Thu, Jul 6, 2017 at 12:39 PM, Oved Ourfali <oourfali@redhat.com> wrote:

...

On Thu, Jul 6, 2017 at 12:19 PM, Roy Golan <rgolan@redhat.com> wrote:

...
On Thu, Jul 6, 2017 at 12:18 PM Roy Golan <rgolan@redhat.com> wrote:

...
Action items: - Demonstrate the effect of the reduction of stats collection on the system - WIP - Code changes: - config item change: NumberVmRefreshesBeforeSave from 5 to 10 - make the 'poll' vms job to fire at NumberVmRefreshesBeforeSave / 2 (or just make the code to support explicit time interval) - VDSM should get a config set with the sampling inteval - to support back-compat

- Chages to DWH sampling and ManageIQ?

I think manageIQ can cope with either 60 seconds or 20 seconds intervals (after a change we've made when we moved to 20 seconds). Put an action item indeed to check that with us if we'll decide to do so.

Indeed 20 or 60 seconds. Their implementation is very strict and coupled with vmware statistics which are 20 seconds.

...

...
...
On Thu, Jul 6, 2017 at 11:00 AM Yaniv Kaul <ykaul@redhat.com> wrote:

...
On Thu, Jul 6, 2017 at 10:04 AM, Oved Ourfali <oourfali@redhat.com> wrote:

...
On Thu, Jul 6, 2017 at 9:38 AM, Arik Hadas <ahadas@redhat.com> wrote:

...
On Wed, Jul 5, 2017 at 9:36 PM, Shirly Radco <sradco@redhat.com> wrote:

> > > -- > > SHIRLY RADCO > > BI SOFTWARE ENGINEER, > > Red Hat Israel <https://www.redhat.com/> > > sradco@redhat.com > <https://red.ht/sig> > <https://redhat.com/summit> > > > On Wed, Jul 5, 2017 at 6:35 PM, Arik Hadas <ahadas@redhat.com> > wrote: > >> >> >> On Wed, Jul 5, 2017 at 5:57 PM, Roy Golan <rgolan@redhat.com> >> wrote: >> >>> Hi all, >>> >>> I would like to get feedback on $subject and see if I'm missing >>> something. The impact of this is simply less resource consumption and by >>> that we can support even greater number of hosts [1] and vms in the system. >>> >> >>> If you think more relaxed statistics collection will affect a core >>> flow let me know - as far as I see I didn't spot anything critical. >>> >> >>> The overhead of a cycle per host something like that: 2 roundtrips >>> per host in a cycle, (vm + host stats) and tons of memory allocation for >>> char[] -> json-> maps of maps -> VM/Vds statistics -> Maps -> serialiazing >>> to DB. >>> >>> To minimize the effect of this change we can leave a call to >>> 'list' verb to at least detect vms existence in the same rate as today. >>> >> >> +1 >> >> >>> >>> Pros >>> - Engine has rore resources to support more hosts/vms/other >>> activities of the engine >>> - Vdsm will have more resources as well (need to tweak vdsm to >>> collect in the same >>> frequency) >>> - less DB writes and reads, approx half of what the system will do >>> in the in its lifefpan (cause this is what is mainly does all the time) >>> >>> Cons >>> - DWH/Dashboard will have less entries, I'm not sure what is >>> graphical affect given our hourly resolution (cmiiw here) >>> >> >> What's the frequency of the queries done by DWH/Dashboard? Do they >> count on the _update_date column of the queried data? >> > > Current frequency is 20 seconds. > The configurations are queried based on the _update_date, but > statistics are queried every interval. > > The affect will be less accuracy in the hourly calculations. >

Ack. So if the proposed change is done, it would probably make sense to increase the inverval of those queries to be higher than 30 sec, or at least taking into consideration the _update_date of vm_statistics as well.

Note that it will cause issues with cloudforms to change those queries to 30 seconds, so I guess we'll still query it every 20 seconds (although the data won't change in some of those queries).

I thought it was configurable in ManageIQ how often to query, but in any case even if we query every 20 seconds, we'll get updated VM stats, which is fine, and not as updated hosts stats, which is fine as well, from my perspective. Y.

...
...
> >> I'm asking because if they query the database every minute and say >> "the time now is 10:30 and the queried data is ..." then there should not >> be less entries. >> >> >>> >>> >>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1430876 >>> >> >>> >>> _______________________________________________ >>> Devel mailing list >>> Devel@ovirt.org >>> http://lists.ovirt.org/mailman/listinfo/devel >>> >> >> >> _______________________________________________ >> Devel mailing list >> Devel@ovirt.org >> http://lists.ovirt.org/mailman/listinfo/devel >> > >

_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

Yaniv Lavi (Dary)

5:33 p.m.

We will probably keep it at 20 seconds in any case for CFME. But first let's measure the benefit, added a needinfo in the bug. Thanks, YANIV LAVI (YANIV DARY) SENIOR TECHNICAL PRODUCT MANAGER Red Hat Israel Ltd. <https://www.redhat.com/> 34 Jerusalem Road, Building A, 1st floor Ra'anana, Israel 4350109 ylavi@redhat.com T: +972-9-7692306/8272306 F: +972-9-7692223 IM: ylavi <https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted> @redhatnews <https://twitter.com/redhatnews> Red Hat <https://www.linkedin.com/company/red-hat> Red Hat <https://www.facebook.com/RedHatInc> On Thu, Jul 6, 2017 at 4:37 PM, Shirly Radco <sradco@redhat.com> wrote:

...

--

SHIRLY RADCO

BI SOFTWARE ENGINEER,

Red Hat Israel <https://www.redhat.com/>

sradco@redhat.com <https://red.ht/sig> <https://redhat.com/summit>

On Thu, Jul 6, 2017 at 12:39 PM, Oved Ourfali <oourfali@redhat.com> wrote:

...
On Thu, Jul 6, 2017 at 12:19 PM, Roy Golan <rgolan@redhat.com> wrote:

...
On Thu, Jul 6, 2017 at 12:18 PM Roy Golan <rgolan@redhat.com> wrote:

...
Action items: - Demonstrate the effect of the reduction of stats collection on the system - WIP - Code changes: - config item change: NumberVmRefreshesBeforeSave from 5 to 10 - make the 'poll' vms job to fire at NumberVmRefreshesBeforeSave / 2 (or just make the code to support explicit time interval) - VDSM should get a config set with the sampling inteval - to support back-compat

- Chages to DWH sampling and ManageIQ?

I think manageIQ can cope with either 60 seconds or 20 seconds intervals (after a change we've made when we moved to 20 seconds). Put an action item indeed to check that with us if we'll decide to do so.

Indeed 20 or 60 seconds. Their implementation is very strict and coupled with vmware statistics which are 20 seconds.

...
...
...
On Thu, Jul 6, 2017 at 11:00 AM Yaniv Kaul <ykaul@redhat.com> wrote:

...
On Thu, Jul 6, 2017 at 10:04 AM, Oved Ourfali <oourfali@redhat.com> wrote:

...
On Thu, Jul 6, 2017 at 9:38 AM, Arik Hadas <ahadas@redhat.com> wrote:

> > > On Wed, Jul 5, 2017 at 9:36 PM, Shirly Radco <sradco@redhat.com> > wrote: > >> >> >> -- >> >> SHIRLY RADCO >> >> BI SOFTWARE ENGINEER, >> >> Red Hat Israel <https://www.redhat.com/> >> >> sradco@redhat.com >> <https://red.ht/sig> >> <https://redhat.com/summit> >> >> >> On Wed, Jul 5, 2017 at 6:35 PM, Arik Hadas <ahadas@redhat.com> >> wrote: >> >>> >>> >>> On Wed, Jul 5, 2017 at 5:57 PM, Roy Golan <rgolan@redhat.com> >>> wrote: >>> >>>> Hi all, >>>> >>>> I would like to get feedback on $subject and see if I'm missing >>>> something. The impact of this is simply less resource consumption and by >>>> that we can support even greater number of hosts [1] and vms in the system. >>>> >>> >>>> If you think more relaxed statistics collection will affect a >>>> core flow let me know - as far as I see I didn't spot anything critical. >>>> >>> >>>> The overhead of a cycle per host something like that: 2 >>>> roundtrips per host in a cycle, (vm + host stats) and tons of memory >>>> allocation for char[] -> json-> maps of maps -> VM/Vds statistics -> Maps >>>> -> serialiazing to DB. >>>> >>>> To minimize the effect of this change we can leave a call to >>>> 'list' verb to at least detect vms existence in the same rate as today. >>>> >>> >>> +1 >>> >>> >>>> >>>> Pros >>>> - Engine has rore resources to support more hosts/vms/other >>>> activities of the engine >>>> - Vdsm will have more resources as well (need to tweak vdsm to >>>> collect in the same >>>> frequency) >>>> - less DB writes and reads, approx half of what the system will >>>> do in the in its lifefpan (cause this is what is mainly does all the time) >>>> >>>> Cons >>>> - DWH/Dashboard will have less entries, I'm not sure what is >>>> graphical affect given our hourly resolution (cmiiw here) >>>> >>> >>> What's the frequency of the queries done by DWH/Dashboard? Do they >>> count on the _update_date column of the queried data? >>> >> >> Current frequency is 20 seconds. >> The configurations are queried based on the _update_date, but >> statistics are queried every interval. >> >> The affect will be less accuracy in the hourly calculations. >> > > Ack. So if the proposed change is done, it would probably make sense > to increase the inverval of those queries to be higher than 30 sec, or at > least taking into consideration the _update_date of vm_statistics as well. > >

Note that it will cause issues with cloudforms to change those queries to 30 seconds, so I guess we'll still query it every 20 seconds (although the data won't change in some of those queries).

I thought it was configurable in ManageIQ how often to query, but in any case even if we query every 20 seconds, we'll get updated VM stats, which is fine, and not as updated hosts stats, which is fine as well, from my perspective. Y.

...
>> >>> I'm asking because if they query the database every minute and say >>> "the time now is 10:30 and the queried data is ..." then there should not >>> be less entries. >>> >>> >>>> >>>> >>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1430876 >>>> >>> >>>> >>>> _______________________________________________ >>>> Devel mailing list >>>> Devel@ovirt.org >>>> http://lists.ovirt.org/mailman/listinfo/devel >>>> >>> >>> >>> _______________________________________________ >>> Devel mailing list >>> Devel@ovirt.org >>> http://lists.ovirt.org/mailman/listinfo/devel >>> >> >> > > _______________________________________________ > Devel mailing list > Devel@ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel >

_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

Nir Soffer

7:10 p.m.

On Wed, Jul 5, 2017 at 5:57 PM Roy Golan <rgolan@redhat.com> wrote:

...

Hi all,

I would like to get feedback on $subject and see if I'm missing something. The impact of this is simply less resource consumption and by that we can support even greater number of hosts [1] and vms in the system.

If you think more relaxed statistics collection will affect a core flow let me know - as far as I see I didn't spot anything critical.

The overhead of a cycle per host something like that: 2 roundtrips per host in a cycle, (vm + host stats) and tons of memory allocation for char[] -> json-> maps of maps -> VM/Vds statistics -> Maps -> serialiazing to DB.

To minimize the effect of this change we can leave a call to 'list' verb to at least detect vms existence in the same rate as today.

Pros - Engine has rore resources to support more hosts/vms/other activities of the engine - Vdsm will have more resources as well (need to tweak vdsm to collect in the same frequency) - less DB writes and reads, approx half of what the system will do in the in its lifefpan (cause this is what is mainly does all the time)

Cons - DWH/Dashboard will have less entries, I'm not sure what is graphical affect given our hourly resolution (cmiiw here)

Why we have a monitoring interval at all? why not move the stats to events? Vdsm should collect stats and send updates to engine, engine can do only polling only if vdsm did not send any update in the last couple of minutes or so. Same for stats collected by collectd, we want to stream updates to engine from the host, no poll every host for the stats. Nir

...

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1430876 _______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

Piotr Kliczewski

7:47 p.m.

On Thu, Jul 6, 2017 at 7:10 PM, Nir Soffer <nsoffer@redhat.com> wrote:

...

On Wed, Jul 5, 2017 at 5:57 PM Roy Golan <rgolan@redhat.com> wrote:

...
Hi all,

I would like to get feedback on $subject and see if I'm missing something. The impact of this is simply less resource consumption and by that we can support even greater number of hosts [1] and vms in the system.

If you think more relaxed statistics collection will affect a core flow let me know - as far as I see I didn't spot anything critical.

The overhead of a cycle per host something like that: 2 roundtrips per host in a cycle, (vm + host stats) and tons of memory allocation for char[] -> json-> maps of maps -> VM/Vds statistics -> Maps -> serialiazing to DB.

To minimize the effect of this change we can leave a call to 'list' verb to at least detect vms existence in the same rate as today.

Pros - Engine has rore resources to support more hosts/vms/other activities of the engine - Vdsm will have more resources as well (need to tweak vdsm to collect in the same frequency) - less DB writes and reads, approx half of what the system will do in the in its lifefpan (cause this is what is mainly does all the time)

Cons - DWH/Dashboard will have less entries, I'm not sure what is graphical affect given our hourly resolution (cmiiw here)

Why we have a monitoring interval at all? why not move the stats to events?

It was one of the main reason to build event infrastructure. I advocated for this many times but it seems that the priority to have it done is very low. Less work is required to change the polling cycle than to trigger events even for the cost of metric resolution.

...

Vdsm should collect stats and send updates to engine, engine can do only polling only if vdsm did not send any update in the last couple of minutes or so.

Same for stats collected by collectd, we want to stream updates to engine from the host, no poll every host for the stats.

Nir

...
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1430876 _______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

Roy Golan

9 Jul 9 Jul

4:52 p.m.

On Thu, Jul 6, 2017 at 8:10 PM Nir Soffer <nsoffer@redhat.com> wrote:

...

On Wed, Jul 5, 2017 at 5:57 PM Roy Golan <rgolan@redhat.com> wrote:

...
Hi all,

I would like to get feedback on $subject and see if I'm missing something. The impact of this is simply less resource consumption and by that we can support even greater number of hosts [1] and vms in the system.

If you think more relaxed statistics collection will affect a core flow let me know - as far as I see I didn't spot anything critical.

The overhead of a cycle per host something like that: 2 roundtrips per host in a cycle, (vm + host stats) and tons of memory allocation for char[] -> json-> maps of maps -> VM/Vds statistics -> Maps -> serialiazing to DB.

To minimize the effect of this change we can leave a call to 'list' verb to at least detect vms existence in the same rate as today.

Pros - Engine has rore resources to support more hosts/vms/other activities of the engine - Vdsm will have more resources as well (need to tweak vdsm to collect in the same frequency) - less DB writes and reads, approx half of what the system will do in the in its lifefpan (cause this is what is mainly does all the time)

Cons - DWH/Dashboard will have less entries, I'm not sure what is graphical affect given our hourly resolution (cmiiw here)

Why we have a monitoring interval at all? why not move the stats to events?

Events is not suited for everything and the current vdsm can only guarantee 'at most once' semantics. We can not rely on that and that is why we poll. Vdsm should collect stats and send updates to engine, engine can do only

...

polling only if vdsm did not send any update in the last couple of minutes or so.

Again you'd have to work harder to guarantee it . One of my main motivations here is to put as little effort as we can to gain more resources.

...

Same for stats collected by collectd, we want to stream updates to engine from the host, no poll every host for the stats.

Again, not always - consider this, for example if we want to support big setups, in the end they are going to stream huge amount of information to the engine - we would have a problem handling this pressure. Poll will allow us to choose who to poll and when. (backpressure if that helps)

Nir

...

...
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1430876 _______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

Piotr Kliczewski

9:20 p.m.

On Sun, Jul 9, 2017 at 4:52 PM, Roy Golan <rgolan@redhat.com> wrote:

...

On Thu, Jul 6, 2017 at 8:10 PM Nir Soffer <nsoffer@redhat.com> wrote:

...
On Wed, Jul 5, 2017 at 5:57 PM Roy Golan <rgolan@redhat.com> wrote:

...
Hi all,

I would like to get feedback on $subject and see if I'm missing something. The impact of this is simply less resource consumption and by that we can support even greater number of hosts [1] and vms in the system.

If you think more relaxed statistics collection will affect a core flow let me know - as far as I see I didn't spot anything critical.

The overhead of a cycle per host something like that: 2 roundtrips per host in a cycle, (vm + host stats) and tons of memory allocation for char[] -> json-> maps of maps -> VM/Vds statistics -> Maps -> serialiazing to DB.

To minimize the effect of this change we can leave a call to 'list' verb to at least detect vms existence in the same rate as today.

Pros - Engine has rore resources to support more hosts/vms/other activities of the engine - Vdsm will have more resources as well (need to tweak vdsm to collect in the same frequency) - less DB writes and reads, approx half of what the system will do in the in its lifefpan (cause this is what is mainly does all the time)

Cons - DWH/Dashboard will have less entries, I'm not sure what is graphical affect given our hourly resolution (cmiiw here)

Why we have a monitoring interval at all? why not move the stats to events?

Events is not suited for everything and the current vdsm can only guarantee 'at most once' semantics. We can not rely on that and that is why we poll.

I agree with the guarantees we have but why they are not enough for stats. Are the stats so critical?

...

Vdsm should collect stats and send updates to engine, engine can do only

...
polling only if vdsm did not send any update in the last couple of minutes or so.

Again you'd have to work harder to guarantee it . One of my main motivations here is to put as little effort as we can to gain more resources.

...
Same for stats collected by collectd, we want to stream updates to engine from the host, no poll every host for the stats.

Again, not always - consider this, for example if we want to support big setups, in the end they are going to stream huge amount of information to the engine - we would have a problem handling this pressure. Poll will allow us to choose who to poll and when. (backpressure if that helps)

You have describe the issue that we have with poll. In large envs we have constant time between the polls and single cycle takes longer. Back pressure was implemented only on the engine side and vdsm side is still missing.

...

Nir

...
...
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1430876 _______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

3142

Age (days ago)

3146

Last active (days ago)

List overview

Download

14 comments

9 participants

participants (9)

Arik Hadas
Nir Soffer
Oved Ourfali
Piotr Kliczewski
Piotr Kliczewski
Roy Golan
Shirly Radco
Yaniv Kaul
Yaniv Lavi (Dary)

Chaning the statistics monitoring interval to 30s

tags

participants (9)