"Host.queryVms": A proposal for new a VDSM API verb

Hi, This proposal is a follow up on a previous discussion about optimizing the communication between VDSM and the ovirt-engine. Currently the engine polls VDSM every 3 seconds with the `list` verb to retrieve the status of all VMs. Every 5th time (every 15 seconds) it alternates to 'getAllVmStats' which gives more information about the VMs. Quite a big portion of the data, mainly the reply of getAllVmStats, is only of interest if it actually changes so that the engine/back-end can act on changes and update rows accordingly. Now the problem is that there is quite a lot of data transferred which is not really necessary and there is a 15 seconds delay on communicating changes to certain VMs before the engine actually can react on them and before these changes can be communicated to the user. As part of an improvement on this matter I'd like to propose a new API verb for VDSM: Host.queryVms(vmIds=[], fields=[], exclude=[], changedSince='') This new verb is intended to eventually replace 2 currently used verbs `list` and `getAllVmStats`. `queryVms` is supposed to allow to request any data fields of a VM which can be requested through the public API: - Statistics - Configuration - Status information - Guest OS Information I have executed some tests and in those tested scenarios the new Verb can result in an improvement of 75%-90% of data transferred and average response body size depending on the scenario and usage. The test results can be found here: http://www.ovirt.org/Feature/VDSM_VM_Query_API/Measurements#Results (An explanation of the tested methods is on the top of the page and a description of the scenario in each section) The benefit of introducing this verb would be a lowered volume of data transferred over the management network which avoids congestion on huge setups and can introduce an increased responsiveness for communicating changes in state to the user. For example live migration status. However even SLA related changes could be communicated faster for things like QoS. Preliminary Feature Proposal Wiki: http://www.ovirt.org/Feature/VDSM_VM_Query_API -- Regards, Vinzenz Feenstra | Senior Software Engineer RedHat Engineering Virtualization R & D Phone: +420 532 294 625 IRC: vfeenstr or evilissimo Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com

Thanks for this effort, I hope I understand this correctly, that your new API still works the same way? So engine pulls constantly from vdsm? If I understand the architecture correctly this should not be necessary, so vdsm could push updates to engine, when there are updates and engine could just query vdsm when a specified timeout without push notification occurs. I think this would scale better for higher numbers of vdsm, in fact for very high vdsm numbers connected to one engine something like a message queue implementation would scale even better (zero mq). -- Mit freundlichen Grüßen / Regards Sven Kieske Systemadministrator Mittwald CM Service GmbH & Co. KG Königsberger Straße 6 32339 Espelkamp T: +49-5772-293-100 F: +49-5772-293-333 https://www.mittwald.de Geschäftsführer: Robert Meyer St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen

On Jul 14, 2014, at 16:20 , Sven Kieske <S.Kieske@mittwald.de> wrote:
Thanks for this effort, I hope I understand this correctly, that your new API still works the same way? So engine pulls constantly from vdsm?
yes
If I understand the architecture correctly this should not be necessary, so vdsm could push updates to engine, when there are updates and engine could just query vdsm when a specified timeout without push notification occurs.
yes, push notifications are still in plan, we only recently added the infrastructural foundations with json-rpc, it will take some time before we can adjust Thanks, michal
I think this would scale better for higher numbers of vdsm, in fact for very high vdsm numbers connected to one engine something like a message queue implementation would scale even better (zero mq).
-- Mit freundlichen Grüßen / Regards
Sven Kieske
Systemadministrator Mittwald CM Service GmbH & Co. KG Königsberger Straße 6 32339 Espelkamp T: +49-5772-293-100 F: +49-5772-293-333 https://www.mittwald.de Geschäftsführer: Robert Meyer St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen _______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

On 07/14/2014 04:05 PM, Vinzenz Feenstra wrote:
Hi, Since this mail did not receive enough attention I am bumping it again.
For this proposal exists currently a draft patch http://gerrit.ovirt.org/#/c/28819/ The patch is not final since the query function should not require to update all the data on every call. (This should be done directly by data modifying code. However that would be implemented by follow up patches) The trackable.py implementation also can be easily extended in future for enabling push notifications to the engine once we switched to the new communication. This could be done by subscribing to certain keys in the TrackableMapping instance. (Not implemented yet) Anyway I am asking once more for comments on the proposal.
This proposal is a follow up on a previous discussion about optimizing the communication between VDSM and the ovirt-engine.
Currently the engine polls VDSM every 3 seconds with the `list` verb to retrieve the status of all VMs. Every 5th time (every 15 seconds) it alternates to 'getAllVmStats' which gives more information about the VMs.
Quite a big portion of the data, mainly the reply of getAllVmStats, is only of interest if it actually changes so that the engine/back-end can act on changes and update rows accordingly.
Now the problem is that there is quite a lot of data transferred which is not really necessary and there is a 15 seconds delay on communicating changes to certain VMs before the engine actually can react on them and before these changes can be communicated to the user.
As part of an improvement on this matter I'd like to propose a new API verb for VDSM:
Host.queryVms(vmIds=[], fields=[], exclude=[], changedSince='')
This new verb is intended to eventually replace 2 currently used verbs `list` and `getAllVmStats`. `queryVms` is supposed to allow to request any data fields of a VM which can be requested through the public API:
- Statistics - Configuration - Status information - Guest OS Information
I have executed some tests and in those tested scenarios the new Verb can result in an improvement of 75%-90% of data transferred and average response body size depending on the scenario and usage.
The test results can be found here: http://www.ovirt.org/Feature/VDSM_VM_Query_API/Measurements#Results (An explanation of the tested methods is on the top of the page and a description of the scenario in each section)
The benefit of introducing this verb would be a lowered volume of data transferred over the management network which avoids congestion on huge setups and can introduce an increased responsiveness for communicating changes in state to the user. For example live migration status. However even SLA related changes could be communicated faster for things like QoS.
Preliminary Feature Proposal Wiki: http://www.ovirt.org/Feature/VDSM_VM_Query_API
-- Regards, Vinzenz Feenstra | Senior Software Engineer RedHat Engineering Virtualization R & D Phone: +420 532 294 625 IRC: vfeenstr or evilissimo Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com

----- Original Message -----
From: "Vinzenz Feenstra" <vfeenstr@redhat.com> To: devel@ovirt.org Sent: Tuesday, July 22, 2014 11:29:40 AM Subject: Re: [ovirt-devel] "Host.queryVms": A proposal for new a VDSM API verb
On 07/14/2014 04:05 PM, Vinzenz Feenstra wrote:
Hi, Since this mail did not receive enough attention I am bumping it again.
For this proposal exists currently a draft patch http://gerrit.ovirt.org/#/c/28819/ The patch is not final since the query function should not require to update all the data on every call. (This should be done directly by data modifying code. However that would be implemented by follow up patches)
The trackable.py implementation also can be easily extended in future for enabling push notifications to the engine once we switched to the new communication. This could be done by subscribing to certain keys in the TrackableMapping instance. (Not implemented yet)
Nice! Both nice to have now and on top of json/push notifications tomorrow. I just had a quick look to the changes to vm.py and to the interface and looks nice as well. [...]
I have executed some tests and in those tested scenarios the new Verb can result in an improvement of 75%-90% of data transferred and average response body size depending on the scenario and usage.
The test results can be found here: http://www.ovirt.org/Feature/VDSM_VM_Query_API/Measurements#Results (An explanation of the tested methods is on the top of the page and a description of the scenario in each section)
Nice graphs :) Silly comment: having units in 'bytes' on the X-axis makes the numbers somehow hard to parse (to me). I suggest you to convert them to KiB for better readability. The savings look really nice. Bests, -- Francesco Romani RedHat Engineering Virtualization R & D Phone: 8261328 IRC: fromani

On 07/22/2014 02:00 PM, Francesco Romani wrote:
----- Original Message -----
From: "Vinzenz Feenstra" <vfeenstr@redhat.com> To: devel@ovirt.org Sent: Tuesday, July 22, 2014 11:29:40 AM Subject: Re: [ovirt-devel] "Host.queryVms": A proposal for new a VDSM API verb
On 07/14/2014 04:05 PM, Vinzenz Feenstra wrote:
Hi, Since this mail did not receive enough attention I am bumping it again.
For this proposal exists currently a draft patch http://gerrit.ovirt.org/#/c/28819/ The patch is not final since the query function should not require to update all the data on every call. (This should be done directly by data modifying code. However that would be implemented by follow up patches)
The trackable.py implementation also can be easily extended in future for enabling push notifications to the engine once we switched to the new communication. This could be done by subscribing to certain keys in the TrackableMapping instance. (Not implemented yet) Nice! Both nice to have now and on top of json/push notifications tomorrow. I just had a quick look to the changes to vm.py and to the interface and looks nice as well.
[...]
I have executed some tests and in those tested scenarios the new Verb can result in an improvement of 75%-90% of data transferred and average response body size depending on the scenario and usage.
The test results can be found here: http://www.ovirt.org/Feature/VDSM_VM_Query_API/Measurements#Results (An explanation of the tested methods is on the top of the page and a description of the scenario in each section) Nice graphs :) Silly comment: having units in 'bytes' on the X-axis makes the numbers somehow hard to parse (to me). I suggest you to convert them to KiB for better readability. The savings look really nice. The table below shows it in MiB in a separate column
Bests,
-- Regards, Vinzenz Feenstra | Senior Software Engineer RedHat Engineering Virtualization R & D Phone: +420 532 294 625 IRC: vfeenstr or evilissimo Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com

In 3.6 we are intending to have events in the json-rpc interface. That means that a lot of the problems this is trying to fix will no longer be an issue. Changes about VM status will be sent as soon as they are available instead of when the engine is polling. This means that the messages will be smaller. We will also be introducing compression which would make things even faster. I also expressed my problems with VDSM keeping state. When we move to an event driven system there would be no reason to cache data anymore as internal polling would generate messages and external polling would be used *only* to get up to date information. With call overhead being minimal getAllVmStats would just become a list of getVmStats() for every VM of interest making the filtering redundant. Also, without internal caching it would also make the interface of specifying the fields to include or exclude a bit problematic. Having a bitflag (or a list of flags) to specify what type of information is needed as it is easier to figure out what actual calls need to be made. It is also easier for the user to know that they are adding overhead to the response this way. ----- Original Message -----
From: "Vinzenz Feenstra" <vfeenstr@redhat.com> To: devel@ovirt.org Sent: Monday, July 14, 2014 5:05:17 PM Subject: [ovirt-devel] "Host.queryVms": A proposal for new a VDSM API verb
Hi,
This proposal is a follow up on a previous discussion about optimizing the communication between VDSM and the ovirt-engine.
Currently the engine polls VDSM every 3 seconds with the `list` verb to retrieve the status of all VMs. Every 5th time (every 15 seconds) it alternates to 'getAllVmStats' which gives more information about the VMs.
Quite a big portion of the data, mainly the reply of getAllVmStats, is only of interest if it actually changes so that the engine/back-end can act on changes and update rows accordingly.
Now the problem is that there is quite a lot of data transferred which is not really necessary and there is a 15 seconds delay on communicating changes to certain VMs before the engine actually can react on them and before these changes can be communicated to the user.
As part of an improvement on this matter I'd like to propose a new API verb for VDSM:
Host.queryVms(vmIds=[], fields=[], exclude=[], changedSince='')
This new verb is intended to eventually replace 2 currently used verbs `list` and `getAllVmStats`. `queryVms` is supposed to allow to request any data fields of a VM which can be requested through the public API:
- Statistics - Configuration - Status information - Guest OS Information
I have executed some tests and in those tested scenarios the new Verb can result in an improvement of 75%-90% of data transferred and average response body size depending on the scenario and usage.
The test results can be found here: http://www.ovirt.org/Feature/VDSM_VM_Query_API/Measurements#Results (An explanation of the tested methods is on the top of the page and a description of the scenario in each section)
The benefit of introducing this verb would be a lowered volume of data transferred over the management network which avoids congestion on huge setups and can introduce an increased responsiveness for communicating changes in state to the user. For example live migration status. However even SLA related changes could be communicated faster for things like QoS.
Preliminary Feature Proposal Wiki: http://www.ovirt.org/Feature/VDSM_VM_Query_API
-- Regards,
Vinzenz Feenstra | Senior Software Engineer RedHat Engineering Virtualization R & D Phone: +420 532 294 625 IRC: vfeenstr or evilissimo
Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
participants (5)
-
Francesco Romani
-
Michal Skrivanek
-
Saggi Mizrahi
-
Sven Kieske
-
Vinzenz Feenstra