February 2017 - Devel - oVirt List Archives

Lowering the bar for wiki contribution?
by Roy Golan 22 Jun '17

22 Jun '17

I'm getting the feeling I'm not alone in this, authoring and publishing a wiki page isn't as used to be for long time. I want to suggest a bit lighter workflow: 1. Everyone can merge their page - (it's a wiki) Same as with (public and open) code, no one has the motivation to publish a badly written wiki page under their name. True, it can have an impact, but not as with broken code 2. Use Page-Status marker The author first merges the draft. Its now out there and should be updated as time goes and its status is DRAFT. Maintainers will come later and after review would change the status to PUBLISH. That could be a header in on the page: --- page status: DRAFT/PUBLISH --- Simple I think, and should work.

14 23

Re: [ovirt-devel] [monitoring][collectd] the collectd virt plugin is now on par with Vdsm needs
by Yaniv Dary 16 Mar '17

16 Mar '17

Yaniv Dary Technical Product Manager Red Hat Israel Ltd. 34 Jerusalem Road Building A, 4th floor Ra'anana, Israel 4350109 Tel : +972 (9) 7692306 8272306 Email: ydary(a)redhat.com IRC : ydary On Feb 21, 2017 13:06, "Francesco Romani" <fromani(a)redhat.com> wrote: Hello everyone, in the last weeks I've been submitting PRs to collectd upstream, to bring the virt plugin up to date with Vdsm and oVirt needs. Previously, the collectd virt plugin reported only a subset of metrics oVirt uses. In current collectd master, the collectd virt plugin provides all the data Vdsm (thus Engine) needs. This means that it is now possible for Vdsm or Engine to query collectd, not Vdsm/libvirt, and have the same data. There are only two caveats: 1. it is yet to be seen which version of collectd will ship all those enhancements 2. collectd *intentionally* report metrics as rates, not as absolute values as Vdsm does. This may be one issue in presence of restarts/data loss in the link between collectd and the metrics store. How does this work? If we want to show memory usage over time for example, we need to have the usage, not the rate. How would this be reported? Please keep reading for more details: How to get the code? -------------------------------- This somehow tricky until we get one official release. If one is familiar with the RPM build process, it is easy to build one custom packages from a snapshot from collectd master (https://github.com/collectd/collectd) and a recent 5.7.1 RPM (like https://koji.fedoraproject.org/koji/buildinfo?buildID=835669) How to configure it? ------------------------------ Most thing work out of the box. One currently in progress Vdsm patch ships the recommended configuration https://gerrit.ovirt.org/#/c/71176/6/static/etc/collectd.d/virt.conf The meaning of the configuration option is documented in man 5 collectd.conf How it looks like? -------------------------- Let me post one "screenshot" :) $ collectdctl listval | grep a0 a0/virt/disk_octets-hdc a0/virt/disk_octets-vda a0/virt/disk_ops-hdc a0/virt/disk_ops-vda a0/virt/disk_time-hdc a0/virt/disk_time-vda a0/virt/if_dropped-vnet0 a0/virt/if_errors-vnet0 a0/virt/if_octets-vnet0 a0/virt/if_packets-vnet0 a0/virt/memory-actual_balloon a0/virt/memory-rss a0/virt/memory-total a0/virt/ps_cputime a0/virt/total_requests-flush-hdc a0/virt/total_requests-flush-vda a0/virt/total_time_in_ms-flush-hdc a0/virt/total_time_in_ms-flush-vda a0/virt/virt_cpu_total a0/virt/virt_vcpu-0 a0/virt/virt_vcpu-1 How to consume the data? ----------------------------------------- Among the ways to query collectd, the two most popular (and most fitting for oVirt use case) ways are perhaps the network protocol (https://collectd.org/wiki/index.php/Binary_protocol) and the plain text protocol (https://collectd.org/wiki/index.php/Plain_text_protocol) The first could be used by Engine to get the data directly, or to consolidate the metrics in one database (e.g to run any kind of query, for historical series...). The latter will be used by Vdsm to keep reporting the metrics (again https://gerrit.ovirt.org/#/c/71176/6) Please note that the performance of the plain text protocol are known to be lower than the binary protocol What about the unresponsive hosts? ------------------------------------------------------- We know from experience that hosts may become unresponsive, and this can disrupt monitoring. however, we do want to keep monitoring the responsive hosts, avoiding that one rogue hosts makes us lose all the monitoring data. To cope with this need, the virt plugin gained support for "partition tag". With this, we can group VMs together using one arbitrary tag. This is completely transparent to collectd, and also completely optional. oVirt can use this tag to group VMs per-storage-domain, or however it sees fit, trying to minimize the disruption should one host become unresponsive. Read the full docs here: https://github.com/collectd/collectd/commit/999efc28d8e2e96bc15f535254d412 a79755ca4f What about the collectd-ovirt plugin? -------------------------------------------------------- Some time ago I implemented one out-of-tree collectd plugin leveraging the libvirt bulk stats: https://github.com/fromanirh/collectd-ovirt This plugin is meant to be a modern, drop-in replacement for the existing virt plugin. The development of that out of tree plugin is now halted, because we have everything we need in the upstream collectd plugin. Future work ------------------ We believe we have reached feature parity, so we are looking for bugixes/performance tuning in the near term future. I'll be happy to provide more patches/PRs about that. Thanks and bests, -- Francesco Romani Red Hat Engineering Virtualization R & D IRC: fromani _______________________________________________ Devel mailing list Devel(a)ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

3 11

HE setup failure
by Sahina Bose 09 Mar '17

09 Mar '17

Hi all, The HE setup fails in ovirt-system-tests while deploying HE on hyperconverged gluster setup using master Error : Failed to execute stage 'Misc configuration': <ProtocolError for localhost:54321/RPC2: 400 Bad Request>" Traceback from hosted-engine log: ProtocolError: <ProtocolError for localhost:54321/RPC2: 400 Bad Request> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 279, in create_volume volUUID=volume_uuid File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 245, in _get_volume_path volUUID File "/usr/lib64/python2.7/xmlrpclib.py", line **FILTERED**3, in __call__ return self.__send(self.__name, args) File "/usr/lib64/python2.7/xmlrpclib.py", line 1587, in __request verbose=self.__verbose File "/usr/lib64/python2.7/xmlrpclib.py", line 1273, in request return self.single_request(host, handler, request_body, verbose) File "/usr/lib64/python2.7/xmlrpclib.py", line 1321, in single_request response.msg, ProtocolError: <ProtocolError for localhost:54321/RPC2: 400 Bad Request> Is this a regression?

3 4

OST: HE installation on 4.1 fails due to default cluster level
by Sahina Bose 03 Mar '17

03 Mar '17

Hi all, The ovirt-engine 4.1 appliance has the Default cluster set to 4.2 and the hyperconverged OST tests fail here as the 4.1 host cannot be added to cluster. (CLUSTER_VERSION_INCOMPATIBLE_WITH_CLUSTER) The appliance is from http://resources.ovirt.org/repos/ovirt/tested/master/rpm/el7/noarch/ovirt-e… Has the default cluster level been changed in an updated appliance? Test results at http://jenkins.ovirt.org/view/oVirt system tests/job/ovirt_4.1_hc-system-tests/1/artifact/exported-artifacts thanks sahina

4 5

dashboard jenkins jobs
by Sandro Bonazzola 01 Mar '17

01 Mar '17

Hi, just seen https://gerrit.ovirt.org/#/c/73182/ merged and no info on why all those jobs have been dropped. Can anybody update on this? As far as I can tell 4.y jobs are still needed. Or am I missing something? Thanks, -- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com

3 6

proposing Filip Krepinsky as moVirt maintainer
by Tomas Jelinek 01 Mar '17

01 Mar '17

Hi All, it is my pleasure to propose Filip Krepinsky (suomiy) as the new maintainer of the moVirt project! Filip has joined the moVirt project 11th November 2015 by this commit: https://github.com/matobet/moVirt/commit/d37e71a436e38b6342903b991a30a8f358… and since than with his 140 commits became the contributor with second most patches (from 13 contributors in total). He have also contributed to the aSPICE project and was active in finding issues in the oVirt core (around dashboard). His patches are well crafted and I'm absolutely sure he will do great as a maintainer. have a nice day, Tomas

3 2

[VDSM] granting network+2 to Eddy
by Dan Kenigsberg 28 Feb '17

28 Feb '17

Hi, After more than a year of substantial contribution to Vdsm networking, and after several months of me upgrading his score, I would like to nominate Eddy as a maintainer for network-related code in Vdsm, in master and stable branches. Current Vdsm maintainers and others: please approve my suggestion if you agree with it. Regards, Dan.

5 4

[ OST Failure Report ] [ ovirt master ] [17.02.17 ] [ external deoendencies missed ]
by Pavel Zhukov 28 Feb '17

28 Feb '17

Hello, Many dependencies are missed [1] Suspected patch is one of the Sandro's fc24 deprecation patches most likely https://gerrit.ovirt.org/#/c/73142 link to first failed build: http://jenkins.ovirt.org/view/experimental%20jobs/job/test-repo_ovirt_exper… Changes: http://jenkins.ovirt.org/job/ovirt-engine_master_build-artifacts-el7-x86_64… Log: http://jenkins.ovirt.org/view/experimental%20jobs/job/test-repo_ovirt_exper… [1] [ ERROR ] Yum [u'ovirt-engine-4.2.0-0.0.master.20170227145042.giteb5e384.el7.centos.noarch requires collectd-postgresql', u'ovirt-engine-4.2.0-0.0.master.20170227145042.giteb5e384.el7.centos.noarch requires collectd-write_http', u'ovirt-engine-4.2.0-0.0.master.20170227145042.giteb5e384.el7.centos.noarch requires rubygem-fluent-plugin-secure-forward', u'ovirt-engine-4.2.0-0.0.master.20170227145042.giteb5e384.el7.centos.noarch requires collectd', u'ovirt-engine-4.2.0-0.0.master.20170227145042.giteb5e384.el7.centos.noarch requires collectd-disk', u'ovirt-engine-4.2.0-0.0.master.20170227145042.giteb5e384.el7.centos.noarch requires fluentd'] -- Pavel Zhukov Software Engineer RHV DevOps IRC: landgraf

4 7

[monitoring][collectd] the collectd virt plugin is now on par with Vdsm needs
by Francesco Romani 27 Feb '17

27 Feb '17

Hello everyone, in the last weeks I've been submitting PRs to collectd upstream, to bring the virt plugin up to date with Vdsm and oVirt needs. Previously, the collectd virt plugin reported only a subset of metrics oVirt uses. In current collectd master, the collectd virt plugin provides all the data Vdsm (thus Engine) needs. This means that it is now possible for Vdsm or Engine to query collectd, not Vdsm/libvirt, and have the same data. There are only two caveats: 1. it is yet to be seen which version of collectd will ship all those enhancements 2. collectd *intentionally* report metrics as rates, not as absolute values as Vdsm does. This may be one issue in presence of restarts/data loss in the link between collectd and the metrics store. Please keep reading for more details: How to get the code? -------------------------------- This somehow tricky until we get one official release. If one is familiar with the RPM build process, it is easy to build one custom packages from a snapshot from collectd master (https://github.com/collectd/collectd) and a recent 5.7.1 RPM (like https://koji.fedoraproject.org/koji/buildinfo?buildID=835669) How to configure it? ------------------------------ Most thing work out of the box. One currently in progress Vdsm patch ships the recommended configuration https://gerrit.ovirt.org/#/c/71176/6/static/etc/collectd.d/virt.conf The meaning of the configuration option is documented in man 5 collectd.conf How it looks like? -------------------------- Let me post one "screenshot" :) $ collectdctl listval | grep a0 a0/virt/disk_octets-hdc a0/virt/disk_octets-vda a0/virt/disk_ops-hdc a0/virt/disk_ops-vda a0/virt/disk_time-hdc a0/virt/disk_time-vda a0/virt/if_dropped-vnet0 a0/virt/if_errors-vnet0 a0/virt/if_octets-vnet0 a0/virt/if_packets-vnet0 a0/virt/memory-actual_balloon a0/virt/memory-rss a0/virt/memory-total a0/virt/ps_cputime a0/virt/total_requests-flush-hdc a0/virt/total_requests-flush-vda a0/virt/total_time_in_ms-flush-hdc a0/virt/total_time_in_ms-flush-vda a0/virt/virt_cpu_total a0/virt/virt_vcpu-0 a0/virt/virt_vcpu-1 How to consume the data? ----------------------------------------- Among the ways to query collectd, the two most popular (and most fitting for oVirt use case) ways are perhaps the network protocol (https://collectd.org/wiki/index.php/Binary_protocol) and the plain text protocol (https://collectd.org/wiki/index.php/Plain_text_protocol) The first could be used by Engine to get the data directly, or to consolidate the metrics in one database (e.g to run any kind of query, for historical series...). The latter will be used by Vdsm to keep reporting the metrics (again https://gerrit.ovirt.org/#/c/71176/6) Please note that the performance of the plain text protocol are known to be lower than the binary protocol What about the unresponsive hosts? ------------------------------------------------------- We know from experience that hosts may become unresponsive, and this can disrupt monitoring. however, we do want to keep monitoring the responsive hosts, avoiding that one rogue hosts makes us lose all the monitoring data. To cope with this need, the virt plugin gained support for "partition tag". With this, we can group VMs together using one arbitrary tag. This is completely transparent to collectd, and also completely optional. oVirt can use this tag to group VMs per-storage-domain, or however it sees fit, trying to minimize the disruption should one host become unresponsive. Read the full docs here: https://github.com/collectd/collectd/commit/999efc28d8e2e96bc15f535254d412a… What about the collectd-ovirt plugin? -------------------------------------------------------- Some time ago I implemented one out-of-tree collectd plugin leveraging the libvirt bulk stats: https://github.com/fromanirh/collectd-ovirt This plugin is meant to be a modern, drop-in replacement for the existing virt plugin. The development of that out of tree plugin is now halted, because we have everything we need in the upstream collectd plugin. Future work ------------------ We believe we have reached feature parity, so we are looking for bugixes/performance tuning in the near term future. I'll be happy to provide more patches/PRs about that. Thanks and bests, -- Francesco Romani Red Hat Engineering Virtualization R & D IRC: fromani

5 6

[ OST Failure Report ] [ oVirt master ] [ 26.02.2017 ] [test-repo_ovirt_experimental_master]
by Shlomo Ben David 26 Feb '17

26 Feb '17

Hi, Test failed: [ test-repo_ovirt_experimental_master ] Link to Job: [1] Link to all logs: [2] Link to error log: [3] [1] http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/5538 [2] http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/5538/artif… [3] http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/5538/artif… Error snippet from the log: <error> 2017-02-26 05:35:53,340-0500 ERROR (jsonrpc/3) [storage.TaskManager.Task] (Task='22828901-d87f-4607-9690-a106c474ebe4') Unexpected error (task:871) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 878, in _run return fn(*args, **kargs) File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 52, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 3200, in teardownImage dom.deactivateImage(imgUUID) File "/usr/share/vdsm/storage/blockSD.py", line 1278, in deactivateImage lvm.deactivateLVs(self.sdUUID, volUUIDs) File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py", line 1304, in deactivateLVs _setLVAvailability(vgName, toDeactivate, "n") File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py", line 845, in _setLVAvailability raise error(str(e)) CannotDeactivateLogicalVolume: Cannot deactivate Logical Volume: ('General Storage Exception: ("5 [] [\' Logical volume 1dd0ee2a-f26a-423c-90f9-79703343aa1e/8323e511-eb93-4b1c-a9fa-ad66409994e7 in use.\', \' Logical volume 1dd0ee2a-f26a-423c-90f9-79703343aa1e/99662dfa-acf2-4392-8ab8-106412c2afa5 in use.\']\\n1dd0ee2a-f26a-423c-90f9-79703343aa1e/[\'99662dfa-acf2-4392-8ab8-106412c2afa5\', \'8323e511-eb93-4b1c-a9fa-ad66409994e7\']",)',) 2017-02-26 05:35:53,347-0500 INFO (jsonrpc/3) [storage.TaskManager.Task] (Task='22828901-d87f-4607-9690-a106c474ebe4') aborting: Task is aborted: 'Cannot deactivate Logical Volume' - code 552 (task:1176) 2017-02-26 05:35:53,348-0500 ERROR (jsonrpc/3) [storage.Dispatcher] {'status': {'message': 'Cannot deactivate Logical Volume: (\'General Storage Exception: ("5 [] [\\\' Logical volume 1dd0ee2a-f26a-423c-90f9-79703343aa1e/8323e511-eb93-4b1c-a9fa-ad66409994e7 in use.\\\', \\\' Logical volume 1dd0ee2a-f26a-423c-90f9-79703343aa1e/99662dfa-acf2-4392-8ab8-106412c2afa5 in use.\\\']\\\\n1dd0ee2a-f26a-423c-90f9-79703343aa1e/[\\\'99662dfa-acf2-4392-8ab8-106412c2afa5\\\', \\\'8323e511-eb93-4b1c-a9fa-ad66409994e7\\\']",)\',)', 'code': 552}} (dispatcher:78) 2017-02-26 05:35:53,349-0500 INFO (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call Image.teardown failed (error 552) in 19.36 seconds (__init__:552) </error> Best Regards, Shlomi Ben-David | Software Engineer | Red Hat ISRAEL RHCSA | RHCVA | RHCE IRC: shlomibendavid (on #rhev-integ, #rhev-dev, #rhev-ci) OPEN SOURCE - 1 4 011 && 011 4 1

4 4