November 2015 - Devel - Ovirt List Archives

[RFC] Proposal for dropping FC22 jenkins tests on master branch

by Sandro Bonazzola

Hi, can we drop FC22 testing in jenkins now that FC23 jobs are up and running? it will reduce jenkins load. If needed we can keep FC22 builds, just dropping the check jobs. Comments? -- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com

8 years, 9 months

3
7
0 / 0

Engine from master is working with Fedora 23

by Roy Golan

Just wanted to share that a dev env installed on Fedora 23 is working with no issues. Roy

8 years, 10 months

4
3
0 / 0

AppErrors cleanup

by Allon Mureinik

Hi all, A recent bug [1] reported as part of the translation effort alerted me to the fact that we have a lot (and I mean a LOT - over 100 per file) of deprecated, unused keys in the various AppErrors files that serve no purpose and just take up space and waste translators time when they examine them. To make a long story short - I've just merged a patch to remove all these useless messages, and enforce via unit tests that EVERY key there should have a corresponding constant in the EngineMessage or EngineError enums. Many thanks to my reviewers! I know this was an tedious patch that couldn't have been too fun to review. -Allon [1] https://bugzilla.redhat.com/show_bug.cgi?id=1244766

8 years, 10 months

5
6
0 / 0

Proposal: Hystrix for realtime command monitoring

by Roman Mohr

Hi All, I am contributing to the engine for three months now. While I dug into the code I started to wonder how to visualize what the engine is actually doing. To get better insights I added hystrix[1] to the engine. Hystrix is a circuit breaker library which was developed by Netflix and has one pretty interesting feature: Real time metrics for commands. In combination with hystrix-dashboard[2] it allows very interesting insights. You can easily get an overview of commands involved in operations, their performance and complexity. Look at [2] and the attachments in [5] and [6] for screenshots to get an Impression. I want to propose to integrate hystrix permanently because from my perspective the results were really useful and I also had some good experiences with hystrix in past projects. A first implementation can be found on gerrit[3]. # Where is it immediately useful? During development and QA. An example: I tested the hystrix integration on /api/vms and /api/hosts rest endpoints and immediately saw that the number of command exectuions grew lineary whit the number of vms and hosts. The bug reports [5] and [6] are the result. # How to monitor the engine? It is as easy as starting a hystrix-dashboard [2] with $ git clone https://github.com/Netflix/Hystrix.git $ cd Hystrix/hystrix-dashboard $ ../gradlew jettyRun and point the dashboard to https://<customer.engine.ip>/ovirt-engine/hystrix.stream. # Other possible benefits? * Live metrics at customer site for admins, consultants and support. * Historical metrics for analysis in addition to the log files. The metrics information is directly usable in graphite [7]. Therefore it would be possible to collect the json stream for a certain time period and analyze them later like in [4]. To do that someone just has to run curl --user admin@internal:engine http://localhost:8080/ovirt-engine/api/hystrix.stream > hystrix.stream for as long as necessary. The results can be analyzed later. # Possible architectural benefits? In addition to the live metrics we might also have use for the real hystrix features: * Circuit Breaker * Bulk execution of commands * De-dublication of commands (Caching) * Synchronous and asynchronous execution support * ... Our commands do already have a lot of features, so I don't think that there are some quick wins, but maybe there are interesting opportunities for infra. # Overhead? In [5] the netflix employees describe their results regarding the overhead of wrapping every command into a new instance of a hystrix command. They ran their tests on a standard 4-core Amazon EC2 server with a load of 60 request per second. When using threadpools they measured a mean overhead of less than one millisecond (so negligible). At the 90th percentile they measured an overhead of 3 ms. At the 99th percentile of about 9 ms. When configuring the hystrix commands to use semaphores instead of threadpools they are even faster. # How to integrate? A working implementation can be found on gerrit[3]. These patch sets wrap a hystrix command around every VdcAction, every VdcQuery and every VDSCommand. This just required four small modifications in the code base. # Security? In the provided patches the hystrix-metrics-servlet is accessible at /ovirt-engine/api/hystrix.stream. It is protected by basic auth but accessible for everyone who can authenticate. We should probably restrict it to admins. # Todo? 1) We do report failed actions with return values. Hystrix expects failing commands to throw an exception. So on the dashboard almost every command looks like a success. To overcome this, it would be pretty easy to throw an exception inside the command and catch it immediately after it leaves the hystrix wrapper. 2) Finetuning Do we want semaphores or a thread pool. When the thread pool, what size do we want? 3) Three unpackaged dependencies: archaius, hystrix-core, hystrix-contrib # References [1] https://github.com/Netflix/Hystrix [2] https://github.com/Netflix/Hystrix/tree/master/hystrix-dashboard [3] https://gerrit.ovirt.org/#/q/topic:hystrix [4] http://www.nurkiewicz.com/2015/02/storing-months-of-historical-metrics.html [5] https://github.com/Netflix/Hystrix/wiki/FAQ#what-is-the-processing-overhe... [5] https://bugzilla.redhat.com/show_bug.cgi?id=1268216 [6] https://bugzilla.redhat.com/show_bug.cgi?id=1268224 [7] http://graphite.wikidot.com

8 years, 10 months

3
5
0 / 0

[vdsm][oVirt 4.0] trimming down vm.py, stage 1

by Francesco Romani

Hi all, as many already know, we have a problem in the vdsm/virt/* area with the size of vm.py. As in today's master (SHA1: ed7618102e4e3aedf859adb1084eecded8e5158d), vm.py weights like: 1082 11:28:09 fromani@musashi ~/Projects/upstream/vdsm $ ls -lh vdsm/virt/vm.py -rw-rw-r--. 1 fromani fromani 209K Nov 24 11:15 vdsm/virt/vm.py 1083 11:30:22 fromani@musashi ~/Projects/upstream/vdsm $ wc vdsm/virt/vm.py 5202 17125 213026 vdsm/virt/vm.py I'd like to renew my commitment to bring this monster down to a sane size. I don't think we can make it in one go. We should tackle this monster task in stages. We have roughly five weeks left until the end of the year. Sounds like a good timespan for the first stage, and I'm aiming for go down under the 4000 line mark. So we have roughly 1200 lines to eliminate somehow. This is the initial plan I'm thinking off, in expected increasing difficulty level, from easies to hardest: 1. LiveMergeCleanupThread: Chance to break things: minimal - roughly 120 lines (10% of the goal) - can be moved verbatim (cut/paste) elsewhere, like vdsm/virt/livemerge.py, fixing only import paths. 2. the findDrive* family of functions: Chance to break things: minimal/very low - roughly 50 lines (~4% of the goal) - can be moved with minimal changes elsewhere, like vdsm/virt/storage/????.py, 3. the getConf* faimily of functions: Chance to break things: medium-to-high in the 3.x branch, low on 4.x because we drop backward compat. - roughly 110 lines (~9% of the goal) - just drop them for 4.0! - alternatively, need to figure out a proper place, maybe a new module? 4. the Devices/VM setup bits All scatthered through vm.py (e.g. _buildDomainXML, DeviceMapping table) Chance to break things: medium-to-low, depends on individual patches - all summed up, roughly 200 lines (~18% of the goal) - need serious cleanup - no definite plan - mostly is cut/paste or transform Vm methods into functions 5. the getUnderlying* familiy of functions: Chance to break things: low (maybe medium on some cases, to lean on the safe side) - roughly 550 lines (~46% of the goal) - easy to test (just run vms!) - can be moved in the devices/ subpackage with some changes. - require further work in the area to properly integrate this code in the device objects - more work is planned in this area, including move from minidom to ElementTree So far, I've gathered ~1030 lines, so I'm short of 170 lines to reach the goal, but I'm confident I can find something more. The plan now is to start from the easy parts *and* in background with #4, which requires way more work than everything else. Thoughts, objections and suggestions all welcome, especially from storage people, because first items in my list affects them. Thanks and bests, -- Francesco Romani RedHat Engineering Virtualization R & D Phone: 8261328 IRC: fromani

8 years, 10 months

1
2
0 / 0

[oVirt 3.6 Localization Question #39] "prefixMsg,"

by Yuko Katabami

Hello all, Here is another question: File: UIMessages Resource IDs: numberValidationNumberBetweenInvalidReason numberValidationNumberGreaterInvalidReason numberValidationNumberLessInvalidReason Strings: {0} between {1} and {2}. {0} greater than or equal to {1}. {0} less than or equal to {1}. Question: Could you please provide some example of those messages? Comment section in each of those strings states "0=prefixMsg" but I would like to know what will actually replace {0}, so that I can translate them properly. Kind regards, Yuko

8 years, 10 months

4
8
0 / 0

[vdsm] strange network test failure on FC23

by Francesco Romani

Hi, Jenkins doesn't like my (trivial) https://gerrit.ovirt.org/#/c/49271/ which is about moving one log line (!). The failure is 00:08:54.680 ====================================================================== 00:08:54.680 FAIL: testEnablePromisc (ipwrapperTests.TestDrvinfo) 00:08:54.680 ---------------------------------------------------------------------- 00:08:54.680 Traceback (most recent call last): 00:08:54.680 File "/home/jenkins/workspace/vdsm_master_check-patch-fc23-x86_64/vdsm/tests/ipwrapperTests.py", line 130, in testEnablePromisc 00:08:54.680 "Could not enable promiscuous mode.") 00:08:54.680 AssertionError: Could not enable promiscuous mode. 00:08:54.680 -------------------- >> begin captured logging << -------------------- 00:08:54.680 root: DEBUG: /usr/bin/taskset --cpu-list 0-1 /usr/sbin/brctl show (cwd None) 00:08:54.680 root: DEBUG: SUCCESS: <err> = ''; <rc> = 0 00:08:54.680 root: DEBUG: /usr/bin/taskset --cpu-list 0-1 /sbin/ip link add name vdsm-HIRjJp type bridge (cwd None) 00:08:54.680 root: DEBUG: SUCCESS: <err> = ''; <rc> = 0 00:08:54.680 root: DEBUG: /usr/bin/taskset --cpu-list 0-1 /sbin/ip link set dev vdsm-HIRjJp up (cwd None) 00:08:54.680 root: DEBUG: SUCCESS: <err> = ''; <rc> = 0 00:08:54.680 root: DEBUG: /usr/bin/taskset --cpu-list 0-1 /sbin/ip link set dev vdsm-HIRjJp promisc on (cwd None) 00:08:54.680 root: DEBUG: SUCCESS: <err> = ''; <rc> = 0 00:08:54.680 --------------------- >> end captured logging << --------------------- Here in fullest: http://jenkins.ovirt.org/job/vdsm_master_check-patch-fc23-x86_64/638/console The command like looks OK, and can't think any reason it could fail, except startup race. Using taskset, the ip command now takes a little longer to complete. Maybe -just wild guessing- the code isn't properly waiting for the command to complete? Otherwise not the slightest clue :) Bests, -- Francesco Romani RedHat Engineering Virtualization R & D Phone: 8261328 IRC: fromani

8 years, 10 months

6
10
0 / 0

[VDSM] Coverage reports

by Nir Soffer

Hi all, Thanks to Edward, we have now coverage reports in jenkins. The way to access the report on jenkins is to use this url: http://jenkins.ovirt.org/job/vdsm_master_check-patch-fc23-x86_64/<build-number>/artifact/exported-artifacts/htmlcov/index.html Here is an example, hopefully it will be accessible when you try: http://jenkins.ovirt.org/job/vdsm_master_check-patch-fc23-x86_64/648/arti... Todo: - Easy way to access the report from gerrit It should be easy to add a link the coverage report in the comment added by jenkins after a build finish. - Store the coverage reports for longer time, maybe a week? - We have only 45% coverage instead of the minimum, 100%. Note that coverage of 25% can mean *no* code was run by the tests. The only code running was the functions and class definitions. Here is an example: http://jenkins.ovirt.org/job/vdsm_master_check-patch-fc23-x86_64/648/arti... Modules that were never imported during the tests have 0% coverage: http://jenkins.ovirt.org/job/vdsm_master_check-patch-fc23-x86_64/648/arti... - coverage creates lot of useless noise in the jenkins logs, need to slicense this output. http://jenkins.ovirt.org/job/vdsm_master_check-patch-fc23-x86_64/648/cons... I did not find a way to do this in nosetests, may need hacking nose coverage plugin. - The report includes only the tests in the tests directory. We have additional tests in lib/vdsm/infra/* which are not included. We should move these to the tests directory. - The report is using absolute paths, but we like shorter relative paths. I don't see a way to configure nosetests or coverage to generate relative paths. May need hacking of the generated html/json in htmlcov. - Add "make coverage" target for running coverage locally - An easy way to enable coverage for the functional tests or for running a single test module. Can be done using nosetests --cover* options. Should be documented in tests/README, and maybe automated using a script or Makefile. When running locally, one would like to have the script open the report in the browser automatically: xdg-open tests/htmlcov/index.html - An easy way to enable coverage when testing flows in vdsm Petr sent a patch for enabling coverage using vdsm.conf: https://gerrit.ovirt.org/49168 We discussed adding vdsm-coverage package that will make it easy to setup Nir

8 years, 10 months

4
6
0 / 0

vdsm debian support

by Sandro Bonazzola

Hi VDSM developers, I built 3.6.1 packages for Debian and I like you to give it a run. I've one host with VDSM and its dependencies installed I can share with you. If you want to try it yourself, just install a minimal debian and add to */etc/apt/sources.list* : # 3.6.0 base deb http://resources.ovirt.org/pub/ovirt-3.6/debian/ binary/ deb-src http://resources.ovirt.org/pub/ovirt-3.6/debian/ source/ # 3.6.1 new packages deb http://resources.ovirt.org/pub/ovirt-3.6-pre/debian/ <http://resources.ovirt.org/pub/ovirt-3.6/debian/> binary/ deb-src http://resources.ovirt.org/pub/ovirt-3.6-pre/debian/ <http://resources.ovirt.org/pub/ovirt-3.6/debian/> source/ -- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com

8 years, 11 months

2
2
0 / 0

[ATN] [master] SSO patchset were merged

by Alon Bar-Lev

Hello, We have merged SSO patchset into master. These kind of deep infra changes are non trivial, we hope we reduced most of the side effects within the 171 revisions and testing. Thanks for Ravi Nori for his great effort! The SSO is based on OAuth2 specification, full description is available[1], it is a stable supported interface of engine. In a nut shell, the major change is that login dialog is now handled by a separate non gwt webapp, this webapp provides authentication and authorization services to other webapps. The immediate bonus is: no need to re-authenticate to user portal and/or admin portal, maybe soon we integrate reports. Performance bonus: if using spnego (kerberos) there is no performance penalty (double request). Usability bonus: support many authentication sequences we were unable to provide using the previous implementation. Regards, Alon Bar-Lev. [1] http://www.ovirt.org/Features/UniformSSOSupport

8 years, 11 months

6
6
0 / 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Devel November 2015