[vdsm][virt] what's new in virt (20151215)
by Francesco Romani
Hi all,
I'm happy to start a weekly summary of what's going on on virt's world (VDSM edition).
General topics
* I've got some mixed feedback about my Vm-on-a-diet effort. I still believe that fat trimming
is a worthwhile goal per se, but I'm willing to adapt the strategy responding to the comments,
so we will focus on virt-specific topics.
* As consequence of the above, I'm focusing on the last bits of device fixing, which involves
- fixing all device to update themselves from libvirt XML after domain boot, instead using
all the getUnderyling* methods
- switch the devices related code to use Etree instead of minidom. This will involve changes to
the domain_descriptor.
I estimate this task will still trim ~600 lines out of vm.py, so it still somehow gets some
size trimming done, albeit not intensive as planned.
This is a complex topic, will post plan and ideas on a separate mail
* Finally, some series are still worth pushing forward, see below.
Patches in need of attention
* topic branches
- mpolednik started a much needed cleanup and fixing of fake_qemu and fake_kvm code, with the ultimate goal to move all
the remaining bits into the faqemu hook, and to make it useful on ppc64.
Lots of refactoring is needed to support this change, and that produced
https://gerrit.ovirt.org/#/q/topic:cpuinfo
- we want to improve the reporting in case of migratio aborted. The ultimate goal is to let Engine (thus the User)
know why a migration failed. To export this information, however, we need some cleanup before. Hence:
https://gerrit.ovirt.org/#/q/status:open+project:vdsm+branch:master+topic...
- last and less urgent: here's some cleanup about existing getUnderlying* methods of Vm class, preparing
the last step of the big vmdevices split. I believe this is useful anyway
https://gerrit.ovirt.org/#/q/status:open+project:vdsm+branch:master+topic...
* single patches
- https://gerrit.ovirt.org/#/c/46846
this is the first of a series aiming to improve migration support in 4.0. Probably worth merging all together,
even though this one seems ready for broader review to me.
- https://gerrit.ovirt.org/#/c/49173/
Notified just to raise awareness, still working on ensuring backward compatibility and smooth upgrade
- https://gerrit.ovirt.org/#/c/48672
v2v xen support
- https://gerrit.ovirt.org/#/c/49951/
OVA support improvements. Worth a look, but note that we are working toward a split of this big patch
- https://gerrit.ovirt.org/#/c/49636
V2V refactoring, also almost ready
- https://gerrit.ovirt.org/#/c/49570/
still in the context of migration enhancements. We want to throttle incoming migrations, to do so we want
to use a sempahore which needs to be held by the creation thread until VM is created.
This helper makes this possible, using an uniform interface for both this case and the common, simpler case.
That's all for now, as usual, reviews welcome! :)
--
Francesco Romani
RedHat Engineering Virtualization R & D
Phone: 8261328
IRC: fromani
8 years, 8 months
Automation CI for vdsm
by Yaniv Bronheim
Hi all,
We want to run functional tests as part of Vdsm CI for each patch before
merge. Therefore we need to declare how to automate this process without
overloading our jenkins machines.
The functional tests will run using lago (https://github.com/ovirt/lago) -
It will initiate multiply vms, install vdsm and manipulate it by nosetests
or other procedures such as upgrade, removal and so on.
Currently standard CI provides check-patch and check-merged scripts (
http://ovirt-infra-docs.readthedocs.org/en/latest/CI/Build_and_test_stand...)
the problem with check-merged is that it will run after merge which doesn't
help if something fails.
We want to allow developers to trigger the script once reviews and
verification are ready (last step before merge). To do so we agreed to add
Continues Integration flag for each vdsm patch. Once this flag will be
signed with +1 it will trigger Jenkins CI to run the check-merged script
(adding new button to gerrit is not an option - you can image that flag as
a trigger button), on success Jenkins CI flag will turn to +2. on fail
we'll get -1 and once new patchset is ready the developer will remove the
+1 and add it back to the Continues Integration flag to re-trigger the job.
Please ack the process before we move on with that
The patch for those scripts still under review and testing -
https://gerrit.ovirt.org/#/c/48268
Thanks
--
*Yaniv Bronhaim.*
8 years, 10 months
[RFC] Proposal for dropping FC22 jenkins tests on master branch
by Sandro Bonazzola
Hi,
can we drop FC22 testing in jenkins now that FC23 jobs are up and running?
it will reduce jenkins load. If needed we can keep FC22 builds, just
dropping the check jobs.
Comments?
--
Sandro Bonazzola
Better technology. Faster innovation. Powered by community collaboration.
See how it works at redhat.com
8 years, 10 months
Subject: Looking for advice regarding user portal development.
by Thomas Shaw
Evening all,
I'm a student from the UK, currently studying Bsc (Hons) Computer Forensics
& Security, looking for a bit of advice regarding front-end development for
the oVirt user portal. For my final year project I’m going to be extending
the current user portal to produce a user interface that fits the needs of
Computer Security education.
I want to create an interface that supports the workflow of a tutor
configuring one to many virtual machines / networks for a scenario based
lab session. The project will involve developing two different interfaces
depending on whether a student or tutor is logged in.
The project is going to be based on a paper that was presented in the UK at
the first Cybersecurity Education and Training conference as part of the
Vibrant Workshop, and is available in full here:
http://z.cliffe.schreuders.org/publications/VibrantWorkshop2015%20-%20An%...
The tutor portal will have an avenue for tutors to upload images and a way
to group templates of virtual machines for students to clone. The student
portal will have one webpage with a list of their owned VMs on the left,
like the current user portal, and all lab VM templates on the right
organised/grouped by lab. Students will be able to clone a VM or group of
VM's from the right with a couple of clicks at most.
I'm planning on forking the user-portal code and making changes here. I
want a different interface to appear for the students and the tutors.
Initial thoughts were to do this based on the 'Role' concept within oVirt -
I plan on creating 2 new user group’s with predefined roles, Student and
Tutor, and having the tutor-only functionality appear on an additional tab
which only appears if the logged in user has the Tutor role.
We have a powerful development server at university which is currently
running oVirt 3.6 with 3 nodes. As of right now I've set up the engine
development environment on Fedora 23. I'm still getting familiar with the
code and build process and have not yet deployed a build to the server.
It is my hope and intention that any work I produce will be of value to the
greater community and will open source all work. I would really appreciate
any input / advice in developing for the user portal and hope to produce
something make my code changes more likely to be accepted upstream.
Thanks for reading,
Thomas Shaw
8 years, 10 months
Re: [ovirt-devel] Actively triggering of CI jobs
by Barak Korren
>>> > [1]: http://www.ovirt.org/CI/Build_and_test_standards
>>> >
>
> That's nice, but most of us are not aware of all that..
>
Well, we can do a better job advocating that, I try to mention this
almost in any infra/devel thread where 'CI' is mentioned.
I'm open to suggestions about how to make developers more aware of the
fact that the ultimate power to determine what happens in CI had
mostly been placed in their hands...
>
> From what I'm seeing, most of the developers here don't make their patches
> drafts.. moreover,
> - personally I didn't even know that it will not trigger jobs if it is a
> draft. (and I'm not the only one)
We, now you know... Adding 'devel' with hope more devs will read this.
> - sometimes I need to label my patches, therefor can't make it a draft
>
By 'label' you mean set topic?
Not sure those are mutually exclusive, 'git review' options seem to
indicate they are not. I will look deeper into that.
> nowadays we are waiting for the jobs too much to finish. and the reality is
> that too much jobs shouldn't run at all- despite all of the nice things you
> guys show here..
I which cases besides the patch not being "ready" (=draft...) should
jobs not run?
>
> I still think that it will be a better solution to force the developer to
> activate the tests manually (by adding a flag when pushing or even doing it
> with the jenkins client..)
>
We tried to add the 'workflow' flag for that at some point (It is used
by most infra projects), but it was not accepted with any enthusiasm
by the devs, you can search back the discussion on 'devel'.
--
Barak Korren
bkorren(a)redhat.com
RHEV-CI Team
8 years, 10 months
vdsm_master_unit-tests_merged is failing
by Sandro Bonazzola
http://jenkins.ovirt.org/job/vdsm_master_unit-tests_merged/1155/consoleFull
is failing.
Looks like python3-nose is not installed inside the mock chroot.
*23:51:01* if [ -x "/usr/bin/python3" ]; then \*23:51:01*
PYTHON_EXE="/usr/bin/python3" ../tests/run_tests_local.sh \*23:51:01*
apiData.py cmdutilsTests.py cpuProfileTests.py ; \*23:51:01*
fi*23:51:01* Traceback (most recent call last):*23:51:01* File
"../tests/testrunner.py", line 42, in <module>*23:51:01* import
testlib*23:51:01* File "/tmp/run/vdsm/tests/testlib.py", line 38, in
<module>*23:51:01* from nose import config*23:51:01* ImportError:
No module named 'nose'*23:51:01* Makefile:1162: recipe for target
'check-local' failed*23:51:01* make[3]: *** [check-local] Error
1*23:51:01* make[3]: Leaving directory '/tmp/run/vdsm/tests'
Please fix ASAP
--
Sandro Bonazzola
Better technology. Faster innovation. Powered by community collaboration.
See how it works at redhat.com
8 years, 10 months
com.google.gwt.event.shared.UmbrellaException
by Sandro Bonazzola
Hi,
not sure the following traceback is enough, but I've the system still
around and can provide more logs / info if needed.
Adding one host, in the dialog select use foreman / satellite, disable the
checkbox.
An exception is reported on the top of the admin interface and
in firefox console I see:
Wed Dec 23 08:23:59 GMT+100 2015
SEVERE: Uncaught exception: com.google.gwt.event.shared.UmbrellaException:
Exception caught: Exception caught: (TypeError)
__gwt$exception: <skipped>: zab(...) is null
at Unknown.ms(Unknown Source)
at Unknown.us(Unknown Source)
at Unknown.e3(Unknown Source)
at Unknown.h3(Unknown Source)
at Unknown.s2(Unknown Source)
at Unknown.Im(Unknown Source)
at Unknown.Sm(Unknown Source)
at Unknown.g$(Unknown Source)
at Unknown.Km(Unknown Source)
at Unknown.Vm(Unknown Source)
at Unknown.$Ve(Unknown Source)
at Unknown.$Xe(Unknown Source)
at Unknown.Yt(Unknown Source)
at Unknown.au(Unknown Source)
at Unknown._t/<(Unknown Source)
at Unknown.anonymous(Unknown Source)
Caused by: com.google.gwt.event.shared.UmbrellaException: Exception caught:
(TypeError)
__gwt$exception: <skipped>: zab(...) is null
at Unknown.ms(Unknown Source)
at Unknown.us(Unknown Source)
at Unknown.e3(Unknown Source)
at Unknown.h3(Unknown Source)
at Unknown.s2(Unknown Source)
at Unknown.Im(Unknown Source)
at Unknown.Sm(Unknown Source)
at Unknown.g2(Unknown Source)
at Unknown.N4e(Unknown Source)
at Unknown.B$(Unknown Source)
at Unknown.C2(Unknown Source)
at Unknown.s2(Unknown Source)
at Unknown.Im(Unknown Source)
at Unknown.Sm(Unknown Source)
at Unknown.g$(Unknown Source)
at Unknown.Km(Unknown Source)
at Unknown.Vm(Unknown Source)
at Unknown.$Ve(Unknown Source)
at Unknown.$Xe(Unknown Source)
at Unknown.Yt(Unknown Source)
at Unknown.au(Unknown Source)
at Unknown._t/<(Unknown Source)
at Unknown.anonymous(Unknown Source)
Caused by: com.google.gwt.core.client.JavaScriptException: (TypeError)
__gwt$exception: <skipped>: zab(...) is null
at Unknown.s2u(Unknown Source)
at Unknown.RHr(Unknown Source)
at Unknown.hno(Unknown Source)
at Unknown.xno(Unknown Source)
at Unknown.n1u(Unknown Source)
at Unknown.p2u(Unknown Source)
at Unknown.RHr(Unknown Source)
at Unknown.hno(Unknown Source)
at Unknown.xno(Unknown Source)
at Unknown.o4u(Unknown Source)
at Unknown.s4u(Unknown Source)
at Unknown.Tuk(Unknown Source)
at Unknown.Zuk(Unknown Source)
at Unknown.f2(Unknown Source)
at Unknown.C2(Unknown Source)
at Unknown.s2(Unknown Source)
at Unknown.Im(Unknown Source)
at Unknown.Sm(Unknown Source)
at Unknown.g2(Unknown Source)
at Unknown.N4e(Unknown Source)
at Unknown.B$(Unknown Source)
at Unknown.C2(Unknown Source)
at Unknown.s2(Unknown Source)
at Unknown.Im(Unknown Source)
at Unknown.Sm(Unknown Source)
at Unknown.g$(Unknown Source)
at Unknown.Km(Unknown Source)
at Unknown.Vm(Unknown Source)
at Unknown.$Ve(Unknown Source)
at Unknown.$Xe(Unknown Source)
at Unknown.Yt(Unknown Source)
at Unknown.au(Unknown Source)
at Unknown._t/<(Unknown Source)
at Unknown.anonymous(Unknown Source)
com.google.gwt.event.shared.UmbrellaException
--
Sandro Bonazzola
Better technology. Faster innovation. Powered by community collaboration.
See how it works at redhat.com
8 years, 11 months
Re: [ovirt-devel] [VDSM] Running make check on *your* development machine
by Nir Soffer
On Tue, Dec 29, 2015 at 7:04 PM, Yaniv Kaul <ykaul(a)redhat.com> wrote:
> On Tue, Dec 29, 2015 at 6:29 PM, Nir Soffer <nsoffer(a)redhat.com> wrote:
>>
>> Hi all,
>>
>> Recently we added a new test, breaking make check when run on a
>> development
>> machine as non-privileged user.
>> https://gerrit.ovirt.org/50984
>>
>> This test pass on the CI environment, because the tests are running as
>> root.
>
>
> Should we do something different in CI?
We should, but I want to discuss this in the devel mailing list.
Nir
> Y.
>
>>
>>
>> Please verify that "make check" is successful on your development
>> environment.
>> otherwise, you may break the tests for others using this practice.
>>
>>
>> Best practice for running the tests:
>>
>> 1. Run the tests for the module you change:
>>
>> $ ./run_tests_local.sh rwlock_test.py
>> nose.config: INFO: Ignoring files matching ['^\\.', '^_', '^setup\\.py$']
>> rwlock_test.RWLockStressTests
>> test_fairness(1, 2) SKIP:
>> Stress tests are disabled
>> test_fairness(2, 8) SKIP:
>> Stress tests are disabled
>> test_fairness(3, 32) SKIP:
>> Stress tests are disabled
>> test_fairness(4, 128) SKIP:
>> Stress tests are disabled
>> rwlock_test.RWLockTests
>> test_concurrent_readers OK
>> test_demotion_no_waiters OK
>> test_demotion_with_blocked_reader SKIP:
>> Slow tests are disabled
>> test_demotion_with_blocked_writer SKIP:
>> Slow tests are disabled
>> test_exclusive_context_blocks_reader SKIP:
>> Slow tests are disabled
>> test_exclusive_context_blocks_writer SKIP:
>> Slow tests are disabled
>> test_fifo SKIP:
>> Slow tests are disabled
>> test_promotion_forbidden OK
>> test_recursive_read_lock OK
>> test_recursive_write_lock OK
>> test_release_other_thread_read_lock OK
>> test_release_other_thread_write_lock OK
>> test_shared_context_allows_reader OK
>> test_shared_context_blocks_writer SKIP:
>> Slow tests are disabled
>> test_wakeup_all_blocked_readers SKIP:
>> Slow tests are disabled
>> test_wakeup_blocked_reader SKIP:
>> Slow tests are disabled
>> test_wakeup_blocked_writer SKIP:
>> Slow tests are disabled
>>
>> ----------------------------------------------------------------------
>> Ran 21 tests in 0.005s
>>
>> This should be very fast and should run after every change.
>>
>> 2. Check that slow and stress tests pass
>>
>> Some modules have @slowtest and @stresstest, which are skipped by default.
>> When modifying such modules, enable also these tests:
>>
>> $ ./run_tests_local.sh rwlock_test.py --enable-slow-tests
>> --enable-stress-tests
>> nose.config: INFO: Ignoring files matching ['^\\.', '^_', '^setup\\.py$']
>> rwlock_test.RWLockStressTests
>> test_fairness(1, 2) OK
>> test_fairness(2, 8) OK
>> test_fairness(3, 32) OK
>> test_fairness(4, 128) OK
>> rwlock_test.RWLockTests
>> test_concurrent_readers OK
>> test_demotion_no_waiters OK
>> test_demotion_with_blocked_reader SKIP:
>> Known issue in current code
>> test_demotion_with_blocked_writer OK
>> test_exclusive_context_blocks_reader OK
>> test_exclusive_context_blocks_writer OK
>> test_fifo OK
>> test_promotion_forbidden OK
>> test_recursive_read_lock OK
>> test_recursive_write_lock OK
>> test_release_other_thread_read_lock OK
>> test_release_other_thread_write_lock OK
>> test_shared_context_allows_reader OK
>> test_shared_context_blocks_writer OK
>> test_wakeup_all_blocked_readers OK
>> test_wakeup_blocked_reader OK
>> test_wakeup_blocked_writer OK
>>
>> ----------------------------------------------------------------------
>> Ran 21 tests in 14.054s
>>
>> This may take more time.
>>
>> 3. When the module tests pass, run "make check"
>>
>> make check
>>
>> This takes about 90 seconds.
>>
>> To run all tests, including slow and stress tests, use:
>>
>> make check-all
>>
>> This may take couple of minutes, so it is not recommended.
>>
>> Thanks,
>> Nir
>> _______________________________________________
>> Devel mailing list
>> Devel(a)ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/devel
>
>
8 years, 11 months
CPU sockets, threads, cores and NUMA
by Martin Polednik
Hello developers,
tl;dr version:
* deprecate report_host_threads_as_cores
* remove cpuSockets, use sum(numaNodes.keys())
* report threadsPerCore for ppc64le / report total number of threads
for ppc64le
* work on our naming issues
I've been going over our capabilities reporting code in VDSM due to
specific threading requirements on ppc64le platform and noticed few
issues. Before trying to fix something that "works", I'm sending this
mail to start a discussion regarding current and future state of the
code.
First thing is the terminology. What we consider cpu sockets,
cores and threads are in fact NUMA cells, sum of cores present in NUMA
nodes and the same for threads. I'd like to see the code moving in a
direction that is correct in this sense.
More important are the actual calculations. I believe we should draw
an uncrossable line between cores and threads and not interfere with
it at least on VDSM's side. That would mean deprecating
report_host_threads_as_cores option. The actual algorithm used at
present does calculate the numa cores and numa threads correctly given
that there are no offline CPUs - most likely fine enough. We don't
have to report the actual number of sockets though, as it is reported
in numa* keys.
It does fail to provide us with information that can be used in
ppc64le environment, where for POWER8 we want to run the host without
SMT while VMs would have multiple CPUs assigned. There are various
configurations of so-called subcores in POWER8, where each CPU core
can contain 1, 2 or 4 subcores. This configuration must be taken in
consideration as given e.g. 160 threads overall, it is possible to run
either 20 VMs in smt8 mode, 40 VMs in smt4 mode or 80 VMs in smt2
mode. We have to report either the total number of threads OR just the
threadsPerCore setting, so the users know how many "CPUs" should be
assigned to machines for optimal performance.
As always, I welcome any opinions regarding the proposed ideas. Also
note that all of the changes can be done via deprecation to be fully
backwards compatible - except for the ppc part.
Regards,
mpolednik
8 years, 11 months
[VDSM] Running make check on *your* development machine
by Nir Soffer
Hi all,
Recently we added a new test, breaking make check when run on a development
machine as non-privileged user.
https://gerrit.ovirt.org/50984
This test pass on the CI environment, because the tests are running as root.
Please verify that "make check" is successful on your development environment.
otherwise, you may break the tests for others using this practice.
Best practice for running the tests:
1. Run the tests for the module you change:
$ ./run_tests_local.sh rwlock_test.py
nose.config: INFO: Ignoring files matching ['^\\.', '^_', '^setup\\.py$']
rwlock_test.RWLockStressTests
test_fairness(1, 2) SKIP:
Stress tests are disabled
test_fairness(2, 8) SKIP:
Stress tests are disabled
test_fairness(3, 32) SKIP:
Stress tests are disabled
test_fairness(4, 128) SKIP:
Stress tests are disabled
rwlock_test.RWLockTests
test_concurrent_readers OK
test_demotion_no_waiters OK
test_demotion_with_blocked_reader SKIP:
Slow tests are disabled
test_demotion_with_blocked_writer SKIP:
Slow tests are disabled
test_exclusive_context_blocks_reader SKIP:
Slow tests are disabled
test_exclusive_context_blocks_writer SKIP:
Slow tests are disabled
test_fifo SKIP:
Slow tests are disabled
test_promotion_forbidden OK
test_recursive_read_lock OK
test_recursive_write_lock OK
test_release_other_thread_read_lock OK
test_release_other_thread_write_lock OK
test_shared_context_allows_reader OK
test_shared_context_blocks_writer SKIP:
Slow tests are disabled
test_wakeup_all_blocked_readers SKIP:
Slow tests are disabled
test_wakeup_blocked_reader SKIP:
Slow tests are disabled
test_wakeup_blocked_writer SKIP:
Slow tests are disabled
----------------------------------------------------------------------
Ran 21 tests in 0.005s
This should be very fast and should run after every change.
2. Check that slow and stress tests pass
Some modules have @slowtest and @stresstest, which are skipped by default.
When modifying such modules, enable also these tests:
$ ./run_tests_local.sh rwlock_test.py --enable-slow-tests --enable-stress-tests
nose.config: INFO: Ignoring files matching ['^\\.', '^_', '^setup\\.py$']
rwlock_test.RWLockStressTests
test_fairness(1, 2) OK
test_fairness(2, 8) OK
test_fairness(3, 32) OK
test_fairness(4, 128) OK
rwlock_test.RWLockTests
test_concurrent_readers OK
test_demotion_no_waiters OK
test_demotion_with_blocked_reader SKIP:
Known issue in current code
test_demotion_with_blocked_writer OK
test_exclusive_context_blocks_reader OK
test_exclusive_context_blocks_writer OK
test_fifo OK
test_promotion_forbidden OK
test_recursive_read_lock OK
test_recursive_write_lock OK
test_release_other_thread_read_lock OK
test_release_other_thread_write_lock OK
test_shared_context_allows_reader OK
test_shared_context_blocks_writer OK
test_wakeup_all_blocked_readers OK
test_wakeup_blocked_reader OK
test_wakeup_blocked_writer OK
----------------------------------------------------------------------
Ran 21 tests in 14.054s
This may take more time.
3. When the module tests pass, run "make check"
make check
This takes about 90 seconds.
To run all tests, including slow and stress tests, use:
make check-all
This may take couple of minutes, so it is not recommended.
Thanks,
Nir
8 years, 11 months