Engine Health
by Stuart Gott
All,
We're working on a script that stands up an oVirt Engine and adds a node to
it. The issue is we don't know how long to wait before trying to add a
node. What we're doing right now is to check the status of the engine using:
https://ENGINE_IP/ovirt-engine/services/health
to determine when the oVirt engine itself has booted. That link reports "DB
Up!Welcome to Health Status!" as soon as the web UI is accessible, but this
is not the same thing as having an actual usable cluster attached.
Would it be possible to have separate status messages to distinguish
between an engine that has/is missing a usable cluster? Is that already
possible some other way? Blindly waiting for arbitrary time periods is
error prone.
Thanks!
Stu
8 years
Upgrade from 3.6 el7 job broken?
by Roman Mohr
Hi,
I saw a lot of failing builds lately on this job:
http://jenkins.ovirt.org/job/ovirt-engine_master_upgrade-from-3.6_el7_cre...
One log which still exists is:
http://jenkins.ovirt.org/job/ovirt-engine_master_upgrade-from-3.6_el7_cre...
It seems like the BUILD ENGINE RPM step is failing, but I can't see any
reason why:
BUILDING ENGINE RPM
+ create_rpms /home/jenkins/workspace/ovirt-engine_master_upgrade-from-3.6_el7_created/tmp_repo/ovirt-engine-4.1.0-0.0.master.20161014001903.el7.centos.src.rpm
/home/jenkins/workspace/ovirt-engine_master_upgrade-from-3.6_el7_created/tmp_repo
.gitee47dd2
+ local src_rpm=/home/jenkins/workspace/ovirt-engine_master_upgrade-from-3.6_el7_created/tmp_repo/ovirt-engine-4.1.0-0.0.master.20161014001903.el7.centos.src.rpm
+ local dst_dir=/home/jenkins/workspace/ovirt-engine_master_upgrade-from-3.6_el7_created/tmp_repo
+ local release=.gitee47dd2
+ local workspace=/home/jenkins/workspace/ovirt-engine_master_upgrade-from-3.6_el7_created
+ local 'BUILD_JAVA_OPTS_MAVEN= -XX:MaxPermSize=1G
-Dgwt.compiler.localWorkers=1 '
+ local 'BUILD_JAVA_OPTS_GWT= -XX:PermSize=512M
-XX:MaxPermSize=1G -Xms1G -Xmx6G '
+ env 'BUILD_JAVA_OPTS_MAVEN= -XX:MaxPermSize=1G
-Dgwt.compiler.localWorkers=1 ' 'BUILD_JAVA_OPTS_GWT=
-XX:PermSize=512M -XX:MaxPermSize=1G -Xms1G -Xmx6G '
rpmbuild -D 'ovirt_build_minimal 1' -D 'release_suffix .gitee47dd2' -D
'ovirt_build_extra_flags -gs
/home/jenkins/workspace/ovirt-engine_master_upgrade-from-3.6_el7_created/artifactory-ovirt-org-settings.xml'
-D '_srcrpmdir /home/jenkins/workspace/ovirt-engine_master_upgrade-from-3.6_el7_created/tmp_repo'
-D '_specdir /home/jenkins/workspace/ovirt-engine_master_upgrade-from-3.6_el7_created/tmp_repo'
-D '_sourcedir /home/jenkins/workspace/ovirt-engine_master_upgrade-from-3.6_el7_created/tmp_repo'
-D '_rpmdir /home/jenkins/workspace/ovirt-engine_master_upgrade-from-3.6_el7_created/tmp_repo'
-D '_builddir /home/jenkins/workspace/ovirt-engine_master_upgrade-from-3.6_el7_created/tmp_repo'
--rebuild /home/jenkins/workspace/ovirt-engine_master_upgrade-from-3.6_el7_created/tmp_repo/ovirt-engine-4.1.0-0.0.master.20161014001903.el7.centos.src.rpm
+ return 1
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for :.* : True
Logical operation result is TRUE
Running script : #!/bin/bash -x
Best Regards,
Roman
8 years
engine-setup: ***L:ERROR Internal error: No module named dwh
by Jakub Niedermertl
Hi all,
does anyone else experience following error of `engine-setup`?
$ ~/target/bin/engine-setup
***L:ERROR Internal error: No module named dwh
I have a suspicion it might be related to commit '221c7ed packaging: setup: Remove constants duplication'.
Jakub
8 years
Project Proposal - Vagrant provider
by Marc Young
Project Proposal - Vagrant Provider
A vagrant provider for oVirt v4
Abstract
This will be a provider plugin for the Vagrant suite that allows
command-line ease of virtual machine provisioning and lifecycle
management.
Proposal
This Vagrant provider plugin will interface with the oVirt REST API
(version 4 and higher) using the oVirt provided ruby SDK
'ovirt-engine-sdk-ruby'. This allows users to abstract the user
interface and experience into a set of command line abilities to
create, provision, destroy and manage the complete lifecycle of
virtual machines. It also allows the use of external configuration
management and configuration files themselves to be committed into
code.
Background
I have previously forked and maintained the 'vagrant-ovirt' gem as
'vagrant-ovirt3' due to Gems requiring unique names. The original
author has officially abandoned the project. For the last few years
all code to maintain this project has been maintained by myself and a
few ad-hoc github contributors. This provider interfaced directly with
oVirt v3 using fog and rbovirt. The new project would be a fresh start
using the oVirt provided ruby SDK to work directly with version 4.
Rationale
The trend in configuration management, operations, and devops has been
to maintain as much of the development process as possible in terms of
the virtual machines and hosts that they run on. With software like
Terraform the tasks of creating the underlying infrastructure such as
network rules, etc have had great success moving into 'Infrastructure
as code'. The same company behind Terraform got their reputation from
Vagrant which aims to utilize the same process for virtual machines
themselves. The core software allows for standard commands such as
'up', 'provision', 'destroy' to be used across a provider framework. A
provider for oVirt makes the process for managing VMs easier and able
to be controlled through code and source control.
Initial Goals
The initial goal is to get the base steps of 'up', 'down' (halt), and
'destroy' to succeed using the oVirt provided ruby SDK for v4.
Stretch/followup goals would be to ensure testability and alternate
commands such as 'provision' and allow configuration management suites
like puppet to work via 'userdata' (cloud-init).
Current Status
The version 3 of this software has been heavily utilized. The original
fork known as 'vagrant-ovirt' has been abandoned with no plans to
communicate or move forward. My upstream fork has had great success
with nearly 4x the downloads from rubygems.org . Until my github fork
has more 'stars' I cannot take over it completely so the gem was
renamed 'vagrant-ovirt3'. This is also true for rubygems.org since
gems are not namespaced, therefore could not be published without a
unique name. The v4 provider is still pending my initial POC commit
but there are no current barriers except initial oVirt hosting. The
hosting of oVirt v3 for testing is a laptop on a UPS at my home, and
v4 is also a different pc attached to a UPS.
External Dependencies
RHEVM/oVirt REST API - This provider must interact with the API itself
to manage virtual machines.
Initial Committers
Marcus Young ( 3vilpenguin at gmail dot com )
8 years
[VDSM] Tests failing because of ordering dependencies
by Nir Soffer
Hi all,
Trying to run vdsm tests via tox (so correct nose is used automatically),
some of the tests fail.
The failure are all about ordering expectations, which look wrong.
Please check and fix your tests.
Thanks,
Nir
----
18:04:10 ======================================================================
18:04:10 FAIL: test_parseVolumeStatus (gluster_cli_tests.GlusterCliTests)
18:04:10 ----------------------------------------------------------------------
18:04:10 Traceback (most recent call last):
18:04:10 File
"/home/jenkins/workspace/vdsm_master_check-patch-el7-x86_64/vdsm/tests/gluster_cli_tests.py",
line 1121, in test_parseVolumeStatus
18:04:10 self._parseVolumeStatusClients_test()
18:04:10 File
"/home/jenkins/workspace/vdsm_master_check-patch-el7-x86_64/vdsm/tests/gluster_cli_tests.py",
line 449, in _parseVolumeStatusClients_test
18:04:10 self.assertEquals(status.keys(), ['bricks', 'name'])
status.keys() order is not undefined.
18:04:10 AssertionError: Lists differ: ['name', 'bricks'] != ['bricks', 'name']
18:04:10
18:04:10 First differing element 0:
18:04:10 name
18:04:10 bricks
18:04:10
18:04:10 - ['name', 'bricks']
18:04:10 + ['bricks', 'name']
18:04:10
18:04:10 ======================================================================
18:04:10 FAIL: testSetPolicyParameters (momTests.MomPolicyTests)
18:04:10 ----------------------------------------------------------------------
18:04:10 Traceback (most recent call last):
18:04:10 File
"/home/jenkins/workspace/vdsm_master_check-patch-el7-x86_64/vdsm/tests/momTests.py",
line 118, in testSetPolicyParameters
18:04:10 self.assertEqual(api.last_policy_content, expected)
18:04:10 AssertionError: "(set a 5)\n(set b True)\n(set c 'test')" !=
"(set a 5)\n(set c 'test')\n(set b True)"
Nothing obvious, need deeper checking.
18:04:10 -------------------- >> begin captured logging << --------------------
18:04:10 2016-10-12 18:01:56,902 INFO (MainThread) [MOM] Preparing
MOM interface (momIF:49)
18:04:10 2016-10-12 18:01:56,903 INFO (MainThread) [MOM] Using
named unix socket /tmp/tmpqOQZvm/test_mom_vdsm.sock (momIF:58)
18:04:10 --------------------- >> end captured logging << ---------------------
18:04:10
18:04:10 ======================================================================
18:04:10 FAIL: test_disk_virtio_cache (vmStorageTests.DriveXMLTests)
18:04:10 ----------------------------------------------------------------------
18:04:10 Traceback (most recent call last):
18:04:10 File
"/home/jenkins/workspace/vdsm_master_check-patch-el7-x86_64/vdsm/tests/vmStorageTests.py",
line 84, in test_disk_virtio_cache
18:04:10 self.check(vm_conf, conf, xml, is_block_device=False)
18:04:10 File
"/home/jenkins/workspace/vdsm_master_check-patch-el7-x86_64/vdsm/tests/vmStorageTests.py",
line 222, in check
18:04:10 self.assertXMLEqual(drive.getXML().toxml(), xml)
18:04:10 File
"/home/jenkins/workspace/vdsm_master_check-patch-el7-x86_64/vdsm/tests/testlib.py",
line 253, in assertXMLEqual
18:04:10 (actualXML, expectedXML))
18:04:10 AssertionError: XMLs are different:
18:04:10 Actual:
18:04:10 <disk device="disk" snapshot="no" type="file">
18:04:10 <source file="/path/to/volume" />
18:04:10 <target bus="virtio" dev="vda" />
18:04:10 <shareable />
18:04:10 <serial>54-a672-23e5b495a9ea</serial>
18:04:10 <driver cache="writethrough" error_policy="enospace"
io="threads" name="qemu" type="qcow2" />
18:04:10 <iotune>
18:04:10 <total_iops_sec>800</total_iops_sec>
18:04:10 <read_bytes_sec>6120000</read_bytes_sec>
18:04:10 </iotune>
18:04:10 </disk>
18:04:10
18:04:10 Expected:
18:04:10 <disk device="disk" snapshot="no" type="file">
18:04:10 <source file="/path/to/volume" />
18:04:10 <target bus="virtio" dev="vda" />
18:04:10 <shareable />
18:04:10 <serial>54-a672-23e5b495a9ea</serial>
18:04:10 <driver cache="writethrough" error_policy="enospace"
io="threads" name="qemu" type="qcow2" />
18:04:10 <iotune>
18:04:10 <read_bytes_sec>6120000</read_bytes_sec>
18:04:10 <total_iops_sec>800</total_iops_sec>
Order of these elements differ, need to check why.
18:04:10 </iotune>
18:04:10 </disk>
18:04:10
18:04:10
18:04:10
18:04:10 ======================================================================
18:04:10 FAIL: testCpuXML (vmTests.TestVm)
18:04:10 ----------------------------------------------------------------------
18:04:10 Traceback (most recent call last):
18:04:10 File
"/home/jenkins/workspace/vdsm_master_check-patch-el7-x86_64/vdsm/tests/vmTests.py",
line 434, in testCpuXML
18:04:10 self.assertXMLEqual(find_xml_element(xml, "./cputune"), cputuneXML)
18:04:10 File
"/home/jenkins/workspace/vdsm_master_check-patch-el7-x86_64/vdsm/tests/testlib.py",
line 253, in assertXMLEqual
18:04:10 (actualXML, expectedXML))
18:04:10 AssertionError: XMLs are different:
18:04:10 Actual:
18:04:10 <cputune>
18:04:10 <vcpupin cpuset="0-1" vcpu="0" />
18:04:10 <vcpupin cpuset="2-3" vcpu="1" />
18:04:10 </cputune>
18:04:10
18:04:10 Expected:
18:04:10 <cputune>
18:04:10 <vcpupin cpuset="2-3" vcpu="1" />
18:04:10 <vcpupin cpuset="0-1" vcpu="0" />
Same
18:04:10 </cputune>
18:04:10
18:04:10
18:04:10
18:04:10 ======================================================================
18:04:10 FAIL: testSetIoTune (vmTests.TestVm)
18:04:10 ----------------------------------------------------------------------
18:04:10 Traceback (most recent call last):
18:04:10 File
"/home/jenkins/workspace/vdsm_master_check-patch-el7-x86_64/vdsm/tests/vmTests.py",
line 936, in testSetIoTune
18:04:10 self._xml_sanitizer(expected_xml))
18:04:10 AssertionError: '<disk device="hdd" snapshot="no"
type="block"><source dev="/dev/dummy"/><target bus="ide"
dev="hda"/><iotune><write_bytes_sec>1</write_bytes_sec><total_bytes_sec>0</total_bytes_sec><read_bytes_sec>2</read_bytes_sec></iotune></disk>'
!= '<disk device="hdd" snapshot="no" type="block"><source
dev="/dev/dummy"/><target bus="ide"
dev="hda"/><iotune><read_bytes_sec>2</read_bytes_sec><write_bytes_sec>1</write_bytes_sec><total_bytes_sec>0</total_bytes_sec></iotune></disk>'
18:04:10 -------------------- >> begin captured logging << --------------------
18:04:10 2016-10-12 18:03:34,110 INFO (MainThread) [virt.vm]
vmId=`TESTING`::New device XML for hda: <disk device="hdd"
snapshot="no" type="block">
18:04:10 <source dev="/dev/dummy"/>
18:04:10 <target bus="ide" dev="hda"/>
18:04:10 <iotune>
18:04:10 <write_bytes_sec>1</write_bytes_sec>
18:04:10 <total_bytes_sec>0</total_bytes_sec>
18:04:10 <read_bytes_sec>2</read_bytes_sec>
18:04:10 </iotune>
18:04:10 </disk>
Seems that this test is not using assertXMLEqual so we don't get meaningful
error like the tests above.
18:04:10 (vm:2687)
18:04:10 --------------------- >> end captured logging << ---------------------
8 years
[ACTION REQUIRED] oVirt 4.0.5 RC2 build starting
by Sandro Bonazzola
Fyi oVirt products maintainers,
An oVirt build for an official release is going to start right now.
If you're a maintainer for any of the projects included in oVirt
distribution and you have changes in your package ready to be released
please:
- bump version and release to be GA ready
- tag your release within git (implies a GitHub Release to be automatically
created)
- build your packages within jenkins / koji / copr / whatever
- verify all bugs on MODIFIED have target release and target milestone set.
- add your builds to releng-tools/releases/ovirt-4.0.5_rc2.conf within
releng-tools project
--
Sandro Bonazzola
Better technology. Faster innovation. Powered by community collaboration.
See how it works at redhat.com
<https://www.redhat.com/it/about/events/red-hat-open-source-day-2016>
8 years
[vdsm] exploring a possible integration between collectd and Vdsm
by Francesco Romani
Hi all,
In the last 2.5 days I was exploring if and how we can integrate collectd and Vdsm.
The final picture could look like:
1. collectd does all the monitoring and reporting currently Vdsm does
2. Engine consumes data from collectd
3. Vdsm consumes *notifications* from collectd - for few but important tasks like Drive high water mark monitoring
Benefits (aka: why to bother?):
1. less code in Vdsm / long-awaited modularization of Vdsm
2. better integration with the system, reuse of well-known components
3. more flexibility in monitoring/reporting: collectd is special purpose existing solution
4. faster, more scalable operation because all the monitoring can be done in C
At first glance, Collectd seems to have all the tools we need.
1. A plugin interface (https://collectd.org/wiki/index.php/Plugin_architecture and https://collectd.org/wiki/index.php/Table_of_Plugins)
2. Support for notifications and thresholds (https://collectd.org/wiki/index.php/Notifications_and_thresholds)
3. a libvirt plugin https://collectd.org/wiki/index.php/Plugin:virt
So, the picture is like
1. we start requiring collectd as dependency of Vdsm
2. we either configure it appropriately (collectd support config drop-ins: /etc/collectd.d) or we document our requirements (or both)
3. collectd monitors the hosts and libvirt
4. Engine polls collectd
5. Vdsm listens from notifications
Should libvirt deliver us the event we need (see https://bugzilla.redhat.com/show_bug.cgi?id=1181659),
we can just stop using collectd notifications, everything else works as previously.
Challenges:
1. Collectd does NOT consider the plugin API stable (https://collectd.org/wiki/index.php/Plugin_architecture#The_interface.27s...)
so the plugins should be inclueded in the main tree, much like the modules of the linux kernel
Worth mentioning that the plugin API itself has a good deal of rough edges.
we will need to maintain this plugin ourselves, *and* we need to maintain our thin API
layer, to make sure the plugin loads and works with recent versions of collectd.
2. the virt plugin is out of date, doesn't report some data we need: see https://github.com/collectd/collectd/issues/1945
3. the notification message(s) are tailored for human consumption, those messages are not easy
to parse for machines.
4. the threshold support in collectd seems to match values against constants; it doesn't seem possible
to match a value against another one, as we need to do for high water monitoring (capacity VS allocation).
How I'm addressing, or how I plan to address those challenges (aka action items):
1. I've been experimenting with out-of-tree plugins, and I managed develop, build, install and run
one out-of-tree plugin: https://github.com/mojaves/vmon/tree/master/collectd
The development pace of collectd looks sustainable, so this doesn't look such a big deal.
Furthermore, we can engage with upstream to merge our plugins, either as-is or to extend existing ones.
2. Write another collectd plugin based on the Vdsm python code and/or my past accelerator executable project
(https://github.com/mojaves/vmon)
3. patch the collectd notification code. It is yet another plugin
OR
4. send notification from the new virt module as per #2, bypassing the threshold system. This move could preclude
the new virt module to be merged in the collectd tree.
Current status of the action items:
1. done BUT PoC quality
2. To be done (more work than #1/possible dupe with github issue)
3. need more investigation, conflicts with #4
4. need more investigation, conflicts with #3
All the code I'm working on will be found on https://github.com/mojaves/vmon
Comments are appreciated
--
Francesco Romani
RedHat Engineering Virtualization R & D
Phone: 8261328
IRC: fromani
8 years