With libgfapi gone - ? - qcow2 backups
by lejeczek
hi guys
Now with libgfapi support removed from qemu/libvirt -
certainly in binaries from oVirt repos for CentOS 9 - how
does one take qcow2 backups?
Would that be some specific options to mount gluster vol in
order to overcome:
-> $ qemu-img convert -O qcow2 ./ubuntu.qcow2.bkp ubuntu.qcow2
qemu-img: Could not open './ubuntu-tor.qcow2.bkp': Could not
open backing file: Failed to get shared "write" lock
Is another process using the image [/VMs3/ubuntu.qcow2]?
or perhaps 'qemu-img' can do the trick somehow?
all thoughts shared are much appreciated.
many thanks, L>
2 years, 4 months
Lack of attribute "decode" in v2v module
by Diego Ercolani
Hello,
As asked by Stefano Stagnaro I'm currently writing about an issue I'm experiencing during
vm import from external VMWARE farm:
I have the system log full of error:
Dec 30 13:23:42 ovirt-node2.ovirt vdsm[3420]: *ERROR Internal server error*
*Traceback (most recent call last):*
* File "/usr/lib/python3.6/site-packages/yajsonrpc/__init__.py", line 349, in
_handle_request*
* res = method(**params)*
* File "/usr/lib/python3.6/site-packages/vdsm/rpc/Bridge.py", line 194, in
_dynamicMethod*
* result = fn(*methodArgs)*
* File "<decorator-gen-471>", line 2, in getStats*
* File "/usr/lib/python3.6/site-packages/vdsm/common/api.py", line 50, in method*
* ret = func(*args, **kwargs)*
* File "/usr/lib/python3.6/site-packages/vdsm/API.py", line 1456, in getStats*
* multipath=True)*
* File "/usr/lib/python3.6/site-packages/vdsm/host/api.py", line 49, in get_stats*
* decStats = stats.produce(first_sample, last_sample)*
* File "/usr/lib/python3.6/site-packages/vdsm/host/stats.py", line 108, in produce*
* stats['v2vJobs'] = v2v.get_jobs_status()*
* File "/usr/lib/python3.6/site-packages/vdsm/v2v.py", line 290, in get_jobs_status*
* 'description': job.description,*
*AttributeError: 'str' object has no attribute 'decode'*
As stated in the error: line 290 of */usr/lib/python3.6/site-packages/vdsm/v2v.py*
Uses the method "decode" that it doesn't seem to be enabled:
'description': job.description.decode('utf-8'),
[root@ovirt-node2 ~]# rpm -qf /usr/lib/python3.6/site-packages/vdsm/v2v.py
--
Ing. Diego Ercolani
S.S.I.S. s.p.a.
T. 0549-875910
2 years, 4 months
VMs do not run with Q35 BIOS
by Martin Marusinec
Hello,
yesterday I installed new ovirt node into the existing cluster, and I found out I cannot start any of my VMs on it. After some tries, I found out the VM simply does not start up with Q35 BIOS. It starts with legacy BIOS, or Q35 UEFI, but not with Q35 BIOS. Blank screen, nothing. The node is slightly newer then others. What could I do with it? I would rather avoid changing chipset on all my VMs to legacy, just to be able to run them on new node....
Martin
2 years, 4 months
Suggested upgrading path from CentOS based 4.4.8 to 4.4.9
by Gianluca Cecchi
I have a lab with an environment based on 4.4.8.6-1, with 3 CentOS Linux
8.4 hosts and a CentOS 8.4 external engine system (that is a VM on vSphere,
so that I can leverage a snapshot methodology for the process...).
I would like to pass to 4.4.9 and retain a full plain OS on hosts for the
moment, without going through oVirt nodes, but standing the repo problems
and CentOS 8.x going through EOL this is what I'm planning to do:
1. stop engine service on engine system
2. convert engine to CentOS Stream
This step needs some confirmation.
Could you provide an official link about the process?
I'm not able to find it again. Is it a problem of mine or all (CentOS
website, RHEL website) seem to point only to conversion from CentOS Linux
to RHEL??
Apart external websites provided workflows, I was only able to find a mid
January youtube video, when CentOS was based on 8.3, with these steps:
yum install centos-release-stream
yum swap centos-{linux,stream}-repos
yum repolist
yum distro-sync
reboot
The video link is here:
https://www.youtube.com/watch?v=Ba2ytp_8x7s
No mention at
https://www.redhat.com/en/blog/faq-centos-stream-updates
And on CentOS page I only found this:
https://centos.org/distro-faq/
with Q7 containing only the two instructions:
dnf swap centos-linux-repos centos-stream-repos
dnf distro-sync
What to use safely?
Is it possible to include some sort of documentation or links on oVirt
page, to migrate from CentOS Linux to CentOS Stream for oVirt upgrade
purposes?
3. After reboot implied, I think, in step 2., use the usual steps to update
engine to 4.4.9
4. update the first out of three hosts from CentOS Linux to CentOS Stream
and to 4.4.9.
4.a follow the same approach of engine (when defined) and pass it to Stream
retaining the 4.4.8.
4.b upgrade from the web admin gui to 4.4.9
5. Do the same for second host and third hosts
Any hints, comments, limitations in having mixed 4.4.8 and 4.4.9 hosts for
a while and such?
Thanks,
Gianluca
2 years, 4 months
Unable to start ovirt-ha-agent on all hosts
by martin@fulmo.org
Hi list!
on a hyperconverged cluster with three hosts I am unable to start the ovirt-ha-agent.
The history:
As all three hosts were running Centos 8, I tried to upgrade host3 to Centos 8 Stream first and left all VMs and host1 and host2 untouched, basically as a test. After all migrations of VMs to host3 failed with:
```
qemu-kvm: error while loading state for instance 0x0 of device '0000:00:01.0/pcie-root-port'#0122021-12-24T00:56:49.428234Z
qemu-kvm: load of migration failed: Invalid argument
```
and since I haven't had the time to dig into that, I decided to roll back the upgrade and rebooted host3 into Centos 8 again and re-installed host3 through the engine appliance. During that process (and the restart of host3) the engine appliance became unresponsive and crashed.
The problem:
Currently all ovirt-ha-agent services on all hosts fail with the following message in /var/log/ovirt-hosted-engine-ha/agent.log
```
MainThread::INFO::2021-12-24 03:56:03,500::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run) ovirt-hosted-engine-ha agent 2.4.9 started
MainThread::INFO::2021-12-24 03:56:03,516::hosted_engine::242::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) Certificate common name not found, using hostname to identify host
MainThread::INFO::2021-12-24 03:56:03,575::hosted_engine::548::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Initializing ha-broker connection
MainThread::INFO::2021-12-24 03:56:03,576::brokerlink::82::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor network, options {'addr': 'GATEWAY_IP', 'network_test': 'dns', 'tcp_t_address': '', 'tcp_t_port': ''}
MainThread::ERROR::2021-12-24 03:56:03,577::hosted_engine::564::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Failed to start necessary monitors
```
Now I've stumbled upon this one [1984262](https://bugzilla.redhat.com/show_bug.cgi?id=1984262) but it doesn't seem to apply. All hosts resolve properly, all hosts also have proper hostnames set, unique /etc/hosts entries and proper A records set (in the form of hostname.subdomain.domain.tld).
The versions involved are:
```
[root@host2 ~]# rpm -qa ovirt*
ovirt-hosted-engine-setup-2.5.4-2.el8.noarch
ovirt-imageio-daemon-2.3.0-1.el8.x86_64
ovirt-host-dependencies-4.4.9-2.el8.x86_64
ovirt-vmconsole-1.0.9-1.el8.noarch
ovirt-imageio-client-2.3.0-1.el8.x86_64
ovirt-host-4.4.9-2.el8.x86_64
ovirt-python-openvswitch-2.11-1.el8.noarch
ovirt-openvswitch-ovn-host-2.11-1.el8.noarch
ovirt-provider-ovn-driver-1.2.34-1.el8.noarch
ovirt-openvswitch-ovn-2.11-1.el8.noarch
ovirt-release44-4.4.9.2-1.el8.noarch
ovirt-openvswitch-2.11-1.el8.noarch
ovirt-ansible-collection-1.6.5-1.el8.noarch
ovirt-openvswitch-ovn-common-2.11-1.el8.noarch
ovirt-hosted-engine-ha-2.4.9-1.el8.noarch
ovirt-vmconsole-host-1.0.9-1.el8.noarch
ovirt-imageio-common-2.3.0-1.el8.x86_64
```
Any hint how to fix this is really appreciated. I'd like to get the engine appliance back, remove host 3 and re-initialize it since this is a production cluster (with hosts 1 and 2 replicating the gluster storage and host 3 acting as an arbiter).
Thanks in advance, Martin
2 years, 4 months
Re: Help installing oVirt on single machine, without cockpit
by Cameron Showalter
The reply button is making me write an email, so hopefully this reaches you
all.
> I think this thread (even if the title is not so clear about the
discussions born inside) could be a good read regarding single host
limitations in terms of updating the environment, after the initial
deployment
Cool, thanks for the advice! Based on the link, it seems like the hosted
engine can't update on a "single machine" install. But if you install the
engine directly to the host, the VM doesn't need to be running, so it might
be able to. Thankfully I wasn't too far down playing with the hosted
engine, or container setup yet.
So with installing an engine directly to the node, I got it to install
ovirt-engine, by enabling the appstream, baseos, extras, and powertools
repo. Then I commented out all "includepkgs = ..." in both
`/etc/yum.repos.d/ovirt-*.repo` files. I can also get through all of the
`engine-setup` questions, but then it fails with starting the
`ovirt-imageio` service at the very end.
The logs in `/var/log/ovirt-imageio/daemon.log`:
```txt
File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/ssl.py",
line 16, in server_context
purpose=ssl.Purpose.CLIENT_AUTH, cafile=cafile)
File "/usr/lib64/python3.6/ssl.py", line 468, in create_default_context
context.load_verify_locations(cafile, capath, cadata)
FileNotFoundError: [Errno 2] No such file or directory
```
Which is the same error as this forum here:
https://lists.ovirt.org/archives/list/users@ovirt.org/thread/DNB73ZMUB5DI...
But all the info about solving it is " Found it; Default route was set to
the NFS / SAN Network Gateway, not the Actual gateway.", and I'm not
exactly sure what that means.
I also tried the engine install/setup in a CentOS workstation VM, and it
worked great, so I thought the node is missing a package? I installed
`openssl-devel` without any luck. In both the VM and the node, I tried
`route -n`, and both CentOS vm / node had similar/expected output. I tried
to do as many default options as possible on both, so I'm surprised the
gateway is different between them.
Any idea how to get past this? I'm also open to switching to plain CentOS
if that might be more stable, and installing to that. I just love how a lot
of the node comes pre-packaged for you, and I'm not sure if you can install
a "node hypervisor" straight to CentOS. That's also outside my comfort
zone, so I'm open to advice either way.
Thanks all for letting me get just this far!
2 years, 4 months
possible actions on host remaining as nonresponsive
by Gianluca Cecchi
Hello,
I have a 4.4.8 host that results as nonresponsive.
The DC is FC based
Tried to restart some daemons without effect (vdsmd, mom-vdsmd wdmd)
Then I executed a ssh host reboot but it seems it continues this way after
rebooting
From storage and network point of view it seems all ok on the host.
In vdsm.log of the host I see every 5 seconds:
2021-12-23 18:54:53,053+0100 INFO (vmrecovery) [vdsm.api] START
getConnectedStoragePoolsList() from=internal,
task_id=916bc455-ce37-4b50-9f38-b69e3b03807f (api:48)
2021-12-23 18:54:53,053+0100 INFO (vmrecovery) [vdsm.api] FINISH
getConnectedStoragePoolsList return={'poollist': []} from=internal,
task_id=916bc455-ce37-4b50-9f38-b69e3b03807f (api:54)
2021-12-23 18:54:53,053+0100 INFO (vmrecovery) [vds] recovery: waiting for
storage pool to go up (clientIF:735)
2021-12-23 18:54:53,444+0100 INFO (periodic/0) [vdsm.api] START
repoStats(domains=()) from=internal,
task_id=eb5540e0-0f90-4996-bc9a-7c73949f390f (api:48)
2021-12-23 18:54:53,445+0100 INFO (periodic/0) [vdsm.api] FINISH repoStats
return={} from=internal, task_id=eb5540e0-0f90-4996-bc9a-7c73949f390f
(api:54)
In engine.log
2021-12-23 18:54:38,745+01 INFO
[org.ovirt.engine.core.bll.utils.ThreadPoolMonitoringService]
(EE-ManagedScheduledExecutorService-engineThreadMonitoringThreadPool-Thread-1)
[] Thread pool 'hostUpdatesChecker' is using 0 threads out of 5, 5 threads
waiting for tasks.
2021-12-23 18:55:27,479+01 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-73) []
EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM ov300 command Get Host
Capabilities failed: Message timeout which can be caused by communication
issues
2021-12-23 18:55:27,479+01 ERROR
[org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-73) []
Unable to RefreshCapabilities: VDSNetworkException: VDSGenericException:
VDSNetworkException: Message timeout which can be caused by communication
issues
I would like to try to put into maintenance the host and then activate, or
reinstall, but there is a power action still in place since 1 hour ago
(when I executed ssh host reboot attempt that got host rebooted but not
connected apparently) that prevents it... what is its timeout?
WHat can I check to understand the source of these supposed communication
problems?
Thanks,
Gianluca
2 years, 4 months
Ovirt version 4.4.9.5-1.el8 getting error PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
by florianvanoudgaarden@gmail.com
Hi All,
I just have installed a Ovirt host using all the default settings from the manual.
I started with a CentOS 8 minimum install
Then I followed the Ovirt installation guide to install Ovirt version 4.4.9.5-1.el8
Now I try to log on to the Administration portal and I get the following message :
PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
When searching the internet then I get several answers but all about using certificates from third-parties.
I don’t use third-party certificates.
When I look in the OVIRT administration guide they only talk about third party certificates
So can anyone help to fix this error?
2 years, 4 months
using stop_reason as a vdsm hook trigger into the UI
by Nathanaël Blanchet
Hello,
I'm writing some code to make the following workflow:
* setting a custom stop_reason as 'force_delete' when
stopping/deleting a vm
* this will trigger a vdsm hook into after_vm_destroy event.
* this hook will call back AWX/ansible to:
o remove DNS entries
o remove vm file backup
o delete supervision
o (optionnal) delete the vm
* the vm is destroyed and removed by vdsm.
Here is the code (I'm not a python expert)
#!/usr/libexec/platform-python
# 211217 NBT
# This is a vdsm hook that aims to auto delete oVirt dependencies when
removing a VM directly from engine.
# It is triggered when filling the stop_reason field in oVirt with
strict 'clean'.
# It initially concerns following actions:
# - Centreon subscription Removing
# - Backup erase from Sotora
# - DNS cleaning on Lilas
# - IPA deletion
# This set of actions can be extended into the concerned AWX
job_template (180 or 160)
# When finished, vm can be manually removed from Engine.
# When string is 'force_delete' or 'force_remove', then in addition, the
vm will be automatically erased at the same time.
importos
fromvdsm.hook importhooking
fromxml.dom importminidom
importrequests
fromxml.etree importElementTree
importsys
importurllib3
importtime
importtraceback
importsubprocess
fromsubprocess importPIPE, STDOUT
importlogging
urllib3.disable_warnings()
logger = logging.getLogger("register_migration")
defexec_cmd(*args):
retcode, out, err = hooking.execCmd(args, sudo=True)
ifretcode != 0:
raiseRuntimeError("Failed to execute %s, due to: %s"%
(args, err))
returnout
if__name__== '__main__':
logging.basicConfig(filename="/var/log/vdsm/custom_hooks.log",
level=logging.INFO, format='%(asctime)s%(levelname)s%(name)s:%(message)s',
datefmt= '%Y-%m-%d%H:%M:%S')
iflen(sys.argv) > 1:
vm_name= sys.argv[1]
else:
domxml = hooking.read_domxml()
vm_name = domxml.getElementsByTagName('name')[0].firstChild.nodeValue
print(vm_name)
# API oVirt: Initialize variables
user = 'admin@internal'
password = 'password'
url =
"https://air-dev.v100.abes.fr/ovirt-engine/api/vms?search=name%3D"+ vm_name
headers = {'Accept': 'application/xml'}
print('name: '+ vm_name)
# API oVirt: Test if VM stop_reason has been defined
whileTrue:
# r = requests.get(url, headers=headers, auth=('admin@internal',
'password'), verify=False)
# tree = ElementTree.fromstring(r.content)
r = exec_cmd('curl', '--insecure', '--header', 'Accept:
application/xml', '--user', 'admin@internal:password',
'https://air-dev.v100.abes.fr/ovirt-engine/api/vms?search=name%3D'+ vm_name)
tree = ElementTree.fromstring(b''.join(r))
forvm intree.findall('vm'):
status = vm.find('status')
stop_reason = vm.find('stop_reason')
print(status.text)
ifstop_reason isnotNone:
print(status.text, stop_reason.text)
break
time.sleep(1)
forvm intree.findall('vm'):
stop_reason = vm.find('stop_reason')
ifstop_reason isNone:
exit('stop_reason is not defined')
else:
# API AWX: Initialize variables
header1 = 'Content-Type: application/json'
header2 = 'Authorization: Bearer token'
curl_server = "nbt"
curl_extra_vars = "{\\\"comment\\\": \\\"Nbt\\\",
\\\"survey_ovirt_password\\\": \\\"password\\\", \\\"force_erase\\\":
\\\"yes\\\", \\\"survey_vms_list\\\": %s}"% (vm_name)
curl_config = '{"extra_vars": "%s"}'% (curl_extra_vars)
ifstop_reason in["clean"]:
curl_job_template = "180"
print('Cleaning'+ vm_name + 'from oVirt on ancolie-'+ curl_server +
'with workflow_job_template '+ curl_job_template)
curl_url =
"http://ancolie-{}.v106.abes.fr/api/v2/workflow_job_templates/{}/launch/".format(curl_server,curl_job_template)
exec_cmd('curl', '-f', '-H', header1, '-H', header2, '-XPOST', '-d',
curl_config, curl_url)
elifstop_reason in["force_delete", "force_remove"]:
curl_job_template = "160"
print('Deleting and cleaning'+ vm_name + 'from oVirt on ancolie-'+
curl_server + 'with workflow_job_template '+ curl_job_template )
curl_url =
"http://ancolie-{}.v106.abes.fr/api/v2/workflow_job_templates/{}/launch/".format(curl_server,curl_job_template)
exec_cmd('curl', '-f', '-H', header1, '-H', header2, '-XPOST', '-d',
curl_config, curl_url)
else:
exit('Stop reason is '+ stop_reason + ' and there is no reason to do
anything for '+ vm_name)
The idea is to use the stop_reason element into the vm xml definition.
But after hours, I realized that this element is writed to the vm
definition file only after the VM has been destroyed.
So if I test this value (if existing) when executing the hook, the text
value doesn't still exist at early time
I added a 'while' loop to wait for the stop_reason element to be
present, but vdsm hangs out because of an infinity loop:
2021-12-20 18:13:30,148+0100 INFO (jsonrpc/7) [root]
/usr/libexec/vdsm/hooks/after_vm_destroy/clean_vm_dependencies_2.py:
rc=1 err=b'Traceback (most recent call last):\n File
"/usr/libexec/vdsm/hooks/after_vm_destroy/clean_vm_dependencies_2.py",
line 84, in <module>\n print(status.text,
stop_reason.text)\nAttributeError: \'NoneType\' object has no attribute
\'text\'\n' (hooks:122)
....
So I'm deducing I'm not able to accomplish my initial goal to use
stop_reason as a trigger with after_vm_destroy event.
I searched an other way to do: I thought of replacing querying ovirt API
with getting the value coming from the UI, but I can't find the suitable
database query. Is there a way to do such a thing? Does engine hooks
exist for stopped vm??
Thank you for your help.
PS: I'm already able to do this from ansible/AWX, but I have to do it
from UI/vdsm for any reason.
--
Nathanaël Blanchet
Supervision réseau
SIRE
227 avenue Professeur-Jean-Louis-Viala
34193 MONTPELLIER CEDEX 5
Tél. 33 (0)4 67 54 84 55
Fax 33 (0)4 67 54 84 14
blanchet(a)abes.fr
2 years, 4 months