December 2021 - Users - oVirt List Archives

Unable to start ovirt-ha-agent on all hosts
by martin＠fulmo.org 28 Dec '21

28 Dec '21

Hi list! on a hyperconverged cluster with three hosts I am unable to start the ovirt-ha-agent. The history: As all three hosts were running Centos 8, I tried to upgrade host3 to Centos 8 Stream first and left all VMs and host1 and host2 untouched, basically as a test. After all migrations of VMs to host3 failed with: ``` qemu-kvm: error while loading state for instance 0x0 of device '0000:00:01.0/pcie-root-port'#0122021-12-24T00:56:49.428234Z qemu-kvm: load of migration failed: Invalid argument ``` and since I haven't had the time to dig into that, I decided to roll back the upgrade and rebooted host3 into Centos 8 again and re-installed host3 through the engine appliance. During that process (and the restart of host3) the engine appliance became unresponsive and crashed. The problem: Currently all ovirt-ha-agent services on all hosts fail with the following message in /var/log/ovirt-hosted-engine-ha/agent.log ``` MainThread::INFO::2021-12-24 03:56:03,500::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run) ovirt-hosted-engine-ha agent 2.4.9 started MainThread::INFO::2021-12-24 03:56:03,516::hosted_engine::242::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) Certificate common name not found, using hostname to identify host MainThread::INFO::2021-12-24 03:56:03,575::hosted_engine::548::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Initializing ha-broker connection MainThread::INFO::2021-12-24 03:56:03,576::brokerlink::82::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor network, options {'addr': 'GATEWAY_IP', 'network_test': 'dns', 'tcp_t_address': '', 'tcp_t_port': ''} MainThread::ERROR::2021-12-24 03:56:03,577::hosted_engine::564::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Failed to start necessary monitors ``` Now I've stumbled upon this one [1984262](https://bugzilla.redhat.com/show_bug.cgi?id=1984262) but it doesn't seem to apply. All hosts resolve properly, all hosts also have proper hostnames set, unique /etc/hosts entries and proper A records set (in the form of hostname.subdomain.domain.tld). The versions involved are: ``` [root@host2 ~]# rpm -qa ovirt* ovirt-hosted-engine-setup-2.5.4-2.el8.noarch ovirt-imageio-daemon-2.3.0-1.el8.x86_64 ovirt-host-dependencies-4.4.9-2.el8.x86_64 ovirt-vmconsole-1.0.9-1.el8.noarch ovirt-imageio-client-2.3.0-1.el8.x86_64 ovirt-host-4.4.9-2.el8.x86_64 ovirt-python-openvswitch-2.11-1.el8.noarch ovirt-openvswitch-ovn-host-2.11-1.el8.noarch ovirt-provider-ovn-driver-1.2.34-1.el8.noarch ovirt-openvswitch-ovn-2.11-1.el8.noarch ovirt-release44-4.4.9.2-1.el8.noarch ovirt-openvswitch-2.11-1.el8.noarch ovirt-ansible-collection-1.6.5-1.el8.noarch ovirt-openvswitch-ovn-common-2.11-1.el8.noarch ovirt-hosted-engine-ha-2.4.9-1.el8.noarch ovirt-vmconsole-host-1.0.9-1.el8.noarch ovirt-imageio-common-2.3.0-1.el8.x86_64 ``` Any hint how to fix this is really appreciated. I'd like to get the engine appliance back, remove host 3 and re-initialize it since this is a production cluster (with hosts 1 and 2 replicating the gluster storage and host 3 acting as an arbiter). Thanks in advance, Martin

2 6

Re: Help installing oVirt on single machine, without cockpit
by Cameron Showalter 26 Dec '21

26 Dec '21

The reply button is making me write an email, so hopefully this reaches you all. > I think this thread (even if the title is not so clear about the discussions born inside) could be a good read regarding single host limitations in terms of updating the environment, after the initial deployment Cool, thanks for the advice! Based on the link, it seems like the hosted engine can't update on a "single machine" install. But if you install the engine directly to the host, the VM doesn't need to be running, so it might be able to. Thankfully I wasn't too far down playing with the hosted engine, or container setup yet. So with installing an engine directly to the node, I got it to install ovirt-engine, by enabling the appstream, baseos, extras, and powertools repo. Then I commented out all "includepkgs = ..." in both `/etc/yum.repos.d/ovirt-*.repo` files. I can also get through all of the `engine-setup` questions, but then it fails with starting the `ovirt-imageio` service at the very end. The logs in `/var/log/ovirt-imageio/daemon.log`: ```txt File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/ssl.py", line 16, in server_context purpose=ssl.Purpose.CLIENT_AUTH, cafile=cafile) File "/usr/lib64/python3.6/ssl.py", line 468, in create_default_context context.load_verify_locations(cafile, capath, cadata) FileNotFoundError: [Errno 2] No such file or directory ``` Which is the same error as this forum here: https://lists.ovirt.org/archives/list/users@ovirt.org/thread/DNB73ZMUB5DIMV… But all the info about solving it is " Found it; Default route was set to the NFS / SAN Network Gateway, not the Actual gateway.", and I'm not exactly sure what that means. I also tried the engine install/setup in a CentOS workstation VM, and it worked great, so I thought the node is missing a package? I installed `openssl-devel` without any luck. In both the VM and the node, I tried `route -n`, and both CentOS vm / node had similar/expected output. I tried to do as many default options as possible on both, so I'm surprised the gateway is different between them. Any idea how to get past this? I'm also open to switching to plain CentOS if that might be more stable, and installing to that. I just love how a lot of the node comes pre-packaged for you, and I'm not sure if you can install a "node hypervisor" straight to CentOS. That's also outside my comfort zone, so I'm open to advice either way. Thanks all for letting me get just this far!

2 1

possible actions on host remaining as nonresponsive
by Gianluca Cecchi 23 Dec '21

23 Dec '21

Hello, I have a 4.4.8 host that results as nonresponsive. The DC is FC based Tried to restart some daemons without effect (vdsmd, mom-vdsmd wdmd) Then I executed a ssh host reboot but it seems it continues this way after rebooting From storage and network point of view it seems all ok on the host. In vdsm.log of the host I see every 5 seconds: 2021-12-23 18:54:53,053+0100 INFO (vmrecovery) [vdsm.api] START getConnectedStoragePoolsList() from=internal, task_id=916bc455-ce37-4b50-9f38-b69e3b03807f (api:48) 2021-12-23 18:54:53,053+0100 INFO (vmrecovery) [vdsm.api] FINISH getConnectedStoragePoolsList return={'poollist': []} from=internal, task_id=916bc455-ce37-4b50-9f38-b69e3b03807f (api:54) 2021-12-23 18:54:53,053+0100 INFO (vmrecovery) [vds] recovery: waiting for storage pool to go up (clientIF:735) 2021-12-23 18:54:53,444+0100 INFO (periodic/0) [vdsm.api] START repoStats(domains=()) from=internal, task_id=eb5540e0-0f90-4996-bc9a-7c73949f390f (api:48) 2021-12-23 18:54:53,445+0100 INFO (periodic/0) [vdsm.api] FINISH repoStats return={} from=internal, task_id=eb5540e0-0f90-4996-bc9a-7c73949f390f (api:54) In engine.log 2021-12-23 18:54:38,745+01 INFO [org.ovirt.engine.core.bll.utils.ThreadPoolMonitoringService] (EE-ManagedScheduledExecutorService-engineThreadMonitoringThreadPool-Thread-1) [] Thread pool 'hostUpdatesChecker' is using 0 threads out of 5, 5 threads waiting for tasks. 2021-12-23 18:55:27,479+01 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-73) [] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM ov300 command Get Host Capabilities failed: Message timeout which can be caused by communication issues 2021-12-23 18:55:27,479+01 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-73) [] Unable to RefreshCapabilities: VDSNetworkException: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues I would like to try to put into maintenance the host and then activate, or reinstall, but there is a power action still in place since 1 hour ago (when I executed ssh host reboot attempt that got host rebooted but not connected apparently) that prevents it... what is its timeout? WHat can I check to understand the source of these supposed communication problems? Thanks, Gianluca

2 2

Ovirt version 4.4.9.5-1.el8 getting error PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
by florianvanoudgaarden＠gmail.com 23 Dec '21

23 Dec '21

Hi All, I just have installed a Ovirt host using all the default settings from the manual. I started with a CentOS 8 minimum install Then I followed the Ovirt installation guide to install Ovirt version 4.4.9.5-1.el8 Now I try to log on to the Administration portal and I get the following message : PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target When searching the internet then I get several answers but all about using certificates from third-parties. I don’t use third-party certificates. When I look in the OVIRT administration guide they only talk about third party certificates So can anyone help to fix this error?

4 10

using stop_reason as a vdsm hook trigger into the UI
by Nathanaël Blanchet 22 Dec '21

22 Dec '21

Hello, I'm writing some code to make the following workflow: * setting a custom stop_reason as 'force_delete' when stopping/deleting a vm * this will trigger a vdsm hook into after_vm_destroy event. * this hook will call back AWX/ansible to: o remove DNS entries o remove vm file backup o delete supervision o (optionnal) delete the vm * the vm is destroyed and removed by vdsm. Here is the code (I'm not a python expert) #!/usr/libexec/platform-python # 211217 NBT # This is a vdsm hook that aims to auto delete oVirt dependencies when removing a VM directly from engine. # It is triggered when filling the stop_reason field in oVirt with strict 'clean'. # It initially concerns following actions: # - Centreon subscription Removing # - Backup erase from Sotora # - DNS cleaning on Lilas # - IPA deletion # This set of actions can be extended into the concerned AWX job_template (180 or 160) # When finished, vm can be manually removed from Engine. # When string is 'force_delete' or 'force_remove', then in addition, the vm will be automatically erased at the same time. importos fromvdsm.hook importhooking fromxml.dom importminidom importrequests fromxml.etree importElementTree importsys importurllib3 importtime importtraceback importsubprocess fromsubprocess importPIPE, STDOUT importlogging urllib3.disable_warnings() logger = logging.getLogger("register_migration") defexec_cmd(*args): retcode, out, err = hooking.execCmd(args, sudo=True) ifretcode != 0: raiseRuntimeError("Failed to execute %s, due to: %s"% (args, err)) returnout if__name__== '__main__': logging.basicConfig(filename="/var/log/vdsm/custom_hooks.log", level=logging.INFO, format='%(asctime)s%(levelname)s%(name)s:%(message)s', datefmt= '%Y-%m-%d%H:%M:%S') iflen(sys.argv) > 1: vm_name= sys.argv[1] else: domxml = hooking.read_domxml() vm_name = domxml.getElementsByTagName('name')[0].firstChild.nodeValue print(vm_name) # API oVirt: Initialize variables user = 'admin@internal' password = 'password' url = "https://air-dev.v100.abes.fr/ovirt-engine/api/vms?search=name%3D"+ vm_name headers = {'Accept': 'application/xml'} print('name: '+ vm_name) # API oVirt: Test if VM stop_reason has been defined whileTrue: # r = requests.get(url, headers=headers, auth=('admin@internal', 'password'), verify=False) # tree = ElementTree.fromstring(r.content) r = exec_cmd('curl', '--insecure', '--header', 'Accept: application/xml', '--user', 'admin@internal:password', 'https://air-dev.v100.abes.fr/ovirt-engine/api/vms?search=name%3D'+ vm_name) tree = ElementTree.fromstring(b''.join(r)) forvm intree.findall('vm'): status = vm.find('status') stop_reason = vm.find('stop_reason') print(status.text) ifstop_reason isnotNone: print(status.text, stop_reason.text) break time.sleep(1) forvm intree.findall('vm'): stop_reason = vm.find('stop_reason') ifstop_reason isNone: exit('stop_reason is not defined') else: # API AWX: Initialize variables header1 = 'Content-Type: application/json' header2 = 'Authorization: Bearer token' curl_server = "nbt" curl_extra_vars = "{\\\"comment\\\": \\\"Nbt\\\", \\\"survey_ovirt_password\\\": \\\"password\\\", \\\"force_erase\\\": \\\"yes\\\", \\\"survey_vms_list\\\": %s}"% (vm_name) curl_config = '{"extra_vars": "%s"}'% (curl_extra_vars) ifstop_reason in["clean"]: curl_job_template = "180" print('Cleaning'+ vm_name + 'from oVirt on ancolie-'+ curl_server + 'with workflow_job_template '+ curl_job_template) curl_url = "http://ancolie-{}.v106.abes.fr/api/v2/workflow_job_templates/{}/launch/".format(curl_server,curl_job_template) exec_cmd('curl', '-f', '-H', header1, '-H', header2, '-XPOST', '-d', curl_config, curl_url) elifstop_reason in["force_delete", "force_remove"]: curl_job_template = "160" print('Deleting and cleaning'+ vm_name + 'from oVirt on ancolie-'+ curl_server + 'with workflow_job_template '+ curl_job_template ) curl_url = "http://ancolie-{}.v106.abes.fr/api/v2/workflow_job_templates/{}/launch/".format(curl_server,curl_job_template) exec_cmd('curl', '-f', '-H', header1, '-H', header2, '-XPOST', '-d', curl_config, curl_url) else: exit('Stop reason is '+ stop_reason + ' and there is no reason to do anything for '+ vm_name) The idea is to use the stop_reason element into the vm xml definition. But after hours, I realized that this element is writed to the vm definition file only after the VM has been destroyed. So if I test this value (if existing) when executing the hook, the text value doesn't still exist at early time I added a 'while' loop to wait for the stop_reason element to be present, but vdsm hangs out because of an infinity loop: 2021-12-20 18:13:30,148+0100 INFO (jsonrpc/7) [root] /usr/libexec/vdsm/hooks/after_vm_destroy/clean_vm_dependencies_2.py: rc=1 err=b'Traceback (most recent call last):\n File "/usr/libexec/vdsm/hooks/after_vm_destroy/clean_vm_dependencies_2.py", line 84, in <module>\n print(status.text, stop_reason.text)\nAttributeError: \'NoneType\' object has no attribute \'text\'\n' (hooks:122) .... So I'm deducing I'm not able to accomplish my initial goal to use stop_reason as a trigger with after_vm_destroy event. I searched an other way to do: I thought of replacing querying ovirt API with getting the value coming from the UI, but I can't find the suitable database query. Is there a way to do such a thing? Does engine hooks exist for stopped vm?? Thank you for your help. PS: I'm already able to do this from ansible/AWX, but I have to do it from UI/vdsm for any reason. -- Nathanaël Blanchet Supervision réseau SIRE 227 avenue Professeur-Jean-Louis-Viala 34193 MONTPELLIER CEDEX 5 Tél. 33 (0)4 67 54 84 55 Fax 33 (0)4 67 54 84 14 blanchet(a)abes.fr

3 5

virsh list --all only shows running on host
by richmoch＠yahoo.com 22 Dec '21

22 Dec '21

Hi *, I am having a problem where on a KVM host I run virsh list --all. I only see the running VMs : [root@MBGVRT3323 ~]# virsh list --all Please enter your authentication name: Please enter your password: Id Name State ----------------------------------- 3 MBPDBS0583U-CLONE running 4 MBGDBS3156U-CLONE running I know there are two more VMs that are shutdown , and if I run virsh list --state-shutoff , nothing is returned. in fact, look at the numbering of the two running VMs, 3 and 4. The GUI shows the two running and the other two as being shutdown. Why is this happening ? Given this, seems the only way to start the two VMs which are down is in the GUI and not with virsh start domain which seems limiting. Any ideads would be greatly appreciated. [root@MBGVRT3323 ~]# virsh version Please enter your authentication name: root Please enter your password: Compiled against library: libvirt 5.7.0 Using library: libvirt 5.7.0 Using API: QEMU 5.7.0 Running hypervisor: QEMU 4.2.1 You have new mail in /var/spool/mail/root [root@MBGVRT3323 ~]#

3 4

Oauth token lifetime
by Nathanaël Blanchet 21 Dec '21

21 Dec '21

3 4

[ANN] oVirt 4.4.10 First Release Candidate is now available for testing
by Sandro Bonazzola 21 Dec '21

21 Dec '21

oVirt 4.4.10 First Release Candidate is now available for testing The oVirt Project is pleased to announce the availability of oVirt 4.4.10 First Release Candidate for testing, as of December 21st, 2021. This update is the tenth in a series of stabilization updates to the 4.4 series. Documentation - If you want to try oVirt as quickly as possible, follow the instructions on the Download <https://ovirt.org/download/> page. - For complete installation, administration, and usage instructions, see the oVirt Documentation <https://ovirt.org/documentation/>. - For upgrading from a previous version, see the oVirt Upgrade Guide <https://ovirt.org/documentation/upgrade_guide/>. - For a general overview of oVirt, see About oVirt <https://ovirt.org/community/about.html>. Important notes before you try it Please note this is a pre-release build. The oVirt Project makes no guarantees as to its suitability or usefulness. This pre-release must not be used in production. Installation instructions For installation instructions and additional information please refer to: https://ovirt.org/documentation/ This release is available now on x86_64 architecture for: * Red Hat Enterprise Linux 8.5 or similar * CentOS Stream 8 This release supports Hypervisor Hosts on x86_64 and ppc64le architectures for: * Red Hat Enterprise Linux 8.5 or similar * CentOS Stream 8 * oVirt Node 4.4 based on CentOS Stream 8 (available for x86_64 only) See the release notes [1] for installation instructions and a list of new features and bugs fixed. Notes: - oVirt Appliance is already available based on CentOS Stream 8 - oVirt Node NG is already available based on CentOS Stream 8 Additional Resources: * Read more about the oVirt 4.4.10 pre-release highlights: http://www.ovirt.org/release/4.4.10/ * Get more oVirt project updates on Twitter: https://twitter.com/ovirt * Check out the latest project news on the oVirt blog: http://www.ovirt.org/blog/ [1] http://www.ovirt.org/release/4.4.10/ [2] http://resources.ovirt.org/pub/ovirt-4.4-pre/iso/ -- Sandro Bonazzola MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA <https://www.redhat.com/> sbonazzo(a)redhat.com <https://www.redhat.com/> *Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours.*

1 0

vdsm-client delete_checkpoints
by Tommaso - Shellrent 20 Dec '21

20 Dec '21

Hi, someone can give to use us an exemple of the command vdsm-client VM delete_checkpoints ? we have tried a lot of combinations like: vdsm-client VM delete_checkpoints vmID="ce5d0251-e971-4d89-be1b-4bc28283614c" checkpoint_ids=["e0c56289-bfb3-4a91-9d33-737881972116"] without success.. Regards, Tommaso. -- -- Shellrent - Il primo hosting italiano Security First *Tommaso De Marchi* /COO - Chief Operating Officer/ Shellrent Srl Via dell'Edilizia, 19 - 36100 Vicenza Tel. 0444321155 <tel:+390444321155> | Fax 04441492177

2 1

copr-be.cloud.fedoraproject.org down again
by tony.stivers＠gmail.com 20 Dec '21

20 Dec '21

Can we get more than one mirror for this repo? It's down every other day...

2 1