March 2019 - Users - oVirt List Archives

Install hosted-engine - Task Get local VM IP failed
by florentl 10 Mar '22

10 Mar '22

Hi all, I try to install hosted-engine on node : ovirt-node-ng-4.2.3-0.20180518. Every times I get stuck on : [ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 50, "changed": true, "cmd": "virsh -r net-dhcp-leases default | grep -i 00:16:3e:6c:5a:91 | awk '{ print $5 }' | cut -f1 -d'/'", "delta": "0:00:00.108872", "end": "2018-06-01 11:17:34.421769", "rc": 0, "start": "2018-06-01 11:17:34.312897", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []} I tried with static IP Address and with DHCP but both failed. To be more specific, I installed three nodes, deployed glusterfs with the wizard. I'm in a nested virtualization environment for this lab (Vmware Esxi Hypervisor). My node IP is : 192.168.176.40 / and I want the hosted-engine vm has 192.168.176.43. Thanks, Florent

7 17

Lots of storage.MailBox.SpmMailMonitor
by Fabrice Bacchella 06 Jan '22

06 Jan '22

My vdsm log files are huge: -rw-r--r-- 1 vdsm kvm 1.8G Nov 22 11:32 vdsm.log And this is juste half an hour of logs: $ head -1 vdsm.log 2018-11-22 11:01:12,132+0100 ERROR (mailbox-spm) [storage.MailBox.SpmMailMonitor] mailbox 2 checksum failed, not clearing mailbox, clearing new mail (data='...lots of data', expected='\xa4\x06\x08\x00') (mailbox:612) I just upgraded vdsm: $ rpm -qi vdsm Name : vdsm Version : 4.20.43

3 5

poweroff and reboot with ovirt_vm ansible module
by Nathanaël Blanchet 18 May '21

18 May '21

Hello, is there a way to poweroff or reboot (without stopped and running state) a vm with the ovirt_vm ansible module? -- Nathanaël Blanchet Supervision réseau Pôle Infrastrutures Informatiques 227 avenue Professeur-Jean-Louis-Viala 34193 MONTPELLIER CEDEX 5 Tél. 33 (0)4 67 54 84 55 Fax 33 (0)4 67 54 84 14 blanchet(a)abes.fr

7 11

OVN and change of mgmt network
by Gianluca Cecchi 26 Jan '21

26 Jan '21

Hello, I previously had OVN running on engine (as OVN provider with northd and northbound and southbound DBs) and hosts (with OVN controller). After changing mgmt ip of hosts (engine has retained instead the same ip), I executed again on them the command: vdsm-tool ovn-config <ip_of_engine> <nel_local_ip_of_host> Now I think I have to clean up some things, eg: 1) On engine where I get these lines below systemctl status ovn-northd.service -l . . . Sep 29 14:41:42 ovmgr1 ovsdb-server[940]: ovs|00005|reconnect|ERR|tcp: 10.4.167.40:37272: no response to inactivity probe after 5 seconds, disconnecting Oct 03 11:52:00 ovmgr1 ovsdb-server[940]: ovs|00006|reconnect|ERR|tcp: 10.4.167.41:52078: no response to inactivity probe after 5 seconds, disconnecting The two IPs are the old ones of two hosts It seems that a restart of the services has fixed... Can anyone confirm if I have to do anything else? 2) On hosts (there are 3 hosts with OVN on ip 10.4.192.32/33/34) where I currently have this output [root@ov301 ~]# ovs-vsctl show 3a38c5bb-0abf-493d-a2e6-345af8aedfe3 Bridge br-int fail_mode: secure Port "ovn-1dce5b-0" Interface "ovn-1dce5b-0" type: geneve options: {csum="true", key=flow, remote_ip="10.4.192.32"} Port "ovn-ddecf0-0" Interface "ovn-ddecf0-0" type: geneve options: {csum="true", key=flow, remote_ip="10.4.192.33"} Port "ovn-fd413b-0" Interface "ovn-fd413b-0" type: geneve options: {csum="true", key=flow, remote_ip="10.4.168.74"} Port br-int Interface br-int type: internal ovs_version: "2.7.2" [root@ov301 ~]# The IPs of kind 10.4.192.x are ok. But there is a left-over of an old host I initially used for tests, corresponding to 10.4.168.74, that now doesn't exist anymore How can I clean records for 1) and 2)? Thanks, Gianluca

3 4

"gluster-ansible-roles is not installed on Host" error on Cockpit
by Hesham Ahmed 26 Nov '20

26 Nov '20

On a new 4.3.1 oVirt Node installation, when trying to deploy HCI (also when trying adding a new gluster volume to existing clusters) using Cockpit, an error is displayed "gluster-ansible-roles is not installed on Host. To continue deployment, please install gluster-ansible-roles on Host and try again". There is no package named gluster-ansible-roles in the repositories: [root@localhost ~]# yum install gluster-ansible-roles Loaded plugins: enabled_repos_upload, fastestmirror, imgbased-persist, package_upload, product-id, search-disabled-repos, subscription-manager, vdsmupgrade This system is not registered with an entitlement server. You can use subscription-manager to register. Loading mirror speeds from cached hostfile * ovirt-4.3-epel: mirror.horizon.vn No package gluster-ansible-roles available. Error: Nothing to do Uploading Enabled Repositories Report Cannot upload enabled repos report, is this client registered? This is due to check introduced here: https://gerrit.ovirt.org/#/c/98023/1/dashboard/src/helpers/AnsibleUtil.js Changing the line from: [ "rpm", "-qa", "gluster-ansible-roles" ], { "superuser":"require" } to [ "rpm", "-qa", "gluster-ansible" ], { "superuser":"require" } resolves the issue. The above code snippet is installed at /usr/share/cockpit/ovirt-dashboard/app.js on oVirt node and can be patched by running "sed -i 's/gluster-ansible-roles/gluster-ansible/g' /usr/share/cockpit/ovirt-dashboard/app.js && systemctl restart cockpit"

2 1

deprecating export domain?
by Charles Kozler 30 Aug '20

30 Aug '20

Hello, I recently read on this list from a redhat member that export domain is either being deprecated or looking at being deprecated To that end, can you share details? Can you share any notes/postings/bz's that document this? I would imagine something like this would be discussed in larger audience This seems like a somewhat significant change to make and I am curious where this is scheduled? Currently, a lot of my backups rely explicitly on an export domain for online snapshots, so I'd like to plan accordingly Thanks!

11 21

How to connect to a guest with vGPU ?
by Josep Manel Andrés Moscardó 29 May '20

29 May '20

Hi, I got vGPU through mdev working but I am wondering how I would connect to the client and make use of the GPU. So far I try to access the console through SPICE and at some point in the boot process it switches to GPU and I cannot see anything else. Thanks. -- Josep Manel Andrés Moscardó Systems Engineer, IT Operations EMBL Heidelberg T +49 6221 387-8394

3 4

Vm suddenly paused with error "vm has paused due to unknown storage error"
by Jasper Siero 18 Feb '20

18 Feb '20

Hi all, Since we upgraded our Ovirt nodes to CentOS 7 a vm (not a specific one but never more then one) will sometimes pause suddenly with the error "VM ... has paused due to unknown storage error". It happens now two times in a month. The Ovirt node uses san storage for the vm's running on it. When a specific vm is pausing with an error the other vm's keeps running without problems. The vm runs without problems after unpausing it. Versions: CentOS Linux release 7.1.1503 vdsm-4.14.17-0 libvirt-daemon-1.2.8-16 vdsm.log: VM Channels Listener::DEBUG::2015-10-25 07:43:54,382::vmChannels::95::vds::(_handle_timeouts) Timeout on fileno 78. libvirtEventLoop::INFO::2015-10-25 07:43:56,177::vm::4602::vm.Vm::(_onIOError) vmId=`77f07ae0-cc3e-4ae2-90ec-7fba7b11deeb`::abnormal vm stop device virtio-disk0 error eother libvirtEventLoop::DEBUG::2015-10-25 07:43:56,178::vm::5204::vm.Vm::(_onLibvirtLifecycleEvent) vmId=`77f07ae0-cc3e-4ae2-90ec-7fba7b11deeb`::event Suspended detail 2 opaque None libvirtEventLoop::INFO::2015-10-25 07:43:56,178::vm::4602::vm.Vm::(_onIOError) vmId=`77f07ae0-cc3e-4ae2-90ec-7fba7b11deeb`::abnormal vm stop device virtio-disk0 error eother ........... libvirtEventLoop::INFO::2015-10-25 07:43:56,180::vm::4602::vm.Vm::(_onIOError) vmId=`77f07ae0-cc3e-4ae2-90ec-7fba7b11deeb`::abnormal vm stop device virtio-disk0 error eother specific error part in libvirt vm log: block I/O error in device 'drive-virtio-disk0': Unknown error 32758 (32758) ........... block I/O error in device 'drive-virtio-disk0': Unknown error 32758 (32758) engine.log: 2015-10-25 07:44:48,945 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-40) [a43dcc8] VM diataal-prod-cas1 77f07ae0-cc3e-4ae2-90ec-7fba7b11deeb moved from Up --> Paused 2015-10-25 07:44:49,003 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-40) [a43dcc8] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM diataal-prod-cas1 has paused due to unknown storage error. Has anyone experienced the same problem or knows a way to solve this? Kind regards, Jasper

3 3

Ovirt-engine-ha cannot to see live status of Hosted Engine
by asm＠pioner.kz 01 Feb '20

01 Feb '20

Good day for all. I have some issues with Ovirt 4.2.6. But now the main this of it: I have two Centos 7 Nodes with same config and last Ovirt 4.2.6 with Hostedengine with disk on NFS storage. Also some of virtual machines working good. But, when HostedEngine running on one node (srv02.local) everything is fine. After migrating to another node (srv00.local), i see that agent cannot to check livelinness of HostedEngine. After few minutes HostedEngine going to reboot and after some time i see some situation. After migration to another node (srv00.local) all looks OK. hosted-engine --vm-status commang when HosterEngine on srv00 node: --== Host 1 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : srv02.local Host ID : 1 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down_unexpected", "detail": "unknown"} Score : 0 stopped : False Local maintenance : False crc32 : ecc7ad2d local_conf_timestamp : 78328 Host timestamp : 78328 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=78328 (Tue Sep 18 12:44:18 2018) host-id=1 score=0 vm_conf_refresh_time=78328 (Tue Sep 18 12:44:18 2018) conf_on_shared_storage=True maintenance=False state=EngineUnexpectedlyDown stopped=False timeout=Fri Jan 2 03:49:58 1970 --== Host 2 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : srv00.local Host ID : 2 Engine status : {"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail": "Up"} Score : 3400 stopped : False Local maintenance : False crc32 : 1d62b106 local_conf_timestamp : 326288 Host timestamp : 326288 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=326288 (Tue Sep 18 12:44:21 2018) host-id=2 score=3400 vm_conf_refresh_time=326288 (Tue Sep 18 12:44:21 2018) conf_on_shared_storage=True maintenance=False state=EngineStarting stopped=False Log agent.log from srv00.local: MainThread::INFO::2018-09-18 12:40:51,749::states::779::ovirt_hosted_engine_ha.agent.hosted_engine.HostedE ngine::(consume) VM is powering up.. MainThread::INFO::2018-09-18 12:40:52,052::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(_monitoring_loop) Current state EngineStarting (score: 3400) MainThread::INFO::2018-09-18 12:41:01,066::states::779::ovirt_hosted_engine_ha.agent.hosted_engine.HostedE ngine::(consume) VM is powering up.. MainThread::INFO::2018-09-18 12:41:01,374::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(_monitoring_loop) Current state EngineStarting (score: 3400) MainThread::INFO::2018-09-18 12:41:11,393::state_machine::169::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(refresh) Global metadata: {'maintenance': False} MainThread::INFO::2018-09-18 12:41:11,393::state_machine::174::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(refresh) Host srv02.local.pioner.kz (id 1): {'conf_on_shared_storage': True, 'extra': 'meta data_parse_version=1\nmetadata_feature_version=1\ntimestamp=78128 (Tue Sep 18 12:40:58 2018)\nhost-id=1\ns core=0\nvm_conf_refresh_time=78128 (Tue Sep 18 12:40:58 2018)\nconf_on_shared_storage=True\nmaintenance=Fa lse\nstate=EngineUnexpectedlyDown\nstopped=False\ntimeout=Fri Jan 2 03:49:58 1970\n', 'hostname': 'srv02. local.pioner.kz', 'alive': True, 'host-id': 1, 'engine-status': {'reason': 'vm not running on this host', 'health': 'bad', 'vm': 'down_unexpected', 'detail': 'unknown'}, 'score': 0, 'stopped': False, 'maintenance ': False, 'crc32': 'e18e3f22', 'local_conf_timestamp': 78128, 'host-ts': 78128} MainThread::INFO::2018-09-18 12:41:11,393::state_machine::177::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(refresh) Local (id 2): {'engine-health': {'reason': 'failed liveliness check', 'health': 'b ad', 'vm': 'up', 'detail': 'Up'}, 'bridge': True, 'mem-free': 12763.0, 'maintenance': False, 'cpu-load': 0 .0364, 'gateway': 1.0, 'storage-domain': True} MainThread::INFO::2018-09-18 12:41:11,393::states::779::ovirt_hosted_engine_ha.agent.hosted_engine.HostedE ngine::(consume) VM is powering up.. MainThread::INFO::2018-09-18 12:41:11,703::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(_monitoring_loop) Current state EngineStarting (score: 3400) MainThread::INFO::2018-09-18 12:41:21,716::states::779::ovirt_hosted_engine_ha.agent.hosted_engine.HostedE ngine::(consume) VM is powering up.. MainThread::INFO::2018-09-18 12:41:22,020::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(_monitoring_loop) Current state EngineStarting (score: 3400) MainThread::INFO::2018-09-18 12:41:31,033::states::779::ovirt_hosted_engine_ha.agent.hosted_engine.HostedE ngine::(consume) VM is powering up.. MainThread::INFO::2018-09-18 12:41:31,344::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(_monitoring_loop) Current state EngineStarting (score: 3400) As we can see, agent thinking that HostedEngine just in powering up mode. I cannot to do anythink with it. I allready reinstalled many times srv00 node without success. One time i even has to uninstall ovirt* and vdsm* software. Also here one interesting point, after installing just "yum install http://resources.ovirt.org/pub/yum-repo/ovirt-release42.rpm" on this node i try to install this node from engine web interface with "Deploy" action. But, installation was unsuccesfull, before i didnt install ovirt-hosted-engine-ha on this node. I dont see in documentation that its need bofore installation of new hosts. But this is for information and checking. After installing ovirt-hosted-engine-ha node was installed with HostedEngine support. But the main issue not changed. Thanks in advance for help. BR, Alexandr

3 5

Hyperconverged setup - storage architecture - scaling
by Leo David 10 Jan '20

10 Jan '20

Hello Everyone, Reading through the document: "Red Hat Hyperconverged Infrastructure for Virtualization 1.5 Automating RHHI for Virtualization deployment" Regarding storage scaling, i see the following statements: *2.7. SCALINGRed Hat Hyperconverged Infrastructure for Virtualization is supported for one node, and for clusters of 3, 6, 9, and 12 nodes.The initial deployment is either 1 or 3 nodes.There are two supported methods of horizontally scaling Red Hat Hyperconverged Infrastructure for Virtualization:* *1 Add new hyperconverged nodes to the cluster, in sets of three, up to the maximum of 12 hyperconverged nodes.* *2 Create new Gluster volumes using new disks on existing hyperconverged nodes.You cannot create a volume that spans more than 3 nodes, or expand an existing volume so that it spans across more than 3 nodes at a time* *2.9.1. Prerequisites for geo-replicationBe aware of the following requirements and limitations when configuring geo-replication:One geo-replicated volume onlyRed Hat Hyperconverged Infrastructure for Virtualization (RHHI for Virtualization) supports only one geo-replicated volume. Red Hat recommends backing up the volume that stores the data of your virtual machines, as this is usually contains the most valuable data.* ------ Also in oVirtEngine UI, when I add a brick to an existing volume i get the following warning: *"Expanding gluster volume in a hyper-converged setup is not recommended as it could lead to degraded performance. To expand storage for cluster, it is advised to add additional gluster volumes." * Those things are raising a couple of questions that maybe for some for you guys are easy to answer, but for me it creates a bit of confusion... I am also referring to RedHat product documentation, because I treat oVirt as production-ready as RHHI is. *1*. Is there any reason for not going to distributed-replicated volumes ( ie: spread one volume across 6,9, or 12 nodes ) ? - ie: is recomanded that in a 9 nodes scenario I should have 3 separated volumes, but how should I deal with the folowing question *2.* If only one geo-replicated volume can be configured, how should I deal with 2nd and 3rd volume replication for disaster recovery *3.* If the limit of hosts per datacenter is 250, then (in theory ) the recomended way in reaching this treshold would be to create 20 separated oVirt logical clusters with 12 nodes per each ( and datacenter managed from one ha-engine ) ? *4.* In present, I have the folowing one 9 nodes cluster , all hosts contributing with 2 disks each to a single replica 3 distributed replicated volume. They where added to the volume in the following order: node1 - disk1 node2 - disk1 ...... node9 - disk1 node1 - disk2 node2 - disk2 ...... node9 - disk2 At the moment, the volume is arbitrated, but I intend to go for full distributed replica 3. Is this a bad setup ? Why ? It oviously brakes the redhat recommended rules... Is there anyone so kind to discuss on these things ? Thank you very much ! Leo -- Best regards, Leo David -- Best regards, Leo David

3 5